iOS PencilKit Meets Core ML

We’ve had a good look at PencilKit in our previous article. It’s time to onboard Core ML into the drawing arena. The goal of this article is to run Handwritten Digit Classification using the famous MNIST Model on our PencilKit drawings.

ios-pencilkit-coreml-together-demo

MNSIT: A quick word

The MNIST dataset is an image dataset of handwritten digits with dimensions 28X28 and is grayscale. The images are of the size 20X20 and are normalized to fit in the center of the box. The accuracy works best when the digits are centered in the input image.

We won’t be digging deep into the model layers and training the dataset in this article. Let’s assume we were gifted Core ML MNSIT ML Model and jump onto the implementation.

Our Final Destionation

An image is worth a thousand words. A GIF is composed of thousands of images. Here’s the final outcome you’ll get by the end of this article.

ios-penclikit-mnist-coreml-output-demo

Implementation

Setting Up The Canvas

let canvasView = PKCanvasView(frame: .zero)
canvasView.backgroundColor = .black
canvasView.translatesAutoresizingMaskIntoConstraints = false
view.addSubview(canvasView)
NSLayoutConstraint.activate([
   canvasView.topAnchor.constraint(equalTo: navigationBar.bottomAnchor),
   canvasView.bottomAnchor.constraint(equalTo: view.bottomAnchor),
   canvasView.leadingAnchor.constraint(equalTo: view.leadingAnchor),
   canvasView.trailingAnchor.constraint(equalTo: view.trailingAnchor),
])

Setting Up the ToolPicker

The following code sets up the ToolPicker UI which consists of the drawing tools and palettes.

override func viewDidAppear(_ animated: Bool) {
        super.viewDidAppear(animated)

        guard
            let window = view.window,
            let toolPicker = PKToolPicker.shared(for: window) else { return }

        toolPicker.setVisible(true, forFirstResponder: canvasView)
        toolPicker.addObserver(canvasView)
        canvasView.becomeFirstResponder()
    }

Setting Up NavigationBar

The NavigationBar is added to the storyboard. In the following code, we’ve added three buttons to it.

 func setNavigationBar() {
        if let navItem = navigationBar.topItem{
            
            let detectItem = UIBarButtonItem(title: "Detect", style: .done, target: self, action: #selector(detectImage))
            let clearItem = UIBarButtonItem(title: "Clear", style: .plain, target: self, action: #selector(clear))

            navItem.rightBarButtonItems = [clearItem,detectItem]
            navItem.leftBarButtonItem = UIBarButtonItem(title: "", style: .plain, target: self, action: nil)
            
        }
    }

The left Bar Button is where the final predicted output is displayed.

Preprocessing the Drawing Input

Converting a PKDrawing into a UIImage is straightforward. The real challenge is in preprocessing it for the Core ML Model.

The UIImage we get from the PKDrawing contains just the drawn rectangle. We need to add a background to it(a contrasting one) and center it in that. Not doing so, and passing the PKDrawing to UIImage image to the model directly will cause inaccurate predictions. The following code does that for you.

func preprocessImage() -> UIImage
    {
        var image = canvasView.drawing.image(from: canvasView.drawing.bounds, scale: 10.0)
        if let newImage = UIImage(color: .black, size: CGSize(width: view.frame.width, height: view.frame.height)){

            if let overlayedImage = newImage.image(byDrawingImage: image, inRect: CGRect(x: view.center.x, y: view.center.y, width: view.frame.width, height: view.frame.height)){
                image = overlayedImage
            }
        }
    }

In the above code, we create a new UIImage with the size of the view bounds and add the image from the PKDrawing function in the center of it. The following extension are used to do so:

extension UIImage {
    
    public convenience init?(color: UIColor, size: CGSize = CGSize(width: 1, height: 1)) {
        let rect = CGRect(origin: .zero, size: size)
        UIGraphicsBeginImageContextWithOptions(rect.size, false, 0.0)
        color.setFill()
        UIRectFill(rect)
        let image = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()

        guard let cgImage = image?.cgImage else { return nil }
        self.init(cgImage: cgImage)
    }

    func image(byDrawingImage image: UIImage, inRect rect: CGRect) -> UIImage! {
        UIGraphicsBeginImageContext(size)

        draw(in: CGRect(x: 0, y: 0, width: size.width, height: size.height))
        image.draw(in: rect)
        let result = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()
        return result
    }
}

extension CGRect {
    var center: CGPoint { return CGPoint(x: midX, y: midY) }
}

Prediction Using Core ML

Now that the image is ready, we’ll do our prediction by resizing it into the training input size and feeding the CVPixelBuffer to the predict function.

private let trainedImageSize = CGSize(width: 28, height: 28)

func predictImage(image: UIImage){
        if let resizedImage = image.resize(newSize: trainedImageSize), let pixelBuffer = resizedImage.toCVPixelBuffer(){

        guard let result = try? MNIST().prediction(image: pixelBuffer) else {
            return
        }
            navigationBar.topItem?.leftBarButtonItem?.title = "Predicted: \(result.classLabel)"
            print("result is \(result.classLabel)")
        }
}

The UIImage is converted to CVPixelBuffer in the color space, gray scale.

extension UIImage
{
func resize(newSize: CGSize) -> UIImage? {
        UIGraphicsBeginImageContextWithOptions(newSize, false, 0.0)
        self.draw(in: CGRect(x: 0, y: 0, width: newSize.width, height: newSize.height))
        let newImage = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()
        return newImage
    }
    
    
    func toCVPixelBuffer() -> CVPixelBuffer? {
       var pixelBuffer: CVPixelBuffer? = nil

        let attr = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue,
        kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue] as CFDictionary
        
       let width = Int(self.size.width)
       let height = Int(self.size.height)

       CVPixelBufferCreate(kCFAllocatorDefault, width, height, kCVPixelFormatType_OneComponent8, attr, &pixelBuffer)
       CVPixelBufferLockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue:0))

       let colorspace = CGColorSpaceCreateDeviceGray()
       let bitmapContext = CGContext(data: CVPixelBufferGetBaseAddress(pixelBuffer!), width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer!), space: colorspace, bitmapInfo: 0)!

       guard let cg = self.cgImage else {
           return nil
       }

       bitmapContext.draw(cg, in: CGRect(x: 0, y: 0, width: width, height: height))

       return pixelBuffer
    }
}

That’s it! You’ll arrive at an outcome similar to the screengrab that was at the beginning of this article.
So that’s how PencilKit works with CoreML.

It’s a wrap up from my side. You can download the full source code from our Github Repository.

Leave a Reply

Your email address will not be published. Required fields are marked *