Previously, we had used Vision and Core ML to scan and recognize texts from an image.
Now that iOS 13 is here, the Vision API is vastly improved. Besides, VisionKit framework is now introduced which allows us to scan documents using Camera.
Vision and VisionKit
Vision API came out with iOS 11. Up until now, it could only detect text and not return us the actual content. Hence we had to bring in Core ML for the recognition part.
Now that the Vision API is upgraded with iOS 13, the VNRecognizedTextObservation
returns us the text, it’s confidence level as well as the bounding box coordinates.
Furthermore, VisionKit allows us to access the system’s document camera to scan pages.
VNDocumentCameraViewController
is the view controller and VNDocumentCameraViewControllerDelegate
is used to handle the delegate callbacks.
Launching a Document Camera
The following code is used to present the Document Camera on the screen.
let scannerViewController = VNDocumentCameraViewController() scannerViewController.delegate = self present(scannerViewController, animated: true)
Once the scan(s) are done and you click ‘Save’, the following delegate method gets triggered.
documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan)
To get a particular scanned image among the multiple images, pass the index of the page in the method:
scan.imageOfPage(at: index)
.
We can then process that image and detect the texts using the Vision API.
To process multiple images, we can loop through the scans in the delegate method in the following way:
for i in 0 ..< scan.pageCount { let img = scan.imageOfPage(at: i) processImage(img) }
Creating VNTextRecognitionRequest
let request = VNRecognizeTextRequest(completionHandler: nil) request.recognitionLevel = .accurate request.recognitionLanguages = ["en_US"]
recognitionLevel
can be also set to fast
. But then we'd have to deal with the less accuracy.
recognitionLanguages
is an array of languages passed in priority order from left to right.
We can also pass custom words which are NOT a part of the dictionary for Vision to recognize.
request.customWords = ["IOC", "COS"]
In the following section, let's create a simple XCode Project in which we'll recognize texts from the captured images using Vision Request Handler.
We're setting our deployment target to iOS 13.
Our Storyboard
Code
The code for the ViewController.swift file is given below:
import UIKit import Vision import VisionKit class ViewController: UIViewController, VNDocumentCameraViewControllerDelegate { @IBOutlet weak var imageView: UIImageView! @IBOutlet weak var textView: UITextView! var textRecognitionRequest = VNRecognizeTextRequest(completionHandler: nil) private let textRecognitionWorkQueue = DispatchQueue(label: "MyVisionScannerQueue", qos: .userInitiated, attributes: [], autoreleaseFrequency: .workItem) override func viewDidLoad() { super.viewDidLoad() textView.isEditable = false setupVision() } @IBAction func btnTakePicture(_ sender: Any) { let scannerViewController = VNDocumentCameraViewController() scannerViewController.delegate = self present(scannerViewController, animated: true) } private func setupVision() { textRecognitionRequest = VNRecognizeTextRequest { (request, error) in guard let observations = request.results as? [VNRecognizedTextObservation] else { return } var detectedText = "" for observation in observations { guard let topCandidate = observation.topCandidates(1).first else { return } print("text \(topCandidate.string) has confidence \(topCandidate.confidence)") detectedText += topCandidate.string detectedText += "\n" } DispatchQueue.main.async { self.textView.text = detectedText self.textView.flashScrollIndicators() } } textRecognitionRequest.recognitionLevel = .accurate } private func processImage(_ image: UIImage) { imageView.image = image recognizeTextInImage(image) } private func recognizeTextInImage(_ image: UIImage) { guard let cgImage = image.cgImage else { return } textView.text = "" textRecognitionWorkQueue.async { let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:]) do { try requestHandler.perform([self.textRecognitionRequest]) } catch { print(error) } } } func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) { guard scan.pageCount >= 1 else { controller.dismiss(animated: true) return } let originalImage = scan.imageOfPage(at: 0) let newImage = compressedImage(originalImage) controller.dismiss(animated: true) processImage(newImage) } func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFailWithError error: Error) { print(error) controller.dismiss(animated: true) } func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController) { controller.dismiss(animated: true) } func compressedImage(_ originalImage: UIImage) -> UIImage { guard let imageData = originalImage.jpegData(compressionQuality: 1), let reloadedImage = UIImage(data: imageData) else { return originalImage } return reloadedImage } }
The textRecognitionWorkQueue
is a DispatchQueue
used to run the vision request handler outside the main thread.
In the processImage
function, we pass the image to a request handler which performs the text recognition.
VNRecognizedTextObservation
is returned for each of the request's results
.
From the VNRecognizedTextObservation
we can look up to 10 candidates. Typically the top candidate gives us the most accurate result.
topCandidate.string
returns the text and topCandidate.confidenceLevel
returns us the confidence of the recognized text.
To get the bounding box for the string in the image we can use the function
topCandidate.boundingBox(for: topCandidate.string.startIndex..< topCandidate.string.endIndex).
This gives us CGRect which we can draw over the image.
Note: Vision Uses a different coordinate space than the UIKit, hence when drawing the bounding boxes, you need to flip the Y-Axis.
Output
Let's look at the output of the application in action.
So we just captured the cover of a bestselling novel and guess what, we were able to recognize to display the texts in a TextView on our screen.
That sums up Vision Text Recogniser for iOS 13.
The full source code material is available here.
2 replies on “iOS 13 Vision Text Recognition with Document Scanner”
Hi, Neat post. There’s a problem with your website in internet explorer, would check this?IE still is the market leader and a good portion of people will miss your fantastic writing because of this problem.
Actually, you are the only one I know, who using IE)