Did you ever need the content of a printed document but couldn’t find the associated file?
Copy word for word into a new document is a suitable task only for short notes.
Did you ever see a text far away from your device you would like to quote?
As you read these words on iNotes4You, your eyes and brain are carrying out OCR operations without you even noticing. Your eyes are recognizing the patterns of light and dark that make up the characters displayed on your device. It took you a long time learning all the patterns and absorbing their meaning. Your brain now can figure out what I’m trying to say (sometimes by reading individual characters but mostly by scanning entire words and whole groups of words at once).
Reading handwritten text of different people improves your experiences with extracting the concealed information. Your brain is like a database saving all the different mutants of a character.
Transferring printed documents to digital text makes the job easier and you don’t have to spend your precious time for typing it manually into a new document. Luckily, you can use your iOS device to do OCR (Optical Character Recognition).
Take a snap of the document with your iPhone and hey, presto.
OCR software extracts all the information from the image and converts it into text. To do this job a stand-alone algorithm implemented in a software application is not sufficient. It must be supported by a database when it comes to non-unique results e.g. a ‘g’ or ‘q’ in a word. The algorithm then has to ask the database wether the first or an alternate variant is a well-known word.
There are many causes preventing the technique to do the job without errors. The error frequency is mainly determined by the following disturbances:
- Incomplete characters
- Font type
- Surrounding non-text elements
- and more …
Most of the problems can be avoided by using a special font set called OCR-A.
Converting with 100% accuracy in 100% of the time would be nice, but we’re talking about OCR on an iPhone or iPad and not about an expensive professional equipment.
The important thing is that you provide the best source image possible. This generally means a flat page with clear text and sufficient lighting. If you can do your part and take a good picture all apps seems to have no trouble doing the job with occasional manual corrections.
Microsoft OneNote …
Microsoft OneNote is part of the Office suite and offers OCR.
Just open OneNote drag the image into the workspace and use ‘Copy text from image’. Depending on the image quality and the amount of not-wanted surrounding information you have to do some manual editing.
Unfortunately the free app Microsoft OneNote for iOS devices does not support OCR.
This free OCR software uses the Tesseract OCR engine (HP), one of the most accurate open source OCR engines available.
FreeOCR offers a simple UI with two windows (left=the original image / right=the extracted text). It supports most image files and multi-page TIFF files.
Steps to do by OCR software …
- Loading image as bitmap
The source usually is a file with one of the well-known formats BMP, JPEG, PNG and so on. PDF files must be supported as well, many documents are stored as images in PDF format and the only way to extract text from such files is to perform OCR.
- Detecting the relevant image features
Many OCR algorithms expect some predefined range of font sizes and foreground/background colors so the image must be rescaled and inverted before processing when necessary.
- Reducing disturbances
An image can be skewed or it can have a lot of optical noise, so deskew- and despeckle-algorithms are applied to improve the image quality.
- Converting to bi-tonal image
Many OCR algorithms require bi-tonal image, therefore color or grayscale must be converted to black-and-white image. This process is called ‘binarization’ (reducing to two colors) and in some cases it is an important step because incorrect binarization will cause a lot of problems.
In other cases, the algorithm performs better on the original image and so this step is skipped.
- Lines detection and removing
This step is required to improve page layout analysis, to achieve better recognition quality for underlined text, to detect tables, etc.
- Page layout analysis
This steps is also called ‘zoning’. At this stage OCR system must detect positions and types of the important areas of the image. It has to identify columns, paragraphs, captions, etc. as distinct blocks. This is important in multi-column layouts often used in newspapers and tables.
- Detection of text lines and words
This might be a complex task when analyzing layout-oriented articles in magazines because of different font sizes and space between words and lines.
- Combined-Broken characters analysis
It’s a common situation that some characters look broken or touch each other. So OCR has to separate characters and virtually complete their shapes.
- Recognition of characters
This is the main algorithm of OCR. An image of every character must be converted to appropriate character code. Sometimes this algorithm produces several character codes. For instance, recognition of the image of ‘I’ may produce ‘I’, ’1′, or ‘l’ codes and the final character code has to be selected by looking on the context.
- Dictionary support
This step can improve recognition quality, some characters like ’1′ and ‘I’, ‘C’ and ‘G’ may look similar and the dictionary must help to make the decision.
- Saving results
The last step is to transfer the pure text into a suitable output format e.g. a searchable PDF, DOCX, RTF, TXT. The greatest challenge is to keep the original page layout for columns, fonts, colors, pictures, background etc.
A sneak peek …
There are a lot of OCR apps for iOS devices. Cost varies and so do feature sets. Some are limited towards scanning business cards. In any case the image should not look like my image above.
Anyway you must keep your feet on the ground. Forget suggestions of app developers that you can convert all your printed documents into readable text.
Part 2 of ‘Text recognition’ is coming soon and compares valuable apps for iOS devices as there are ImageToText, Text Grabber, and Prizmo.
Thanks for dropping by.