Introduction
A very crucial modules in the optical character recognition pipelines is segmentation and detection of text, which is also known "text-localization". As we discussed in the prior blog we discussed several methods for pre-processing the image input that can aid in increasing the OCR accuracy by collection various documents or we can say gathering OCR Datasets. The blog in this one, we'll discover how to locate the text in an image to cut them out, and later feed them into our text recognition program to detect text in it.
What is text detection and segmentation?
It's the process of locating every instances of text on the photo into meaningful parts like words, characters, or text lines. Make segments of each of these components.
The method of character-based identification first recognizes the individual characters, and then groups the characters into words. One method for doing this is to find characters by separating them into Extremal Areas( MSER) and then group the identified characters through an exhaustive search technique.
Word-based recognition generally operates in the same way similar to object detection. It is possible to use Faster R-CNN or YOLO algorithms to achieve this.
Text-line-based detection detects lines of text and breaks it into words.
There are two kinds of text images used by the program for text recognition as inputs. One is documents that have been scanned, and the other kinds of text are natural scene texts like storefronts, street signs, texts, etc.
Scanned Documents
Scanned documents typically contain hundreds or thousands of words within the document. We can use deep neural networks such as the R-CNN algorithm that is faster or the YOLO to identify words that are present in documents. But , in some instances, they might not be able to locate all the text in images due to the fact that they are usually designed to detect only a small number of objects in an image. In such a scenario, we have to apply some post-processing following deep nets in order to detect other texts.
Another OpenCV technique that could use for scanning files can be used to calculate the Maximally Stable Extremal Regions(MSER) using OpenCV.
MSER is a technique that can be used to detect blobs in images. With this method, we are able to determine the coordinates of text regions, and we can draw the bounding boxes surrounding every word within the image. By doing this, we can obtain the images that we need to input into our text recognition program.
Natural Scenes
Natural scenes have a smaller amount of words but they also have other issues such as distortions, occlusions blurred background, directional blur and so on. To address these issues we have to create a deep-learning algorithm which is focused on the natural text of the scene and ignoring distortions above. There are a number of solid open source algorithms and Dataset For Machine Learning. They can be utilized to localize text within documents that have been scanned, however you will need to perform some post-processing in order to find every word in the image , as I mentioned previously.
Effective and Accurate text Detector(EAST)
It is a deep-learning text detection technique that has two phases, one of which is completely convolutional network(FCN) and the second one is non-max suppression(NMS) merging stage. In FCN it makes use of U-shape networks which produces text regions at the in the text line or at the word level. Here is the schematic of FCN that is used for the algorithms.
Optical Recognition Pipeline: Character Recognition Pipeline Text Recognition
Text Recognition
As you may remember, during the step of detecting text we divided the text areas. It's now time to identify the text found in these segments. This is called Text Recognition. As an example, take a look at the image below where we see sections on the left and the text that is recognized in the center. This is the goal we are trying to achieve, i.e. identify the language present in the sections. The next thing we'll do is send each segment one-by-one into our model for text recognition that will generate the text that is recognized. In general this Text Recognition step outputs a text file that includes each segment's bounding box coordinates , along with the text that has been recognized. Take above image(right) which contains three columns i.e. the name of the segment, coordinates, as well as the recognized text.
Now, you might be wondering why coordinates? This will be clearer as we discuss Restructuring (the following step).
Similar to the process of detecting text and recognition, text recognition has been a long-running research subject for computer vision. Traditional methods of recognition of text typically comprise three main stages
- Image pre-processing
- character segmentation
- Recognition of characters
They generally perform their work at the level of characters. However, when dealing with images that have a complicated background, font or any other distortions, character segmentation can be a difficult task. To keep from character segmentation two primary methods are employed.
- Connectionist Temporal Classification (CTC) based
- Attention-base
GTS Gives You Text Segmentation Through OCR Services
Global Technology Solutions (GTS) OCR has got your business covered. With its remarkable accuracy of more than 90% and fast real-time results, GTS helps businesses automate their data extraction processes. In mere seconds, the banking industry, e-commerce, digital payment services, Image Data Collection, AI Training Dataset, Video Dataset and many more can pull out the user information from any type of document by taking advantage of OCR technology. This reduces the overhead of manual data entry and time taking tasks of data collection.