July 24, 2022
Today’s photos are made up of tens of millions of pixels containing an infinite amount of data. This might make us wonder - how do we get access to this information? In this post, I will show you, step by step, how to read text data from a photo, using the library in the title of the article: OpenCV and Tesseract OCR.
At the very beginning, let's prepare the development environment. To do so, we have to install:
pip install matplotlib
pip install pytesseract
pip install opencv-python
pip install matplotlib
Users of other operating systems can use the description on the GitHub tesseract.
sudo apt-get install -y tesseract-ocr
The most important task is to separate the text from the background. If we properly filter a given image or photo to change the background to white pixels, the tesseract will have no problem reading the text. Let's start with loading the photo and performing a few operations on it.
import cv2 import matplotlib as plt img = cv2.imread('example.jpg') # image is loaded gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # black and white conversion thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY) # all values over 127 converted to 255(white) plt.imshow(out, 'gray') plt.show() # displaying a picture
Sample photo of the inscription from the glasses cleaning wipes that I found on the cupboard, before and after applying the above operations
Now we need to provide our resulting image to be processed by tesseract.
from pytesseract import image_to_string output = image_to_string(thresh, lang='eng', config='--psm 7') print('Output: ', output)
The output, in this case, is "Output: GLASSES WIPES," so it's all right. As for the value of config = '- psm 7', they are different depending on what exactly we want to achieve and look like this:
Page segmentation modes: 0 Orientation and script detection (OSD) only. 1 Automatic page segmentation with OSD. 2 Automatic page segmentation, but no OSD, or OCR. 3 Fully automatic page segmentation, but no OSD. (Default) 4 Assume a single column of text of variable sizes. 5 Assume a single uniform block of vertically aligned text. 6 Assume a single uniform block of text. 7 Treat the image as a single text line. 8 Treat the image as a single word. 9 Treat the image as a single word in a circle. 10 Treat the image as a single character. 11 Sparse text. Find as much text as possible in no particular order. 12 Sparse text with OSD. 13 Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.
Let's try with text on multiple lines (config = '- psm 6')
also, this time, the result turned out to be correct
Output: Ala ma kota, kot ma Ale, ale Ala, nie ma psa.
as you can see, reading text from photos or pictures is not difficult. Of course, you can do more with the photo, such as extending the pixel values over the entire scale range ([0: 255]), or remove elements that we believe are background contamination and even fill our area when there are any gaps. It is worth noting that our "home OCR" can even handle handwriting, of course, if it is at least a little legible.
Title photo: pexels.com