Text detection on LCD(16x2) display via Tesseract OCR - ocr

I need to detect the value from an energy meter shown in an LCD display.
Is it possible to detect the text from the below LCD display using Tesseract OCR.
Or is there any other algorithm for detecting it? Can anyone guide me with this issue.
ENERGY METER

Related

Tesseract Number Recognition from image

tesseract ../spliced-time.png spliced-time -l eng --psm 13 --oem 3 txt pdf hocr
Gives me a result of: Al
I am confused if there is more I should be doing, or what would be the best approach for an image like this where the font and alignment should be generally the same. Just the numbers would be different. I was looking at opencv as well, but I feel as though this image shouldnt be that hard with maybe some extra work, or configuration or training to recognize the numbers well.
Image Attachment:

ocr - optical character recognition image size/quality decrease

I have question about OCR, parsing text from image. I am making ANDROID application that uses google cloud api to parse text from image. Problem is that sending/uploading image takes too much time. So I though i could resize image or decrease image quality. But in that case, usually the ocr detection results are suffering. Can anyone please tell me best way to do it. Maybe someone knows whatsapp image compression or the best image file format (jpg,png), best quality decrease/images size decrease ratio or something like that.
Thaks in progress

Algorithm to detect dithered image

I am trying to detect if a g4 compressed Tiff image will produce a good OCR output. Currently, dithered Tiff's yield poor OCR results. Therefore, before I send a Tiff to the OCR engine, I would like to determine if the image is dithered. If a Tiff was dithered, Ghostscript was used to perform the dithering.
Is there an algorithm to determine if an image is dithered?

achieve better recognition results via training tesseract

I have a question regarding achieving better recognition results with tesseract. I am using tesseract to recognize serial numbers. The serial numbes consist of only one font-type, characters A-Z, 0-9 and occur in different sizes and lengths.
At the moment I am able to recognize about 40% of the serial number images correct. Images are taken via mobile phone camera. Therefore the image quality isn't the best.
Special problem characters are 8/B, 5/6. Since I am recognizing only serial numbers, I am not using any dictionary improvements and every character is recognized independently.
My question is: Does someone has already experience in achieving better recognition results with training tesseract? How many images would be needed to be able to get good results.
For training tesseract should I use printed and afterwards photographed serial numbers, or should I use original digital serial numbers, without printing and photographing?
Maybe somebody has already experience in that kind of area.
Regarding training tesseract: I have already trained tesseract with some images. Therefore I have printed all characters in different sizes, photographed and labeled them correctly. Example training photo of the character 5
Is this a good/bad training example? Since I only want to recognize single characters without any dependency, I though I don't have to use words for training.
Actual I only have trained with 3 of these images for the characters B 8 6 5 which doesn't result in a better recognition in comparison with the original english (eng) tesseract database.
best regards,
Christoph
I am currently working on a Sikuli application using Tesseract to read text (Strings and numbers) from screenshots. I found that the best way to achieve accuracy was to process the screenshot before performing the OCR on it. However, most of the text I am reading is green text-on black background, making this my preferred solution. I used Scalr's method within BufferedImage to increase the size of the image:
BufferedImage bufImg = Scalr.resize(...)
which instantly yielded more accurate results with black text on gray background. I then used BufferedImage's options BufferedImage.TYPE_BYTE_GRAY and BufferedImage.TYPE_BYTE_BINARY when creating a new BufferedImage to process the Image to grayscale and black/white, respectively.
Following these steps brought Tesseract's accuracy from a 30% to around an 85% when dealing with green text on black background, and a really-close-to-100% accuracy when dealing with normal black text on white background. (sometimes letters within a word are mistaken by numbers i.e. hel10)
I hope this helps!

Creating a training image for Tesseract OCR

I'm writing a generator for training images for Tesseract OCR.
When generating a training image for a new font for Tesseract OCR, what are the best values for:
The DPI
The font size in points
Should the font be anti-aliased or not
Should the bounding boxes fit snugly: , or not:
The 2th question is somehow answered here: http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training_Images
There is no need to train with multiple sizes. 10 point will do. (An exception to this is very small text. If you want to recognize text with an x-height smaller than about 15 pixels, you should either train it specifically or scale your images before trying to recognize them.)
Questions 1 and 3: by experience, I've successfully used 300 dpi images/non anti-aliased fonts. More specifically, I have used the following convert parameters on a training pdf, which generated a satisfactory image:
convert -density 300 -depth 8 [input].pdf -background white -flatten +matte -compress none -monochrome [output].tif
But then I tried to add a dotted font to Tesseract and it only detected characters properly when I used a 150 dpi image. So, I don't think there's a general solution, it depends on the kind of fonts you're trying to add.
I found the answer to the 4th question - "Should the bounding boxes fit snugly".
It seems that fitting the rectangles as much as possible gives much better results.
For the other 12 pts and 300 dpi will be good enough, as #Yaroslav suggested. I think anti-aliasing is better turned off.
Good tool for tesseract training http://vietocr.sourceforge.net/training.html
It is good tool because having number of advantages
bounding box on letter can be editable by GUI based interface
automatically create all require file
automatically combined all files like freq-dawg, word-dawg, user-words (can be empty file), Inttemp, Normproto, Pffmtable, Unicharset, DangAmbigs (can be empty file), shapetable into single eng.traineddata file.
New training data can be used with existing tesseract file end.traineddata