I need to write an OCR program for digits only. I will use MNIST datasets. The problem is I do not know where to start. There are a lot of papers which doesn't really explain the algorithm. I don't really have much knowledge about pattern recognition. So I have a few questions.
Q1 : Where can I find the algorithm (or a tutorial)
Q2 : How do I classify digits? I don't need very advanced things. First thing that comes to my mind is finding the ratio of upper half/lower half and left side/ right side. Is there more useful and easy classification methods.
Q3 : What is back propagation and the layers which is shown in most of the papers. Do I need them for my simple OCR.
Note: I know my OCR program won't be accurate. It isn't very important for now.
If the closest engineering library to you has a section on image processing, computer vision, or machine vision, then with luck that library will have a copy of a book I recommend for OCR:
Character Recognition Systems by Cheriet, Kharma, Liu, and Suen
This book provides a fairly comprehensive overview of OCR techniques and recent research. It does not go into great depth on any particular subject, but it does provide references to academic papers.
Make sure you have access to a good introductory textbook on image processing. The book by Gonzalez and Woods is a standard in many universities:
Digital Image Processing by Gonzalez and Woods
Even "simple" OCR gets tricky very quickly. It could be overwhelming if you jump into a class about neural networks, Bayes theorem, etc., before you have a firm grasp of basic image processing principles.
If you can, try writing one or more OCR algorithms for machine-printed characters before you attempt to write an algorithm for handwritten characters.
Q1 : Where can I find the algorithm (or a tutorial)
There are numerous algorithms for OCR. The Cheriet book will give you a good start.
Q2 : How do I classify digits? I don't need very advanced things. First thing that comes to my mind is finding the ratio of upper half/lower half and left side/ right side. Is there more useful and easy classification methods.
Try implementing that technique and see how well it works. Even if the implementation doesn't work as well as you'd like, lessons learned while implementing it could help you later.
You can also subdivide a character into a 2 x 2 grid or 3 x 3 grid and check for relatively densities of pixels. Unlike machine printed characters, handwritten characters won't line up nicely in rectilinear grids.
Template matching using normalized correlation is simple, and it can work reasonably well for machine printed characters for a single, known font. It's relatively simple to implement and worth learning:
http://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation
For OCR it's common to thin the characters in your sample as an initial step. Thinning is a technique to reduce a character (or any other shape) to a representation that is 1 pixel wide. Once you have a thinned character it can be easier to identify lines and intersections. If you can identify lines (or curves) and intesections, then one technique is to look at the relative position and angle of each line with respect to the others.
Common thinning algorithms include Stentiford and Zhang-Suen. There's a freeware version of WinTopo that demonstrates both of these algorithms:
http://wintopo.com/
You can look into academic papers about "stroke extraction", but those techniques tend to be more difficult to implement.
Q3 : What is back propagation and the layers which is shown in most of the papers. Do I need them for my simple OCR.
These terms refer to artificial neural networks. For a simple OCR algorithm you'll hard-code the recognition logic OR use simple training methods. Artificial neural networks can be trained to recognize characters that aren't hard-coded in your software.
http://en.wikipedia.org/wiki/Neural_network
Although you don't need to learn about artificial neural network to write a simple OCR algorithm, a simple algorithm will have only limited success with handwritten characters.
Above all, keep in mind that OCR for handwritten characters is an extremely difficult problem. If you could achieve a handwritten character read rate of 20% with a simple technique, then consider that a success.
Related
I am trying to recognize a barcode using simple CNN treating it like a multi-digit recognition problem.
The results are not very good. So I was looking was some better deep learning models for the same. During my search, I did not find any OCR model being tried on barcodes. So my question is - Can OCR models be trained to recognize barcodes. I find the task of barcode detection and recognition very similar to text recognition. Is there something I am missing?
While CNNs can be used to read the contents of the barcode, especially in the scenario where massive datasets of images are available for training, it is tough to match the performance of a classical barcode reading algorithm with standard AI approaches.
The difference between reading the text and reading the barcode is structural. Text is fundamentally unstructured, while barcodes are designed to be structured for readability using specifically engineered decoding algorithms.
All these algorithms for reading have rules which are, in many cases, not so hard to implement. On the other hand, CNNs would have a hard time and need vast amounts of data to learn those rules.
Also, many barcode symbologies (EAN included) use error detection or correction algorithms (like check-digits), which can be integrated into the error-recovery loop to increase the performance of the scanning further.
So, in theory, OCR and Barcode scanning are similar problems, while in practice, there are substantial differences.
Note: I'm working at Microblink, where we do R&D in the area of barcode scanning and text recognition. When it comes to barcode scanning, we've tried basically everything in the AI repertoire to get the most out of it, and ended up using both CNNs and classical algorithms working tightly together.
We have a history of conversations between humans (any language, any vocabulary), so with a lof of spelling errors:
"hellobb do u hav skip?" => "hello baby, do you have skype?"
Before running a deep learning task against this data set (find synonyms etc..), I would like to fix these errors.
Is it a good idea? I've never worked with such bad quality data. Wondering if there is a "magic solution" to achieve this.
Else I plan to use:
word embeddings (word2vec) to check if good and bad words are similar
distance function between words
if wordA is less famous wordB then fix(wordA) = wordB
There is no magic solution at this moment to guaranty to fix all misspelling errors on your text but here are some possible options you can consider:
Dictionary-based approach. I found Hunspell very handy in this case. It uses language modeling and Levenshtein distance to suggest the correct spelling. It is available on many natural & programming languages. Although it is a dictionary-based approach, it is superior to many sophisticated approaches. It is used in vast majority word-processing applications.
Statistical and traditional approach. Another possible solution is to develop your own statistical models such as language modeling. Training language modeling on a large corpus, at character level & word level, can found many misspelling on the text. Many speech recognition and search engines use language modeling at their heart to fix the misspelling.
Deep learning approach. If you look at NLPProgress.com, most of the state-of-the-art research used seq2seq models to attack grammatical error problem. The main intuition behind these models is to train a neural network on pairs of sentences which network learns how to fix grammatical error. These approaches require quite a lot of pairs sentence to gives a reliable result. If the available corpora are not fit to your needs, you can generate your own misspelling e.g. by replacing some tokens in your text.
I tried to improved the results of OpenSource OCR software. I'm using tessaract, because I find it still produces better results than gocr, but with bad quality input it has huge problems. So I tried to prepocess the image with various tools I found in the internet:
unpaper
Fred's ImageMagick Scripts: TEXTCLEANER
manuall using GIMP
But I was not able to get good results with this bad test document: (really just for test, I don't need to content of this file)
http://9gag.com/gag/aBrG8w2/employee-handbook
This online service works surprisingly good with this test document:
http://www.onlineocr.net/
I'm wonderung if it is possible using smart preprocessing to get similar results with tesseract. Are the OpenSource OCR engines really so bad compared to commercial ones? Even google uses tesseract to scan documents, so I was expecting more...
Tesseract's precision in recognition is a little bit lower than the precision of the best commercial one (Abbyy FineReader), but it's more flexible because of its nature.
This flexibility entail sometimes some preprocessing, because it's not possible for Tesseract to manage each situation.
Actually is used by google because is Google its main sponsor!
The first thing you could do is to try to expand the text in order to have at least 20 pixel wide characters or more. Since Tesseract works using as features the main segments of the characters' borders, it needs to have a bigger characters' size comparing with other algorithms.
Another thing that you could try, always referring to the test document you mentioned, is to binarize your image with an adaptive thresholding method (here you can find some infos about that https://dsp.stackexchange.com/a/2504), because some changes in the illumination are present. Tesseract binarizes the image internally, but this could be the case when it fails to do that (it's similar to the example here Improving the quality of the output with Tesseract, where you can also find some other useful informations)
I've been reading (and trying) OCR programs suggested in previous answers but I'm still without a clear answer to my problem.
I need to recognize handwritten English text. The text would be multiple lines but each line is only one or two words length. The text is from a different person at time. I could ask that person to provide a training file (e.g. with the alphabet and 0-9 numbers) but I cannot really ask for a much more complicated training than this.
I need to integrate the recognition as part of another (Java) application but the solution doesn't need to be Java. I can just execute it from Java and get the results from a text file.
Any recommendations?
I've already tested Tesseract (bad results without training and training looks quite complex). Java OCR looked like the perfect solution (simple training, open source and Java) but it doesn't work well even with their own examples (anybody has had a better experiencie?). GOCR does not seem very active.
Of course I prefer free solutions but this is not a MUST (though the problem I see with a commercial option is that I must be able to integrate it in my own app which will be offered as SaaS)
From my experience ABBYY is one of the best for handwriting recognition, even without training. (It's possibly one of the most expensive too, though...) They have an SDK for Java.
http://www.abbyy.com
With a free trial, it's definately worth a look!
I am on the lookout for a handwritten text recognition software. So far the only one giving better results than even abby 11 has been SimpleOCR using the same text for both, which is a freeware for ocr but a 14 day trial for HCR!
I know I am answering after nearly 6 years. But if anyone's still looking, try using tensorflow. Their website has a simple example for handwritten digit recognition(MNIST). You can use this example and implement it for handwritten alphabet recognition (you need training data for this, I used NIST special Database 19 to get this data).
Has anybody seen a GP implemented with online learning rather than the standard offline learning? I've done some stuff with genetic programs and I simply can't figure out what would be a good way to make the learning process online.
Please let me know if you have any ideas, seen any implementations, or have any references that I can look at.
Per the Wikipedia link, online learning "learns one instance at a time." The online/offline labels usually refer to how training data is feed to a supervised regression or classification algorithm. Since genetic programming is a heuristic search that uses an evaluation function to evaluate the fitness of its solutions, and not a training set with labels, those terms don't really apply.
If what you're asking is if the output of the GP algorithm (i.e. the best phenotype), can be used while it's still "searching" for better solutions, I see no reason why not, assuming it makes sense for your domain/application. Once the fitness of your GA/GP's population reaches a certain threshold, you can apply that solution to your application, and continue to run the GP, switching to a new solution when a better one becomes available.
One approach along this line is an algorithm called rtNEAT, which attempts to use a genetic algorithm to generate and update a neural network in real time.
I found a few examples by doing a Google scholar search for online Genetic Programming.
An On-Line Method to Evolve Behavior and to Control a Miniature Robot in Real Time with Genetic Programming
It actually looks like they found a way to make GP modify the machine code of the robot's control system during actual activities - pretty cool!
Those same authors went on to produce more related work, such as this improvement:
Evolution of a world model for a miniature robot using genetic programming
Hopefully their work will be enough to get you started - I don't have enough experience with genetic programming to be able to give you any specific advice.
It actually looks like they found a way to make GP modify the machine code of the robot's control system during actual activities - pretty cool!
Yes, the department at Uni Dortmund was heavily into linear GP :-)
Direct execution of GP programs vs. interpreted code has some advantages, though in these days you'd probably rather want to go with dynamic languages such as Java, C# or Obj-C that allow you to write classes/methods at runtime while still you can still benefit from some runtime rather than run on the raw CPU.
The online-learning approach doesn't seem like anything absolutely novel or different from 'classic GP' to me.
From my understanding it's just a case of extending the set of training/fitness/test cases during runtime?
Cheers,
Jay