How are numbers detected in optical character recognition? - ocr

I am interested in understanding how optical character recognition works on numbers in particular in an attempt to make my own, such as the logic behind determining a number (it must meet certain requirements)? Are there any resources out there that somewhat describe the algorithm that is used to detect numbers? I imagine it largely is comparing non-whitish pixel locations?

Related

OCR of Chinese characters with probability

I have a collection of Chinese character images (with single characters, not text). I need a way to OCR them, i.e. to map them to Unicode.
However, a crucial fact is that many of the images are a bit blurred, or quite small. Thus, the algorithm (or library, or online service) should not just return one Unicode value, but some kind of probability vector which estimates the probability that the given image represents which character.
For example, the image
could have the following distribution:
咳 95%
骇 4%
该 1%
I'd rather not train a neural network myself, since I'm sure that all OCR models are probabilistic underneath. All I'm looking for is an OCR solution that exposes the probabilities for single characters.

What is representation in optical character recognition?

I am learning OCR and reading this book https://www.amazon.com/Character-Recognition-Different-Languages-Computing/dp/3319502514
The authors define 8 processes to implement OCR that follow one by one (2 after 1, 3 after 2 etc):
Optical scanning
Location segmentation
Pre-processing
Segmentation
Representation
Feature extraction
Recognition
Post-processing
This is what they write about representation (#5)
The fifth OCR component is representation. The image representation
plays one of the most important roles in any recognition system. In
the simplest case, gray level or binary images are fed to a
recognizer. However, in most of the recognition systems in order to
avoid extra complexity and to increase the accuracy of the algorithms,
a more compact and characteristic representation is required. For this
purpose, a set of features is extracted for each class that helps
distinguish it from other classes while remaining invariant to
characteristic differences within the class.The character image
representation methods are generally categorized into three major
groups: (a) global transformation and series expansion (b) statistical
representation and (c) geometrical and topological representation.
This is what they write about feature extraction (#6)
The sixth OCR component is feature extraction. The objective of
feature extraction is to capture essential characteristics of symbols.
Feature extraction is accepted as one of the most difficult problems
of pattern recognition. The most straight forward way of describing
character is by actual raster image. Another approach is to extract
certain features that characterize symbols but leaves the unimportant
attributes. The techniques for extraction of such features are divided
into three groups’ viz. (a) distribution of points (b) transformations
and series expansions and (c) structural analysis.
I am totally confused. I don't understand what is representation. As I understand after segmentation we must take from image some features, for example topological structure like Freeman chain code and must match to some saved at the learning stage model - i.e. to do recognition. By other words - segmentation - feature extraction - recognition. I don't understand what must be done on representation stage. Please, explain.
The representation component takes the raster image produced by segmentation and converts it into a simpler format (a "representation") that preserves the characteristic properties of classes. This is in order to reduce the complexity of the recognition process later on. The Freeman chain code you mention is one such representation.
Some (most?) authors conflate representation and feature extraction into a single step, but the authors of your book have chosen to treat them separately. Changing the representation isn't mandatory, but doing so reduces the complexity, and so increases the accuracy, of the training and recognition steps.
It is from this simpler representation that features are extracted in the feature extraction step. Which features are extracted will depend upon the representation chosen. This paper - Feature Extraction Methods for Character Recognition - A Survey - describes 11 different feature extraction methods that can be applied to 4 different representations.
The extracted features are what are passed to the trainer or recognizer.

Generate huffman tree on Nvidia GPU

Is there a way to recreate a huffman tree on the GPU using only the frequency and symbol? Also is it fast or even worth the trouble to do?
No. The Huffman code is not uniquely determined by the frequencies of the symbols. There are many different optimal codes for the same set of frequencies.
What you want is a canonical Huffman code, which defines exactly what code to generate given only the symbols and the number of bits for each symbol. You do not need to transmit frequencies or a tree.

Using HMM for offline character recognition

I have extracted features from many images of isolated characters (such as gradient, neighbouring pixel weight and geometric properties. How can I use HMMs as a classifier trained on this data? All literature I read about HMM refers to states and state transitions but I can't connect it to features and class labeling. The example on JAHMM's home page doesn't relate to my problem.
I need to use HMM not because it will work better than other approaches for this problem but because of constraints on project topic.
There was an answer to this question for online recognition but I want the same for offline and in a little more detail
EDIT: I partitioned each character into a grid with fixed number of squares. Now I am planning to perform feature extraction on each grid block and thus obtain a sequence of features for each sample by moving from left to right and top to bottom.
Would this represent an adequate "sequence" for an HMM i.e. would an HMM be able to guess the temporal variation of the data, even though the character is not drawn from left to right and top to bottom? If not suggest an alternate way.
Should I feed a lot of features or start with a few? how do I know if the HMM is underforming or if the features are bad? I am using JAHMM.
Extracting stroke features is difficult and cant be logically combined with grid features? (since HMM expects a sequence generated by some random process)
I've usually seen neural networks used for this sort of recognition task, i.e. here, here here, and here. Since a simple google search turns up so many hits for neural networks in OCR, I'll assume you are set in using HMMs (a project limitation, correct?) Regardless, these links can offer some insight into gridding the image and obtaining image features.
Your approach for turning a grid into a sequence of observations is reasonable. In this case, be sure you do not confuse observations and states. The features you extract from one block should be collected into one observation, i.e. a feature vector. (In comparison to speech recognition, your block's feature vector is analogous to the feature vector associated with a speech phoneme.) You don't really have much information regarding the underlying states. This is the hidden aspect of HMMs, and the training process should inform the model how likely one feature vector is to follow another for a character (i.e. transition probabilities).
Since this is an off-line process, don't be concerned with the temporal aspects of how characters are actually drawn. For the purposes of your task, you've imposed a temporal order on the sequence of observations with your the left-to-right, top-to-bottom block sequence. This should work fine.
As for HMM performance: choose a reasonable vector of salient features. In speech recog, the dimensionality of a feature vector can be high (>10). (This is also where the cited literature can assist.) Set aside a percentage of the training data so that you can properly test the model. First, train the model, and then evaluate the model on the training dataset. How well does classify your characters? If it does poorly, re-evaluate the feature vector. If it does well on the test data, test the generality of the classifier by running it on the reserved test data.
As for the number of states, I would start with something heuristically derived number. Assuming your character images are scaled and normalized, perhaps something like 40%(?) of the blocks are occupied? This is a crude guess on my part since a source image was not provided. For an 8x8 grid, this would imply that 25 blocks are occupied. We could then start with 25 states - but that's probably naive: empty blocks can convey information (meaning the number of states might increase), but some features sets may be observed in similar states (meaning the number of states might decrease.) If it were me, I would probably pick something like 20 states. Having said that: be careful not to confuse features and states. Your feature vector is a representation of things observed in a particular state. If the tests described above show your model is performing poorly, tweak the number of states up or down and try again.
Good luck.

Cosine in floating point

I am trying to implement the cosine and sine functions in floating point (but I have no floating point hardware).
Since my processor has no floating-point hardware, nor instructions, I have already implemented algorithms for floating point multiplication, division, addition, subtraction, and square root. So those are the tools I have available to me to implement cosine and sine.
I was considering using the CORDIC method, at this site
However, I implemented division and square root using newton's method, so I was hoping to use the most efficient method.
Please don't tell me to just go look in a book or that "paper's exist", no kidding they exist. I am looking for names of well known algorithms that are known to be fast and efficient.
First off, depending on your accuracy requirements, this can be considerably fussier than your earlier questions.
Now that you've been warned: you'll first want to reduce the argument modulo pi/2 (or 2pi, or pi, or pi/4) to get the input into a manageable range. This is the subtle part. For a nice discussion of the issues involved, download a copy of K.C. Ng's ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit. (simple google search on the title will get you a pdf). It's very readable, and does a great job of describing why this is tricky.
After doing that, you only need to approximate the functions on a small range around zero, which is easily done via a polynomial approximation. A taylor series will work, though it is inefficient. A truncated chebyshev series is easy to compute and reasonably efficient; computing the minimax approximation is better still. This is the easy part.
I have implemented sine and cosine exactly as described, entirely in integer, in the past (sorry, no public sources). Using hand-tuned assembly, results in the neighborhood of 100 cycles are entirely reasonable on "typical" processors. I don't know what hardware you're dealing with (the performance will mostly be gated on how quickly your hardware can produce the high part of an integer multiply).
For various levels of precision, you can find some good approximations here:
http://www.ganssle.com/approx.htm
With the added advantage that they are deterministic in runtime unlike the various "converging series" options which can vary wildly depending on the input value. This matters if you are doing anything real-time (games, motion control etc.)
Since you have the basic arithmetic operations implemented, you may as well implement sine and cosine using their taylor series expansions.