OCR: weighted Levenshtein distance - ocr

I'm trying to create an optical character recognition system with the dictionary.
In fact I don't have an implemented dictionary yet=)
I've heard that there are simple metrics based on Levenstein distance which take in account different distance between different symbols. E.g. 'N' and 'H' are very close to each other and d("THEATRE", "TNEATRE") should be less than d("THEATRE", "TOEATRE") which is impossible using basic Levenstein distance.
Could you help me locating such metric, please.

This might be what you are looking for: http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance (and kindly some working code is included in the link)
Update:
http://nlp.stanford.edu/IR-book/html/htmledition/edit-distance-1.html

Here is an example (C#) where weight of "replace character" operation depends on distance between character codes:
static double WeightedLevenshtein(string b1, string b2) {
b1 = b1.ToUpper();
b2 = b2.ToUpper();
double[,] matrix = new double[b1.Length + 1, b2.Length + 1];
for (int i = 1; i <= b1.Length; i++) {
matrix[i, 0] = i;
}
for (int i = 1; i <= b2.Length; i++) {
matrix[0, i] = i;
}
for (int i = 1; i <= b1.Length; i++) {
for (int j = 1; j <= b2.Length; j++) {
double distance_replace = matrix[(i - 1), (j - 1)];
if (b1[i - 1] != b2[j - 1]) {
// Cost of replace
distance_replace += Math.Abs((float)(b1[i - 1]) - b2[j - 1]) / ('Z'-'A');
}
// Cost of remove = 1
double distance_remove = matrix[(i - 1), j] + 1;
// Cost of add = 1
double distance_add = matrix[i, (j - 1)] + 1;
matrix[i, j] = Math.Min(distance_replace,
Math.Min(distance_add, distance_remove));
}
}
return matrix[b1.Length, b2.Length] ;
}
You see how it works here: http://ideone.com/RblFK

A few years too late but the following python package (with which I am NOT affiliated) allows for arbitrary weighting of all the Levenshtein edit operations and ASCII character mappings etc.
https://github.com/infoscout/weighted-levenshtein
pip install weighted-levenshtein
Also this one (also not affiliated):
https://github.com/luozhouyang/python-string-similarity

I've recently created a python package that does exactly that https://github.com/zas97/ocr_weighted_levenshtein.
In my Weigthed-Levenshtein implementation the distance between "THEATRE" and "TNEATRE" is 1.3 while the distance between "THEATRE" and "TOEATRE" is 1.42.
Other exemples are the d("O", "0") is 0.06 and d("e","c") is 0.57.
This distances have been calculated by running multiple ocrs in a synthetic dataset and doing statistics on the most common ocr errors. I hope it helps someone =)

Related

Interpolate function in Google Scripts

I am using google sheets to log my water usage and want to find out how much I have used in a given period by using linear interpolation. I would really like to use a function similar to forecast, but instead of using the entire range to interpolate, just use the nearest points above and below.
I am keen to try and code it myself (have done lots of VBA) but don't know really where to start with google scripts. Does anyone have a starting point for me?
The process I would take is:
Interpolate(x, data_y, data_x)
// Check value is within range of known values (could expand function to use closet two values and extrapolate...)
(is X within XMin and XMax)
// Find closet X value below (X1), corresponding Y1
// Find closet X value above (X2), corresponding Y2
Return Y = Y1+(X-X1)*((Y2-Y1)/(X2-X1))
function interpolation(x_range, y_range, x_value) {
var xValue, yValue, xDiff, yDiff, xInt, index, check = 0;
if(x_value > Math.max.apply(Math, x_range) || x_value < Math.min.apply(Math, x_range)) {
throw "value can't be interpolated !!";
return;
}
for(var i = 0, iLen = x_range.length; i < iLen-1; i++) {
if((x_range[i][0] <= x_value && x_range[i+1][0]> x_value) || (x_range[i][0] >= x_value && x_range[i+1][0] < x_value)){
yValue = y_range[i][0];
xDiff = x_range[i+1][0] - x_range[i][0];
yDiff = y_range[i+1][0] - yValue;
xInt = x_value - x_range[i][0];
return (xInt * (yDiff / xDiff)) + yValue;
}
}
return y_range[x_range.length-1][0];
}

Tackling imbalanced class members in Caffe: weight contribution of each instance to loss value

I have a highly imbalanced data, I know that some users suggesting using InfoGainLoss loss function, however, I am facing few errors when I tried to add this function to Caffe layers.
I have the following questions, I really appreciate if someone guides me:
How can I add this layer to Caffe? Does anyone know any sources/ codes of this layer?
I want to apply it for image segmentation and the proportion of some classes varies. How can I create the H matrix (a stack of weights) for my images? And how infoGainLoss layer can read a specific weight matrix (H) related to that specific image?
After adding the cpp and cu version of InforGainLoss layer to caffe, should I remake Caffe?
I am sorry for few question, but all are my concern and related to each other. I will be thankful to get some help and support.
Thanks
1.If you copy from current infogain_loss_layer.cpp you can easily adapt. For forward pass change line 59-66 like:
// assuming num = batch size, dim = label size, image_dim = image height * width
Dtype loss = 0;
for (int i = 0; i < num; ++i) {
for(int k = 0; k < image_dim; k++) {
int label = static_cast<int>(bottom_label[i*image_dim+k]);
for (int j = 0; j < dim; ++j) {
Dtype prob = std::max(bottom_data[i *image_dim *dim+ k * dim + j], Dtype(kLOG_THRESHOLD));
loss -= infogain_mat[label * dim + j] * log(prob);
}
}
}
Similarly for backward pass you could change line 95-101 like:
for (int i = 0; i < num; ++i) {
for(int k = 0; k < image_dim; k++) {
const int label = static_cast<int>(bottom_label[i*image_dim+k]);
for (int j = 0; j < dim; ++j) {
Dtype prob = std::max(bottom_data[i *image_dim *dim+ k * dim + j], Dtype(kLOG_THRESHOLD));
bottom_diff[i *image_dim *dim+ k * dim + j] = scale * infogain_mat[label * dim + j] / prob;
}
}
}
This is kind of naive. I don't seem to find any option for optimization. You will also need to change some setup code in reshape.
2.In this PR suggestion is that for diagonal entries in H put min_count/|i| where |i| is the number of samples has label i. Everything else as 0. Also see this . As for loading the weight matrix H is fixed for all input. You can load it as lmdb file or in other ways.
3.Yes you will need to rebuild.
Update:
As Shai pointed out the infogain pull for this has already been approved this week. So current version of caffe supports pixelwise infogain loss.

Need help understanding the Caffe code for SigmoidCrossEntropyLossLayer for multi-label loss

I need help in understanding the Caffe function, SigmoidCrossEntropyLossLayer, which is the cross-entropy error with logistic activation.
Basically, the cross-entropy error for a single example with N independent targets is denoted as:
- sum-over-N( t[i] * log(x[i]) + (1 - t[i]) * log(1 - x[i] )
where t is the target, 0 or 1, and x is the output, indexed by i. x, of course goes through a logistic activation.
An algebraic trick for quicker cross-entropy calculation reduces the computation to:
-t[i] * x[i] + log(1 + exp(x[i]))
and you can verify that from Section 3 here.
The question is, how is the above translated to the loss calculating code below:
loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
Thank you.
The function is reproduced below for convenience.
template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
// The forward pass computes the sigmoid outputs.
sigmoid_bottom_vec_[0] = bottom[0];
sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_);
// Compute the loss (negative log likelihood)
// Stable version of loss computation from input data
const Dtype* input_data = bottom[0]->cpu_data();
const Dtype* target = bottom[1]->cpu_data();
int valid_count = 0;
Dtype loss = 0;
for (int i = 0; i < bottom[0]->count(); ++i) {
const int target_value = static_cast<int>(target[i]);
if (has_ignore_label_ && target_value == ignore_label_) {
continue;
}
loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
++valid_count;
}
normalizer_ = get_normalizer(normalization_, valid_count);
top[0]->mutable_cpu_data()[0] = loss / normalizer_;
}
In the expression log(1 + exp(x[i])) you might encounter numerical instability in case x[i] is very large. To overcome this numerical instability, one scales the sigmoid function like this:
sig(x) = exp(x)/(1+exp(x))
= [exp(x)*exp(-x(x>=0))]/[(1+exp(x))*exp(-x(x>=0))]
Now, if you plug the new and stable expression for sig(x) into the loss you'll end up with the same expression as caffe is using.
Enjoy!

Number type and bitwise operations

I want to pack epoch milliseconds into 6 bytes but i have problem. Let me introduce it:
trace(t);
for (var i:int = 6; i > 0; i--) {
dataBuffer.writeByte(((t >>> 8*(i-1)) & 255));
trace(dataBuffer[dataBuffer.length - 1]);
}
Output:
1330454496254
131
254
197
68
131
254
What i'm doing wrong?
I'm just guessing but I think your t variable is getting automatically converted to an int before the bit operation takes effect. This, of course, destroys the value.
I don't think it's possible to use Number in bit operations - AS3 only supports those with int-s.
Depending on how you acquire the value in t, you may want to start with 2 int-s and then extrat the bytes from those.
The Number type is an IEEE 754 64-bit double-precision number, which is quite a different format to your normal int. The bits aren't lined up quite the same way. What you're looking for is a ByteArray representation of a normal 64-bit int type, which of course doesn't exist in ActionScript 3.
Here's a function that converts a Number object into its "int64" equivalent:
private function numberToInt64Bytes(n:Number):ByteArray
{
// Write your IEEE 754 64-bit double-precision number to a byte array.
var b:ByteArray = new ByteArray();
b.writeDouble(n);
// Get the exponent.
var e:int = ((b[0] & 0x7F) << 4) | (b[1] >> 4);
// Significant bits.
var s:int = e - 1023;
// Number of bits to shift towards the right.
var x:int = (52 - s) % 8;
// Read and write positions in the byte array.
var r:int = 8 - int((52 - s) / 8);
var w:int = 8;
// Clear the first two bytes of the sign bit and the exponent.
b[0] &= 0x80;
b[1] &= 0xF;
// Add the "hidden" fraction bit.
b[1] |= 0x10;
// Shift everything.
while (w > 1) {
if (--r > 0) {
if (w < 8)
b[w] |= b[r] << (8 - x);
b[--w] = b[r] >> x;
} else {
b[--w] = 0;
}
}
// Now you've got your 64-bit signed two's complement integer.
return b;
}
Note that it works only with integers within a certain range, and it doesn't handle values like "not a number" and infinity. It probably also fails in other cases.
Here's a usage example:
var n:Number = 1330454496254;
var bytes:ByteArray = numberToInt64Bytes(n);
trace("bytes:",
bytes[0].toString(16),
bytes[1].toString(16),
bytes[2].toString(16),
bytes[3].toString(16),
bytes[4].toString(16),
bytes[5].toString(16),
bytes[6].toString(16),
bytes[7].toString(16)
);
Output:
bytes: 0 0 1 35 c5 44 83 fe
It should be useful for serializing data in AS3 later to be read by a Java program.
Homework assignment: Write int64BytesToNumber()

1D multiple peak detection?

I am currently trying to implement basic speech recognition in AS3. I need this to be completely client side, as such I can't access powerful server-side speech recognition tools. The idea I had was to detect syllables in a word, and use that to determine the word spoken. I am aware that this will grealty limit the capacities for recognition, but I only need to recognize a few key words and I can make sure they all have a different number of syllables.
I am currently able to generate a 1D array of voice level for a spoken word, and I can clearly see, if I somehow draw it, that there are distinct peaks for the syllables in most of the cases. However, I am completely stuck as to how I would find out those peaks. I only really need the count, but I suppose that comes with finding them. At first I thought of grabbing a few maximum values and comparing them with the average of values but I had forgot about that peak that is bigger than the others and as such, all my "peaks" were located on one actual peak.
I stumbled onto some Matlab code that looks almost too short to be true, but I can't very that as I am unable to convert it to any language I know. I tried AS3 and C#. So I am wondering if you guys could start me on the right path or had any pseudo-code for peak detection?
The matlab code is pretty straightforward. I'll try to translate it to something more pseudocodeish.
It should be easy to translate to ActionScript/C#, you should try this and post follow-up questions with your code if you get stuck, this way you'll have the best learning effect.
Param: delta (defines kind of a tolerance and depends on your data, try out different values)
min = Inf (or some very high value)
max = -Inf (or some very low value)
lookformax = 1
for every datapoint d [0..maxdata] in array arr do
this = arr[d]
if this > max
max = this
maxpos = d
endif
if this < min
min = this
minpos = d
endif
if lookformax == 1
if this < max-delta
there's a maximum at position maxpos
min = this
minpos = d
lookformax = 0
endif
else
if this > min+delta
there's a minimum at position minpos
max = this
maxpos = d
lookformax = 1
endif
endif
Finding peaks and valleys of a curve is all about looking at the slope of the line. At such a location the slope is 0. As i am guessing a voice curve is very irregular, it must first be smoothed, until only significant peaks exist.
So as i see it the curve should be taken as a set of points. Groups of points should be averaged to produce a simple smooth curve. Then the difference of each point should be compared, and points not very different from each other found and those areas identified as a peak, valleys or plateau.
If anyone wants the final code in AS3, here it is:
function detectPeaks(values:Array, tolerance:int):void
{
var min:int = int.MIN_VALUE;
var max:int = int.MAX_VALUE;
var lookformax:int = 1;
var maxpos:int = 0;
var minpos:int = 0;
for(var i:int = 0; i < values.length; i++)
{
var v:int = values[i];
if (v > max)
{
max = v;
maxpos = i;
}
if (v < min)
{
min = v;
minpos = i;
}
if (lookformax == 1)
{
if (v < max - tolerance)
{
canvas.graphics.beginFill(0x00FF00);
canvas.graphics.drawCircle(maxpos % stage.stageWidth, (1 - (values[maxpos] / 100)) * stage.stageHeight, 5);
canvas.graphics.endFill();
min = v;
minpos = i;
lookformax = 0;
}
}
else
{
if (v > min + tolerance)
{
canvas.graphics.beginFill(0xFF0000);
canvas.graphics.drawCircle(minpos % stage.stageWidth, (1 - (values[minpos] / 100)) * stage.stageHeight, 5);
canvas.graphics.endFill();
max = v;
maxpos = i;
lookformax = 1;
}
}
}
}