Can I configure Tesseract to recognize texts from image with only specified length?

Can I configure Tesseract to recognize texts from image with only specified length? - ocr

I am working on some OCR experiments where I would like to improve the quality of Tesseract output. Basically the test subject is things like CAPTCHA, random characters on an obfuscated image. Now Tesseract isn't doing a very good job. Partially because sometimes it identifies certain character as several characters/digits separately.
I am wondering if telling Tesseract that, my specific image should always contain a text of length, say six, could improve the OCR recognition result a bit. But I am not sure if this is even supported in Tesseract.
I didn't find documentation on that point. Could someone help point out if such feature exists, and if does, what configuration parameter I can set. Thanks!

Try this example for specifying length of the text. Please set value in for loop, which length you need to recognise text.
Consider following code:
Pix *image = pixRead("/usr/src/tesseract-3.02/phototest.tif");
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
api->Init(NULL, "eng");
api->SetImage(image);
Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true, NULL, NULL);
printf("Found %d textline image components.\n", boxes->n);
for (int i = 0; i < boxes->n; i++) {
BOX* box = boxaGetBox(boxes, i, L_CLONE);
api->SetRectangle(box->x, box->y, box->w, box->h);
char* ocrResult = api->GetUTF8Text();
int conf = api->MeanTextConf();
fprintf(stdout, "Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: %s",
i, box->x, box->y, box->w, box->h, conf, ocrResult);
}
In for (int i = 0; i < boxes->n; i++), replace boxes->n by 20 if you want specified length of 20.

Related

Can Tesseract OCR recognize subscripts and superscripts?

I have problems with the general recognition of subscript and superscript in text fragments.
Example-image:
I used Tesseract 4.1.1 with the training data available under https://github.com/tesseract-ocr/tessdata_best. The numerous options had default values except:
tessedit_create_hocr = 1 (to get result as HOCR)
hocr_font_info = 1 (to get additional font infos like font size)
hocr_char_boxes = 1 (to get character-based result)
The language was set to eng. Neither with page segmentation mode 3 (PSM_AUTO_OSD) nor 11 (PSM_SPARSE_TEXT) nor 12 (PSM_SPARSE_TEXT_OSD) the subscript/superscript was recognized correctly.
In the output the sub/sup-fragments were all more or less wrong:
"SubtextSub" is recognized as "Subtextsu,"
"SuptextSub" is recognized as "Suptexts?"
"P0" is recognized as "Po"
"P100" is recognized as "P1go"
"a2+b2" is recognized as "a+b?"
Using Tesseract for OCR is there a way to ...?
optimize subscript/superscript handling
get infos about recognized subscript/superscript (in the hocr-output - ideally for each character)

Working on the quality of the image as suggested in other questions/answers to this topic didn't really change anything.
Following these 2 links from the tesseract-google-newsgroup at first it really seemed to be a question of training:
link1 and link2.
But after doing some experiments I found out, that the used OEM_DEFAULT-OCR engine mode just doesn't bring up the needed information. I found a partial solution to the problem. Partial, because I now get most infos about sub/sup and also the recognized characters are right in most cases, but not for all characters.
Using the OEM_TESSERACT_ONLY-OCR engine mode (=the legacy mode) and some API methods provided by Tess4J I came up with the following java test class:
public class SubSupEvaluator {
public void determineSubSupCharacters(BufferedImage image) {
//1. initialize Tesseract and set image infos
TessBaseAPI handle = TessAPI1.TessBaseAPICreate();
try {
int bpp = image.getColorModel().getPixelSize();
int bytespp = bpp / 8;
int bytespl = (int) Math.ceil(image.getWidth() * bpp / 8.0);
TessBaseAPIInit2(handle, new File("./tessdata/").getAbsolutePath(), "eng", TessOcrEngineMode.OEM_TESSERACT_ONLY);
TessBaseAPISetPageSegMode(handle, TessPageSegMode.PSM_AUTO_OSD);
TessBaseAPISetImage(handle, ImageIOHelper.convertImageData(image), image.getWidth(), image.getHeight(), bytespp, bytespl);
//2. start actual OCR run
TessBaseAPIRecognize(handle, null);
//3. iterate over the result character-wise
TessResultIterator ri = TessBaseAPIGetIterator(handle);
TessPageIterator pi = TessResultIteratorGetPageIterator(ri);
TessPageIteratorBegin(pi);
do {
//determine character
Pointer ptr = TessResultIteratorGetUTF8Text(ri, TessPageIteratorLevel.RIL_SYMBOL);
String character = ptr.getString(0);
TessDeleteText(ptr); //release memory
//determine position information
IntBuffer leftB = IntBuffer.allocate(1);
IntBuffer topB = IntBuffer.allocate(1);
IntBuffer rightB = IntBuffer.allocate(1);
IntBuffer bottomB = IntBuffer.allocate(1);
TessPageIteratorBoundingBox(pi, TessPageIteratorLevel.RIL_SYMBOL, leftB, topB, rightB, bottomB);
//write info to console
System.out.println(String.format("%s - position [%d %d %d %d], subscript: %b, superscript: %b", character, leftB.get(), topB.get(),
rightB.get(), bottomB.get(), TessAPI1.TessResultIteratorSymbolIsSubscript(ri) == TessAPI1.TRUE,
TessAPI1.TessResultIteratorSymbolIsSuperscript(ri) == TessAPI1.TRUE));
} while (TessPageIteratorNext(pi, TessPageIteratorLevel.RIL_SYMBOL) == TessAPI1.TRUE);
} finally {
TessBaseAPIDelete(handle); //release memory
}
}
}
The legacy mode only works with 'normal' training data. Using the '-best' training data is bringing an error.

There is very little information on this topic.
One option to enhance sub/superscript character recognition (even if not the position itself) is by preprocessing the image, with cv2 / pil (also pillow) e.g., and then tesseract it.
See
How to detect subscript numbers in an image using OCR?
Related (but otherwise not answering the question):
https://www.mail-archive.com/tesseract-ocr#googlegroups.com/msg19434.html
https://github.com/tesseract-ocr/tesseract/blob/master/src/ccmain/superscript.cpp

what do you guys think about getting tesseract to recognize single letters?
Tesseract does not recognize single characters
I tried it with the option --psm 10
tesseract imTstg.png out5 --psm 10
but it did not seem to work. I am thinking about just running yolo to detect the single letters.

Why do I get odd 0,0 point in Octave trisurf

I am trying to draw a surface from a file on disk (shown below). But I get an odd additional point at co-ords (0,0).
The file appears to be in correct shape to me.
I draw the chart from my C# application with a call to Octave .Net. Here is the Octave part of the script:
figure (1,'name','Map');
colormap('hot');
t = dlmread('C:\Map3D.csv');
# idx = find(t(:,4) == 4.0);t2 = t(idx,:);
tx =t(:,1);ty=t(:,2);tz=t(:,3);
tri = delaunay(tx,ty);
handle = trisurf(tri,tx,ty,tz);xlabel('Floor');ylabel('HurdleF');zlabel('Sharpe');
waitfor(handle);
This script is called from my C# app, with the following , very simple, code snippet:
using (var octave = new OctaveContext())
{
octave.Execute(script, int.MaxValue);
}
Can anyone explain if my Octave script is wrong, or the way I have structured the file.
Floor,HurdleF,Sharpe,Model
1.40000000,15.00000000,-0.44,xxx1.40_Hrd_15.00
1.40000000,14.00000000,-0.49,xxx1.40_Hrd_14.00
1.40000000,13.00000000,-0.19,xxx1.40_Hrd_13.00
1.40000000,12.00000000,-0.41,xxx1.40_Hrd_12.00
1.40000000,11.00000000,0.42,xxx1.40_Hrd_11.00
1.40000000,10.00000000,0.17,xxx1.40_Hrd_10.00
1.40000000,9.00000000,0.28,xxx1.40_Hrd_9.00
1.40000000,8.00000000,0.49,xxx1.40_Hrd_8.00
1.40000000,7.00000000,0.45,xxx1.40_Hrd_7.00
1.40000000,6.00000000,0.79,xxx1.40_Hrd_6.00
1.40000000,5.00000000,0.56,xxx1.40_Hrd_5.00
1.40000000,4.00000000,1.76,xxx1.40_Hrd_4.00
1.30000000,15.00000000,-0.46,xxx1.30_Hrd_15.00
1.30000000,14.00000000,-0.55,xxx1.30_Hrd_14.00
1.30000000,13.00000000,-0.24,xxx1.30_Hrd_13.00
1.30000000,12.00000000,0.35,xxx1.30_Hrd_12.00
1.30000000,11.00000000,0.08,xxx1.30_Hrd_11.00
1.30000000,10.00000000,0.63,xxx1.30_Hrd_10.00
1.30000000,9.00000000,0.83,xxx1.30_Hrd_9.00
1.30000000,8.00000000,0.21,xxx1.30_Hrd_8.00
1.30000000,7.00000000,0.55,xxx1.30_Hrd_7.00
1.30000000,6.00000000,0.63,xxx1.30_Hrd_6.00
1.30000000,5.00000000,0.93,xxx1.30_Hrd_5.00
1.30000000,4.00000000,2.50,xxx1.30_Hrd_4.00
1.20000000,15.00000000,-0.40,xxx1.20_Hrd_15.00
1.20000000,14.00000000,-0.69,xxx1.20_Hrd_14.00
1.20000000,13.00000000,0.23,xxx1.20_Hrd_13.00
1.20000000,12.00000000,0.56,xxx1.20_Hrd_12.00
1.20000000,11.00000000,0.22,xxx1.20_Hrd_11.00
1.20000000,10.00000000,0.56,xxx1.20_Hrd_10.00
1.20000000,9.00000000,0.79,xxx1.20_Hrd_9.00
1.20000000,8.00000000,0.20,xxx1.20_Hrd_8.00
1.20000000,7.00000000,1.09,xxx1.20_Hrd_7.00
1.20000000,6.00000000,0.99,xxx1.20_Hrd_6.00
1.20000000,5.00000000,1.66,xxx1.20_Hrd_5.00
1.20000000,4.00000000,2.23,xxx1.20_Hrd_4.00
1.10000000,15.00000000,-0.31,xxx1.10_Hrd_15.00
1.10000000,14.00000000,-0.18,xxx1.10_Hrd_14.00
1.10000000,13.00000000,0.24,xxx1.10_Hrd_13.00
1.10000000,12.00000000,0.70,xxx1.10_Hrd_12.00
1.10000000,11.00000000,0.31,xxx1.10_Hrd_11.00
1.10000000,10.00000000,0.76,xxx1.10_Hrd_10.00
1.10000000,9.00000000,1.24,xxx1.10_Hrd_9.00
1.10000000,8.00000000,0.94,xxx1.10_Hrd_8.00
1.10000000,7.00000000,1.09,xxx1.10_Hrd_7.00
1.10000000,6.00000000,1.53,xxx1.10_Hrd_6.00
1.10000000,5.00000000,2.41,xxx1.10_Hrd_5.00
1.10000000,4.00000000,2.16,xxx1.10_Hrd_4.00
1.00000000,15.00000000,-0.41,xxx1.00_Hrd_15.00
1.00000000,14.00000000,-0.24,xxx1.00_Hrd_14.00
1.00000000,13.00000000,0.33,xxx1.00_Hrd_13.00
1.00000000,12.00000000,0.18,xxx1.00_Hrd_12.00
1.00000000,11.00000000,0.61,xxx1.00_Hrd_11.00
1.00000000,10.00000000,0.96,xxx1.00_Hrd_10.00
1.00000000,9.00000000,1.75,xxx1.00_Hrd_9.00
1.00000000,8.00000000,0.74,xxx1.00_Hrd_8.00
1.00000000,7.00000000,1.63,xxx1.00_Hrd_7.00
1.00000000,6.00000000,2.12,xxx1.00_Hrd_6.00
1.00000000,5.00000000,2.73,xxx1.00_Hrd_5.00
1.00000000,4.00000000,2.03,xxx1.00_Hrd_4.00
0.90000000,15.00000000,-0.42,xxx0.90_Hrd_15.00
0.90000000,14.00000000,-0.37,xxx0.90_Hrd_14.00
0.90000000,13.00000000,0.58,xxx0.90_Hrd_13.00
0.90000000,12.00000000,0.03,xxx0.90_Hrd_12.00
0.90000000,11.00000000,0.68,xxx0.90_Hrd_11.00
0.90000000,10.00000000,0.79,xxx0.90_Hrd_10.00
0.90000000,9.00000000,1.54,xxx0.90_Hrd_9.00
0.90000000,8.00000000,0.82,xxx0.90_Hrd_8.00
0.90000000,7.00000000,1.81,xxx0.90_Hrd_7.00
0.90000000,6.00000000,2.33,xxx0.90_Hrd_6.00
0.90000000,5.00000000,2.99,xxx0.90_Hrd_5.00
0.90000000,4.00000000,1.71,xxx0.90_Hrd_4.00
0.80000000,15.00000000,-0.46,xxx0.80_Hrd_15.00
0.80000000,14.00000000,-0.26,xxx0.80_Hrd_14.00
0.80000000,13.00000000,0.55,xxx0.80_Hrd_13.00
0.80000000,12.00000000,0.07,xxx0.80_Hrd_12.00
0.80000000,11.00000000,0.65,xxx0.80_Hrd_11.00
0.80000000,10.00000000,1.08,xxx0.80_Hrd_10.00
0.80000000,9.00000000,1.27,xxx0.80_Hrd_9.00
0.80000000,8.00000000,1.12,xxx0.80_Hrd_8.00
0.80000000,7.00000000,1.98,xxx0.80_Hrd_7.00
0.80000000,6.00000000,2.62,xxx0.80_Hrd_6.00
0.80000000,5.00000000,3.35,xxx0.80_Hrd_5.00
0.80000000,4.00000000,1.27,xxx0.80_Hrd_4.00
0.70000000,15.00000000,-0.56,xxx0.70_Hrd_15.00
0.70000000,14.00000000,-0.33,xxx0.70_Hrd_14.00
0.70000000,13.00000000,0.24,xxx0.70_Hrd_13.00
0.70000000,12.00000000,-0.22,xxx0.70_Hrd_12.00
0.70000000,11.00000000,0.74,xxx0.70_Hrd_11.00
0.70000000,10.00000000,1.19,xxx0.70_Hrd_10.00
0.70000000,9.00000000,1.24,xxx0.70_Hrd_9.00
0.70000000,8.00000000,1.14,xxx0.70_Hrd_8.00
0.70000000,7.00000000,2.26,xxx0.70_Hrd_7.00
0.70000000,6.00000000,2.70,xxx0.70_Hrd_6.00
0.70000000,5.00000000,3.52,xxx0.70_Hrd_5.00
0.70000000,4.00000000,1.05,xxx0.70_Hrd_4.00
0.60000000,15.00000000,-0.50,xxx0.60_Hrd_15.00
0.60000000,14.00000000,-0.60,xxx0.60_Hrd_14.00
0.60000000,13.00000000,0.11,xxx0.60_Hrd_13.00
0.60000000,12.00000000,-0.16,xxx0.60_Hrd_12.00
0.60000000,11.00000000,0.73,xxx0.60_Hrd_11.00
0.60000000,10.00000000,1.08,xxx0.60_Hrd_10.00
0.60000000,9.00000000,1.31,xxx0.60_Hrd_9.00
0.60000000,8.00000000,1.38,xxx0.60_Hrd_8.00
0.60000000,7.00000000,2.24,xxx0.60_Hrd_7.00
0.60000000,6.00000000,2.89,xxx0.60_Hrd_6.00
0.60000000,5.00000000,3.50,xxx0.60_Hrd_5.00
0.60000000,4.00000000,1.11,xxx0.60_Hrd_4.00
0.50000000,15.00000000,-0.40,xxx0.50_Hrd_15.00
0.50000000,14.00000000,-0.37,xxx0.50_Hrd_14.00
0.50000000,13.00000000,0.13,xxx0.50_Hrd_13.00
0.50000000,12.00000000,-0.11,xxx0.50_Hrd_12.00
0.50000000,11.00000000,0.61,xxx0.50_Hrd_11.00
0.50000000,10.00000000,0.92,xxx0.50_Hrd_10.00
0.50000000,9.00000000,1.41,xxx0.50_Hrd_9.00
0.50000000,8.00000000,1.39,xxx0.50_Hrd_8.00
0.50000000,7.00000000,2.19,xxx0.50_Hrd_7.00
0.50000000,6.00000000,2.80,xxx0.50_Hrd_6.00
0.50000000,5.00000000,3.41,xxx0.50_Hrd_5.00
0.50000000,4.00000000,1.05,xxx0.50_Hrd_4.00
0.40000000,15.00000000,-0.25,xxx0.40_Hrd_15.00
0.40000000,14.00000000,-0.44,xxx0.40_Hrd_14.00
0.40000000,13.00000000,0.02,xxx0.40_Hrd_13.00
0.40000000,12.00000000,0.00,xxx0.40_Hrd_12.00
0.40000000,11.00000000,0.69,xxx0.40_Hrd_11.00
0.40000000,10.00000000,0.67,xxx0.40_Hrd_10.00
0.40000000,9.00000000,1.02,xxx0.40_Hrd_9.00
0.40000000,8.00000000,1.29,xxx0.40_Hrd_8.00
0.40000000,7.00000000,2.17,xxx0.40_Hrd_7.00
0.40000000,6.00000000,2.88,xxx0.40_Hrd_6.00
0.40000000,5.00000000,3.19,xxx0.40_Hrd_5.00
0.40000000,4.00000000,0.98,xxx0.40_Hrd_4.00
0.30000000,15.00000000,-0.02,xxx0.30_Hrd_15.00
0.30000000,14.00000000,-0.36,xxx0.30_Hrd_14.00
0.30000000,13.00000000,-0.26,xxx0.30_Hrd_13.00
0.30000000,12.00000000,-0.11,xxx0.30_Hrd_12.00
0.30000000,11.00000000,0.50,xxx0.30_Hrd_11.00
0.30000000,10.00000000,0.50,xxx0.30_Hrd_10.00
0.30000000,9.00000000,1.01,xxx0.30_Hrd_9.00
0.30000000,8.00000000,1.28,xxx0.30_Hrd_8.00
0.30000000,7.00000000,2.11,xxx0.30_Hrd_7.00
0.30000000,6.00000000,2.89,xxx0.30_Hrd_6.00
0.30000000,5.00000000,3.16,xxx0.30_Hrd_5.00
0.30000000,4.00000000,0.95,xxx0.30_Hrd_4.00

What's happening
dlmread() is reading the file in as numeric data and returning a numeric matrix. It doesn't recognize the text in your header line, so it silently converts that row to all zeros. (IMHO this is a design flaw in dlmread.) Remove the header line.
How to debug this
So, you've got some zeros in your plot that you didn't expect to be there? Check for zeros in your input data:
ixZerosX = find(tx == 0)
ixZerosY = find(ty == 0)
ixZerosZ = find(tz == 0)
The semicolons are omitted intentionally there to get Octave to automatically display the results.
Better yet, since doubles are an approximate type, and the values might be close to but not actually zero, do a "near zero" search:
threshold = 0.1;
ixZerosX = find(abs(tx) < threshold)
ixZerosY = find(abs(ty) < threshold)
ixZerosZ = find(abs(tz) < threshold)

RangeError: Error #2006: The supplied index is out of bounds AS3

I've loved using this site for little tips on code, and I try to solve all the errors I can by myself. However, this one has had me stumped for days. I just can't crack it.
RangeError: Error #2006: The supplied index is out of bounds.
at flash.text::TextField/setTextFormat()
at BibleProgram_fla::MainTimeline/checkAgainstBible()
at BibleProgram_fla::MainTimeline/compileInputString()
at BibleProgram_fla::MainTimeline/spaceBuild()
function spaceBuild(event:Event):void //This program runs every frame
{
compileInputString();
}
function compileInputString():void
{
inputVerse = inputText.text; // takes text from the input field
inputVerse = inputVerse.toLowerCase();
inputVerse = inputVerse.replace(rexWhiteSpace, ""); //Removes spaces and line breaks
inputVerse = inputVerse.replace(rexPunc, ""); // Removes punctuation
inputVerse = addSpaces(inputVerse); //adds spaces back in to match the BibleVerse
inputVerse = addCaps(inputVerse); //adds capitalization to match the BibleVerse
checkAgainstBible();
}
function checkAgainstBible()
{
outputText.text = inputVerse; // sets output text to be formatted to show which letters are wrong
for(var n:Number = 0; n < inputText.length; n++)
{
var specLetter:String = inputVerse.charAt(n);
if(specLetter != bibleVerse.charAt(n))
{
outputText.setTextFormat(red, n); // sets all of the wrong letters to red
}
}
}
Whenever I run the program and type a string longer than the BibleVerse, it returns the error, but I cannot figure out how to fix it.
I hope I provided enough information for you to help me. If you need more code or something, please ask!
Thanks in advance!!

Well, you would get that error if n is greater than the number of characters in the outputText when it sets its format color to red, and it looks like your outputText's characters are extended or shortened when you make it equal to your inputVerse because inputVerse had all the regex operations that i can't seen done to it. So most likely these operations are shortening the characters and so outputText.text is shorter than it should be and when it loops over the inputText.length, when it gets to the end of the outputText, n goes past its character length and so you get that error (That is what the error is - you are attempting to access something that is not there). So the way i see it is (using example made up strings);
// Pseudo code...
inputVerse=inputText.text; // (lets say its "Thee ")
// inputVerse and inputText.text now both have 5 characters
inputVerse=lotsOfOperations(inputVerse);
// inputVerse now only has 4 characters (got rid of the " " at the end)
outputText.text=inputVerse;
// outputText.text now has the new 4 character
for(var n:Number = 0; n < inputText.length; n++)
// loops through inputText.length (so it loops 5 times)
outputText.setTextFormat(red, n);
// if n=4 (since n starts at 0) from the inputText.length, then when it access
//outputText.setTextFormat(red,n) it is accessing a character of outputText.text
//that is at the end and not there. outputText.text is too short for the loop.
So, your problem is that your operations to inputVerse are making it too short to compare to the other strings, I don't know your other code so I can't say whats wrong, but this is why you are getting the error. Please comment if you have any questions or to notify me of what I am missing.

Character confidence for Tesseract 3.02 using config file

How would I get the % confidence per character detected?
By searching around I found that you should set save_blob_choices to T.
So I added that to as a line in the hocr config file in tessdata/configs and called tesseract with it.
This is all I'm getting in the generated html file:
<span class='ocr_line' id='line_1' title="bbox 0 0 50 17"><span class='ocrx_word' id='word_1' title="bbox 3 2 45 15"><strong>31,835</strong></span>
As you can see there isn't any confidence annotations not even per word.
I don't have visual studio so I'm not able to make any code changes. But I'm also open to answers describing code changes as well as how I would compile the code without VS.

Here is the sample code of getting confidence of each word.
You can even replace RIL_WORD with RIL_SYMBOL to get confidence of each character.
mTess.Recognize(0);
tesseract::ResultIterator* ri = mTess.GetIterator();
if(ri != 0)
{
do
{
const char* word = ri->GetUTF8Text(tesseract::RIL_WORD);
if(word != 0 )
{
float conf = ri->Confidence(tesseract::RIL_WORD);
printf(" word:%s, confidence: %f", word, conf );
}
delete[] word;
} while((ri->Next(tesseract::RIL_WORD)));
delete ri;
}

You will have to write a program to do this. Take a look at the ResultIterator API example at Tesseract site. For your case, be sure to set save_blob_choices variable and iterate at RIL_SYMBOL level.

Data from TLF TextLine

i have some problem with utilizing TLF, i need to parse through the text and get x and y for each character inside the textfield. This is what i have so far...
Getting every TextLine from the TextFlow:
if (textflow.flowComposer) {
for (var i:int = 0; i < textflow.flowComposer.numLines; i++) {
var flowLine:TextFlowLine = textflow.flowComposer.findLineAtPosition(i);
var textLine:TextLine = flowLine.getTextLine(true);
}
}
Getting every "atom" for the TextLine:
var charPosition:int = textLine.textBlockBeginIndex;
while (charPosition < textLine.textBlockBeginIndex + textLine.rawTextLength) {
var atomIndex:int = textLine.getAtomIndexAtCharIndex(charPosition);
textLine.getAtomBounds(atomIndex);
charPosition = textLine.getAtomTextBlockEndIndex(atomIndex);
}
This works for getting the bounding for each character but i still need some more data like what character is it and what font-size, font does it have? When doing a textLine.dump(); i think im getting this data but not the character, i get something called gid witch seems to point to the character in use but i don't know how to get exactly what character that is. Anny ideas?

Solved my problem with the help of Jin-Huang over at the adobe forum for TLF. I haven't tried it to full extent yet, but seems to work for now.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Can I configure Tesseract to recognize texts from image with only specified length? - ocr

Related

Can Tesseract OCR recognize subscripts and superscripts?

Why do I get odd 0,0 point in Octave trisurf

RangeError: Error #2006: The supplied index is out of bounds AS3

Character confidence for Tesseract 3.02 using config file

Data from TLF TextLine

Categories

Resources