Tesseract: Specifying regions of text

Tesseract: Specifying regions of text - ocr

I'm using tesseract-ocr-3.01 to scan many forms. The forms all follow a template, so I already know where the regions/rectangles of text are.
Is there a way to pass those regions to tesseract when using the command-line tool?

I found the answer, thanks to this thread.
It seems that tesseract suports the uzn format (used in the unvl tests).
From the thread:
Calling tesseract with parameter "-psm 4" and renaming the uzn file
with the same name of the image seem works.
Example: If we have C:\input.tif and C:\input.uzn, we do this:
tesseract -psm 4 C:\input.tif C:\output

This may not be an optimal answer, but here goes:
I'm not sure whether the command-line tool has options to specify text-regions.
What you can do is use a Tesseract wrapper on another platform (EmguCV has Tesseract built-in). So you get the the scanned image, crop out the text-regions, and give them to Tesseract one-at-a-time. This way you'll also avoid any inaccuracies in Tesseract's page-layout analysis.
eg.
Image<Gray,Byte> scannedImage = new Image<Gray,Byte>(path_to_scanned_image);
//assuming you know a text region
Image<Gray,Byte> textRegion = new Image(100,20);
scannedImage.ROI = new Rectangle(0,0,100,20);
scannedImage.copyTo(textRegion);
ocr.recognize(textRegion);

Related

Convert custom tiny-YOLOv3 to a tensorflow format

After having tried all solutions I have found on every github, I couldn't find a way to convert a customly trained YOLOv3 from darknet to a tensorflow format (keras, tensorflow, tflite)
By custom I mean:
I changed the number of class to 1
I set the image size to 576x576
I set the number of channels to 1 (grayscale images)
So far I am happy with the results on darknet, but for my application I need TFlite and I can't find working method for conversion that suits my case.
Anyone have succeed in doing something similar?
Thank you.

Do you have the resulting .weights file for your custom model?
If so, the following project by peace195 may help:
https://github.com/peace195/tensorflow-lite-YOLOv3
EDIT:
In the above link, use convert_weights_pb.py file to convert your .weights file to a .pb file.
Then use the .pb file as a saved model and convert it to a .tflite model using the following command.
tflite_convert --saved_model_dir=saved_model/ --output_file yolo_v3.tflite --saved_model_signature_key='predict'
Thanks Anton Menshov for your suggestion on improving the answer.

This is the most simplest and easy repo. Author has done a wonderful job and it works well with yolov3, yolv3-tiny and yolov-4. Please don't forget to change the coco.names under classes if you are training on custom classes.
Git link for the code

caffe could not open or find file

I'm new to caffe and after successfully running an example I'm trying to use my own data. However, when trying to either write my data into the lmdb data format or directly trying to use the solver, in both cases I get the error:
E0201 14:26:00.450629 13235 io.cpp:80] Could not open or find file ~/Documents/ChessgameCNN/input/train/731_1.bmp 731
The path is right, but it's weird that the label 731 is part of this error message. That implies that it's reading it as part of the path instead of as a label. The text file looks like this:
~/Documents/ChessgameCNN/input/train/731_1.bmp 731
Is it because the labels are too high? Or maybe because the labels don't start with 0? I've searched for this error and all I found were examples with relatively few labels, about ~1-5, but I have about 4096 classes of which I don't always actually have examples in the training data. Maybe this is a problem, too (certainly for learning, at least, but I didn't expect for it to give me an actual error message). Usually, the label does not seem to be part of this error message.
For the creation of the lmdb file, I use the create_imagenet.sh from the caffe examples. For solving, I use:
~/caffe/build/tools/caffe train --solver ~/Documents/ChessgameCNN/caffe_models/caffe_model_1/solver_1.prototxt 2>&1 | tee ~/Documents/ChessgameCNN/caffe_models/caffe_model_1/model_1_train.log
I tried different image data types, too: PNG, JPEG and BMP. So this isn't the culprit, either.
If it is really because of my choice of labels, what would be a viable workaround for this problem?
Thanks a lot for your help!

I had the same issue. Check that lines in your text file don't have spaces in the end.

I was facing a similar problem with convert_imageset. I have solved just removing the trailing spaces in the text file which contains the labels.

Tesseract OCR - recognize checkboxes as word

for a customer I want to teach Tesseract to recognize checkboxes as a word. It worked fine when Tesseract should recognize a empty checkbox.
This command in combination with this tutorial worked like a charm and Tesseract was able to find empty checkboxes and interpret them to "[_]":
tesseract -psm 10 deu2.unchecked1.exp0.JPG deu2.unchecked1.exp0.box nobatch box.train
Here is my command to successful analyze a document:
tesseract test.png test -l deu1+deu2
Then I tried to train a checked checkbox, but got this error:
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
FAIL!
APPLY_BOXES: boxfile line 1/[X] ((60,30),(314,293)): FAILURE! Couldn't find a matching blob
APPLY_BOXES:
Boxes read from boxfile: 1
Boxes failed resegmentation: 1
Found 0 good blobs.
Generated training data for 0 words
Does anyone have an idea how to teach Tesseract recognize checked checkboxes as well?
Thank you in advance!

After much more tries I figured out that it is of course possible to teach Tesseract different kind of letters. But as I know today, there is no possibility to teach Tesseract a sign which is not conform to some "visual rules" of a letter. For example: A letter is always one connected line of ink, at most a combination of ink and "something outside it" (for example: i,ä,ö,ü) Problem here ist that there is nothing what is similiat to checkbox (one object in antother object) This leads for Tesseract to irritations and crashes.

igraph for python

I'm thoroughly confused about how to read/write into igraph's Python module. What I'm trying right now is:
g = igraph.read("football.gml")
g.write_svg("football.svg", g.layout_circle() )
I have a football.gml file, and this code runs and writes a file called football.svg. But when I try to open it using InkScape, I get an error message saying the file cannot be loaded. Is this the correct way to write the code? What could be going wrong?

The write_svg function is sort of deprecated; it was meant only as a quick hack to allow SVG exports from igraph even if you don't have the Cairo module for Python. It has not been maintained for a while so it could be the case that you hit a bug.
If you have the Cairo module for Python (on most Linux systems, you can simply install it from an appropriate package), you can simply do this:
igraph.plot(g, "football.svg", layout="circle")
This would use Cairo's SVG renderer, which is likely to generate the correct result. If you cannot install the Cairo module for Python for some reason, please file a bug report on https://bugs.launchpad.net/igraph so we can look into this.
(Even better, please file a bug report even if you managed to make it work using igraph.plot).

Couple years late, but maybe this will be helpful to somebody.
The write_svg function seems not to escape ampersands correctly. Texas A&M has an ampersand in its label -- InkScape is probably confused because it sees & rather than &. Just open football.svg in a text editor to fix that, and you should be golden!

Watch file(s) for modifications algorithm

I was simply wondering how file watching algorithms are implemented. For instance, let's say I want to apply a filter (i.e., search/replace a string) to a file every time it is modified, what technique should I use? Obviously, I could run an infinite loop that would check every file in a directory for modifications, but it might not be very efficient. Is there any way to get notified directly by the OS instead? For the sake of demonstration, let's assume a *nix OS and whatever language (C/Ruby/Python/Java/etc.).

Linux has inotify, and judging from the wikipedia links, Windows has something similar called 'Directory Management'. Without something like inotify, you can only poll..

In Linux there is the Inotify subsystem which will alert you to file modification.

JavaSE 7 will have File Change Notification as part of NIO.2 updates.

There are wrappers to inotify that make it easy to use from high-level languages. For example, in ruby you can do the following with rb-inotify:
notifier = INotify::Notifier.new
# tell it what to watch
notifier.watch("path/to/foo.txt", :modify) {puts "foo.txt was modified!"}
notifier.watch("path/to/bar", :moved_to, :create) do |event|
puts "#{event.name} is now in path/to/bar!"
end
There's also pyinotify but I was unable to come up with an example as concise as the above.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008