tessaract ocr on url image gives me 100% error file - ocr

When I run tessaract on a PNG image containing only urls, it gives me a 100% error output
like:
Jcâa\râcL7mpnmeVr
Jevuusdwvmceranr
pmmyhemnï¬r
nnnnnysaaan
ï¬mï¬asmunï¬r
Is there a way to get a better result : image is clean and readable.
I tried with GOCR, and result is like 70% good (which is still not good enough for me)
Any chance to use a linux command line OCR to get better results ?

Related

pytesseract - Invalid resolution 0 dpi

I am using pytesseract v5.0 and I am rotating the image with OpenCV and then passing it to pytesseract.image_to_osd(). There are some images that work with the image_to_osd, but other images do not and the program gives me the following error:
TesseractError: (1, 'Tesseract Open Source OCR Engine v5.0.0-alpha.20201127 with Leptonica Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 179 Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Error during processing.')
I am using python 3.9.5.
Please share the solution / sample code to fix this issue.
I've been facing this error for a quite long time but finally realized the reason.
Tesseract OSD seems to get the correct image rotation if it's only 0,90,180, or 270.
If you're using OpenCV or Pillow to read your image, it's likely to get the error above.
If you view Tesseract parameters, you will notice something called "Min_characters_to_try" which is the minimum number of characters to run OSD. It's set to 50 by default, which might be too much for you. So, we have to reduce it.
What you can do is cropping your background to make your object have one of the angles stated above. Then, pass your image file directly to Tesseract and reduce the min_characters_to_try like the following:
osd = pytesseract.image_to_osd(r'D:\image.jpg',config='--psm 0 -c min_characters_to_try=5')

PhpStorm isn't doing code completion with large file

I've just installed this library. PhpStorm does its usual code completion, except for the \XeroAPI\XeroPHP\Api\AccountingApi class. The \XeroAPI\XeroPHP\Api\IdentityApi class in the same folder works just fine.
The file is quite big - 2,560KB. If I delete roughly half of the 65,000 lines from the class (and it works whether it's the first half or the second half) then I get my code completion back. In fact, I can delete just the last 3,000 or so lines (getting the file down to 2,499KB) and it works.
I've also tried a quick regex find/replace to remove all the #throws PHPDoc comments. This got the file down to 2,491KB and hey presto, code completion works fine.
If I had to make a guess I'd say it's not doing code completion with source files over 2.5MB or something, but I can't find any setting for this.
Any way to get code completion going with this file short of deleting stuff from it (which will be restored next time I do a Composer update anyway)?
Based on your info (especially the mentioned file size and the fact that it starts to work after reducing it) you have hit a limit of max file size that IDE is willing to parse and index.
Solution: configure idea.max.intellisense.filesize option using Help | Edit Custom Properties command. By default it has a value of 2500 (size in KB). Set it to 3000 or so (to cover your file size) and restart IDE (it reads and applies settings from idea.properties file on start only).
idea.max.intellisense.filesize=3000
P.S. Do not put that value too big as it may cause other performance issues.

caffe could not open or find file

I'm new to caffe and after successfully running an example I'm trying to use my own data. However, when trying to either write my data into the lmdb data format or directly trying to use the solver, in both cases I get the error:
E0201 14:26:00.450629 13235 io.cpp:80] Could not open or find file ~/Documents/ChessgameCNN/input/train/731_1.bmp 731
The path is right, but it's weird that the label 731 is part of this error message. That implies that it's reading it as part of the path instead of as a label. The text file looks like this:
~/Documents/ChessgameCNN/input/train/731_1.bmp 731
Is it because the labels are too high? Or maybe because the labels don't start with 0? I've searched for this error and all I found were examples with relatively few labels, about ~1-5, but I have about 4096 classes of which I don't always actually have examples in the training data. Maybe this is a problem, too (certainly for learning, at least, but I didn't expect for it to give me an actual error message). Usually, the label does not seem to be part of this error message.
For the creation of the lmdb file, I use the create_imagenet.sh from the caffe examples. For solving, I use:
~/caffe/build/tools/caffe train --solver ~/Documents/ChessgameCNN/caffe_models/caffe_model_1/solver_1.prototxt 2>&1 | tee ~/Documents/ChessgameCNN/caffe_models/caffe_model_1/model_1_train.log
I tried different image data types, too: PNG, JPEG and BMP. So this isn't the culprit, either.
If it is really because of my choice of labels, what would be a viable workaround for this problem?
Thanks a lot for your help!
I had the same issue. Check that lines in your text file don't have spaces in the end.
I was facing a similar problem with convert_imageset. I have solved just removing the trailing spaces in the text file which contains the labels.

Octave contourf() Not Coloring in the Line

I'm having trouble filling in my curves using contourf() on Octave. I'm running Octave 3.6.4 on Mac OS X 10.8.5.
When I use contour(x,fp,data3) I get the following, which is correct:
[Contour plot of data, not filled]
However, when I want try contourf(x,fp,data3) in order to fill in the gaps I get this monstrosity:
Contourf plot of data, filled, but not correct
What can I do to fix this? I've read the contourf() documentation and can't see anything there that I'm missing. Any advice would be helpful.
Thanks!
P.S. Here's a link to a smaller version of the data file: https://www.dropbox.com/s/lmvdzi7l42tasr8/Ch942.csv?dl=0. The whole file is huge, so this represents the first few lines but still shows the problem when plotted in Octave.
Unfortunately I don't have enough reputation points to post more than two links, so I've deleted the contour() plot that looks right, but isn't filled in. Sorry.

Not seeing the entire output after the script runs

I'm not sure how to word this but I'm not getting the complete history after I run a script. I have some print statements and I see the results print out as it's running. But then when I go back and scroll vertically to take a closer look at the output, the results are gone. In other words the first lines are not displayed.
I have the following statement in my code so it should display all the lines
pd.set_option('display.max_rows', 180000)
What do I need to do to see all the output; not just the last part ?
You probably have to increase the size of the scrollback buffer. If you're using the QT console, there's a StackOverflow answer here that shows how to do that. If you're using ipython from another console, you'll have to look up how to change the size of the scrollback buffer there.