Arabic letters width' problem in OpenALPR training sheet generation

Arabic letters width' problem in OpenALPR training sheet generation - ocr

I have created numbers and letters tiles and organized them with a max height of 40px. But when I run openalpr-utils-prepcharsfortraining I get this result:
As you can see the arabic letters are having difficult times. Here are the letters in the original form to compare versus the numbers:
When I checked the source code, I found out that the tools resizes the images to 40px height by default, how to fix this?
What should I do? Please help, thanks a lot!

I found the solution... Just added --tile_width 70 to the command and bam! job done!

Related

How to reduce white space in octave legend function

I have build some algorithm and now I should make things pretty enough for presentation. Only problem what I have is a "strange" legend behaviour in octave. I dont know if the problem is because I am using "subplot" function, but I have tried several ideas what was possible to find on google, but none of that really works.
At the uploaded picture we can see that diagram lines names are moved all the way to the left (nothing wrong with that), however there is too much space on the right side of the line names. The legend box is simply too big for the context inside. I have tried already with reducing the size of the font, but is not the best solution.
Can somebody please provide some solution for my problem. My current code status is:
hleg1 = legend({"sample1", "sample2", "sample3"});
set(hleg1, "FontSize", 8);
I currently using Octave version 5.1.0 on windows 10 x64.

The fix for me was te execute the plotting script twice.
The first time you resize the window in full screen mode.
Then you execute the script again (without closing the figure window)
Now legends should not take half the space of your plot ;)

Can I denote a glyph as being two chars (NA) in a box file in Tesseract 3.05

I am using tesseract 3.05 for reasons beyond my control. I am using source files to train the engine to detect this unique font. As I have a vast amount of samples, I am simply using the samples themselves as the training images rather than segment them into a font training image as this should give it more variation and training with the specific spacing issues this font has.
My question when generating the box files, as some letters are touching at corners (i.e . no clear break between glyphs), it will detect them as one glyph instead of two separate glyphs. An example it sometimes struggles with NA as the front serif of the A has bled into serif of the N. The image pre-processing I have applied has improved it by leaps and bounds but there are still some that I cannot correct on the image enough.
My question is this: can I simply denote the glyph as being NA in the box file?
If I cannot what would be the simplest solution? Introducing another glyph box seems like it wouldn't be a good idea but the only other solution I can see is to manually edit the image to make the separation of glyphs more obvious. This is itself anthi-thetical however as this is the kind of problem the font will have in the future that I am trying to OCR.
Thank you in advance but the documentation isn't specific on if I can correct a box glyph to being two characters instead of just one (or I just haven't found a relevant section where they explain this).

After scouring the documentation, I managed to find a lone paragraph that wasn't appearing in my website scraping:
"If you didn't successfully space out the characters on the training image, some may have been joined into a single box. In this case, you can either remake the images with better spacing and start again, or if the pair is common, put both characters at the start of the line, leaving the bounding box to represent them both. (As of 3.00, there is a limit of 24 bytes for the description of a "character". This will allow you between 6 and 24 unicodes to describe the character, depending on where your codes sit in the unicode set. If anyone hits this limit, please file an issue describing your situation.)"
Thus you can do what I ask: represent a glyph with two or more characters in a box file for Tesseract.

Tesseracts handles similar Pictures completely different

I was just playing around a little with Tesseract as I noticed some strange behavior of the program, which I can’t explain myself. Firstly I gave tesseract this preprocessed Picture1 but it didn’t understand any letter.
Then I put this one in and guess what it gave me?
Neuinitialisierung des automatischen
Karten-Updates erforderlich. Aktuellste
The exact letters and words, every single letter was correct!
So can anybody tell me why it didn't got the text in the first picture.
(btw. I preprocessed the two pictures in the absolut same way)
Thanks in advance!

Tesseract OCR finds too few boxes / ignores small characters

I have a problem with the training/text recognition process with Tesseract. Here is my trainingdata: http://s11.postimg.org/867aq10ur/dot_dotmatrixfont_exp0.png While training Tesseract ignores the dashes (I've marked them with red boxes, just to make it clear which ones I mean) and if I'm using the trained data for text recognition it also ignores them. Today I've played around with the Tesseract parameters (SetVariable(name, value)) but unfortunately I had no success.
What can I do to teach Tesseract those dashes? Thank you in advance!

Tesserect training is pretty tricky.
Your best chance might be to handle the dashes as a single char.
If your box editor or whatever tools you are using does not see the dashes as all, try running some image processing first, especially threshold or invert. try taking a look at OpenCV. They have some excellent tool for this kind of image processing.

"L" characters showing up randomly in text in IE 8

I'm having this problem with L characters showing up in IE 8. It's happening in the Healthcare Professionals block and the bottom two blocks. Any experience with this/clue as to what's wrong? I'm going to start deconstructing the whole page soon and rebuilding it line by line, but it would be great to get an answer as to what the heck the cause is.

Maybe you can refer to this https://webmasters.stackexchange.com/questions/15709/strange-characters-appearing-on-websites-ascii-unicode
There may be some encoding issue with the content.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Arabic letters width' problem in OpenALPR training sheet generation - ocr

I found the solution... Just added --tile_width 70 to the command and bam! job done!

Related

How to reduce white space in octave legend function

Can I denote a glyph as being two chars (NA) in a box file in Tesseract 3.05

Tesseracts handles similar Pictures completely different

Tesseract OCR finds too few boxes / ignores small characters

"L" characters showing up randomly in text in IE 8

Categories

Resources