Letters Missing parts with OpenALPR PrepCharsForTraining Utility - ocr

I did run openalpr-utils-prepcharsfortraining on numbers and arabic letters, but the dots are cropped out... You can see the result in my output tiff (converted to png).
The images were resized to get max height 40px.
characters tiff
Here is the original ب but the dot below was removed in the tiff format:
ب

I got it fixed, just edit the code in this file:
float minHeightPercent = 0.0;
That's it!

Related

fixed-width font (usually Courier) in htm

i'm, new on learning html (for front-end)
i encountered with a subject that includes fixed-width font (usually Courier) in some web pages like w3schools.So after seraching alot,
i couldn't find a good answer for that.Can anyone explain it with an example?
A "fixed width font" is a style of glyphs (a glyph is the human visible representation of a character as displayed on the computer's screen or as printed on paper) such that every character has the same horizontal length.
Fixed width fonts are needed for "ASCII art" displays such as this:
I want to highlight this word. So I use "^" characters on
the next line like this: ^^^^
If the width was not fixed, the "^" characters might not be aligned correctly.

Why does my autoencoder generate some wierd pixels while reconstructing?

I have been working on the problem of deblurring an image using GAN. My generator is an autoencoder which should take a blurry image as input and output a de-blurred image. The results are good but the reconstructed image is containing some weird pixels.
https://imgur.com/a/xQEnFsI The middle column in each image is my output. Can someone help me understand why am I getting those weird artifacts?
This is probably caused by overflow(underflow) if those pixels' RGB values are negative or over 255 (assuming 8 bits for each channel) after decoding. To solve that, simply overwrite those values with 0 and 255 before showing the images.

How to avoid Tesseract from recognizing small lines as numbers or letters?

I'm using Tesseract to recognize big and clear text in 1bpp images. It works beautifully for the font and font-size I selected. However, it also recognizes some small lines and speckles as letters/numbers. In the attached image, Tesseract does not only recognize "Ge", "1", "2", "J.", and "Sp", but also an additional "1" for each line, corresponding to those small vertical lines you can see there. How can avoid Tesseract from doing this?
Thanks in advance.
You should preprocess your image first. OpenCV offers some morphological operations like eroding or dilating which could remove these speckles and lines (http://docs.opencv.org/doc/tutorials/imgproc/erosion_dilatation/erosion_dilatation.html).
Like the other answers suggested some simple eroding will help to remove the lines. However, if the lines are always outside of the area where the real characters are you could try a simple trick to avoid a degradation of the real characters while eroding.
Use a strongly eroded image to find the bounding box for the real chars and use this bbox to cut out the interesting part of the original image.

adjust gap between the lines in R console

When I parse web sites in R, (system: R+debian) the html object output in the console make me uncomfortable.
The gap is wide between lines. How can I make it normal, to narrow the gap between the lines?
Maybe you can see tha same output with the following code.
options(encoding="gbk")
library(XML)
baseURL <- "http://www.jb51.net/article/27174.htm"
txt <- readLines(baseURL)
txt
Interesting, it seems that when print-ing a vector, the longest element decides how all elements will be spaced.
Your longest string is txt[374]: on my screen, it takes 19 lines; that means every element of txt will be printed using 19 lines, with possibly a lot of white space.
You don't have that problem when printing a list, so a solution is to do:
print(as.list(txt))
Try to use gsub() for replacing space by nothing.

Why don't the images fully display when I convert HTML to PDF with Perl's HTML::HTMLDoc?

I need to create a PDF file from the HTML I have created usign rrdcgi. This page contains the details and graphs in PNG format. I have written the below code using Perl module HTML::HTMLDoc to create a PDF file using saved HTML file. The images are of size width 1048 and hight 266 but when creating a PDF file the images are not shown completly from the right side.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::HTMLDoc;
my $filename = shift;
my $htmldoc = new HTML::HTMLDoc();
$htmldoc->set_input_file($filename);
$htmldoc->no_links();
$htmldoc->landscape();
$htmldoc->set_jpeg_compression('50');
$htmldoc->best_image_quality();
$htmldoc->color_on();
$htmldoc->set_right_margin('1', 'mm');
$htmldoc->set_left_margin('1', 'mm');
$htmldoc->set_bodycolor('#FFFFFF');
$htmldoc->set_browserwidth('1000');
my $pdf = $htmldoc->generate_pdf();
$pdf->to_file('foo.pdf');
I need help on following items:
How do I display the complete image on page.
Any help with the Perl code would be really appreciated.
Have a look to HTML::HTMLDoc documentation:
set_browserwidth($width)
specifies the browser width in pixels.
The browser width is used to scale
images and pixel measurements when
generating PostScript and PDF files.
It does not affect the font size of
text. The default browser width is 680
pixels which corresponds roughly to a
96 DPI display.
Please note that
your images and table sizes are equal
to or smaller than the browser width,
or your output will overlap or
truncate in places.
You said that images are 1048 width but you specified 1000 on browser width using:
$htmldoc->set_browserwidth('1000');
try to increase browserwidth parameter to fit your images.