Failed in generating Tesseract traineddata

Failed in generating Tesseract traineddata - ocr

I'm using Tesseract v5.0.1.20220118 on Windows 10, training a font only have letter "P" and "Q".
When I get to the step
mftraining -F font_properties.txt -U unicharset -O normal.unicharset pq.normal.exp0.tr
The pffmtable file is not generated.
And when I run code cntraining pq.normal.exp0.tr
It shows me
Reading pq.normal.exp0.tr ...
Clustering ...
N == sizeof(Cluster->Mean):Error:Assert failed:in file ../../../src/classify/cluster.cpp, line 2526
Why it goes wrong? How can I fix it?
I only have inttemp and shapetable generated, but the tutorial says there will be four files include shapetable, inttemp, pffmtable and normproto, I wonder that maybe is beacuse of the font only have letter "P" and "Q", but I have no idea how to solve it.

Please read the docs:
https://tesseract-ocr.github.io/tessdoc/#training-for-tesseract-5
Use the right tools:
https://github.com/tesseract-ocr/tesstrain

Related

Issue to train tesseract-OCR 4 - Empy shape table

I am trying to train Tesseract 4 with particular pictures (to read multimeters with 7 segments),
please note that I am aware of the allready trained data from Arthur Augusto at https://github.com/arturaugusto/display_ocr but I need to train Tesseract over my own data.
In order to train tess, I followed differents tutorials (as https://robipritrznik.medium.com/recognizing-vehicle-license-plates-on-images-using-tesseract-4-ocr-with-custom-trained-models-4ba9861595e7 or https://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/)
but i allways get problem when running the shapeclustering command with my own data
(With example data as https://github.com/tesseract-ocr/tesseract/issues/1174#issuecomment-338448972, every things is working fine)
Indeed when I try to do the shapeclusturing command it have this output screenshot
Then my shape_table is empty and the trainig could'nt be efficient...
With example data it's working fine and the shape_table is well filled
I am guessing that I have issue with box file generation, here is my process to create box file :
I use the
tesseract imageFileName.tif imageFileName batch.nochop makebox
command to generate box file and then i edit it with JtessboxEditor.
So I can't see where I'am wrong with my .box/.tif data couple.
Have a good day & thanks for helping me
\n
Adrien
Here is my full batch script for training after having generated and edited box files.
set name=sev7.exp0
set shortName=sev7
echo Run Tesseract for Training..
tesseract.exe %name%.tif %name% nobatch box.train
echo Compute the Character Set..
unicharset_extractor.exe %name%.box
shapeclustering -F font_properties -U unicharset -O %shortName%.unicharset %name%.tr
mftraining -F font_properties -U unicharset -O %shortName%.unicharset %name%.tr
echo Clustering..
cntraining.exe %name%.tr
echo Rename Files..
rename normproto %shortName%.normproto
rename inttemp %shortName%.inttemp
rename pffmtable %shortName%.pffmtable
rename shapetable %shortName%.shapetable
echo Create Tessdata..
combine_tessdata.exe %shortName%.
echo. & pause

Ok so finally I achieved to train tesseract.
The solution is to add a --psm parameter when using the command
tesseract.exe %name%.tif %name% nobatch box.train
as
tesseract.exe %name%.%typeFile% %name% --psm %psm% nobatch box.train
note that all the psm value are :
REM pagesegmode values are:
REM 0 = Orientation and script detection (OSD) only.
REM 1 = Automatic page segmentation with OSD.
REM 2 = Automatic page segmentation, but no OSD, or OCR
REM 3 = Fully automatic page segmentation, but no OSD. (Default)
REM 4 = Assume a single column of text of variable sizes.
REM 5 = Assume a single uniform block of vertically aligned text.
REM 6 = Assume a single uniform block of text.
REM 7 = Treat the image as a single text line.
REM 8 = Treat the image as a single word.
REM 9 = Treat the image as a single word in a circle.
REM 10 = Treat the image as a single character.
REM 11 = Sparse text. Find as much text as possible in no particular order.
REM 12 Sparse text with OSD.
REM 13 Raw line. Treat the image as a single text line bypassing hacks that are Tesseract-specific.
founded on https://github.com/tesseract-ocr/tesseract/issues/434

ESP32 cannot upload code write_flash error

I'm getting this error on Arduino(1.8.9).
usage: esptool write_flash [-h] [--erase-all]
[--flash_freq {keep,40m,26m,20m,80m}]
[--flash_mode {keep,qio,qout,dio,dout}]
[--flash_size FLASH_SIZE]
[--spi-connection SPI_CONNECTION] [--no-progress]
[--verify] [--compress | --no-compress]
<address> <filename> [<address> <filename> ...]
esptool write_flash: error: argument <address>
<filename>: [Errno 2] No such file or directory:'/home/USER/.arduino15/packages/esp32/hardware/esp32/1.0.2/tools/partitions/boot_app0.bin'
esptool write_flash: error: argument <address> <filename>: [Errno 2] No such file or directory: '/home/USER/.arduino15/packages/esp32/hardware/esp32/1.0.2/tools/partitions/boot_app0.bin'
Although boot_app0.bin file is present:
image link

To fix this problem you can try to edit the platform.txt file which you can find in your esp package directory.
So, you need to replace inner section with code written below:
## Combine gc-sections, archives, and objects
recipe.c.combine.pattern={recipe.hooks.linking.prelink.1.pattern} & "{compiler.path}{compiler.c.elf.cmd}" {build.exception_flags} -Wl,-Map "-Wl,{build.path}/{build.project_name}.map" {compiler.c.elf.flags} {compiler.c.elf.extra_flags} -o "{build.path}/{build.project_name}.elf" -Wl,--start-group {object_files} "{archive_file_path}" {compiler.c.elf.libs} -Wl,--end-group "-L{build.path}" & {recipe.objcopy.hex.1.pattern}
The key thing here is "{recipe.hooks.linking.prelink.1.pattern} &" at start and "& {recipe.objcopy.hex.1.pattern}" at the end. The text between is the part of the platform.txt file, that you dont need to change.
The above is true for OS Windows. In OS Linux set {recipe.hooks.linking.prelink.1.pattern} ;" and "; {recipe.objcopy.hex.1.pattern}".
Its solve my problem for ESP8266, I hope its will be helpful for you too.
Reference link https://www.eclipse.org/forums/index.php/t/1095090/

if you are using esptool. Make sure you actually download the files from github using the download button instead of using the save as option when u right click on the file name.

How to Create Traineddata file For Tesseract 4.1.0

I want to recognise the characters of NumberPlate.
How to train the tesseract-ocr for respective number plate in ubuntu 16.04.
Since i don't familiar with training. Please help me to create a 'traineddata' file for recognizing numberplate.
I have 1000 images of number plate.
Please look into it.
Any help would be appreciate.
So I have tried the following commands
tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox
tesseract eng.arial.plate3655.png eng.arial.plate3655 batch.nochop makebox
But it gives error.
Tesseract Open Source OCR Engine v4.1.0-rc1-56-g7fbd with Leptonica
Error, cannot read input file eng.arial.plate3655.png: No such file or directory
Error during processing.
after that i have tried
tesseract plate4.png eng.arial.plate4 batch.nochop makebox
it works but in some plates.
Now in Step 2. I am getting error.
Screenshot is attached.
Plate 4 image for training
Step 1 and Ste p2 display in terminal
File Generated after step 1 and step 2
Content of file generated after step 1 and step 2

Creating .traineddata for Tesseract 4
{*Note : After install tesseract open cmd and do the following.}
Step 1:
Make box files for images that we want to train
Syntax:
tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox
Eg:
tesseract own.arial.exp0.jpg own.arial.exp0 batch.nochop makebox
{*Note:After making box files we have to change or modify wrongly identified characters in box files.}
Step 2:
Create .tr file (Compounding image file and box file)
Syntax:
tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] box.train
Eg:
tesseract own.arial.exp0.jpg own.arial.exp0 box.train
step 3:
Extract the charset from the box files (Output for this command is unicharset file)
Syntax:
unicharset_extractor [langname].[fontname].[expN].box
Eg:
unicharset_extractor own.arial.exp0.box
step 4:
Create a font_properties file based on our needs.
Syntax:
echo "[fontname] [italic (0 or 1)] [bold (0 or 1)] [monospace (0 or 1)] [serif (0 or 1)] [fraktur (0 or 1)]" > font_properties
Eg:
echo "arial 0 0 1 0 0" > font_properties
Step 5:
Training the data.
Syntax:
mftraining -F font_properties -U unicharset -O [langname].unicharset [langname].[fontname].[expN].tr
Eg:
mftraining -F font_properties -U unicharset -O own.unicharset own.arial.exp0.tr
Step 6:
Syntax:
cntraining [langname].[fontname].[expN].tr
Eg:
cntraining own.arial.exp0.tr
{*Note:After step 5 and step 6 four files were created.(shapetable,inttemp,pffmtable,normproto) }
Step 7:
Rename four files (shapetable,inttemp,pffmtable,normproto) into ([langname].shapetable,[langname].inttemp,[langname].pffmtable,[langname].normproto)
Syntax:
rename filename1 filename2
Eg:
rename shapetable own.shapetable
rename inttemp own.inttemp
rename pffmtable own.pffmtable
rename normproto own.normproto
Step 8:
Create .traineddata file
Syntax:
combine_tessdata [langname].
Eg:
combine_tessdata own.
{ *Note : I will use only one image exp0 for creating traineddata.if you want to train more than one image you can train i.e exp1,exp2..expn }
Reference

IF and ! = ns2 error

I have a problem with path in a tcl file. I tried to use
source " /tmp/mob.tcl "
and this path in bash file :
/opt/ns-allinone-2.35/ns-2.35/indep-utils/cmu-scen-gen/setdest/setdest -v 1 -n $n -p 10 -M 64 -t 100 -x 250 -y 250 >> /tmp/mob.tcl
The terminal give me this output:
..."
(procedure "source" line 8)
invoked from within
"source "/tmp/mob.tcl" "
(file "mobilita_source.tcl" line 125)
How I can do this?

Firstly, this:
source " /tmp/mob.tcl "
is very unlikely to be correct. The spaces around the filename inside the quotes will confuse the source command. (It could be correct, but only if you have a directory in your current directory whose name is a single space. That's really unlikely, unless you're a great deal more evil than I am.)
It really helps a lot if you stop making this error.
Secondly, the error message is both
Incomplete, with just an ellipsis instead of a full error on the first line
Really worrying, with source claimed to be a procedure (second line of that short trace).
It's legal to make a procedure called source, and sometimes the right thing to do, but if you're doing it then you have to be ever so careful to duplicate the semantics of the standard Tcl command or odd things will happen.
Thirdly, you've got a file of what is apparently generated code, and you're hitting a problem in it, and you're not telling us what is on/around line 125 of the file (the error trace is pretty clear on that front) or in the contents of the source procedure (which is non-standard; the standard source is implemented in C) and you're expecting us to guess what's going wrong for you??? Seriously?
Tcl error traces are usually quite clear enough for you to figure out what went wrong and where. If there's an unclear error, and it didn't come from user code (by calling error or return -code error) then let us know; we'll help (or possibly even change Tcl to make things clearer in the future). But right now, there's a complete shortage of information.
Here's an example of what a normal source error looks like:
% source /tmp/foo/bar/boo
couldn't read file "/tmp/foo/bar/boo": no such file or directory
% puts $errorInfo
couldn't read file "/tmp/foo/bar/boo": no such file or directory
while executing
"source /tmp/foo/bar/boo"
If a script generates an error directly, it's encouraged to be as clear as that, but we cannot enforce it. Sometimes you have to be a bit of a detective yourself…

Replacing output text of a command with a string in a Shell Script

Hello and thank you for any help you can provide
I have my Apache2 web server set up so that when I go to a specific link, it will run and display the output of a shell script stored on my server. I need to output the results of an SVN command (svn log). If I simply put the command 'svn log -q' (-q for quiet), I get the output of:
(of course not blurred), and with exactly 72 dashes in between each line. I need to be able to take these dashes, and turn them into an html line break, like so:
Basically I need the shell script to take the output of the 'svn log -q' command, search and replace every chunk of 72 dashes with an html line break, and then echo the output.
Is this at all possible?
I'm somewhat a noob at shell scripting, so please excuse any mess-ups.
Thank you so much for your help.

svn log -q | sed -e 's,-{72},<br/>,'

If you want to write it in the script this might help:
${string//substring/replacement}
Replace all matches of $substring with $replacement.
stringZ=abcABC123ABCabc
echo ${stringZ/abc/xyz} # xyzABC123ABCabc
# Replaces first match of 'abc' with 'xyz'.
echo ${stringZ//abc/xyz} # xyzABC123ABCxyz
# Replaces all matches of 'abc' with # 'xyz'.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008