i am attempting to extract OCR data of a 3-digit counter within a video via tesseract 4.1.1 on Kubuntu 21.04. (full tesseract version string below.) i am failing to add characters during the shapetable phase, and no other troubleshooting has worked for me -- i turn to you with humble heart. n.b.: the images are of a small pixel font, which takes up the entirety of my source image
image preparation and collation
from the source videos, i: crop to only the counter, invert, grayscale, dump at 1 fps, and then increase resolution by 1000% to 780x180 resolution. the results are individual frames such as this. i take a section of sequential numbers counting down from 500 (without any duplicates or blank images) and combine them into a .tif. (i can't upload the file here, but find the set of images mosaic'd together here)
i import this file into jTessBoxEditor as, for example, type_3.font.exp0.tif. i run tesseract --psm 6 --oem 3 font_name.font.exp0.tif font_name.font.exp0 makebox to create a .box file, with understandably nonsensical results.
with the hand-chosen source frames and the consistent positions, i'm able to edit the .box file with known box sizes, quantities, like so:
5 0 0 240 180 0
0 270 0 510 180 0
0 540 0 780 180 0
4 0 0 240 180 1
9 270 0 510 180 1
9 540 0 780 180 1
4 0 0 240 180 2
9 270 0 510 180 2
8 540 0 780 180 2
4 0 0 240 180 3
9 270 0 510 180 3
7 540 0 780 180 3
...
i load the edited .box into the jTessBoxEditor to check that it indeed matches my data. this is a 131-page .tif, meaning roughly 40 trains per digit.
training steps (where the problems begin)
i create font_properties and load it with font 0 0 0 0 0. Please note that i've also tried type_3 0 0 0 0 0 and type_3.font.exp0 0 0 0 0 0, with no difference on the below results
i input tesseract type_3.font.exp0.tif type_3.font.exp0 nobatch box.train and a training file is created; however, each page is listed as blank (is this normal?). e.g.:
Page 108
Warning: Invalid resolution 1 dpi. Using 70 instead.
Estimating resolution as 2263
Empty page!!
i input unicharset_extractor font_name.font.exp0.box with success -- the resulting extraction contains the characters i've identified, with some extra lines
13
NULL 0 Common 0
Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a
|Broken|0|1 15 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken
5 8 0,255,0,255,0,0,0,0,0,0 Common 3 2 3 5 # 5 [35 ]0
0 8 0,255,0,255,0,0,0,0,0,0 Common 4 2 4 0 # 0 [30 ]0
4 8 0,255,0,255,0,0,0,0,0,0 Common 5 2 5 4 # 4 [34 ]0
9 8 0,255,0,255,0,0,0,0,0,0 Common 6 2 6 9 # 9 [39 ]0
8 8 0,255,0,255,0,0,0,0,0,0 Common 7 2 7 8 # 8 [38 ]0
7 8 0,255,0,255,0,0,0,0,0,0 Common 8 2 8 7 # 7 [37 ]0
6 8 0,255,0,255,0,0,0,0,0,0 Common 9 2 9 6 # 6 [36 ]0
3 8 0,255,0,255,0,0,0,0,0,0 Common 10 2 10 3 # 3 [33 ]0
2 8 0,255,0,255,0,0,0,0,0,0 Common 11 2 11 2 # 2 [32 ]0
1 8 0,255,0,255,0,0,0,0,0,0 Common 12 2 12 1 # 1 [31 ]0
but i know that failure has come for me when shapeclustering -F font_properties -U unicharset -O type_3.unicharset type_3.font.exp0.tr
results in
Reading type_3.font.exp0.tr ...
Building master shape table
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
...
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Master shape_table:Number of shapes = 0 max unichars = 0 number with multiple unichars = 0
It has not recognized any shapes at all.
my plea:
what have i missed?? what can i do to pass these 10 humble characters to tesseract?
full version string (installed via apt)
tesseract 4.1.1
leptonica-1.79.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.2.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.4.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.5
I have a file with data in 2 columns X and Y. There are some blocks and they are separated by a blank line. I want to join the points (given by their coordenates x and y in the file) in each block using vectors. I'm trying to use these functions:
prev_x = NaN
prev_y = NaN
dx(x) = (x_delta = x-prev_x, prev_x = ($0 > 0 ? x : 1/0), x_delta)
dy(y) = (y_delta = y-prev_y, prev_y = ($0 > 0 ? y : 1/0), y_delta)
which I've taken from Plot lines and vector in graphical gnuplot (first answer). The command to plot would be plot for[i=0:5] 'Field_lines.txt' every :::i::i u (prev_x):(prev_y):(dx($1)):(dy($2)) with vectors. The output is
and the problem is that the point (0,0) is being included even though it's not in the file. I don't think I understand what the functions dx and dy do exactly and how they are being used with the option using (prev_x):(prev_y):(dx($1)):(dy($2)) so an explanation of this would help me a lot to try to fix this.
This is the file:
#1
0 5
0 4
0 3
0.4 2
0.8 1
0.8 1
#2
2 5
2 4
2 3
2 2
2 1
2 0
#3
4 5
4.2 4
4.5 3
4.6 2
4.7 1
4.7 0
#4
7 5
7.2 4
7.5 3
7.9 2
7.9 1
7.9 0
#5
9 5
9 4
9.2 3
9.5 2
9.5 1
9.5 0
#6
11 7
12 6
13 5
13.3 4
13.5 3
13.5 2
13.6 1
14 0
Thanks!
I'm not completely sure, what the real problem is, but I think you cannot rely on the columns in the using statement to be evaluated from left to right, and your check $0 > 0 in the dx and dy some too late in my opinion.
I usually put all the assignments and conditionals in the first column, and that works fine also in your case:
set offsets 1,1,1,1
unset key
prev_x = prev_y = 1
plot for [i=0:5] 'Field_lines.txt' every :::i::i \
u (x_delta = prev_x-$1, prev_x=$1, y_delta=prev_y-$2, prev_y=$2, ($0 == 0 ? 1/0 : prev_x)):(prev_y):(x_delta):(y_delta) with vectors backhead
Also, to draw a vector from j-th row to the point in the following row you must invert the definition of x_delta and use backhead to draw the vectors in the correct direction
If I have some:
-------------
float values
-----------------------
0.9 0.6 0.3 0.1 0.0
0.7 0.5 0.1 0.0 0.0
0.3 0.2 0.1 0.0 0.0
or int values
-----------------------
22 15 10 7 0
44 35 20 10 0
12 8 6 4 1
How Can I create a grayscale image from these values in ITK, in both cases?
You might find what you are looking for in this example: http://itk.org/ITKExamples/src/Filtering/ImageIntensity/ConvertRGBImageToGrayscaleImage/Documentation.html
Anyway, you need to convert your data matrix to an itk::Image before you can do that. For that goal take a look at the official guide: http://www.itk.org/ItkSoftwareGuide.pdf section 'Importing Image Data from a Buffer'. Once you are done with this just take a look at the output image, that might be exactly what you are looking for (there might be no need to apply the luminance filter in the first example)
Table pc -
code model speed ram hd cd price
1 1232 500 64 5.0 12x 600.0000
10 1260 500 32 10.0 12x 350.0000
11 1233 900 128 40.0 40x 980.0000
12 1233 800 128 20.0 50x 970.0000
2 1121 750 128 14.0 40x 850.0000
3 1233 500 64 5.0 12x 600.0000
4 1121 600 128 14.0 40x 850.0000
5 1121 600 128 8.0 40x 850.0000
6 1233 750 128 20.0 50x 950.0000
7 1232 500 32 10.0 12x 400.0000
8 1232 450 64 8.0 24x 350.0000
9 1232 450 32 10.0 24x 350.0000
Desired output -
model speed hd
1232 450 10.0
1232 450 8.0
1232 500 10.0
1260 500 10.0
Query 1 -
SELECT model, speed, hd
FROM pc
WHERE cd = '12x' AND price < 600
OR
cd = '24x' AND price < 600
Query 2 -
SELECT model, speed, hd
FROM pc
WHERE cd = '12x' OR cd = '24x'
AND price < 600
Query 1 is definitely working correctly, however when i tried to reduce the query to use price at once only, it is not showing the correct result..let me know what I am missing in the logic.
Find the model number, speed and hard drive capacity of the PCs having
12x CD and prices less than $600 or having 24x CD and prices less than
$600.
Since AND comes before OR, your query is being interpreted as:
WHERE (cd = '12x') OR ((cd = '24x') AND (price < 600))
Or, in words: All PCs having 12x CD, or PCs < $600 having 24x CD
You need to use parentheses to specify order of operations:
WHERE (cd = '12x' OR cd = '24x') AND price < 600
Or, you can use IN:
WHERE cd IN ('12x', '24x') AND price < 600
See Also: http://dev.mysql.com/doc/refman/5.5/en/operator-precedence.html
In your table may contain duplicate rows or coloumns try by using group by clause as shown below where you will get the soloution and let me know the output after trying these thanks...
SELECT model, speed, hd
FROM PC
WHERE cd IN ('12x','24x') AND price < 600
group by Model,speed,hd
Try using IN:
SELECT model, speed, hd
FROM PC
WHERE cd IN ('12x','24x') AND price < 600
Good luck.
I have several thousand small ASCII files containing 3D cartesian coordinates for atoms in molecules (among other information) that I need to store somewhere.
A simple calculation told me that we will require several terrabytes of space, which may be reduced to several gigabytes at most, but is not manageable under current infrastructural constraints. Somebody told me some people have stored similar numbers of files (of the same format, but sometimes bzipped) in MySQL and Oracle as a BLOB field. My question is, does storing such files as BLOB offer some form of reduction in storage requirements? If yes, how much of a reduction can I expect?
This is example text from an ASCII file that needs to be stored:
#<TRIPOS>MOLECULE
****
5 4 1 1 0
SMALL
GAST_HUCK
#<TRIPOS>ATOM
1 C1 -9.7504 2.6683 0.0002 C.3 1 <1> -0.0776
2 H1 -8.6504 2.6685 0.0010 H 1 <1> 0.0194
3 H2 -10.1163 2.1494 -0.8981 H 1 <1> 0.0194
4 H3 -10.1173 3.7053 -0.0004 H 1 <1> 0.0194
5 H4 -10.1176 2.1500 0.8982 H 1 <1> 0.0194
#<TRIPOS>BOND
1 1 2 1
2 1 3 1
3 1 4 1
4 1 5 1
#<TRIPOS>SUBSTRUCTURE
1 **** 1 TEMP 0 **** **** 0 ROOT
#<TRIPOS>NORMAL
#<TRIPOS>FF_PBC
FORCE_FIELD_SETUP_FEATURE Force Field Setup information
v1.0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NONE 0 0 0 0 1 0 0 0 0 0 0 0 0
Storing data in a BLOB column offers no form of reduction in storage requirements. The storage requirements for BLOB types are simple:
TINYBLOB L + 1 bytes, where L < 28
BLOB L + 2 bytes, where L < 216
MEDIUMBLOB L + 3 bytes, where L < 224
LONGBLOB L + 4 bytes, where L < 232
L represensts the length of the string data in bytes.
See Storage Requrements for further details.
If there is no need to search the contents of the molecule files in your database, you can reduce the storage requirements by compressing the data prior to inserting it or using the MySQL COMPRESS() function on insert.
I think that addressed your main question, and based on those figures and how many files you plan to store based on an average size, you can calculate how much storage space will be consumed by the BLOB type columns.