Related
i am attempting to extract OCR data of a 3-digit counter within a video via tesseract 4.1.1 on Kubuntu 21.04. (full tesseract version string below.) i am failing to add characters during the shapetable phase, and no other troubleshooting has worked for me -- i turn to you with humble heart. n.b.: the images are of a small pixel font, which takes up the entirety of my source image
image preparation and collation
from the source videos, i: crop to only the counter, invert, grayscale, dump at 1 fps, and then increase resolution by 1000% to 780x180 resolution. the results are individual frames such as this. i take a section of sequential numbers counting down from 500 (without any duplicates or blank images) and combine them into a .tif. (i can't upload the file here, but find the set of images mosaic'd together here)
i import this file into jTessBoxEditor as, for example, type_3.font.exp0.tif. i run tesseract --psm 6 --oem 3 font_name.font.exp0.tif font_name.font.exp0 makebox to create a .box file, with understandably nonsensical results.
with the hand-chosen source frames and the consistent positions, i'm able to edit the .box file with known box sizes, quantities, like so:
5 0 0 240 180 0
0 270 0 510 180 0
0 540 0 780 180 0
4 0 0 240 180 1
9 270 0 510 180 1
9 540 0 780 180 1
4 0 0 240 180 2
9 270 0 510 180 2
8 540 0 780 180 2
4 0 0 240 180 3
9 270 0 510 180 3
7 540 0 780 180 3
...
i load the edited .box into the jTessBoxEditor to check that it indeed matches my data. this is a 131-page .tif, meaning roughly 40 trains per digit.
training steps (where the problems begin)
i create font_properties and load it with font 0 0 0 0 0. Please note that i've also tried type_3 0 0 0 0 0 and type_3.font.exp0 0 0 0 0 0, with no difference on the below results
i input tesseract type_3.font.exp0.tif type_3.font.exp0 nobatch box.train and a training file is created; however, each page is listed as blank (is this normal?). e.g.:
Page 108
Warning: Invalid resolution 1 dpi. Using 70 instead.
Estimating resolution as 2263
Empty page!!
i input unicharset_extractor font_name.font.exp0.box with success -- the resulting extraction contains the characters i've identified, with some extra lines
13
NULL 0 Common 0
Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a
|Broken|0|1 15 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken
5 8 0,255,0,255,0,0,0,0,0,0 Common 3 2 3 5 # 5 [35 ]0
0 8 0,255,0,255,0,0,0,0,0,0 Common 4 2 4 0 # 0 [30 ]0
4 8 0,255,0,255,0,0,0,0,0,0 Common 5 2 5 4 # 4 [34 ]0
9 8 0,255,0,255,0,0,0,0,0,0 Common 6 2 6 9 # 9 [39 ]0
8 8 0,255,0,255,0,0,0,0,0,0 Common 7 2 7 8 # 8 [38 ]0
7 8 0,255,0,255,0,0,0,0,0,0 Common 8 2 8 7 # 7 [37 ]0
6 8 0,255,0,255,0,0,0,0,0,0 Common 9 2 9 6 # 6 [36 ]0
3 8 0,255,0,255,0,0,0,0,0,0 Common 10 2 10 3 # 3 [33 ]0
2 8 0,255,0,255,0,0,0,0,0,0 Common 11 2 11 2 # 2 [32 ]0
1 8 0,255,0,255,0,0,0,0,0,0 Common 12 2 12 1 # 1 [31 ]0
but i know that failure has come for me when shapeclustering -F font_properties -U unicharset -O type_3.unicharset type_3.font.exp0.tr
results in
Reading type_3.font.exp0.tr ...
Building master shape table
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
...
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Master shape_table:Number of shapes = 0 max unichars = 0 number with multiple unichars = 0
It has not recognized any shapes at all.
my plea:
what have i missed?? what can i do to pass these 10 humble characters to tesseract?
full version string (installed via apt)
tesseract 4.1.1
leptonica-1.79.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.2.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.4.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.5
Following from my previous question, it seems that APL performs a scanl in O(n^2) but the compiler is smart enough to optimize for simple primitives. What then is the best strategy to apply \ to non-simple functions? Additionally, there are many cases when the right associativity does affect the result, for example:
{0⌈⍺+⍵} \ 3 ¯4 1 5 ¯1 ¯2 ¯3 2 0 4 ⍝ 3 0 3 5 4 5 5 5 5 5
Which is not the answer I would have expected 3 0 1 6 5 3 0 2 2 6
I stumbled on one possible solution, but don't know if it's the most idiomatic.
s←0 ⋄ ↑{s⊢←0⌈s+⍵}¨ 3 ¯4 1 5 ¯1 ¯2 ¯3 2 0 4
Correctly gives me 3 0 1 6 5 3 0 2 2 6
In Octave, in general, '+' will only work when the two operands have the same dimension.
There seems to be an exception to this rule: if you '+' a row vector (1 x n) and a column vector (n x 1), Octave will produce a (reasonable) Matrix of dimensions (n x n):
>> a = [1, 2, 3, 4, 5]
a =
1 2 3 4 5
>> b = [1; 2; 3; 4; 5]
b =
1
2
3
4
5
>> a+b
ans =
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
6 7 8 9 10
Can this be prevented, or at least be made to generate a warning? It seems potentially unsafe; I was recently bitten by a bug that was being masked by this behaviour.
Thanks!
No, this cannot be prevented. You need to check the input of your functions. The exception you mention is not an exception, it's the language syntax.
Element-wise operations require that dimensions have dimensions of equal lengths or length of one. The feature you are trying to prevent is also the feature that makes this work:
octave:1> a = 1:4
a =
1 2 3 4
octave:2> a+1
ans =
2 3 4 5
octave:3> a == 2
ans =
0 1 0 0
In the above examples, the value in the dimension with length 1 (1x1) is broadcasted or expanded. This feature is named Broadcasting in Octave and Python, and Implicit Expansion in Matlab. There's a bunch of operators and functions, such as == and max which also broadcast.
For a while, in Octave 3.6 and 3.8, it was possible to disable this by turning the Octave:broadcast into an error. However, because the way errors are handled in the language, that effectively made all Octave functions that used broadcasting to error.
I have a txt file which contains a set of 3 Dimensional data points and I would like to create a vtkPolyData based on those points.
In the file, I have the number of points on the first line, in my case they are 6 x 6. And after that the actual coordinates of each point. The content of the file is like this.
6 6
1 1 3
2 1 3.4
3 1 3.6
4 1 3.6
5 1 3.4
6 1 3
1 2 3
2 2 3.8
3 2 4.2
4 2 4.2
5 2 3.8
6 2 3
1 3 3
2 3 3
3 3 3
4 3 3
5 3 3
6 3 3
1 4 3
2 4 3
3 4 3
4 4 3
5 4 3
6 4 3
1 5 3
2 5 3.8
3 5 4.2
4 5 4.2
5 5 3.8
6 5 3
1 6 3
2 6 3.4
3 6 3.6
4 6 3.6
5 6 3.4
6 6 3
How can I build a vtkPolyData structure with a txt file with this data?
It looks to me like you have a regularly gridded series of points, right? If so, vtkImageData might be a better choice. You can always use a geometry filter afterwards to convert to polydata if you really need it that way.
Create a vtkImageData instance.
Set its dimensions to (6, 6, 1) (the third dimension is ignored).
Set its data type to an appropriate type (float or double, I guess).
Call AllocateScalars();
If in C++,
call GetScalarPointer() and cast it to the data type set in 3.
This pointer will point to an array of size 36. You can just fill each point as you would normally.
If in another language (TCL/Python/Java), call SetScalarComponentFromFloat on the image data, with the arguments (x, y, 0, 0, value). The first 0 is the 3rd dimension and the second is for the first component.
This will give you a grid, and it'll be far more memory efficient than a polydata.
If you want to visualize only the points, use a vtkDataSetMapper, and setup the actor's property with SetRepresentationToPoints(), setting an appropriate point size. That will do a simple job of visualization.
Are these examples useful? In particular, this does generation of points and polygons, so it should be possible to adapt. The core seems to be (with lots left out):
# ...
vtkPolyData shell
vtkFloatPoints points
vtkCellArray strips
# Generate points...
loop {
...
points InsertPoint $k $x0 $x1 $x2
}
shell SetPoints points
points Delete
# Generate triangles/polygons...
loop {
strips InsertNextCell $NP2
# ...
strips InsertCellPoint [expr $kb +$ke ]
# ...
strips InsertCellPoint [expr $kb +$ke ]
}
shell SetStrips strips
strips Delete
# ...
I don't really understand how modulus division works.
I was calculating 27 % 16 and wound up with 11 and I don't understand why.
I can't seem to find an explanation in layman's terms online.
Can someone elaborate on a very high level as to what's going on here?
Most explanations miss one important step, let's fill the gap using another example.
Given the following:
Dividend: 16
Divisor: 6
The modulus function looks like this:
16 % 6 = 4
Let's determine why this is.
First, perform integer division, which is similar to normal division, except any fractional number (a.k.a. remainder) is discarded:
16 / 6 = 2
Then, multiply the result of the above division (2) with our divisor (6):
2 * 6 = 12
Finally, subtract the result of the above multiplication (12) from our dividend (16):
16 - 12 = 4
The result of this subtraction, 4, the remainder, is the same result of our modulus above!
The result of a modulo division is the remainder of an integer division of the given numbers.
That means:
27 / 16 = 1, remainder 11
=> 27 mod 16 = 11
Other examples:
30 / 3 = 10, remainder 0
=> 30 mod 3 = 0
35 / 3 = 11, remainder 2
=> 35 mod 3 = 2
The simple formula for calculating modulus is :-
[Dividend-{(Dividend/Divisor)*Divisor}]
So, 27 % 16 :-
27- {(27/16)*16}
27-{1*16}
Answer= 11
Note:
All calculations are with integers. In case of a decimal quotient, the part after the decimal is to be ignored/truncated.
eg: 27/16= 1.6875 is to be taken as just 1 in the above mentioned formula. 0.6875 is ignored.
Compilers of computer languages treat an integer with decimal part the same way (by truncating after the decimal) as well
Maybe the example with an clock could help you understand the modulo.
A familiar use of modular arithmetic is its use in the 12-hour clock, in which the day is divided into two 12 hour periods.
Lets say we have currently this time: 15:00
But you could also say it is 3 pm
This is exactly what modulo does:
15 / 12 = 1, remainder 3
You find this example better explained on wikipedia: Wikipedia Modulo Article
The modulus operator takes a division statement and returns whatever is left over from that calculation, the "remaining" data, so to speak, such as 13 / 5 = 2. Which means, there is 3 left over, or remaining from that calculation. Why? because 2 * 5 = 10. Thus, 13 - 10 = 3.
The modulus operator does all that calculation for you, 13 % 5 = 3.
modulus division is simply this : divide two numbers and return the remainder only
27 / 16 = 1 with 11 left over, therefore 27 % 16 = 11
ditto 43 / 16 = 2 with 11 left over so 43 % 16 = 11 too
Very simple: a % b is defined as the remainder of the division of a by b.
See the wikipedia article for more examples.
I would like to add one more thing:
it's easy to calculate modulo when dividend is greater/larger than divisor
dividend = 5
divisor = 3
5 % 3 = 2
3)5(1
3
-----
2
but what if divisor is smaller than dividend
dividend = 3
divisor = 5
3 % 5 = 3 ?? how
This is because, since 5 cannot divide 3 directly, modulo will be what dividend is
I hope these simple steps will help:
20 % 3 = 2
20 / 3 = 6; do not include the .6667 – just ignore it
3 * 6 = 18
20 - 18 = 2, which is the remainder of the modulo
Easier when your number after the decimal (0.xxx) is short. Then all you need to do is multiply that number with the number after the division.
Ex: 32 % 12 = 8
You do 32/12=2.666666667
Then you throw the 2 away, and focus on the 0.666666667
0.666666667*12=8 <-- That's your answer.
(again, only easy when the number after the decimal is short)
27 % 16 = 11
You can interpret it this way:
16 goes 1 time into 27 before passing it.
16 * 2 = 32.
So you could say that 16 goes one time in 27 with a remainder of 11.
In fact,
16 + 11 = 27
An other exemple:
20 % 3 = 2
Well 3 goes 6 times into 20 before passing it.
3 * 6 = 18
To add-up to 20 we need 2 so the remainder of the modulus expression is 2.
The only important thing to understand is that modulus (denoted here by % like in C) is defined through the Euclidean division.
For any two (d, q) integers the following is always true:
d = ( d / q ) * q + ( d % q )
As you can see the value of d%q depends on the value of d/q. Generally for positive integers d/q is truncated toward zero, for instance 5/2 gives 2, hence:
5 = (5/2)*2 + (5%2) => 5 = 2*2 + (5%2) => 5%2 = 1
However for negative integers the situation is less clear and depends on the language and/or the standard. For instance -5/2 can return -2 (truncated toward zero as before) but can also returns -3 (with another language).
In the first case:
-5 = (-5/2)*2 + (-5%2) => -5 = -2*2 + (-5%2) => -5%2 = -1
but in the second one:
-5 = (-5/2)*2 + (-5%2) => -5 = -3*2 + (-5%2) => -5%2 = +1
As said before, just remember the invariant, which is the Euclidean division.
Further details:
What is the behavior of integer division?
Division and Modulus for Computer Scientists
Modulus division gives you the remainder of a division, rather than the quotient.
It's simple, Modulus operator(%) returns remainder after integer division. Let's take the example of your question. How 27 % 16 = 11? When you simply divide 27 by 16 i.e (27/16) then you get remainder as 11, and that is why your answer is 11.
Lets say you have 17 mod 6.
what total of 6 will get you the closest to 17, it will be 12 because if you go over 12 you will have 18 which is more that the question of 17 mod 6. You will then take 12 and minus from 17 which will give you your answer, in this case 5.
17 mod 6=5
Modulus division is pretty simple. It uses the remainder instead of the quotient.
1.0833... <-- Quotient
__
12|13
12
1 <-- Remainder
1.00 <-- Remainder can be used to find decimal values
.96
.040
.036
.0040 <-- remainder of 4 starts repeating here, so the quotient is 1.083333...
13/12 = 1R1, ergo 13%12 = 1.
It helps to think of modulus as a "cycle".
In other words, for the expression n % 12, the result will always be < 12.
That means the sequence for the set 0..100 for n % 12 is:
{0,1,2,3,4,5,6,7,8,9,10,11,0,1,2,3,4,5,6,7,8,9,10,11,0,[...],4}
In that light, the modulus, as well as its uses, becomes much clearer.
Write out a table starting with 0.
{0,1,2,3,4}
Continue the table in rows.
{0,1,2,3,4}
{5,6,7,8,9}
{10,11,12,13,14}
Everything in column one is a multiple of 5. Everything in column 2 is a
multiple of 5 with 1 as a remainder. Now the abstract part: You can write
that (1) as 1/5 or as a decimal expansion. The modulus operator returns only
the column, or in another way of thinking, it returns the remainder on long
division. You are dealing in modulo(5). Different modulus, different table.
Think of a Hash Table.
When we divide two integers we will have an equation that looks like the following:
A/B =Q remainder R
A is the dividend; B is the divisor; Q is the quotient and R is the remainder
Sometimes, we are only interested in what the remainder is when we divide A by B.
For these cases there is an operator called the modulo operator (abbreviated as mod).
Examples
16/5= 3 Remainder 1 i.e 16 Mod 5 is 1.
0/5= 0 Remainder 0 i.e 0 Mod 5 is 0.
-14/5= 3 Remainder 1 i.e. -14 Mod 5 is 1.
See Khan Academy Article for more information.
In Computer science, Hash table uses Mod operator to store the element where A will be the values after hashing, B will be the table size and R is the number of slots or key where element is inserted.
See How does a hash table works for more information
This was the best approach for me for understanding modulus operator. I will just explain to you through examples.
16 % 3
When you division these two number, remainder is the result. This is the way how i do it.
16 % 3 = 3 + 3 = 6; 6 + 3 = 9; 9 + 3 = 12; 12 + 3 = 15
So what is left to 16 is 1
16 % 3 = 1
Here is one more example: 16 % 7 = 7 + 7 = 14 what is left to 16? Is 2 16 % 7 = 2
One more: 24 % 6 = 6 + 6 = 12; 12 + 6 = 18; 18 + 6 = 24. So remainder is zero, 24 % 6 = 0