Tesseract thinks my 1's are 7's - ocr

It seems like this is probably a common issue with ocr. Is there a way to tell tesseract that my 1's are actually 1's?
Hopefully without changing my 7's into 1's in the process.
Note: these are scanned documents and I have no idea what font was used.

if "tesseract" is trainable, try to train it on the font manually. It should solve the problem.
There is another possible solution. Make a small valdiation module after "tesseracting". For all 1s and 7s, double check them using intensity based method. For example try to find corners(feature points) on it and apply KLT with 1 and 7 template and see which one got more positive tracking result. This method is costy but since you will try it on just 2 templates and so small, I do not think it gonna be a big performance decreasing.
if both solution are not possible , try to solve it using post-processing. For example, if it is a student age it would not be 78, it is 18 and so on. However this method is so bad and not a solution at all. but when no other solution is possible you have to do something like it.

Related

Relay Calculator: How to show binary results in a display?

I'm trying to create a calculator(real calculator), from scratch, with only simple parts (relay, diode, etc.). But I got to a chronic problem, how the hell do I convert binary numbers, ex:00101110 to decimal? so that I can make my display show 46 for example. Making the display translate a 0000...1001 (0...9) is easy, but what about after that? when a number comes with two or more decimal places (ex:10000000 =128). I know it can be tricky to explain, so is there somewhere I can find the answer, maybe a schematic?
This isn't a programming language, it's literally a relay computer (remember the old IBM, Harvard Mark I?). I just want to make a relay calculator that does binary calculations (the calculation part is theoretically finished. For now only sum.)
What I can't do is make the result in binary become something that can be shown on a 7segment display.
An easy example is: "0000 0111" this I can make the display show the number 7, because it has only one decimal place. Now with "0011 0100" the situation changes, the number would be 52, simply making the "52" appear on a display is not a challenge, the problem here is: how does a processor translate binary numbers from 0000 to infinity, in a way that can you put it on a display?
I don't necessarily need a definitive answer, whatever, even if a website, a book, a light at the end of the tunnel.

How to do a base conversion with Little Man Computer?

I need to convert a decimal number to a base between 2 and 9 using Little Man Computer. How do I proceed?
I believe successive divisions are the best method. In my opinion, I must write a code which divides two numbers, then save the integer ratio for the next division, as well as all of the remainders in an array of indefinite size, but I've been struggling with the division code for hours now. I tried searching for a code which divides two numbers, but all the ones I tried have mistakes/don't work. I'm stuck at the easiest part of the problem, I can't imagine how I'm ever going to be able to write a self-modifying code which manages an array of ever-increasing line positions and backtracks through it at the end to extract all the remainders. I'm at a loss here, any help would be appreciated.

reverse engineering a checksum algorithm

I am all the time trying to determine how to reverse engineer this checksum. It should be a simple one, it is just a checksum for a firmware version of a device. Here are 5 hex-strings:
01854000ff1131050600323132393031323430344d45363438304330363730313835ffffffffffffffffffffffffffffffffffffffffffffffffffffff30303232313239303035353138303031485534355f4543455f4456445f535f4e00443120302e3120ff7beff9fff36fff7ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff40b
01854000ff1029050600323132393031303830364d45373738304230373541303436ffffffffffffffffffffffffffffffffffffffffffffffffffffff30303132313239303036333133303031485534355f4543455f4456445f535f4e00443520302e3120ff7beff9fff36fff7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffaa87
01854000ff1029050600323034393031323230334d45353135304238303830333132ffffffffffffffffffffffffffffffffffffffffffffffffffffff30303232303439303035393038303031485534355f4543455f4456445f535f4e00443520302e3120ff7beff9fff36fff7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff9e6a
01854000ff1131050600323436393031303330324d45363831304330363632333636ffffffffffffffffffffffffffffffffffffffffffffffffffffff30303332343639303036393038303031485534355f4543455f4456445f535f4e00443120302e3120ff7beff9fff36fff7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffe5d6
01854000ff1301050600323436393031343430344d45373435304430373132313239ffffffffffffffffffffffffffffffffffffffffffffffffffffff30303132343639303031333133303031485534355f4543455f4456445f535f4e00443520302e3120ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffeb57
It looks like the last 2 bytes (4 hex-letters) are the checksum. I marked the differences in black.
Is anyone able to find out the algorithm, how the checksum is created? I tried already many things to find it out, but either I did it wrong or it didnĀ“t work.
My guess:
checksum(data) = CRC16-CCITT(data) XOR 0x6155
(which may be equivalent to another standard CRC16, I don't know)
See here for an online demo
Well it can be just about anything.. there are multiple implementation of crc, check for example these, I would apply those crcs on the data and compare their outputs to what you have ..

solving Project Euler #305

Problem # 305
Let's call S the (infinite) string
that is made by concatenating the
consecutive positive integers
(starting from 1) written down in base
10.
Thus, S =
1234567891011121314151617181920212223242...
It's easy to see that any number will
show up an infinite number of times in
S.
Let's call f(n) the starting position
of the nth occurrence of n in S. For
example, f(1)=1, f(5)=81, f(12)=271
and f(7780)=111111365.
Find Summation[f(3^k)] for 1 <= k <=
13.
How can I go about solving this?
Calculating S to an arbitrary size is deceivingly easy, but as you have probably already found out, not practical, it simply becomes too big .
As is common for the newer Project Euler Problems, brute force simply does not work.
That said, you can still look at S for small values of k and maybe construct a formula that will solve the problem in parts (the first few values are easy to handle in memory). Also, look at Problem 40
Note: remember the one minute rule. (most problems can be solved in a few milliseconds)
My estimate of the running time is O(n2 log n), so this brute force approach is not feasible.
Note that you are supposed to solve Project Euler problems yourself, which IMHO applies in particular to newer problems.

Programmatically obtaining the number of colors used in an image

Question:
Given an image in PNG format, what is the simplest way to programmatically obtain the number of colors used in the image?
Constraints:
The solution will be integreted into a shell script running under Linux, so any solution that fits in such an environment will do.
Please note that the "color capacity of the image file" does not necessarily correspond to "colors used". Example: In an image file with a theoretical color capacity of 256 colors only say 7 colors might be in actual use. I want to obtain the number of colors actually used.
Why write your own program?
If you're doing this with a shell script, you can use the netpbm utilities:
count = `pngtoppm png_file | ppmhist -noheader | wc -l`
The Image.getcolors method in Python Imaging Library seems to do exactly what you want.
Fun. There doesn't appear to be any guaranteed method of doing this; in the worst case you'll need to scan the image and interpret every pixel, in the best possible case the PNG will be using a palette and you can just check there.
Even in the palette case, though, you're not guaranteed that every entry is used -- so you're (at best) getting an upper bound.
http://www.libpng.org/pub/png/spec/1.1/PNG-Contents.html
.. and the chunk info here:
http://www.libpng.org/pub/png/spec/1.1/PNG-Chunks.html
Alnitak's solution is nice :) I really should get to know netpbm and imagemagick etc. better some time.
Just FYI, as a simple and very general solution: loop through each pixel in the image, getting the r,g,b color values as a single integer. Look for that integer in a list. If it's not there, add it. When finished with all the pixels, print the number of colors in the list.
If you want to count occurences, use a hashmap/dictionary instead of a simple list, incrementing the key's value (a counter) if found in the dictionary already. If not found, add it with a starting counter value of 1.