Tesseract running error - ocr

I have a problem with running tesseract-ocr engine on linux. I've downloaded RUS language data and put it to tessdata directory (/usr/local/share/tessdata). When I'm trying to run tesseract with command tesseract blob.jpg out -l rus , it displays an error:
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language eng
Tesseract couldn't load any languages!
Could not initialize tesseract.
According to compiling guide, I used export TESSDATA_PREFIX='/usr/local/share/'
to point my tessdata directory.
Maybe I should edit any config files? Tesseract try to load 'eng' data files instead of 'rus'.
Screenshot:
http://i.stack.imgur.com/I0Guc.png

You can grab eng.traineddata Github:
wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata
Check https://github.com/tesseract-ocr/tessdata for a full list of trained language data.
When you grab the file(s), move them to the /usr/local/share/tessdata folder. Warning: some Linux distributions (such as openSUSE and Ubuntu) may be expecting it in /usr/share/tessdata instead.
# If you got the data from Google, unzip it first!
gunzip eng.traineddata.gz
# Move the data
sudo mv -v eng.traineddata /usr/local/share/tessdata/

The simpliest way is to install the needed package:
sudo apt-get install tesseract-ocr-eng #for english
sudo apt-get install tesseract-ocr-tam #for tamil
sudo apt-get install tesseract-ocr-deu #for deutsch (German)
As you can notice, it opens the road to others languages (i.e. tesseract-ocr-fra).

I had this error too on the Windows machine.
My solution.
1) Download your language files from
https://github.com/tesseract-ocr/tessdata/tree/3.04.00
For example, for eng, I downloaded all files with eng prefix.
2) Put them into tessdata directory inside of some folder. Add this folder into System Path variables as TESSDATA_PREFIX.
Result will be
System env var: TESSDATA_PREFIX=D:/Java/OCR
And OCR folder has tessdata with languages files.
This is a screenshot of the directory:

No previous solution worked for me.
I've installed both by apt-get and manually downloading the tessdata, moved around /usr and so on and no one worked even if i exported the variable thousand times.
Finally, on a last try before start to cry i've tried to pass the path directly to the instance of Tesseract().
In Python: tr = Tesseract("/usr/local/share/tesseract-ocr/") and now it works. To clarify, im using tesserwrap module.

For Windows Users:
In Environment Variables, add a new variable in system variable with name "TESSDATA_PREFIX" and value is "C:\Program Files (x86)\Tesseract-OCR\tessdata"

tesseract --tessdata-dir <tessdata-folder> <image-path> stdout --oem 2 -l <lng>
In my case, the mistakes that I've made or attempts that wasn't a success.
I cloned the github repo and copied files from there to
/usr/local/share/tessdata/
/usr/share/tesseract-ocr/tessdata/
/usr/share/tessdata/
Used TESSDATA_PREFIX with above paths
sudo apt-get install tesseract-ocr-eng
First 2 attempts did not worked because, the files from git clone did not worked for the reasons that I do not know. I am not sure why #3 attempt worked for me.
Finally,
I downloaded the eng.traindata file using wget
Copied it to some directory
Used --tessdata-dir with directory name
Take away for me is to learn the tool well & make use of it, rather than relying on package manager installation & directories

For me the problem was in how I downloaded the train data files. Make sure you get the raw link.
Initially I was using:
wget https://github.com/tesseract-ocr/tessdata_best/blob/master/eng.traineddata
When I changed it to:
wget https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata
It worked

For Ubuntu just run the below command and the Environment variable error will disappear.
command:
export TESSDATA_PREFIX=Path_of_your_tessdata_folder
Command Example:
export TESSDATA_PREFIX=/home/amar/Desktop/OCR/tesseract-4.1.1/tessdata
This command will set the tessdata folder's path to the environment variable with name TESSDATA_PREFIX and the above error will be resolved.

You can call tesseract API function from C code:
#include <tesseract/baseapi.h>
#include <tesseract/ocrclass.h>; // ETEXT_DESC
using namespace tesseract;
class TessAPI : public TessBaseAPI {
public:
void PrintRects(int len);
};
...
TessAPI *api = new TessAPI();
int res = api->Init(NULL, "rus");
api->SetAccuracyVSpeed(AVS_MOST_ACCURATE);
api->SetImage(data, w0, h0, bpp, stride);
api->SetRectangle(x0,y0,w0,h0);
char *text;
ETEXT_DESC monitor;
api->RecognizeForChopTest(&monitor);
text = api->GetUTF8Text();
printf("text: %s\n", text);
printf("m.count: %s\n", monitor.count);
printf("m.progress: %s\n", monitor.progress);
api->RecognizeForChopTest(&monitor);
text = api->GetUTF8Text();
printf("text: %s\n", text);
...
api->End();
And build this code:
g++ -g -I. -I/usr/local/include -o _test test.cpp -ltesseract_api -lfreeimageplus
(i need FreeImage for picture loading)

I'm using windows OS, I tried all solutions above and none of them work.
Finally, I install Tesseract-OCR on D drive(Where I run my python script from) instead of C drive and it works.
So, if you are using windows, run your python script in the same drive as your Tesseract-OCR.

In Google Colab I resolved the issue in this way:
!sudo apt-get install tesseract-ocr-*
Because if you use this command !sudo apt install tesseract-ocr then it imports 2 languages but when you intend to work on non-English languages then the former command works.
Afterwards, use this command !pip install pytesseract
You can also check languages in this way !tesseract --list-langs

I'm using Visual Studio 2017 Community Edition.
I solved this problem by making a directory called tessdata in the Debug directory of my project. Then I put the eng.traineddata file into said directory.

C# developer working on Windows here. What works for me is simply download the file eng.traineddata from the following URL:
https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata
and copy it to the following directory in my Console Application project:
[Project Directory]\bin\Debug\tessdata
I did manually create the tessdata folder above.

tessdata_dir_config = r'--tessdata-dir "/usr/local/Cellar/tesseract/4.1.1/share/tessdata"'
pytesseract.image_to_string(imgCrop,lang='eng',config=tessdata_dir_config)

Add this to your code :
instance.setDatapath("C:\\somepath\\tessdata");
instance.setLanguage("eng");

How I solved the problem in my Manjaro Xfce:
Message “TesseractError: (1, 'Error opening data file /home/julio/snap/tesseract/common/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.')”
Then, in my Manjaro, I typed: sudo pacman -S tesseract
Then the system installed both the “tesseract” and also a package name “leptonica”
After this step, I thought everything was ok, and tried to run my simple script. However, the error message changed to something like this (it changed the previous “/home” location to other “/usr”-like location):
“"Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.')"”
Then I realized that there had appeared this message when I installed “tesseract” with pacman: “You must install one of tesseract-data-* packages or whole tesseract-data group”
So, I tried the command: “sudo pacman -S tesseract-data”, and the system presented lots of language options to me. So I’ve chosen some languages, installed as follows, and the module started to work like a charm:
sudo pacman -S tesseract-data-eng
sudo pacman -S tesseract-data-por
sudo pacman -S tesseract-data-fra
sudo pacman -S tesseract-data-spa
I tried some portuguese special characters (like "ão"), that only worked when I used the argument "lang='por'" in the pytesseract.image_to_string(img,lang='por')

As of 2021, My solution for Ubuntu is to download the zip files from https://github.com/tesseract-ocr/tessdata_best/releases/tag/4.1.0, extract and copy the neccessary .traineddata files into /usr/local/share/tessdata. This is the default folder for tesseract 4.1.1 to search for trained data.

I had the same problem with DEU language on macOS. I could solve it by installing all additional languages like so:
brew install tesseract-lang
as suggested on https://formulae.brew.sh/formula/tesseract

**IF you have windows OS then please add your TesseractOCR to system variable.
Eg..
Find the path where Tesseract is installed in your c drive (in my case r"C:\Program Files\Tesseract-OCR\tesseract.exe")**
2)make sure you have the required files ie tessdata, tessdata if not then download it from https://github.com/tesseract-ocr/tessdata https://github.com/tesseract-ocr/langdata (At least those languages which you want to convert)
past it into the main directory in my case C:\Program Files\Tesseract-OCR
4)Add the path of the directory to your system environment variable
for that
search environment variable in start bar
go to environment variable
click path in your system environment variable (NOT IN USER ENVIRONMENT VARIABLE)
past the path of tesseractocr
thats all...

Related

How to run html file on port 8080 through vs code

I am trying to run my html file on port 8080 through command 'http-server' but the terminal keeps saying 'command not found'.
I have tried solving this through 'npx http-server' and 'npm install -g http-server' but then I'm told that the 'npx' and 'npm' commands cannot be found as well.
hello friends please make sure you install node js correctly and maybe this article could help
try node -v for check node is install or not
try npm -v for check npm is install or not
NOTE if you are using windows please make sure to add PATH
how to add PATH in windows
search 'Environment Variables' on global search
Choose “Edit system environment variables”.
Click “Environment Variables” in the “Advanced” tab.
In the “System Variables” box, search for Path and edit it to include the path C:\Program Files\nodejs. If you don’t see it there click “New” then add this path. (Note: Depending on your version you may just need to edit and append this path to what’s there by prefixing it with a semicolon. You’ll see the other paths there are also separated by semicolons).
If you are using linux it could be the permission issues
sudo chown -R $(whoami):admin /usr/local/lib/node_modules/
https://linuxhint.com/npm-command-not-found/

How do I specify a path for pyinstaller

I am learning in a tutorial, how to create widgets. The tutorial, however requires you to use pyinstaller to send the program to anyone. The problem I am facing is specifying my path.
Here is a recent attempt on the terminal command:
C:\tkinter.idea>cd pyinstaller.exe --onefile --icon=sun_icon.ico book.py
The system cannot find the path specified.
Comment below if you need further clarification.
The cd command only works for accessing directory files inside your system (folders). As pyinstaller.exe is a program you don't need to pass the cd command, remove this command and just input pyinstaller.exe --onefile --icon=sun_icon.ico book.py in your cmd.

How to connect to Mysql from Lua with local-infile=1

I am connecting to a mysql database from lua using :
mysql = require "luasql.mysql"
local env = mysql.mysql()
local conn = env:connect(database,userName,password)
but the option local-infile is not activated so my requests using LOAD DATA don't work.
I tried to put the line
local-infile = 1
in the file my.cnf in the field [client] but it still doesn't work.
FYI : I am using linux and mysql 5.1.
I went through the same situation last week. The query LOAD DATA INFILE worked on Mac OSX, but I could not make it work on Ubuntu. The only way I found to make it work was adding one line of code to the LuaSQL project and recompiling it.
I used the MySQL driver's function mysql_options (you can check its prototype in the mysql.h file, probably located at /usr/include/mysql) to enable the local-infile. You can check the code at the repository.
To compile and install this workaround, you should download the files:
$ wget https://github.com/rafaeldias/luasql/archive/master.zip
$ unzip master.zip
To compile and install :
$ cd luasql-master/
$ make
$ sudo make install
Note: Depending on where your Lua and MySQL folders are located, you may need to set the proper values for the LUA_LIBDIR, LUA_DIR , LUA_INC , DRIVER_LIBS and DRIVER_INCS in the config file within the LuaSQL folder.
Hope it helps.

How do I extend the $PATH that Sublime Text 2 uses?

I just installed Sublime-jshint (and the requisite node.js + jshint) but get this error when I try to invoke JSHint from within ST2:
[Errno 2] No such file or directory
[cmd: [u'jshint', u'PATH-TO-THE-JS-FILE-I-AM-LINTING', u'--reporter', u'/home/cmg/.config/sublime-text-2/Packages/JSHint/reporter.js']]
[dir: DIR-MY-JS-FILE-IS-IN]
[path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/cmg/bin]
[Finished]
The final item in the given path is in the home dir of my user (cmg), so it's been customized somehow... but I don't recall how, so I don't know how to add the dir I need (~/node_modules/.bin).
I've added it to $PATH in my shell (via both .bashrc and .bash_profile) but ST2 doesn't pick it up.
(I'm on Ubuntu 14.04. All the usable stuff I've found via Google on this subject has been either OS X specific or related to ST's build system).
Basically, the exec command, which the jshint package uses internally, allows you to set/extend the PATH of the spawned subprocess. (docs)
The package actually uses this path argument on OSX, but has it hardcoded (I am partly guilty of that as I rewrote the command because it was just horrible before). It should allow for a setting to specify the path to your jshint executable, so I suggest you create an issue for that.
I don't know why ST dosn't pick up your PATH from somewhere else since I have very little experience with that.
Open /etc/profile in Sublime (using sudo) and add the following line at the very bottom:
export PATH=/home/cmg/node_modules/.bin:$PATH
and save the file. Restart completely, and your PATH should be updated.

How to convert bash file to a binary executable

I created a binary executable from bash script on linux server through SHC. The binary created works fine on linux machines, but through mistake on Mac. How could I convert my bash file to binary executable that is able to run everywhere(ubuntu, CentOS, Mac, Cygwin)?
shc -v -r -T -f ir16fetcher.sh
mv ir16fetcher.sh.x ir16fetcher
Shebang of my bash script
#!/bin/bash
On Linux machines
./ir16installer
USAGE : ir16fetcher <servername/ip address> [the n th latest build - optional. Default 1]
EXAMPLE: ir16fetcher jagger 2
EXAMPLE: ir16fetcher 167.116.6.155
REQUIRE: Please make sure conf file in installation folder ~/IRinstall/ir16 & ~/IRinstall/irmanager
On my Mac
./ir16installer
-bash: ./ir16installer: cannot execute binary file
I think it's not gonna work
"The compiled binary will still be dependent on the shell
specified in the first line of the shell code (i.e.
#!/bin/sh), thus shc does not create completely independent
binaries."
From http://www.datsi.fi.upm.es/~frosal/sources/shc.html
You will have to do this for every architecture and operating system you need to support. In any case, there doesn't really seem to be any benefits of using this method for distribution. It adds dependencies and complicates delivery, and I'm pretty sure whatever obfuscation the "shc" compiler implements is easily reversed.
if the goal here is to "hide" your source code, and then have the "hidden" copy of the code be executable on the Unix OSes you listed, then, encryption is really your only option.
I say this because encryption tools are available on every base Unix install. For your purposes, this is a very good thing as you wont have to download or configure anything additional. They're just there, as part of the natural installation of the OS. One of such tools is called openssl.
To Encrypt your file/script with openssl:
echo precious-content | openssl aes-128-cbc -a -salt -k mypassword
U2FsdGVkX1+K6tvItr9eEI4yC4nZPK8b6o4fc0DR/Vzh7HqpE96se8Fu/BhM314z
To Decrypt your file/script with openssl:
echo U2FsdGVkX1+K6tvItr9eEI4yC4nZPK8b6o4fc0DR/Vzh7HqpE96se8Fu/BhM314z | openssl aes-128-cbc -a -d -salt -k mypassword
precious-content
Now, to get openssl to do what you want it to do automatically without having to spend hours of your own time figuring out a way, you can paste your script to a site like www.EnScryption.com. This site will generate an "executable" version of your code for you, which you can then run on any Mac, Ubuntu, RedHat, CentOS box.