How to use Kraken OCR in your python script? - ocr

There is this OCR library called "Kraken". I would like to build a script in Python that uses this library, but when I see documentation, it doesn't tell me anything about using it in the way "from kraken import ...". How to use that library in that way? I want to build a script for reading the text with Kraken, but I cannot find anything in documentation.

Kraken is a turn-key OCR system forked from ocropus.
It requires some external libraries to run.
Installation
Using pip:
$ pip3 install kraken
If you are running Anaconda/miniconda, use:
$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml
$ conda env create -f environment.yml
Sample Code (Google Colab)
For more details see the official documentation page.

Related

Difficulty installing JSON module for Perl on Mac [duplicate]

I get this error:
Can't locate Foo.pm in #INC
Is there an easier way to install it than downloading, untarring, making, etc?
On Unix:
usually you start cpan in your shell:
$ cpan
and type
install Chocolate::Belgian
or in short form:
cpan Chocolate::Belgian
On Windows:
If you're using ActivePerl on Windows, the PPM (Perl Package Manager) has much of the same functionality as CPAN.pm.
Example:
$ ppm
ppm> search net-smtp
ppm> install Net-SMTP-Multipart
see How do I install Perl modules? in the CPAN FAQ
Many distributions ship a lot of perl modules as packages.
Debian/Ubuntu: apt-cache search 'perl$'
Arch Linux: pacman -Ss '^perl-'
Gentoo: category dev-perl
You should always prefer them as you benefit from automatic (security) updates and the ease of removal. This can be pretty tricky with the cpan tool itself.
For Gentoo there's a nice tool called g-cpan which builds/installs the module from CPAN and creates a Gentoo package (ebuild) for you.
Try App::cpanminus:
# cpanm Chocolate::Belgian
It's great for just getting stuff installed. It provides none of the more complex functionality of CPAN or CPANPLUS, so it's easy to use, provided you know which module you want to install. If you haven't already got cpanminus, just type:
# cpan App::cpanminus
to install it.
It is also possible to install it without using cpan at all. The basic bootstrap procedure is,
curl -L http://cpanmin.us | perl - --sudo App::cpanminus
For more information go to the App::cpanminus page and look at the section on installation.
I note some folks suggesting one run cpan under sudo. That used to be necessary to install into the system directory, but modern versions of the CPAN shell allow you to configure it to use sudo just for installing. This is much safer, since it means that tests don't run as root.
If you have an old CPAN shell, simply install the new cpan ("install CPAN") and when you reload the shell, it should prompt you to configure these new directives.
Nowadays, when I'm on a system with an old CPAN, the first thing I do is update the shell and set it up to do this so I can do most of my cpan work as a normal user.
Also, I'd strongly suggest that Windows users investigate strawberry Perl. This is a version of Perl that comes packaged with a pre-configured CPAN shell as well as a compiler. It also includes some hard-to-compile Perl modules with their external C library dependencies, notably XML::Parser. This means that you can do the same thing as every other Perl user when it comes to installing modules, and things tend to "just work" a lot more often.
If you're on Ubuntu and you want to install the pre-packaged perl module (for example, geo::ipfree) try this:
$ apt-cache search perl geo::ipfree
libgeo-ipfree-perl - A look up country of ip address Perl module
$ sudo apt-get install libgeo-ipfree-perl
A couple of people mentioned the cpan utility, but it's more than just starting a shell. Just give it the modules that you want to install and let it do it's work.
$prompt> cpan Foo::Bar
If you don't give it any arguments it starts the CPAN.pm shell. This works on Unix, Mac, and should be just fine on Windows (especially Strawberry Perl).
There are several other things that you can do with the cpan tool as well. Here's a summary of the current features (which might be newer than the one that comes with CPAN.pm and perl):
-a
Creates the CPAN.pm autobundle with CPAN::Shell->autobundle.
-A module [ module ... ]
Shows the primary maintainers for the specified modules
-C module [ module ... ]
Show the Changes files for the specified modules
-D module [ module ... ]
Show the module details. This prints one line for each out-of-date module (meaning,
modules locally installed but have newer versions on CPAN). Each line has three columns:
module name, local version, and CPAN version.
-L author [ author ... ]
List the modules by the specified authors.
-h
Prints a help message.
-O
Show the out-of-date modules.
-r
Recompiles dynamically loaded modules with CPAN::Shell->recompile.
-v
Print the script version and CPAN.pm version.
sudo perl -MCPAN -e 'install Foo'
Also see Yes, even you can use CPAN. It shows how you can use CPAN without having root or sudo access.
Otto made a good suggestion. This works for Debian too, as well as any other Debian derivative. The missing piece is what to do when apt-cache search doesn't find something.
$ sudo apt-get install dh-make-perl build-essential apt-file
$ sudo apt-file update
Then whenever you have a random module you wish to install:
$ cd ~/some/path
$ dh-make-perl --build --cpan Some::Random::Module
$ sudo dpkg -i libsome-random-module-perl-0.01-1_i386.deb
This will give you a deb package that you can install to get Some::Random::Module. One of the big benefits here is man pages and sample scripts in addition to the module itself will be placed in your distro's location of choice. If the distro ever comes out with an official package for a newer version of Some::Random::Module, it will automatically be installed when you apt-get upgrade.
Already answered and accepted answer - but anyway:
IMHO the easiest way installing CPAN modules (on unix like systems, and have no idea about the wondows) is:
curl -L http://cpanmin.us | perl - --sudo App::cpanminus
The above is installing the "zero configuration CPAN modules installer" called cpanm. (Can take several minutes to install - don't break the process)
and after - simply:
cpanm Foo
cpanm Module::One
cpanm Another::Module
Many times it does happen that cpan install command fails with the message like
"make test had returned bad status, won't install without force"
In that case following is the way to install the module:
perl -MCPAN -e "CPAN::Shell->force(qw(install Foo::Bar));"
Lots of recommendation for CPAN.pm, which is great, but if you're using Perl 5.10 then you've also got access to CPANPLUS.pm which is like CPAN.pm but better.
And, of course, it's available on CPAN for people still using older versions of Perl. Why not try:
$ cpan CPANPLUS
Use cpan command as cpan Modulename
$ cpan HTML::Parser
To install dependencies automatically follow the below
$ perl -MCPAN -e shell
cpan[1]> o conf prerequisites_policy follow
cpan[2]> o conf commit
exit
I prefer App::cpanminus, it installs dependencies automatically. Just do
$ cpanm HTML::Parser
On ubuntu most perl modules are already packaged, so installing is much faster than most other systems which have to compile.
To install Foo::Bar at a commmand prompt for example usually you just do:
sudo apt-get install libfoo-bar-perl
Sadly not all modules follow that naming convention.
On Fedora Linux or Enterprise Linux, yum also tracks perl library dependencies. So, if the perl module is available, and some rpm package exports that dependency, it will install the right package for you.
yum install 'perl(Chocolate::Belgian)'
(most likely perl-Chocolate-Belgian package, or even ChocolateFactory package)
Even it should work:
cpan -i module_name
2 ways that I know of :
USING PPM :
With Windows (ActivePerl) I've used ppm
from the command line type ppm. At the ppm prompt ...
ppm> install foo
or
ppm> search foo
to get a list of foo modules available. Type help for all the commands
USING CPAN :
you can also use CPAN like this (*nix systems) :
perl -MCPAN -e 'shell'
gets you a prompt
cpan>
at the prompt ...
cpan> install foo (again to install the foo module)
type h to get a list of commands for cpan
On Fedora you can use
# yum install foo
as long as Fedora has an existing package for the module.
Easiest way for me is this:
PERL_MM_USE_DEFAULT=1 perl -MCPAN -e 'install DateTime::TimeZone'
a) automatic recursive dependency detection/resolving/installing
b) it's a shell onliner, good for setup-scripts
If you want to put the new module into a custom location that your cpan shell isn't configured to use, then perhaps, the following will be handy.
#wget <URL to the module.tgz>
##unpack
perl Build.PL
./Build destdir=$HOME install_base=$HOME
./Build destdir=$HOME install_base=$HOME install
Sometimes you can use the yum search foo to search the relative perl module, then use yum install xxx to install.
Secure solution
Many answers mention the use of the cpan utility (which uses CPAN.pm) without a word on security. By default, CPAN 2.27 and earlier configures urllist to use a http URL (namely, http://www.cpan.org/), which allows MITM attacks, thus is insecure. This is what is used to download the CHECKSUMS files, so that it needs to be changed to a secure URL (e.g. https://www.cpan.org/).
So, after running cpan and accepting the default configuration, you need to modify the generated MyConfig.pm file (the full path is output) in the following way. Replace
'urllist' => [q[http://www.cpan.org/]],
by
'urllist' => [q[https://www.cpan.org/]],
Note: https is not sufficient; you also need a web site you can trust. So, be careful if you want to choose some arbitrary mirror.
Then you can use cpan in the usual way.
My bug report on rt.cpan.org about the insecure URL.
Simply executing cpan Foo::Bar on shell would serve the purpose.
Seems like you've already got your answer but I figured I'd chime in. This is what I do in some scripts on an Ubuntu (or debian server)
#!/usr/bin/perl
use warnings;
use strict;
#I've gotten into the habit of setting this on all my scripts, prevents weird path issues if the script is not being run by root
$ENV{'PATH'} = '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin';
#Fill this with the perl modules required for your project
my #perl = qw(LWP::Simple XML::LibXML MIME::Lite DBI DateTime Config::Tiny Proc::ProcessTable);
chomp(my $curl = `which curl`);
if(!$curl){ system('apt-get install curl -y > /dev/null'); }
chomp(my $cpanm = system('/bin/bash', '-c', 'which cpanm &>/dev/null'));
#installs cpanm if missing
if($cpanm){ system('curl -s -L http://cpanmin.us | perl - --sudo App::cpanminus'); }
#loops through required modules and installs them if missing
foreach my $x (#perl){
eval "use $x";
if($#){
system("cpanm $x");
eval "use $x";
}
}
This works well for me, maybe there is something here you can use.
On Windows with the ActiveState distribution of Perl, use the ppm command.

Trying to use multi-rake with Google Cloud Functions

I am trying to use this library here: multi-rake
However, as stated in the docs, we have to run this before installing multi-rake:
CFLAGS="-Wno-narrowing" pip install cld2-cffi
So I cannot simply put cld2-cffi and multi-rake in requirements.txt because cld2-cffi needs to be installed like this beforehand. How could I overcome this problem?
According to the official documentation you have to package as local dependencies.
You can also package and deploy dependencies alongside your function.
This approach is useful if your dependency is not available via the
pip package manager or if your Cloud Functions environment's internet
access is restricted. For example, you might use a directory structure
such as the following:
You can then use code as usual from the included local dependency,
localpackage. You can use this approach to bundle any Python packages
with your deployment.
Note: You can still use a requirements.txt file to specify additional
dependencies you haven't packaged alongside your function.
Specifying dependencies in Python

How to install Google Apis Drive v3 via command line on Ubuntu-18.04

I have been trying to install Install-Package Google.Apis.Drive.v3 using this source with the difference that I have Ubuntu-18.04 instead of Windows.
I know it may be a simple question but I have been trying research how to do that from this morning. I installed sudo apt install nuget on my machine and have been trying to add packages or as in this case the Google.Apis.Drive.v3 package but no luck.
I went through this source which was useful, but does not carry information I was able to replicate on my Linux machine.
Also this source, this one and this one too. But also this last one is for Windows and was not very useful.
How do I install Google Apis Drive V3 via command line easily as it is documented for windows but on Ubunbtu-18.04?
Thanks for pointing to the right direction for solving this problem.
Solution
The way you install your Drive API's library is depending on the programming language you are aiming to use. These are the following commands to run depending on the different languages to interact with the API (with their respective links to the source of the setup):
Python:
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
C#/.NET:
Create a new Visual C# Console Application project in Visual Studio.
Open the NuGet Package Manager Console, select the package source nuget.org, and run the following command:
">Install-Package Google.Apis.Drive.v3
Java:
gradle init --type basic
mkdir -p src/main/java src/main/resources
Node.js:
npm install googleapis#39 --save
For the Browser check out the steps to follow here
I hope this has helped you. Let me know if you need anything else or if you did not understood something.
NOTE: For all Ubuntu-18.04 users that wish to install via command line the correct way is: sudo dotnet add package Google.Apis.Drive.v3

How to install models/download packages on Google Colab?

I am using text analytics library "Spacy". I've installed spacy on Google Colab notebook without any issue. But for using it I need to download "en" model.
Generally, that command should look like this:
python -m spacy download en
I tried few ways but I am not able to get it to install on the notebook. Looking for help.
Cheers
If you have a Python interpreter but not a teriminal, you could try:
import spacy.cli
spacy.cli.download("en_core_web_sm")
More manual alternatives can be found here: https://spacy.io/usage/models#download-pip
Fundamentally what needs to happen is the model file needs to be downloaded, unzipped, and placed into the appropriate site-packages directory. You should be able to find a convenient way to do that, e.g. by pip installing the model package directly. But if you get really stuck, you can get the path by looking at the __file__ variable of any module you have installed, e.g. print(spacy.__file__) . That should tell you where on your file-system the site-packages directory is.
or you can use:
import en_core_web_sm
nlp = en_core_web_sm.load()
!python -m spacy download fr_core_news_sm
Worked for me for the French model

MySQL include files from Cygwin gcc

How can I set up MySQL so that i can have header-files and libraries in my Cygwin gcc C++ builds?
I have seen descriptions on the web, but it seems to refer to stuff I don't have, Like "configure.
(I suspect MySQL has changed their build system).
Using an older version could be an option, but I would prefer to have the same versions as on Linux.
I have a full Cygwin install.
First, what's wrong with just using the Windows version? It works fine.
Then, I've wanted to do the same thing as you, and it can be done. Note that I haven't attempted to build the server; all I was interested in was the MySQL client library so I could do some simple client development in the Cygwin environment.
So what do you have to do in order to build the client library on Cygwin?
First, get the tarball. I used mysql-5.5.13.tar.gz. Unpack it in a suitable location, like /usr/local/src.
Then, install the CMake build system via the Cygwin installer. MySQL has switched from GNU Autotools to CMake. CMake is a meta-build system. It generates Makefiles and other build scripts for specific build environments.
Of course, you also need make and gcc.
I had to apply an innocuous little patch posted on the MySQL forum by one Hiroaki Kawai in order to get the stuff to compile:
Finally, I renamed all dtoa() to _dtoa() in mysql/strings/dtoa.c.
The function is static, and should be safe to be renamed.
You can patch using Perl:
perl -pi.orig -e 's/\bdtoa\b/_dtoa/g' strings/dtoa.c
Then, in the top source directory, type:
cmake .
make mysqlclient
You'll get two static libraries in libmysql/, libclientlib.a and libmysqlclient.a. I don't know that the former is (possibly just a build artefact), but the latter is the real thing.
cp /usr/local/src/mysql-5.5/libmysql/libmysqlclient.a /usr/local/lib/
But it's static, and you likely want a dynamic library. This is where the Cygwin docs come in handy. So:
module=mysqlclient
gcc -shared -o cyg${module}.dll \
-Wl,--out-implib=lib${module}.dll.a \
-Wl,--export-all-symbols \
-Wl,--enable-auto-import \
-Wl,--whole-archive lib${module}.a \
-Wl,--no-whole-archive -lz
That'll create the shared library cygmysqlclient.dll and the import library libmysqlclient.dll.a. Copy both to /usr/local/bin. And that's it.
Here's another question on building the MySQL client library on Cygwin.