Build tools for Tesseract Training on Windows 10 - ocr

On the Tesseract Github page for training, there is mentioned to install These two additional libraries. where can we find these files for Windows 10:
libpango1.0-dev
libcairo2-dev

On windows training tools requires a lot of external libraries that are not common on Windows. Best way is to use cppan build or its successor sw that take care about them.

You Can find it here
https://debian.pkgs.org/10/debian-main-amd64/libpango1.0-dev_1.42.4-6_amd64.deb.html
or try this
https://www.cairographics.org/download/

Related

Is there any OCR SDK for c++ builder?

I'd like to add character recognition functionality to my application that's why asking you what's the best available and affordable OCR SDK . I looked at ABBY FineReader Engine 10.0 but haven't got trial version yet as I requested from the official site!
I've downloaded Asprise OCR SDK but it's doesn't recognize Cyrillic symbols..
How to implement character recognition on my application ? By using what kind of libs, SDKs, APIs and so on..
There's Cunieform and Google's Tesseract OCR, both of which are free. Personally I've used Tesseract, the SDK was giving a lot of trouble so finally decided to simply call the command line interface of Tesseract with arguments from within my C program using the system() function.
Lots of people face difficulties with the Tesseract installation, so here's a short summary (version 2 works for me, insert appropriate version if necessary):
Download the following from the svn: tesseract-2.00.tar.gz, tesseract-2.00.exe6.tar.gz, tesseract-2.00.eng.tar.gz
Unzip tesseract-2.00.tar.gz to a folder
Unzip tesseract-2.00.exe6.tar.gz and move to where tesseract-2.00.tar.gz was unzipped. A few files will be replaced this way
Similarly unzip tesseract-2.00.eng.tar.gz and move to tesseract-2.00.tar.gz where tessdata folder will be replaced.
After all this is done, open the tesseract.dsw workspace, select All Files and do "Rebuild All." This'll take a while with loads of warnings but hopefully no errors.
The command using DOS shell is tesseract picture.tif textfile -l eng. So basically save your image as a TIFF file, run the command from within your program and then read in the OCR output strings from the text file.
I can recommend you Crystal OCR if you don't need to recognize a very complex documents, they sent me C++ Builder sample by request. IMHO, Tesseract is still buggy, though it's the best free OCR of course.
You can try KSAI-Toolkits. It has a completely ocr application, which include C++ API, OCR model, benchmark and test data. And it supports different platforms.

Best portable development platform for small personal project

I'm looking for a development platform (language and set of libraries) that will allow me to develop a personal project. (In case anyone is curious, I'm looking at making a music library manager, similar to iTunes, that can work on multiple platforms and sync with Android devices).
I want the language to have the following characteristics:
Essential
The program must run flawlessly, with no (or very little) code changes on Mac, Linux, and Windows. That means, notably, that I need to have a cross-platform GUI framework, a consistent API for accessing files and directories, and a consistent interface for talking to USB storage devices
Important
A language that is easy to use, powerful, and expressive. Big standard libraries with a lot of built-in functionality. (I'd probably use C#/.NET but the portability isn't great)
Nice to have
Good tool support (on Linux if possible, but I'll do my development on Windows if needs be)
Not Java. (I have used it and just don't like it - I'm not interested in getting into a language war here).
Please help me choose a language!
Python
Cross platform GUI: more than one option, I'd use WxPython, but Qt bindings are also available (comparison between wxWidgets and Qt).
File System API: this gets into the os package, but there are also convenience methods for just dealing with I/O.
USB I/O: I confess to not having any knowledge here, but suspect if you're talking storage that Python will be able to read and write using its IO package.
Libraries, Ease of Use, etc..: there's a lot built in, but also a huge number of add-ons (called "packages"). Some of the most notable are SciPy and NumPy, used for scientific and numerical analysis.
Tooling: there are a number of IDEs out there, I use PyDev (but it's Eclipse based so you probably won't like it if you don't like Java).
Finally, Python is supported on Android via its scripting environment.
For cross platform GUI, you can explore QT. The back-end can be on c.
Have you explored anything so far?
Qt quick ?

Is there an API or tool that can automate software updating?

Is there any API or tool that can automate software updating? It should take care of checking for updates from a URL for a provided list of files and downloading and replacing the ones that need updating. It would also be nice if it contained an authentication module so that only authorized parties could access the updates. It should be language-agnostic - takes a list of files without extra knowledge except their versions and replaces them with newly downloaded copies if on the site there are newer versions.
I'm specifically interested in something for the Windows platform, that would run on Win Xp to Win 7.
This makes me think about apt-get ...
take a look here, as well: Is there an auto-update framework for C++/Win32/MFC (like Sparkle)?
I did see some articles a while back about embedding subversion into your application to manage version control.
Edit:
http://svnbook.red-bean.com/en/1.5/svn.developer.html
Subversion has a modular design: it's implemented as a collection of libraries written in C. Each library has a well-defined purpose and application programming interface (API), and that interface is available not only for Subversion itself to use, but for any software that wishes to embed or otherwise programmatically control Subversion. Additionally, Subversion's API is available not only to other C programs, but also to programs written in higher-level languages such as Python, Perl, Java, and Ruby."
Just saw UpdateNode launching a pretty cool update and messaging system. It seems to be cross platform and free for Open Source.
UPDATE, did some further analysis on that, posted at: https://stackoverflow.com/a/22528011/3257300
For windows, I'd use Google Update, also known as omaha.
Since you didn't tag this question as windows, I'd also mention a UpdateEngine for Mac.
And (best of all) apt, which is available for free on all Debian-based Linux and BSD distributions, like Ubuntu
There is open source project WIPT inspired by APT of Debian Linux.
Head over to Launchpad and use a PPA: it is a Debian/Ubuntu repository management platform. Of course this is not really platform independent but it is language wise :-)
You should take a look at ClickThrough, I don't know much about it but it sounds similar to what you're looking for. As for authorization, I would imagine this to be handled by your webserver based on the URL.
InstallShield has an offering. Never used it but researched it a few years back but we decided on a roll your own solution.
InstallShield Update Manager
InstallShield Update Service
You didn't state what platform you needed this for. The easiest way I can think of doing this is with subversion using rsync.
The concept is to write a post-commit hook for subversion. This script would update a "working folder" on the repository machine and then use rsync to update the differences to another machine.
Data protection and authentication would be set up using rsync over ssh.
If this is for windows, you could try doing the same with cygwin installs on the two machines.
Good luck.
If you use .NET, I'm a happy customer of AppLife Update
CRONw is a scheduled execution service for Windows. (Sorry, I can't link it, I'm apparently limited to 1 as a new user. It's hosted on Sourceforge.)
Powershell is a Windows scripting language (Microsoft-official) that allows you to do most system administration operations you could conceivably want to do. It is very easy to pick up even if you haven't worked with it before.
I would say your best bet is to write a simple update script in Powershell and, optionally, set it up as a crontask so you don't have to manually execute it.
IIRC, Powershell is an optional install on XP, and CRONw requires you be running a 32-bit system. You didn't say, so I'd guess you're doing 32-bit, but the alternative bears mentioning.
And in all this, I'm assuming that the URLs you're describing are designed for this purpose - if they're not and you don't own them, it will rapidly become more suffering than you're willing to bear. (Making a computer navigate a human-readable website usually does.)

Defining a runtime environment

I need to define a runtime environment for my development. The first idea is of course not to reinvent the wheel. I downloaded macports, used easy_install, tried fink. I always had problems. Right now, for example, I am not able to compile scipy because the MacPorts installer wants to download and install gcc43, but this does not compile on Snow Leopard. A bug is open for this issue, but I am basically tied to them for my runtime to be usable.
A technique I learned some time ago, was to write a makefile to download and build the runtime/libs with clearly specified versions of libraries and utilities. This predates the MacPorts/fink/apt approach, but you have much more control on it, although you have to do everything by hand. Of course, this can become a nightmare on its own if the runtime grows, but if you find a problem, you can use patch and fix the issue on the downloaded package, then build it.
I have multiple questions:
What is your technique to prepare a well-defined runtime/library collection for your development?
Does MacPorts/fink/whatever allows me the same flexibility of rehacking if something goes wrong ?
Considering my makefile solution, when my software is finally out for download, what are your suggestions about solving the potential troubles between my development environment and the actual platform on my user's machines ?
Edit: What I don't understand in particular is that other projects don't give me hints. For example, I just downloaded scipy, a complex library with lots of dependencies. Developers must have all the deps setup before working on it. Despite this, there's nothing in the svn that creates this environment.
Edit: Added a bounty to the question. I think this is an important issue and it deserves to get more answers. I will consider best those answers with real world examples with particular attention towards any arisen issues and their solution.
Additional questions to inspire for the Bounty:
Do you perform testing on your environment (to check proper installation, e.g. on an integration machine) ?
How do you include your environment at shipping time ? If it's C, do you statically link it, or ship the dynamic library, tinkering the LD_LIBRARY_PATH before running the executable? What about the same issue for python, perl, and other ?
Do you stick to the runtime, or update it as time passes? Do you download "trunk" packages of your dependency libraries or a fixed version?
How do you deal with situations like: library foo needs python 2.5, but you need to develop in python 2.4 because library bar does not work with python 2.5 ?
We use a CMake script that generates Makefiles that download (mainly through SVN)/configure/build all our dependencies. Why CMake? Multiplatform. This works quite well, and we support invocation of scons/autopain/cmake. As we build on several platforms (Windows, MacOSX, a bunch of Linux variants) we also support different compile flags etc based on the operating system. Typically a library has a default configuration, and if we encounter a system that needs special configuration the configuration is replaced with a specialized configuration. This works quite well. We did not really find any ready solution that would fit our purpose.
That being said, it is a PITA to get it up and running - there's a lot of knobs to turn when you need to support several operating systems. I don't think it will become a maintainance-nightmare as the dependencies are quite fixed (libraries are upgraded regularly, but we rarely introduce new one).
virtualenv is good, but it can't do magic - e.g. if you want use a library that just MUST have Python 2.4 and another one that absolutely NEEDS 2.5 instead, you're out of luck. Nor can virtualenv (or any other tool) help when there's a brand new release of an OS and half the tools &c just don't support it yet, as you mentioned for Snow Leopard: some problems are just impossible to solve (two libraries with absolutely conflicting needs within the same build), others just require patience (until all tools you need are ported to the new OS's release, you just need to stick with the previous OS release).

Buildfarms : Options

We use Incredibuild here to compile our code in a distributed fashion. I was wondering if there are any open source (or free) alternatives to use on a home network?
Failing that, are there any other simple solutions with good integration with Visual Studio out there?
EDIT: I should say that I am quite happy to get my hands dirty and manually configure everything on each machine should that be required.
I can't look past TeamCity as a CI environment - among other features it allows multiple build agents to be linked together in one build grid.
Oh and it also has excellent integration with VS and SubVersion. And it's free to use, up to a maximum of 20 build configurations and 2 build agents.