Lemmatisation for telugu words in python - nltk

I am new to translation concepts and nltk. Do we have any stemmers or lemmatizers implemented in python for Telugu? Thanks in advance

Related

Mxnet C++ prediction vs Python

We are mainly using C++ and want to use Mxnet. I found some discussion that C++ prediction or future extraction slower than Python version ?
Is there any experienced Mxnet C++ engineers to expedite this subject including the a decent way to using Python generated Mxnet model in C++?
prediction.cpp in Mxnet is not so user friendly.
MXNet was built with several frontend languages in mind, and I see no reason why prediction made using Python should be faster than predictions made using C++.
There is a gap in documentation on how to use MXNet with C++ at the moment mostly because the majority of the MXNet community is using Python (same holds true for the deep learning/machine learning field in general). One relevant C++ example you could look into is here.
If you would like to contribute more examples on how to work with MXNet using C++, you are more than welcome to submit Pull requests.

Approach to develop a package in R to convert PDF to HTML

I'm working on a project to convert PDF to HTML using R. I know there are no packages in R to do that.
I would highly appreciate if any experts could provide some suggestions or approach. I have an approach to do that with the help of python but I'm looking for some better style.
Two suggestions:
Have a look at an existing (open source) tool that does this. It will enable you to learn. https://github.com/itext/i7j-pdfhtml
Don't re-invent the wheel. Use language bindings to call an existing library from R.
Have a look at https://darrenjw.wordpress.com/2011/01/01/calling-java-code-from-r/.
Where the author explains how to call Java from R.
If you were to go for this approach, you could use iText pdfHTML

Introduction to OCR

Someone gave me a trove full of amazing information. It is 200MB .tiff images of scanned announcements that goes back until the 40's. I want to digitize this, but I have no knowledge whatsoever about OCR. Some of the early material is barely readable by a human, let alone a machine. It is also in Hebrew.
I'm looking for advice on how to approach this. A good suggestion about books, articles, code libraries or software (all of them should be available freely on the web). I'm proficient in C++ and Python and can pick up another language if it is needed.
Thank you.
This sounds like a great task for Python, using an OCR library. A quick Google search turned up pytesser:
PyTesser is an Optical Character Recognition module for Python. It takes as input an image or image file and outputs a string.
PyTesser uses the Tesseract OCR engine, converting images to an accepted format and calling the Tesseract executable as an external script. A Windows executable is provided along with the Python scripts. The scripts should work in other operating systems as well.
...
Usage Example
>>> from pytesser import *
>>> image = Image.open('fnord.tif') # Open image object using PIL
>>> print image_to_string(image) # Run tesseract.exe on image
fnord
>>> print image_file_to_string('fnord.tif')
fnord

Which has the most mature stable libraries for multiple languages available for it, YAML or JSON?

I'm not looking for a comparison of the relative merits of YAML or JSON over each other, I'm purely looking for something that is supported by many languages and has stable implementations.
Also another plus would be to know which has the libraries that do not have huge dependency trees requiring other libraries.
Both YAML and JSON have stable libraries for many different languages and all of the most popular languages have good support for both. JSON is the simplest of the two to implement so it has slightly more support.
If you want to get a very rough idea of stable language support you can compare the lists of the libraries linked from the official homepages:
YAML
C/C++
Java
Python
Ruby
Perl Modules
C#/.NET
PHP
OCaml
Javascript
Actionscript
Haskell
JSON
ASP
ActionScript
C
C++
C#
ColdFusion
D
Delphi
E
Eiffel
Erlang
Fantom
Flex
Go
Haskell
Haxe
Java
JavaScript
Lasso
Lisp
LotusScript
Lua
Objective C
Objective CAML
OpenLaszlo
Perl
PHP
Pike
PL/SQL
PowerShell
Prolog
Python
R
REALbasic
Rebol
RPG
Ruby
Squeak
Tcl
Visual Basic
Visual FoxPro
YAML is a superset of JSON - therefore if a library supports YAML, it also supports JSON.
I'd have to be JSON... Because the situation in the world of YAML is a mess. There are border cases, where every library interprets something differently from the others. That's what you get with ad hoc "specifications"...
See, for example, this rant. And there are plenty more of those out there.

Any open source software like matlab? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
free matlab environment
What’s the best MATLAB equivalent? (open source or otherwise free)
Please suggest any open source/free software equivalent to Matlab for windows with same syntax which is best and widely used
GNU Octave is the closest replacement, it supports large parts of the Matlab syntax, plus incorporates several improvements (IMHO) to the language.
But if you are familiar with python I suggest you also take a look at SciPy. Powerful language, a lot of libraries, and active development in scientific libraries (plotting, calculus, etc.). Use the ipython interactive shell.
First the obvious choices for MATLAB alternatives:
Octave
Scilab
Python + NumPy + SciPy + matplotlib
R
FreeMat
And here's a number of similar environments:
RLab [discontinued]
Mathnium (Freeware)
Sysquake LE (Freeware)
O-Matrix (Commercial)
octave is mostly compatible with matlab. You can read more about the differences here.
Octave is a free Matlab-like program that a lot of people seem to like.
This site has a whole list of free alternatives:
http://www.math.tu-berlin.de/~ehrhardt/matlab_alternatives.html
While studying in the University, I personally used Sage for al my lab and course papers calculations :-) What I love about it, is that you don't need to learn new language if you know python already.
The most popular alternatives are: Octave and Scilab (www.scilab.org).