How to install models/download packages on Google Colab? - nltk

I am using text analytics library "Spacy". I've installed spacy on Google Colab notebook without any issue. But for using it I need to download "en" model.
Generally, that command should look like this:
python -m spacy download en
I tried few ways but I am not able to get it to install on the notebook. Looking for help.
Cheers

If you have a Python interpreter but not a teriminal, you could try:
import spacy.cli
spacy.cli.download("en_core_web_sm")
More manual alternatives can be found here: https://spacy.io/usage/models#download-pip
Fundamentally what needs to happen is the model file needs to be downloaded, unzipped, and placed into the appropriate site-packages directory. You should be able to find a convenient way to do that, e.g. by pip installing the model package directly. But if you get really stuck, you can get the path by looking at the __file__ variable of any module you have installed, e.g. print(spacy.__file__) . That should tell you where on your file-system the site-packages directory is.

or you can use:
import en_core_web_sm
nlp = en_core_web_sm.load()

!python -m spacy download fr_core_news_sm
Worked for me for the French model

Related

Can not do "import chisel3._"

I want to use chisel3.2, and have installed "sbt" into Mac OS-X.
I wrote my project (Scala file), and downloaded template of project.
I did;
sbt
It did a lint of "scala" but did not import chisel3 object.
Indeed this is caused by PATH setting, but there is no information about it.
Does anyone suggest a solution?

How to add another .i swig file to an Extension?

I have created a Python interface to my library using SWIG. This Python interface uses numpy. All of this works correctly.
Now, I want to package this Python interface into a Python wheel. Packaging for Windows works correctly.
myext = Extension( "MyExt",
sources = ["MyExt.i"],
swig_opts=["-py3", "-I/usr/include", "-includeall"],
libraries=["mylib"],
)
On Windows, compilation occurs directly where all the sources and setup.py files are. This is not the case for Linux when building my bdist_deb (same for bdist_rpm), and here is my problem.
The file MyExt.i includes numpy.i. Therefore, I should add it as a source file of the extension. However if I do this, then setuptools also tries to run swig on numpy.i. This is not what I want. I haven't found any of the other parameters of Extension that would accept such a file.
Someone knows how to get out of this issue?

Readthedocs mock not working with Matplolitb

Read the Docs will not build docs for my package because it includes matplotlib.
I used the code on their website to mock out matplotlib, but still the build fails because freetype and png are required to build matplotlib, and apparently this is not installed on their machine.
I tried with and without building in the virtualenv.
Here is my config.py.
Why is my mock not working?
If you have matplotlib in your requirements.txt, Read the Docs will still try to install it in the virtualenv. You have to take matplotlib (and anything else you want to mock) out of the requirements.
If you still want it in requirements.txt for setup but not for building the docs, I think you can specify a different requirements file (like docs/requirements.txt or something) in the ReadTheDocs Admin (under advanced settings).
I hope this solves your problem.

How to use execute mysqldiff utilities in command line?

I have two versions of my database. The n and n+1 version. I want to know the difference between the two version. I have downloaded the archive on the link mysqldiff utility
I unzip my archive and went in the bin directory, then i type mysqldiff -help. But at my surprise, i have the following message. mysqldif is not recognozed as command line. Is there any way to install it ?
Thanks
Well, if you read the INSTALL file in the link you gave, it doesn't say to download the archive, it says to install via CPAN.
However, I'm not sure why you'd use a CPAN module or some random Github archive when MySQL distributes a mysqldiff.exe [1] tool itself.
[1] http://dev.mysql.com/downloads/utilities/
I'm the author of that ancient CPAN module and I don't really maintain it any more. It looks like another mysqldiff is offered in the MySQL Utilities suite which seems to be maintained (here's a github clone) and also a lot more sophisticated, so I'd recommend trying that.

Python/Openshift application using NLTK resources

I hosted a Python webservice application in openshift which uses RSLP Stemmer module of nltk, but the log of service reported that:
[...] Resource 'stemmers/rslp/step0.pt' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download()
Searched in:
- '/var/lib/openshift/539a61ab5973caa2410000bf/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data' [...]
I concluded that the module is not installed properly. Someone knows how install resources of nltk in OpenShift/Python application??
PS: portuguese stopwords module also contains an error like this.
You can use NLTK package on OpenShift. The reason it is not working for you is because NLTK package by default expect corpus in user home directory. In OpenShift, you cannot write to user home but have to use $OPENSHIFT_DATA_DIR for storing data. To solve this problem do the follwing:
Create an environment variable called NLTK_DATA with value $OPENSHIFT_DATA_DIR. After creating environment variable restart the app using rhc app-restart command.
SSH into your application gear using rhc ssh command
Activate the virtual environment and download the corpus using the commads shown below.
. $VIRTUAL_ENV/bin/activate
curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python
I have written a blog on Textblob package that underneath uses NLTK package https://www.openshift.com/blogs/day-9-textblob-finding-sentiments-in-text