Python/Openshift application using NLTK resources - nltk

I hosted a Python webservice application in openshift which uses RSLP Stemmer module of nltk, but the log of service reported that:
[...] Resource 'stemmers/rslp/step0.pt' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download()
Searched in:
- '/var/lib/openshift/539a61ab5973caa2410000bf/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data' [...]
I concluded that the module is not installed properly. Someone knows how install resources of nltk in OpenShift/Python application??
PS: portuguese stopwords module also contains an error like this.

You can use NLTK package on OpenShift. The reason it is not working for you is because NLTK package by default expect corpus in user home directory. In OpenShift, you cannot write to user home but have to use $OPENSHIFT_DATA_DIR for storing data. To solve this problem do the follwing:
Create an environment variable called NLTK_DATA with value $OPENSHIFT_DATA_DIR. After creating environment variable restart the app using rhc app-restart command.
SSH into your application gear using rhc ssh command
Activate the virtual environment and download the corpus using the commads shown below.
. $VIRTUAL_ENV/bin/activate
curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python
I have written a blog on Textblob package that underneath uses NLTK package https://www.openshift.com/blogs/day-9-textblob-finding-sentiments-in-text

Related

SQLAlchemy NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:spanner

Problem
We're trying to connect to Cloud Spanner via SQLAlchemy version 1.3.23 and python-spanner-sqlalchemy. Using Poetry for dependency management, sqlalchemy-spanner has been added like so (this is how the project was set up):
sqlalchemy = "~1.3"
sqlalchemy-spanner = { git="https://github.com/cloudspannerecosystem/python-spanner-sqlalchemy.git", tag="v0.1.0" }
When create_engine is called with
create_engine("spanner:///projects/my-project/instances/my-instance/databases/my-db")
I get the following error
class 'sqlalchemy.exc.NoSuchModuleError'>", "NoSuchModuleError(\"Can't load plugin: sqlalchemy.dialects:spanner\")
Attempts
Registry
I've tried adding (as seen in the conftest.py file in the python-spanner-sqlalchemy test package)
from sqlalchemy.dialects import registry
registry.register("spanner", "google.cloud.sqlalchemy_spanner", "SpannerDialect")
before create_engine is called, which leads to the following error:
class 'ModuleNotFoundError'>", "ModuleNotFoundError(\"No module named 'google.cloud.sqlalchemy_spanner'\")
This makes me think that the plugin dialect has not been correctly added since, in line 49 of setup.py, the connection for the dialect is made:
entry_points={
"sqlalchemy.dialects": [
"spanner = google.cloud.sqlalchemy_spanner:SpannerDialect"
]
},
Installing via python setup.py install
In the README for the spanner project, it says to clone the repo and install via python setup.py install. I performed this step, but am unsure how to import this into my current project or make my project aware of this library. I've never manually installed python packages before so, if anyone can provide any help here, I'd appreciate it.
What I did try:
install the library as per above
try to add the dependency via poetry : poetry add sqlalchemy-spanner. Got Could not find a matching version of package sqlalchemy-spanner
try to locate the library via pip : pip install sqlalchemy-spanner== which usually lists available package versions.
I'm not sure that either of the last 2 bullets actually check a local installation of a package. Not even really sure what I'm talking about here.
Update
So I was able to install the local version of python-spanner-sqlalchemy by using pip install /path/to/project, which works, but still having the same issues with loading the dialect.
I added an import for SpannerDialect in the code (in the Registry section) above with from google.cloud.sqlalchemy_spanner import SpannerDialect. PyCharm auto-completed this for me which indicates to me that the package is successfully installed and available. But I receive the ModuleNotFoundError for google.cloud.sqlalchemy_spanner when running.
I ran python in my project root directory and, from the repl, imported SpannerDialect with no errors.
Solution
To clarify, the solution #larkee provided worked regarding the updated repository URL.
As a note, we recently moved the repo from cloudspannerecosystem/python-spanner-sqlalchemy to googleapis/python-spanner-sqlalchemy.
I clarified why that worked in the comments to their answer
I have not tested the answer from #neondot42, but I have seen this brought up as well, so take a look there if you're having the same issue.
sorry for the late response. I was recently struggling with a similar problem. Sqlalchemy was unable to load the spanner dialect. I tried uninstalling and installing different versions with pip but nothing seemed to work.
At the end what did the trick was specifying the "driver" part of the database URL as "spanner" too. So the final URL looked like this:
spanner+spanner:///projects/project-id/instances/instance-id/databases/database-id
I am not entirely sure of why this appeared to be the solution for me, as all the documentation I could find on Google's end just showed that you could use only the "spanner:///..." to connect. Also I am not familiar with the inner workings of sqlalchemy and how it should detect other installed dialects. However, I hope this solution can help someone else.
I was able to replicate the error you are seeing by installing sqlalchemy but not having sqlalchemy-spanner installed at all:
pip install sqlalchemy==1.3
pip uninstall sqlalchemy-spanner
>>> from sqlalchemy import create_engine
>>> engine = create_engine(<url>)
sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:spanner
After I install sqlalchemy-spanner I am able to use SQLAlchemy without issue:
pip install git+git://github.com/googleapis/python-spanner-sqlalchemy.git#v0.1.0
>>> from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String, inspect
>>> engine = create_engine(<url>)
>>> metadata = MetaData(bind=engine)
>>> table = Table(
... "TestTable",
... metadata,
... Column("user_id", Integer, primary_key=True),
... Column("user_name", String(16), nullable=False),
... )
>>> table.create()
>>> inspect(engine).get_table_names()
['TestTable']
>>>
Based on this, I think the issue is that sqlalchemy-spanner is not being installed. Unfortunately, I'm not familiar with Poetry for dependency management so I'm not sure exactly what is going wrong. As a note, we recently moved the repo from cloudspannerecosystem/python-spanner-sqlalchemy to googleapis/python-spanner-sqlalchemy. I am able to use either for the pip command but perhaps Poetry requires the newer one?

How to install Google Apis Drive v3 via command line on Ubuntu-18.04

I have been trying to install Install-Package Google.Apis.Drive.v3 using this source with the difference that I have Ubuntu-18.04 instead of Windows.
I know it may be a simple question but I have been trying research how to do that from this morning. I installed sudo apt install nuget on my machine and have been trying to add packages or as in this case the Google.Apis.Drive.v3 package but no luck.
I went through this source which was useful, but does not carry information I was able to replicate on my Linux machine.
Also this source, this one and this one too. But also this last one is for Windows and was not very useful.
How do I install Google Apis Drive V3 via command line easily as it is documented for windows but on Ubunbtu-18.04?
Thanks for pointing to the right direction for solving this problem.
Solution
The way you install your Drive API's library is depending on the programming language you are aiming to use. These are the following commands to run depending on the different languages to interact with the API (with their respective links to the source of the setup):
Python:
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
C#/.NET:
Create a new Visual C# Console Application project in Visual Studio.
Open the NuGet Package Manager Console, select the package source nuget.org, and run the following command:
">Install-Package Google.Apis.Drive.v3
Java:
gradle init --type basic
mkdir -p src/main/java src/main/resources
Node.js:
npm install googleapis#39 --save
For the Browser check out the steps to follow here
I hope this has helped you. Let me know if you need anything else or if you did not understood something.
NOTE: For all Ubuntu-18.04 users that wish to install via command line the correct way is: sudo dotnet add package Google.Apis.Drive.v3

Services and env in manifest file?

I have a web (online calculator for an example) which developed by my fellow tem members. Now they want to deploy in PCF using manifests.
Languages used : python, php and javascipt.
I gone through the docs about pcf with manifest.yml
In that I don't have any idea about services and env.
What is that services and how can I find the services for the above project and also how can I find the environment variables?
And tell whether these fields are mandatory to run the project in PCF.
To your original question:
What is that services and how can I find the services for the above project and also how can I find the environment variables? And tell whether these fields are mandatory to run the project in pcf.
Does your app require any services to run? Services would be things like a database or message queue. If it does not, then you do not need to specify any services in your manifest. They are optional.
Similarly, for environment variables, you would only need to set them if they are required to configure your application. Otherwise, just omit that section of your manifest.
At the end of the day, you should talk with whomever developed the application or read the documentation they produce as that's the only way to know what services or environment variables are required.
In regards to your additional questions:
1)And also I have one more query...like in our application we used python ok! In that we use lots of pacakages say pandas,numpy,scipy and so on...how can I import all the libraries into the PCF ??? Buildpacks will contain version only right?
Correct. The buildpack only includes Python itself. Your dependencies either need to be installed or vendored. To do this for Python, you need to include a requirements.txt file. The buildpack will see this and use pip to install your dependencies.
See the docs for the Python buildpack which explains this in more detail: https://docs.cloudfoundry.org/buildpacks/python/index.html#pushing_apps
2)And also tell me what will be the path for my app name if Java I can enclose jar files
For Java apps, you need to push compiled code. That means, you need to run something like mvn package or gradle assemble to build your executable JAR or WAR file. This should be a self contained file that has everything necessary to run your app, compile class files, config, and all dependent JARs.
You then run cf push -p path/to/my-app.jar (or WAR, whatever you build). The cf cli will take everything in the app and push it up to Cloud Foundry where the Java buildpack will install things like the JVM and possibly Tomcat so you app can run.
what should I do for application devloped using pyhton , JavaScript and php....
You can use multiple buildpacks. See the instructions here.
https://docs.cloudfoundry.org/buildpacks/use-multiple-buildpacks.html
In short, you can have as many buildpacks as you want. The last buildpack in the list is special because that is the buildpack which will set the start command for your application (although you can override this with cf push -c if necessary). The non-final buildpacks will run and simply install dependencies.
3) we were using postgresql how can I use this in pcf with my app
Run cf marketplace and see if there are any Postgres providers in your Marketplace. If there is one, you can just do a cf create-service <provider> <plan> <service name> and the foundation will create a database for you to use. You would then run a cf bind-service <app> <service name> to bind the service you create to your app. This will generate credentials and pass them along to your app when it starts. You app can then read the credentials out of VCAP_SERVICES and use them to make connections to the database.
See here for more details:
https://docs.cloudfoundry.org/devguide/services/application-binding.html
https://docs.cloudfoundry.org/devguide/deploy-apps/environment-variable.html#VCAP-SERVICES

How to install models/download packages on Google Colab?

I am using text analytics library "Spacy". I've installed spacy on Google Colab notebook without any issue. But for using it I need to download "en" model.
Generally, that command should look like this:
python -m spacy download en
I tried few ways but I am not able to get it to install on the notebook. Looking for help.
Cheers
If you have a Python interpreter but not a teriminal, you could try:
import spacy.cli
spacy.cli.download("en_core_web_sm")
More manual alternatives can be found here: https://spacy.io/usage/models#download-pip
Fundamentally what needs to happen is the model file needs to be downloaded, unzipped, and placed into the appropriate site-packages directory. You should be able to find a convenient way to do that, e.g. by pip installing the model package directly. But if you get really stuck, you can get the path by looking at the __file__ variable of any module you have installed, e.g. print(spacy.__file__) . That should tell you where on your file-system the site-packages directory is.
or you can use:
import en_core_web_sm
nlp = en_core_web_sm.load()
!python -m spacy download fr_core_news_sm
Worked for me for the French model

How to use the package written by another language in AWS Lambda?

I want to import and use dataset package of python at AWS Lambda. The dataset package is about MySQL connection and executing queries. But, when I try to import it, there is an error.
"libmysqlclient.so.18: cannot open shared object file: No such file or directory"
I think that the problem is because MySQL client package is necessary. But, there is no MySQL package in the machine of AWS Lambda.
How to add the third party program and how to link that?
You should install your packages in your lambda folder :
$ pip install YOUR_MODULE -t YOUR_LAMBDA_FOLDER
And then, compress your whole directory in a zip to upload in you lambda.
What you have to do is to include the binaries needed with your lambda package.
You need to utilize pip and create an isolated environment.Your zip uploaded to lambda needs to have the python2.7/site-packages included (the ones installed with pip).
Now there are extreme cases of os-related dependencies.
This has a tricky solution.
In those cases you have to spawn an amazon linux ec2 instance in order to build/get those dependencies and package them with your lambda.
Once your lambda is packaged you can close the ec2 instance.
Check this guide if virtualenv is not enough.
This is an os dependent system file. I'm guessing that you successfully installed the Python mysql client, but you still need the system mysql client, which seems to be a different version on your system than the lambda one. While building your virtual environment on the official lambda image will definitely fix this problem, you might have some luck copying your own copy of this system file into your lambda zip file.
I found mine with
locate libmysqlclient.so.18
Note: depending on your system, the version number at the end might be different. Use the version in the error you receive.
Adding that file on the top level of my zip file with
cd \path\from\locate\to\libmysqlclient
followed by
zip -u \path\to\lambda\zip\file.zip libmysqlclient.so.18
worked for me.