nltk pos_tag usage - nltk

I am trying to use speech tagging in NLTK and have used this command:
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
nltk.pos_tag(text)
File "C:\Python27\lib\site-packages\nltk\tag\__init__.py", line 99, in pos_tag
tagger = load(_POS_TAGGER)
File "C:\Python27\lib\site-packages\nltk\data.py", line 605, in load
resource_val = pickle.load(_open(resource_url))
File "C:\Python27\lib\site-packages\nltk\data.py", line 686, in _open
return find(path).open()
File "C:\Python27\lib\site-packages\nltk\data.py", line 467, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not
found. Please use the NLTK Downloader to obtain the resource:
However, I get an error message which shows:
engish.pickle not found.
I have download the whole corpora and the english.pickle file is there in the maxtent_treebank_pos_tagger
What can I do to get this to work?

Your Python installation is not able to reach maxent or treemap.
First, check if the tagger is indeed there:
Start Python from the command line.
>>> import nltk
Then you can check using
>>> dir (nltk)
Look through the list to see if maxent and treebank are both there.
Easier would be to type
>>> "maxent" in dir(nltk)
>>> True
>>> "treebank" in dir(nltk)
>>> True
Use nltk.download() --> Models tab and check to see if the treemap tagger shows as installed.
You should also try downloading the tagger again.

If you don't want to use the downloader gui, you can just use the following commands in a python or ipython shell:
import nltk
nltk.download('punkt')
nltk.download('maxent_treebank_pos_tagger')

Over 50 corpora and lexical resources such as WordNet: http://www.nltk.org/nltk_data/ for free.
Use http://nltk.github.com/nltk_data/ as server index instead of googlecode
Google code 401: Authorization Required

Related

Transforming shapefiles to dataframes with shapefile_to_dataframe() helper function - fiona related error

I am trying to use the Palantir Foundry helper function shapefile_to_dataframe() in order to ingest shapefiles for later usage in geolocation features.
I have manually imported the shapefiles (.shp, .shx & .dbf) in a single dataset (no access issues through the filesystem API).
As per documentation, I have imported geospatial-tools and the GEOSPARK profiles + included dependencies in the transforms-python build.gradle.
Here is my transform code, which is mostly extracted from the documentation:
from transforms.api import transform, Input, Output, configure
from geospatial_tools import geospatial
from geospatial_tools.parsers import shapefile_to_dataframe
#geospatial()
#transform(
raw = Input("ri.foundry.main.dataset.0d984138-23da-4bcf-ad86-39686a14ef21"),
output = Output("/Indhu/InDhu/Vincent/geo_energy/datasets/extract_coord/raw_df")
)
def compute(raw, output):
return output.write_dataframe(shapefile_to_dataframe(raw))
Code assist then become extremely slow to load, and then I am finally getting following error:
AttributeError: partially initialized module 'fiona' has no attribute '_loading' (most likely due to a circular import)
Traceback (most recent call last):
File "/myproject/datasets/shp_to_df.py", line 3, in <module>
from geospatial_tools.parsers import shapefile_to_dataframe
File "/scratch/standalone/3a553998-623b-48f5-9c3f-03de7e64f328/code-assist/contents/transforms-python/build/conda/env/lib/python3.8/site-packages/geospatial_tools/parsers.py", line 11, in <module>
from fiona.drvsupport import supported_drivers
File "/scratch/standalone/3a553998-623b-48f5-9c3f-03de7e64f328/code-assist/contents/transforms-python/build/conda/env/lib/python3.8/site-packages/fiona/__init__.py", line 85, in <module>
with fiona._loading.add_gdal_dll_directories():
AttributeError: partially initialized module 'fiona' has no attribute '_loading' (most likely due to a circular import)
Thanks a lot for your help,
Vincent
I was able to reproduce this error and it seems like it happens only in previews - running the full build seems to be working fine. The simplest way to get around it is to move the import inside the function:
from transforms.api import transform, Input, Output, configure
from geospatial_tools import geospatial
#geospatial()
#transform(
raw = Input("ri.foundry.main.dataset.0d984138-23da-4bcf-ad86-39686a14ef21"),
output = Output("/Indhu/InDhu/Vincent/geo_energy/datasets/extract_coord/raw_df")
)
def compute(raw, output):
from geospatial_tools.parsers import shapefile_to_dataframe
return output.write_dataframe(shapefile_to_dataframe(raw))
However, at the moment, the function shapefile_to_dataframe isn't going to work in the Preview anyway because the full transforms.api.FileSystem API isn't implemented - specifically, the functions ls doesn't implement the parameter glob which the full transforms API does.

Loading XGBoost Model: ModuleNotFoundError: No module named 'sklearn.preprocessing._label'

I'm having issues loading a pretrained xgboost model using the following code:
xgb_model = pickle.load(open('churnfinalunscaled.pickle.dat', 'rb'))
And when I do that, I get the following error:
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-29-31e7f426e19e> in <module>()
----> 1 xgb_model = pickle.load(open('churnfinalunscaled.pickle.dat', 'rb'))
ModuleNotFoundError: No module named 'sklearn.preprocessing._label'
I haven't seen anything online so any help would be much appreciated.
I was able to solve my issue. Simply update scikit-learn from 0.21.3 to 0.22.0 seems to solve the issue. Along the way I have to update my pandas version to 0.25.2 as well.
The cue is provided in this link: https://www.gitmemory.com/vruusmann, where it states:
During Scikit-Learn version upgrade from 0.21.X to 0.22.X many modules were renamed (typically, by prepending an underscore character to the module name). For example, sklearn.preprocessing.label.LabelEncoder became sklearn.preprocessing._label.LabelEncoder.

Python error in MNIST TPU tutorial

I'm trying to get the MNIST example for TPUs in GCE running (as shown at https://cloud.google.com/tpu/docs/tutorials/mnist) but I've run into a couple of bumps. First, I had to set my PYTHONPATH to pick up the models directory which isn't listed as a step in the walk-through (perhaps it's obvious to daily python programmers, but it's not stated if it's not). After that I'm now hitting the following error that I'm not sure how to work around:
frival#tpu-demo-vm:~$ python /usr/share/models/official/mnist/mnist_tpu.py --tpu_name=$TPU_NAME --data_dir=${STORAGE_BUCKET}/data --model_dir=${STORAGE_BUCKET}/output --use_tpu=True --iterations=500 --train_steps=1000 --train_file=${STORAGE_BUCKET}/data/train.tfrecords
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Traceback (most recent call last):
File "/usr/share/models/official/mnist/mnist_tpu.py", line 163, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/usr/share/models/official/mnist/mnist_tpu.py", line 135, in main
FLAGS.tpu, zone=FLAGS.tpu_zone, project=FLAGS.gcp_project)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/cluster_resolver/python/training/tpu_cluster_resolver.py", line 128, in __init__
self._tpu = compat.as_bytes(tpu) # self._tpu is always bytes
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/compat.py", line 68, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got None
I've verified that TPU_NAME and STORAGE_BUCKET are set properly, and I've also verified that I see the TPU in the READY state from this VM although I don't think either of those would have caused this error. Does anyone know what I'm missing?
Probably your tensorflow version is newer than your mnist_tpu.py.
You may try this newer version of mnist_tpu.py here, instead use the --tpu flag as tayo mentioned above.
https://github.com/tensorflow/models/blob/master/official/mnist/mnist_tpu.py
Please change the --tpu_name=$TPU_NAME flag to --tpu=$TPU_NAME.
Apologies for the error, as this was a recent internal change that did not make it to the walk-through documentation. It is being corrected.
Good luck in TPU land!

Still downloading even Keras has the VGG16 pretrained model in ./keras/models

I tried running the VGG16 keras script.
I get this error:
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
Traceback (most recent call last):
File "test_imagenet.py", line 40, in
model = VGG16(weights="imagenet")
File "/home/nvidia/deep-learning-models/imagenet-example/vgg16.py", line 143, in VGG16
cache_subdir='models')
File "build/bdist.linux-aarch64/egg/keras/utils/data_utils.py", line 222, in get_file
Exception: URL fetch failure on https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5:
I tried to download it manually from here and paste it to ~/.keras/models.
But still, I am getting the same error. Why? I donĀ“t understand the error because the correct model already is in .keras/models.
The default value of include_top parameter in VGG16 function is True. This means if you want to use a full layer pre-trained VGG network (with fully connected parts) you need to download vgg16_weights_tf_dim_ordering_tf_kernels.h5 file, not vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5.

nltk.word_tokenize() giving AttributeError: 'module' object has no attribute 'defaultdict'

I am new to nltk.
I was trying some basics.
import nltk
nltk.word_tokenize("Tokenize me")
gives me this following error
Traceback (most recent call last):
File "<pyshell#27>", line 1, in <module>
nltk.word_tokenize("hi im no onee")
File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 101, in word_tokenize
return [token for sent in sent_tokenize(text, language)
File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 85, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "C:\Python27\lib\site-packages\nltk\data.py", line 786, in load
resource_val = pickle.load(opened_resource)
AttributeError: 'module' object has no attribute 'defaultdict'
Please someone help. Please tell me how to fix this error.
I just checked it on my system.
Fix:
>> import nltk
>> nltk.download('all')
Then everything worked fine.
>> import nltk
>> nltk.word_tokenize("Tokenize me")
['Tokenize', 'me']
I had the same error, and then I realized that I had saved the file as tokenize.py that's why I was getting this error when I changed the name of my python file with another name it worked fine. Hope this is helpful.
I found out later that I was using a backdated nltk data. The programs started to work fine as soon as I updated the data.
you need to update your nltk version. In case you are using anaconda, then do the following in terminal:
>> conda update nltk
It will update nltk. Then restart ipython and it should work!