Python error in MNIST TPU tutorial - google-compute-engine

I'm trying to get the MNIST example for TPUs in GCE running (as shown at https://cloud.google.com/tpu/docs/tutorials/mnist) but I've run into a couple of bumps. First, I had to set my PYTHONPATH to pick up the models directory which isn't listed as a step in the walk-through (perhaps it's obvious to daily python programmers, but it's not stated if it's not). After that I'm now hitting the following error that I'm not sure how to work around:
frival#tpu-demo-vm:~$ python /usr/share/models/official/mnist/mnist_tpu.py --tpu_name=$TPU_NAME --data_dir=${STORAGE_BUCKET}/data --model_dir=${STORAGE_BUCKET}/output --use_tpu=True --iterations=500 --train_steps=1000 --train_file=${STORAGE_BUCKET}/data/train.tfrecords
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Traceback (most recent call last):
File "/usr/share/models/official/mnist/mnist_tpu.py", line 163, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/usr/share/models/official/mnist/mnist_tpu.py", line 135, in main
FLAGS.tpu, zone=FLAGS.tpu_zone, project=FLAGS.gcp_project)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/cluster_resolver/python/training/tpu_cluster_resolver.py", line 128, in __init__
self._tpu = compat.as_bytes(tpu) # self._tpu is always bytes
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/compat.py", line 68, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got None
I've verified that TPU_NAME and STORAGE_BUCKET are set properly, and I've also verified that I see the TPU in the READY state from this VM although I don't think either of those would have caused this error. Does anyone know what I'm missing?

Probably your tensorflow version is newer than your mnist_tpu.py.
You may try this newer version of mnist_tpu.py here, instead use the --tpu flag as tayo mentioned above.
https://github.com/tensorflow/models/blob/master/official/mnist/mnist_tpu.py

Please change the --tpu_name=$TPU_NAME flag to --tpu=$TPU_NAME.
Apologies for the error, as this was a recent internal change that did not make it to the walk-through documentation. It is being corrected.
Good luck in TPU land!

Related

Gensim Pickle Error: Enable to Load the Saved Topic Model

I am working on topic inference that will require to load a previously saved model.
However, I got a pickle error that says
Traceback (most recent call last):
File "topic_inference.py", line 35, in <module>
model_for_inference = gensim.models.LdaModel.load(model_name, mmap = 'r')
File "topic_modeling/env/lib/python3.8/site-packages/gensim/models/ldamodel.py", line 1663, in load
result = super(LdaModel, cls).load(fname, *args, **kwargs)
File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 486, in load
obj = unpickle(fname)
File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 1461, in unpickle
return _pickle.load(f, encoding='latin1') # needed because loading from S3 doesn't support readline()
TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given
The code I use to load the model is simply
gensim.models.LdaModel.load(model_name, mmap = 'r')
Here is the code that I use to create and save the model
model = gensim.models.ldamulticore.LdaMulticore(
corpus=comment_corpus,
id2word=key_word_dict, ## This is now a gensim.corpora.Dictionary Object, previously it was the .id2token attribute
chunksize=chunksize,
alpha='symmetric',
eta='auto',
iterations=iterations,
num_topics=num_topics,
passes=epochs,
eval_every=eval_every,
workers = 15,
minimum_probability= 0.0)
model.save(output_model)
where output_model doesn't have an extension like .model or .pkl
In the past, I tried the similar approach with the exception that I passed in a .id2token attribute under the gensim.corpora.Dictionary object instead of the full gensim.corpora.Dictionary to the id2word parameter when I created the model, and the method loads the model fine back then. I wonder if passing in a corpora.Dictionary will make a difference in the loading output...? Back that time, I was using regular python, but now I am using anaconda. However, all the versions of the packages are the same.
Another report of an error about __randomstate_ctor (at https://github.com/numpy/numpy/issues/14210) suggests the problem may be related to numpy object pickling.
Is there a chance that the configuration where your load is failing is using a later version of numpy than when the save occurred? Could you try, at least temporarily, rolling back to some older numpy (that's still sufficient for whatever Gensim you're using) to see if it helps?
If you find any load that works, even in a suboptimal config, you might be able to null-out whatever random-related objects are causing the problem and re-save, then having a saved version that loads better in your truly-desired configuration. Then, if the random-related objects truly needed after reload, it may be possible to manually re-constitute them. (I haven't looked into this yet, but if you find any workaround allowing a load, but then aren't sure what to manually null/rebuild, I could take a closer look.)

SQLAlchemy: "catching classes that do not inherit from BaseException is not allowed" when testing in Pytest

Recently, when running my tests in Pytest, I started to get a strange warning at the end of the test results; many many iterations of the following:
Exception ignored in: <function _ConnectionRecord.checkout.<locals>.<lambda> at 0x10eea07a0>
Traceback (most recent call last):
File "/Users/username/appdev/scattr-api/venv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 503, in <lambda>
File "/Users/username/appdev/scattr-api/venv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 710, in _finalize_fairy
File "/Users/username/appdev/scattr-api/venv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 528, in checkin
File "/Users/username/appdev/scattr-api/venv/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 387, in _return_conn
File "/Users/username/appdev/scattr-api/venv/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 106, in _do_return_conn
TypeError: catching classes that do not inherit from BaseException is not allowed
This started happening recently, regardless of whether tests pass or not. I don't understand why this warning started showing up, or what it means that it's "ignored," so I'm not sure what I should be doing about it, if anything. I haven't noticed any problems when running my actual application, it's just something that's happened when running the tests.
The fact that the last function in traceback is "_do_return_conn" makes me think it's something about the connection pool, but I'm still not sure how to make sense of it...
I determined that the issue arises from not explicitly closing a connection in pytest tests.
I'd been using custom connection to look up values in the database in the tests (to make sure correct values were inserted) and it's if this wasn't explicitly closed at the end that this cryptic error message appeared.
SQLAlchemy has garbage collection which will eventually return connections to the pool even if they're not explicitly closed. However, in the context of running pytest, this doesn't have time to happen during the duration of the test, resulting in the cryptic error message.

Loading XGBoost Model: ModuleNotFoundError: No module named 'sklearn.preprocessing._label'

I'm having issues loading a pretrained xgboost model using the following code:
xgb_model = pickle.load(open('churnfinalunscaled.pickle.dat', 'rb'))
And when I do that, I get the following error:
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-29-31e7f426e19e> in <module>()
----> 1 xgb_model = pickle.load(open('churnfinalunscaled.pickle.dat', 'rb'))
ModuleNotFoundError: No module named 'sklearn.preprocessing._label'
I haven't seen anything online so any help would be much appreciated.
I was able to solve my issue. Simply update scikit-learn from 0.21.3 to 0.22.0 seems to solve the issue. Along the way I have to update my pandas version to 0.25.2 as well.
The cue is provided in this link: https://www.gitmemory.com/vruusmann, where it states:
During Scikit-Learn version upgrade from 0.21.X to 0.22.X many modules were renamed (typically, by prepending an underscore character to the module name). For example, sklearn.preprocessing.label.LabelEncoder became sklearn.preprocessing._label.LabelEncoder.

Still downloading even Keras has the VGG16 pretrained model in ./keras/models

I tried running the VGG16 keras script.
I get this error:
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
Traceback (most recent call last):
File "test_imagenet.py", line 40, in
model = VGG16(weights="imagenet")
File "/home/nvidia/deep-learning-models/imagenet-example/vgg16.py", line 143, in VGG16
cache_subdir='models')
File "build/bdist.linux-aarch64/egg/keras/utils/data_utils.py", line 222, in get_file
Exception: URL fetch failure on https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5:
I tried to download it manually from here and paste it to ~/.keras/models.
But still, I am getting the same error. Why? I donĀ“t understand the error because the correct model already is in .keras/models.
The default value of include_top parameter in VGG16 function is True. This means if you want to use a full layer pre-trained VGG network (with fully connected parts) you need to download vgg16_weights_tf_dim_ordering_tf_kernels.h5 file, not vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5.

nltk.word_tokenize() giving AttributeError: 'module' object has no attribute 'defaultdict'

I am new to nltk.
I was trying some basics.
import nltk
nltk.word_tokenize("Tokenize me")
gives me this following error
Traceback (most recent call last):
File "<pyshell#27>", line 1, in <module>
nltk.word_tokenize("hi im no onee")
File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 101, in word_tokenize
return [token for sent in sent_tokenize(text, language)
File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 85, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "C:\Python27\lib\site-packages\nltk\data.py", line 786, in load
resource_val = pickle.load(opened_resource)
AttributeError: 'module' object has no attribute 'defaultdict'
Please someone help. Please tell me how to fix this error.
I just checked it on my system.
Fix:
>> import nltk
>> nltk.download('all')
Then everything worked fine.
>> import nltk
>> nltk.word_tokenize("Tokenize me")
['Tokenize', 'me']
I had the same error, and then I realized that I had saved the file as tokenize.py that's why I was getting this error when I changed the name of my python file with another name it worked fine. Hope this is helpful.
I found out later that I was using a backdated nltk data. The programs started to work fine as soon as I updated the data.
you need to update your nltk version. In case you are using anaconda, then do the following in terminal:
>> conda update nltk
It will update nltk. Then restart ipython and it should work!