Loading XGBoost Model: ModuleNotFoundError: No module named 'sklearn.preprocessing._label' - pickle

I'm having issues loading a pretrained xgboost model using the following code:
xgb_model = pickle.load(open('churnfinalunscaled.pickle.dat', 'rb'))
And when I do that, I get the following error:
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-29-31e7f426e19e> in <module>()
----> 1 xgb_model = pickle.load(open('churnfinalunscaled.pickle.dat', 'rb'))
ModuleNotFoundError: No module named 'sklearn.preprocessing._label'
I haven't seen anything online so any help would be much appreciated.

I was able to solve my issue. Simply update scikit-learn from 0.21.3 to 0.22.0 seems to solve the issue. Along the way I have to update my pandas version to 0.25.2 as well.
The cue is provided in this link: https://www.gitmemory.com/vruusmann, where it states:
During Scikit-Learn version upgrade from 0.21.X to 0.22.X many modules were renamed (typically, by prepending an underscore character to the module name). For example, sklearn.preprocessing.label.LabelEncoder became sklearn.preprocessing._label.LabelEncoder.

Related

Gensim Pickle Error: Enable to Load the Saved Topic Model

I am working on topic inference that will require to load a previously saved model.
However, I got a pickle error that says
Traceback (most recent call last):
File "topic_inference.py", line 35, in <module>
model_for_inference = gensim.models.LdaModel.load(model_name, mmap = 'r')
File "topic_modeling/env/lib/python3.8/site-packages/gensim/models/ldamodel.py", line 1663, in load
result = super(LdaModel, cls).load(fname, *args, **kwargs)
File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 486, in load
obj = unpickle(fname)
File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 1461, in unpickle
return _pickle.load(f, encoding='latin1') # needed because loading from S3 doesn't support readline()
TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given
The code I use to load the model is simply
gensim.models.LdaModel.load(model_name, mmap = 'r')
Here is the code that I use to create and save the model
model = gensim.models.ldamulticore.LdaMulticore(
corpus=comment_corpus,
id2word=key_word_dict, ## This is now a gensim.corpora.Dictionary Object, previously it was the .id2token attribute
chunksize=chunksize,
alpha='symmetric',
eta='auto',
iterations=iterations,
num_topics=num_topics,
passes=epochs,
eval_every=eval_every,
workers = 15,
minimum_probability= 0.0)
model.save(output_model)
where output_model doesn't have an extension like .model or .pkl
In the past, I tried the similar approach with the exception that I passed in a .id2token attribute under the gensim.corpora.Dictionary object instead of the full gensim.corpora.Dictionary to the id2word parameter when I created the model, and the method loads the model fine back then. I wonder if passing in a corpora.Dictionary will make a difference in the loading output...? Back that time, I was using regular python, but now I am using anaconda. However, all the versions of the packages are the same.
Another report of an error about __randomstate_ctor (at https://github.com/numpy/numpy/issues/14210) suggests the problem may be related to numpy object pickling.
Is there a chance that the configuration where your load is failing is using a later version of numpy than when the save occurred? Could you try, at least temporarily, rolling back to some older numpy (that's still sufficient for whatever Gensim you're using) to see if it helps?
If you find any load that works, even in a suboptimal config, you might be able to null-out whatever random-related objects are causing the problem and re-save, then having a saved version that loads better in your truly-desired configuration. Then, if the random-related objects truly needed after reload, it may be possible to manually re-constitute them. (I haven't looked into this yet, but if you find any workaround allowing a load, but then aren't sure what to manually null/rebuild, I could take a closer look.)

Transforming shapefiles to dataframes with shapefile_to_dataframe() helper function - fiona related error

I am trying to use the Palantir Foundry helper function shapefile_to_dataframe() in order to ingest shapefiles for later usage in geolocation features.
I have manually imported the shapefiles (.shp, .shx & .dbf) in a single dataset (no access issues through the filesystem API).
As per documentation, I have imported geospatial-tools and the GEOSPARK profiles + included dependencies in the transforms-python build.gradle.
Here is my transform code, which is mostly extracted from the documentation:
from transforms.api import transform, Input, Output, configure
from geospatial_tools import geospatial
from geospatial_tools.parsers import shapefile_to_dataframe
#geospatial()
#transform(
raw = Input("ri.foundry.main.dataset.0d984138-23da-4bcf-ad86-39686a14ef21"),
output = Output("/Indhu/InDhu/Vincent/geo_energy/datasets/extract_coord/raw_df")
)
def compute(raw, output):
return output.write_dataframe(shapefile_to_dataframe(raw))
Code assist then become extremely slow to load, and then I am finally getting following error:
AttributeError: partially initialized module 'fiona' has no attribute '_loading' (most likely due to a circular import)
Traceback (most recent call last):
File "/myproject/datasets/shp_to_df.py", line 3, in <module>
from geospatial_tools.parsers import shapefile_to_dataframe
File "/scratch/standalone/3a553998-623b-48f5-9c3f-03de7e64f328/code-assist/contents/transforms-python/build/conda/env/lib/python3.8/site-packages/geospatial_tools/parsers.py", line 11, in <module>
from fiona.drvsupport import supported_drivers
File "/scratch/standalone/3a553998-623b-48f5-9c3f-03de7e64f328/code-assist/contents/transforms-python/build/conda/env/lib/python3.8/site-packages/fiona/__init__.py", line 85, in <module>
with fiona._loading.add_gdal_dll_directories():
AttributeError: partially initialized module 'fiona' has no attribute '_loading' (most likely due to a circular import)
Thanks a lot for your help,
Vincent
I was able to reproduce this error and it seems like it happens only in previews - running the full build seems to be working fine. The simplest way to get around it is to move the import inside the function:
from transforms.api import transform, Input, Output, configure
from geospatial_tools import geospatial
#geospatial()
#transform(
raw = Input("ri.foundry.main.dataset.0d984138-23da-4bcf-ad86-39686a14ef21"),
output = Output("/Indhu/InDhu/Vincent/geo_energy/datasets/extract_coord/raw_df")
)
def compute(raw, output):
from geospatial_tools.parsers import shapefile_to_dataframe
return output.write_dataframe(shapefile_to_dataframe(raw))
However, at the moment, the function shapefile_to_dataframe isn't going to work in the Preview anyway because the full transforms.api.FileSystem API isn't implemented - specifically, the functions ls doesn't implement the parameter glob which the full transforms API does.

'Word2Vec' object has no attribute 'generate_training_data'

Code :
from gensim.models.word2vec import Word2Vec
w2v = Word2Vec()
training_data = w2v.generate_training_data(settings, corpus)
Error :
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-45-bae554564046> in <module>
1 w2v = Word2Vec()
2 # Numpy ndarray with one-hot representation for [target_word, context_words]
----> 3 training_data = w2v.generate_training_data(settings, corpus)
AttributeError: 'Word2Vec' object has no attribute 'generate_training_data'
I even tried importing gensim.models.word2vec and tried every possibility but couldn't get it done.
Can someone help me with it?
Thanks in advance !
Yes, the gensim Word2Vec class doesn't have that method – and as far as I know, it never has.
And from your example usage, I'm not even sure what it might purport to do: a Word2Vec model needs to be provided data in the right format – it doesn't "generate" it (even as a translation from some other corpus).
I suspect you are looking at docs or a code example from some other unrelated library.
For using gensim's Word2Vec, you should rely on the gensim documentation & examples. The class docs include some basic details of proper usage, and there's a Jupyter notebook word2vec.ipynb included with the library, in its docs/notebooks directory (and also viewable online).

Python error in MNIST TPU tutorial

I'm trying to get the MNIST example for TPUs in GCE running (as shown at https://cloud.google.com/tpu/docs/tutorials/mnist) but I've run into a couple of bumps. First, I had to set my PYTHONPATH to pick up the models directory which isn't listed as a step in the walk-through (perhaps it's obvious to daily python programmers, but it's not stated if it's not). After that I'm now hitting the following error that I'm not sure how to work around:
frival#tpu-demo-vm:~$ python /usr/share/models/official/mnist/mnist_tpu.py --tpu_name=$TPU_NAME --data_dir=${STORAGE_BUCKET}/data --model_dir=${STORAGE_BUCKET}/output --use_tpu=True --iterations=500 --train_steps=1000 --train_file=${STORAGE_BUCKET}/data/train.tfrecords
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Traceback (most recent call last):
File "/usr/share/models/official/mnist/mnist_tpu.py", line 163, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/usr/share/models/official/mnist/mnist_tpu.py", line 135, in main
FLAGS.tpu, zone=FLAGS.tpu_zone, project=FLAGS.gcp_project)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/cluster_resolver/python/training/tpu_cluster_resolver.py", line 128, in __init__
self._tpu = compat.as_bytes(tpu) # self._tpu is always bytes
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/compat.py", line 68, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got None
I've verified that TPU_NAME and STORAGE_BUCKET are set properly, and I've also verified that I see the TPU in the READY state from this VM although I don't think either of those would have caused this error. Does anyone know what I'm missing?
Probably your tensorflow version is newer than your mnist_tpu.py.
You may try this newer version of mnist_tpu.py here, instead use the --tpu flag as tayo mentioned above.
https://github.com/tensorflow/models/blob/master/official/mnist/mnist_tpu.py
Please change the --tpu_name=$TPU_NAME flag to --tpu=$TPU_NAME.
Apologies for the error, as this was a recent internal change that did not make it to the walk-through documentation. It is being corrected.
Good luck in TPU land!

SqlAlchemy: problem pickling pending instances (association_proxy)

I've successfully pickled persistent instances freshly loaded from the DB, but I can't seem to do the same for an instance that I've just created, and is in session.new.
Getting the following error (the python pickle module had the more helpful version of the message):
*** PicklingError: Can't pickle <function <lambda> at 0xb08d3ac>:
it's not found as sqlalchemy.ext.associationproxy.<lambda>
If I clear the association_proxy by doing the following:
my_new_obj.my_proxy = []
del my_new_obj.my_proxy
my_new_obj pickles fine.
Any ideas how I can have my association_proxy and eat it too?
Might have found a solution here:
http://www.sqlalchemy.org/trac/ticket/1446
(solution is to upgrade)