Is there a way to load Python 3.6 pickle in Python 3.8? - pickle

I have a pickle file created in Python 3.6 with protocol=pickle.HIGHEST_PROTOCOL, which is 4.
with open(file_path, 'wb') as ff:
pickle.dump(data, ff, protocol=pickle.HIGHEST_PROTOCOL)
I am trying load the file in Python 3.8, where pickle.DEFAULT_PROTOCOL = 4.
with open(file_path, 'rb') as ff:
data = pickle.load(ff)
I am getting TypeError: an integer is required (got type bytes).
I tried adding different encodings to the pickle.load call (pickle.load(ff, encodings=...)) but I have no idea what is the problem with the file.
Is this some backward-incompatibility with Python 3.8?

Related

Gensim Pickle Error: Enable to Load the Saved Topic Model

I am working on topic inference that will require to load a previously saved model.
However, I got a pickle error that says
Traceback (most recent call last):
File "topic_inference.py", line 35, in <module>
model_for_inference = gensim.models.LdaModel.load(model_name, mmap = 'r')
File "topic_modeling/env/lib/python3.8/site-packages/gensim/models/ldamodel.py", line 1663, in load
result = super(LdaModel, cls).load(fname, *args, **kwargs)
File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 486, in load
obj = unpickle(fname)
File "topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 1461, in unpickle
return _pickle.load(f, encoding='latin1') # needed because loading from S3 doesn't support readline()
TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given
The code I use to load the model is simply
gensim.models.LdaModel.load(model_name, mmap = 'r')
Here is the code that I use to create and save the model
model = gensim.models.ldamulticore.LdaMulticore(
corpus=comment_corpus,
id2word=key_word_dict, ## This is now a gensim.corpora.Dictionary Object, previously it was the .id2token attribute
chunksize=chunksize,
alpha='symmetric',
eta='auto',
iterations=iterations,
num_topics=num_topics,
passes=epochs,
eval_every=eval_every,
workers = 15,
minimum_probability= 0.0)
model.save(output_model)
where output_model doesn't have an extension like .model or .pkl
In the past, I tried the similar approach with the exception that I passed in a .id2token attribute under the gensim.corpora.Dictionary object instead of the full gensim.corpora.Dictionary to the id2word parameter when I created the model, and the method loads the model fine back then. I wonder if passing in a corpora.Dictionary will make a difference in the loading output...? Back that time, I was using regular python, but now I am using anaconda. However, all the versions of the packages are the same.
Another report of an error about __randomstate_ctor (at https://github.com/numpy/numpy/issues/14210) suggests the problem may be related to numpy object pickling.
Is there a chance that the configuration where your load is failing is using a later version of numpy than when the save occurred? Could you try, at least temporarily, rolling back to some older numpy (that's still sufficient for whatever Gensim you're using) to see if it helps?
If you find any load that works, even in a suboptimal config, you might be able to null-out whatever random-related objects are causing the problem and re-save, then having a saved version that loads better in your truly-desired configuration. Then, if the random-related objects truly needed after reload, it may be possible to manually re-constitute them. (I haven't looked into this yet, but if you find any workaround allowing a load, but then aren't sure what to manually null/rebuild, I could take a closer look.)

Transforming shapefiles to dataframes with shapefile_to_dataframe() helper function - fiona related error

I am trying to use the Palantir Foundry helper function shapefile_to_dataframe() in order to ingest shapefiles for later usage in geolocation features.
I have manually imported the shapefiles (.shp, .shx & .dbf) in a single dataset (no access issues through the filesystem API).
As per documentation, I have imported geospatial-tools and the GEOSPARK profiles + included dependencies in the transforms-python build.gradle.
Here is my transform code, which is mostly extracted from the documentation:
from transforms.api import transform, Input, Output, configure
from geospatial_tools import geospatial
from geospatial_tools.parsers import shapefile_to_dataframe
#geospatial()
#transform(
raw = Input("ri.foundry.main.dataset.0d984138-23da-4bcf-ad86-39686a14ef21"),
output = Output("/Indhu/InDhu/Vincent/geo_energy/datasets/extract_coord/raw_df")
)
def compute(raw, output):
return output.write_dataframe(shapefile_to_dataframe(raw))
Code assist then become extremely slow to load, and then I am finally getting following error:
AttributeError: partially initialized module 'fiona' has no attribute '_loading' (most likely due to a circular import)
Traceback (most recent call last):
File "/myproject/datasets/shp_to_df.py", line 3, in <module>
from geospatial_tools.parsers import shapefile_to_dataframe
File "/scratch/standalone/3a553998-623b-48f5-9c3f-03de7e64f328/code-assist/contents/transforms-python/build/conda/env/lib/python3.8/site-packages/geospatial_tools/parsers.py", line 11, in <module>
from fiona.drvsupport import supported_drivers
File "/scratch/standalone/3a553998-623b-48f5-9c3f-03de7e64f328/code-assist/contents/transforms-python/build/conda/env/lib/python3.8/site-packages/fiona/__init__.py", line 85, in <module>
with fiona._loading.add_gdal_dll_directories():
AttributeError: partially initialized module 'fiona' has no attribute '_loading' (most likely due to a circular import)
Thanks a lot for your help,
Vincent
I was able to reproduce this error and it seems like it happens only in previews - running the full build seems to be working fine. The simplest way to get around it is to move the import inside the function:
from transforms.api import transform, Input, Output, configure
from geospatial_tools import geospatial
#geospatial()
#transform(
raw = Input("ri.foundry.main.dataset.0d984138-23da-4bcf-ad86-39686a14ef21"),
output = Output("/Indhu/InDhu/Vincent/geo_energy/datasets/extract_coord/raw_df")
)
def compute(raw, output):
from geospatial_tools.parsers import shapefile_to_dataframe
return output.write_dataframe(shapefile_to_dataframe(raw))
However, at the moment, the function shapefile_to_dataframe isn't going to work in the Preview anyway because the full transforms.api.FileSystem API isn't implemented - specifically, the functions ls doesn't implement the parameter glob which the full transforms API does.

Declare encoding in Open AI Gym implementation on Python 3

I am learning reinforcement learning and following this tutorial. I am trying to run an instance of CartPole-v0 environment and getting this error.
import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample())
SyntaxError: Non-ASCII character '\xc2' in file /home/kshitizsahay26/gym/gym/envs/classic_control/cartpole.py on line 27, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
I read that the default encoding in Python 3 is UTF-8 but it doesn't seem so in this case. I looked up at the URL mentioned in the error message but it 's applicable for Python 2.6. How should I change the encoding in this script?
I fixed this error by adding:
# -*- coding: utf-8 -*-
to the beginning of cartpole.py file

nltk.word_tokenize() giving AttributeError: 'module' object has no attribute 'defaultdict'

I am new to nltk.
I was trying some basics.
import nltk
nltk.word_tokenize("Tokenize me")
gives me this following error
Traceback (most recent call last):
File "<pyshell#27>", line 1, in <module>
nltk.word_tokenize("hi im no onee")
File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 101, in word_tokenize
return [token for sent in sent_tokenize(text, language)
File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 85, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "C:\Python27\lib\site-packages\nltk\data.py", line 786, in load
resource_val = pickle.load(opened_resource)
AttributeError: 'module' object has no attribute 'defaultdict'
Please someone help. Please tell me how to fix this error.
I just checked it on my system.
Fix:
>> import nltk
>> nltk.download('all')
Then everything worked fine.
>> import nltk
>> nltk.word_tokenize("Tokenize me")
['Tokenize', 'me']
I had the same error, and then I realized that I had saved the file as tokenize.py that's why I was getting this error when I changed the name of my python file with another name it worked fine. Hope this is helpful.
I found out later that I was using a backdated nltk data. The programs started to work fine as soon as I updated the data.
you need to update your nltk version. In case you are using anaconda, then do the following in terminal:
>> conda update nltk
It will update nltk. Then restart ipython and it should work!

nltk pos_tag usage

I am trying to use speech tagging in NLTK and have used this command:
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
nltk.pos_tag(text)
File "C:\Python27\lib\site-packages\nltk\tag\__init__.py", line 99, in pos_tag
tagger = load(_POS_TAGGER)
File "C:\Python27\lib\site-packages\nltk\data.py", line 605, in load
resource_val = pickle.load(_open(resource_url))
File "C:\Python27\lib\site-packages\nltk\data.py", line 686, in _open
return find(path).open()
File "C:\Python27\lib\site-packages\nltk\data.py", line 467, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not
found. Please use the NLTK Downloader to obtain the resource:
However, I get an error message which shows:
engish.pickle not found.
I have download the whole corpora and the english.pickle file is there in the maxtent_treebank_pos_tagger
What can I do to get this to work?
Your Python installation is not able to reach maxent or treemap.
First, check if the tagger is indeed there:
Start Python from the command line.
>>> import nltk
Then you can check using
>>> dir (nltk)
Look through the list to see if maxent and treebank are both there.
Easier would be to type
>>> "maxent" in dir(nltk)
>>> True
>>> "treebank" in dir(nltk)
>>> True
Use nltk.download() --> Models tab and check to see if the treemap tagger shows as installed.
You should also try downloading the tagger again.
If you don't want to use the downloader gui, you can just use the following commands in a python or ipython shell:
import nltk
nltk.download('punkt')
nltk.download('maxent_treebank_pos_tagger')
Over 50 corpora and lexical resources such as WordNet: http://www.nltk.org/nltk_data/ for free.
Use http://nltk.github.com/nltk_data/ as server index instead of googlecode
Google code 401: Authorization Required