issue with loading a saved ensemble model with Python ( Can't get attribute 'Ensemble' on <module '__main__'>) - pickle

I have an ensemble of XGBoost models that I saved with both "pickle" and "joblib". If I load the model in the same script that I save, it will be loaded successfully. However, if I load the model in another script, it will give me an AttributError saying that
"AttributeError: Can't get attribute 'Ensemble' on <module 'main'>".
Any suggestion is appreciated.
It is a general issue because I have this issue with both "pickle" and "joblib".
loading library
import pickle
load saved model
with open('/scratch/gpfs/sg6615/known_antibiotics/Code/test.pkl', 'rb') as f:
lr = pickle.load(f)
AttributeError Traceback (most recent call last)
Input In [5], in <cell line: 8>()
7 # load saved model
8 with open('/scratch/gpfs/sg6615/known_antibiotics/Code/test.pkl', 'rb') as f:
----> 9 lr = pickle.load(f)
AttributeError: Can't get attribute 'Ensemble' on <module 'main'>

Related

Use json overrides default argparse parameters

I have a argparse function containing a mix of internal and user specify settings. I want to use a json as configuration file to store user-specified parameters so that the json will be parsed back to this argparse function.
I also have a mix of data types in the parameters, they are defined in argparse but not in the json.
My argparse function looks like this
def parse_opt():
parser = argparse.ArgumentParser()
parser.add_argument('--name', nargs='+', type=str, default='experiment', help='project name') #specify by users
parser.add_argument('--visualise', action='store_true', help='output contains graphs') #specify by users
parser.add_argument('--imgsize', '--img', '--img-size', nargs='+', type=int, default=[640], help='image size h,w') #let users specify
parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='(optional) dataset.yaml path') #internal default setting
parser.add_argument('--thres', type=float, default=0.3, help='threshold') #internal default setting
opt = parser.parse_args()
return opt
My json configuration config.json looks like this, and it allows users to specify their parameters
d = {"name": "trial_001",
"visualise": true,
"imgsize": 1280}
I tried the following to pass new configurations using the script below, and ran into error TypeError: 'bool' object is not subscriptable . In the main() function, I want all default settings parsed as opt , then the three use user-defined parameters defined in config.json will override opt.name, opt.visualise and opt.imgsize. Then detect(**vars(opt)) reads all users and default parameters and apply detect() function to them (note: my detect() function isn't added in this post as it is quite long). Appreciate any pointers here. thanks.
import argparse
import json
def main(opt):
opt = parse_opt()
with open('config.json') as config_file:
d = json.loads(config_file.read())
for item in d.items():
args.extend(item)
detect(**vars(opt)) #detect() is a function that reads all variables from opt
if __name__ == "__main__":
main(opt)
EDIT: this is the full error message I encountered.
for item in d.items():
args.extend(item)
parser.parse_args(args)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-26-53a113868d66>", line 1, in <module>
parser.parse_args(args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/argparse.py", line 1749, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/argparse.py", line 1781, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/argparse.py", line 1822, in _parse_known_args
option_tuple = self._parse_optional(arg_string)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/argparse.py", line 2108, in _parse_optional
if not arg_string[0] in self.prefix_chars:
TypeError: 'bool' object is not subscriptable

Problem uploading an sklearn model to S3 bucket using s3fs

I am trying to upload an SVR model (created with sklearn) to S3 bucket using s3fs, but I get an error saying "TypeError: a bytes-like object is required, not 'SVR'". Can anyone suggest how to transform SVR into the right format?
My code is
model = SVR_model
fs = s3fs.S3FileSystem()
with fs.open('s3://bucket/SVR_model', 'wb') as f:
f.write(model)
Use pickle to turn model into a bytes object:
model = pickle.dumps(SVR_model)
fs = s3fs.S3FileSystem()
with fs.open('s3://bucket/SVR_model', 'wb') as f:
f.write(model)

'NoneType' object has no attribute 'read' when reading from JSON file

I am making a script for a school project that requires that I receive a JSON file that tells me if a license plate is visible in a picture. Right now the code sends a POST with an image to an API that then gives me a JSON in return, that JSON data is sent to the file "lastResponse.json."
The code that is giving out the error
with open('lastResponse.json', 'r+') as fp:
f = json.dump(r.json(), fp, sort_keys=True, indent=4) # Where the response data is sent to the JSON
data = json.load(f) # Line that triggers the error
print(data["results"]) # Debug code
print("------------------") # Debug code
print(data) # Debug code
# This statement just checks if a license plate is visible
if data["results"]["plate"] is None:
print("No car detected!")
else:
print("Car with plate number '" + data["results"]["plate"] + "' has been detected")
The Error
Traceback (most recent call last):
File "DetectionFinished.py", line 19, in <module>
data = json.load(f)
File "/usr/lib/python3.7/json/__init__.py", line 293, in load
return loads(fp.read(),
AttributeError: 'NoneType' object has no attribute 'read'
I am not very experienced in Python so I would appreciate explanations!
It turns out, after rereading the API's documentation and using their examples I was able to fix my issues
import requests
from pprint import pprint
regions = ['gb', 'it']
with open('/path/to/car.jpg', 'rb') as fp:
response = requests.post(
'https://api.platerecognizer.com/v1/plate-reader/',
data=dict(regions=regions), # Optional
files=dict(upload=fp),
headers={'Authorization': 'Token API_TOKEN'})
pprint(response.json())

Python Json reference and validation

I'm starting using python to validate some json information, i'm using a json schema with reference but i'm having trouble to reference those files. This is the code :
from os.path import join, dirname
from jsonschema import validate
import jsonref
def assert_valid_schema(data, schema_file):
""" Checks whether the given data matches the schema """
schema = _load_json_schema(schema_file)
return validate(data, schema)
def _load_json_schema(filename):
""" Loads the given schema file """
relative_path = join('schemas', filename).replace("\\", "/")
absolute_path = join(dirname(__file__), relative_path).replace("\\", "/")
base_path = dirname(absolute_path)
base_uri = 'file://{}/'.format(base_path)
with open(absolute_path) as schema_file:
return jsonref.loads(schema_file.read(), base_uri=base_uri, jsonschema=True, )
assert_valid_schema(data, 'grandpa.json')
The json data is :
data = {"id":1,"work":{"id":10,"name":"Miroirs","composer":{"id":100,"name":"Maurice Ravel","functions":["Composer"]}},"recording_artists":[{"id":101,"name":"Alexandre Tharaud","functions":["Piano"]},{"id":102,"name":"Jean-Martial Golaz","functions":["Engineer","Producer"]}]}
And i'm saving the schema and reference file, into a schemas folder :
recording.json :
{"$schema":"http://json-schema.org/draft-04/schema#","title":"Schema for a recording","type":"object","properties":{"id":{"type":"number"},"work":{"type":"object","properties":{"id":{"type":"number"},"name":{"type":"string"},"composer":{"$ref":"artist.json"}}},"recording_artists":{"type":"array","items":{"$ref":"artist.json"}}},"required":["id","work","recording_artists"]}
artist.json :
{"$schema":"http://json-schema.org/draft-04/schema#","title":"Schema for an artist","type":"object","properties":{"id":{"type":"number"},"name":{"type":"string"},"functions":{"type":"array","items":{"type":"string"}}},"required":["id","name","functions"]}
And this is my error :
Connected to pydev debugger (build 181.5281.24)
Traceback (most recent call last):
File "C:\Python\lib\site-packages\proxytypes.py", line 207, in __subject__
return self.cache
File "C:\Python\lib\site-packages\proxytypes.py", line 131, in __getattribute__
return _oga(self, attr)
AttributeError: 'JsonRef' object has no attribute 'cache'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python\lib\site-packages\jsonref.py", line 163, in callback
base_doc = self.loader(uri)
<MORE>
python version : 3.6.5
windows 7
Ide : intellijIdea
Can somebody help me?
Thank you
I am not sure why, but on Windows, the file:// needs an extra /. So the following change should do the trick
base_uri = 'file:///{}/'.format(base_path)
Arrived at this answer from a solution posted for a related issue in json schema

How to get the actual value of a cell with openpyxl?

I'm a beginner with Python and I need help. I'm using Python 2.7 and I'm trying to retrieve the cell values of an excel file and store it into a csv file. My code is the following:
import os, openpyxl, csv
aggname = "deu"
wb_source = openpyxl.load_workbook(filename, data_only = True)
app_file = open(filename,'a')
dest_file = csv.writer(app_file, delimiter=',', lineterminator='\n')
calib_sheet = wb_source.get_sheet_by_name('Calibration')
data = calib_sheet['B78:C88']
data = list(data)
print(data)
for i in range(len(data)):
dest_file.writerow(data[i])
app_file.close()
In my csv file, I get this, instead of the actual value (for example in my case: SFCG, 99103).
<Cell Calibration.B78>,<Cell Calibration.C78>
<Cell Calibration.B79>,<Cell Calibration.C79>
<Cell Calibration.B80>,<Cell Calibration.C80>
<Cell Calibration.B81>,<Cell Calibration.C81>
<Cell Calibration.B82>,<Cell Calibration.C82>
<Cell Calibration.B83>,<Cell Calibration.C83>
<Cell Calibration.B84>,<Cell Calibration.C84>
<Cell Calibration.B85>,<Cell Calibration.C85>
<Cell Calibration.B86>,<Cell Calibration.C86>
<Cell Calibration.B87>,<Cell Calibration.C87>
<Cell Calibration.B88>,<Cell Calibration.C88>
I tried to set the data_only = True, when opening the excel file as suggested in answers to similar questions but it doesn't solve my problem.
---------------EDIT-------------
Taking into account the first two answers I got (thank you!), I tried several things:
for i in range(len(data)):
dest_file.writerows(data[i].value)
I get this error message :
for i in range(len(data)):
dest_file.writerows(data[i].values)
Traceback (most recent call last):
File "<ipython-input-78-27828c989b39>", line 2, in <module>
dest_file.writerows(data[i].values)
AttributeError: 'tuple' object has no attribute 'values'
Then I tried this instead:
for i in range(len(data)):
for j in range(2):
dest_file.writerow(data[i][j].value)
and then I have the following error message:
for i in range(len(data)):
for j in range(2):
dest_file.writerow(data[i][j].value)
Traceback (most recent call last):
File "<ipython-input-80-c571abd7c3ec>", line 3, in <module>
dest_file.writerow(data[i][j].value)
Error: sequence expected
So then, I tried this:
import os, openpyxl, csv
wb_source = openpyxl.load_workbook(filename, data_only=True)
app_file = open(filename,'a')
dest_file = csv.writer(app_file, delimiter=',', lineterminator='\n')
calib_sheet = wb_source.get_sheet_by_name('Calibration')
list(calib_sheet.iter_rows('B78:C88'))
for row in calib_sheet.iter_rows('B78:C88'):
for cell in row:
dest_file.writerow(cell.value)
Only to get this error message:
Traceback (most recent call last):
File "<ipython-input-81-5bed62b45985>", line 12, in <module>
dest_file.writerow(cell.value)
Error: sequence expected
For the "sequence expected" error I suppose python expects a list rather than a single cell, so I did this:
import os, openpyxl, csv
wb_source = openpyxl.load_workbook(filename, data_only=True)
app_file = open(filename,'a')
dest_file = csv.writer(app_file, delimiter=',', lineterminator='\n')
calib_sheet = wb_source.get_sheet_by_name('Calibration')
list(calib_sheet.iter_rows('B78:C88'))
for row in calib_sheet.iter_rows('B78:C88'):
dest_file.writerow(row)
There is no error message but I only get the reference of the cell in csv file and changing it to dest_file.writerow(row.value) brings me back to the tuple error.
I obviously still need your help!
You've forgot to get the cell's value! See the documentation
I found a way around it using numpy, which allows me to store my values as a list of lists rather than a list of tuples.
import os, openpyxl, csv
import numpy as np
wb_source = openpyxl.load_workbook(filename, data_only=True)
app_file = open(filename,'a')
dest_file = csv.writer(app_file, delimiter=',', lineterminator='\n')
calib_sheet = wb_source.get_sheet_by_name('Calibration')
store = list(calib_sheet.iter_rows('B78:C88'))
print store
truc = np.array(store)
print truc
for i in range(11):
for j in range(1):
dest_file.writerow([truc[i][j].value, truc[i][j+1].value])
app_file.close()
I actually have a sequence as my argument in "writerow()" and with the list object I can also use the double index and the value method to retrieve the value of my cell.
Try using data.values instead of just data when you are printing it.
Hope it helps !!
**
***An example :
import openpyxl
import re
import os
wc=openpyxl.load_workbook('<path of the file>') wcsheet=wc.get_sheet_by_name('test')
store=[]
for data in wcsheet.columns[0]:
store=data
print(store.value)***
=======================
=================================================
**
Live Life Buddha Size