I am trying to upload an SVR model (created with sklearn) to S3 bucket using s3fs, but I get an error saying "TypeError: a bytes-like object is required, not 'SVR'". Can anyone suggest how to transform SVR into the right format?
My code is
model = SVR_model
fs = s3fs.S3FileSystem()
with fs.open('s3://bucket/SVR_model', 'wb') as f:
f.write(model)
Use pickle to turn model into a bytes object:
model = pickle.dumps(SVR_model)
fs = s3fs.S3FileSystem()
with fs.open('s3://bucket/SVR_model', 'wb') as f:
f.write(model)
Related
#after importing lib
T5model = AutoModel.from_pretrained("t5-small")
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
# from transformers import AutoTokenizer
processor.tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten")
model.config.decoder = T5model.decoder
model.config.pad_token_id = processor.tokenizer.pad_token_id
After uploading IAM data set and setting config and executing using trainer class from hugging face
output["decoder"] = self.decoder.to_dict()
File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1130, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'T5Stack' object has no attribute 'to_dict'
import csv
class Converter:
def __init__(self,csv):
with open(csv) as f:
obj = csv.reader(f)
conversions = list(obj)
conversions.pop(0)
if __name__ == "__main__":
with open("table.csv", "w") as f:
f.write("type,ratio\ncm_to_inch,0.393701\ninch_to_cm,2.54")
cvt = Converter("table.csv")
r1 = cvt.convert(5.5, "cm", "inch")
r2 = cvt.convert(100, "inch", "cm")
print(f"{r1}\n{r2}")
So I am trying to use class to convert each values from cm to inches and inches to cm. These conversion rate will be written in csv file. However, when I activate this code, AttributeError: 'str' object has no attribute 'reader' continuously pop up. Please help
It's probably a resolution issue try changing the name of your constructor parameter to something other than csv like csv_name, because that's what is being referred to when you're actually trying to refer to the imported csv module in the csv.render() call.
For context, I have a json file that I am reading in with the following code:
with open("file.json", "w") as jsonFile:
data = json.load(jsonFile)
Then, I modify this dict based on some other information:
for key in data:
if key == label_file[-len(key):]:
print(data)
del data[key]
print(data)
I know that this del call is actually working properly because I can see the actual json file being modified. Then, I update this file in the bucket through these lines
with open('file.json', 'w', encoding='utf-8') as updatedFile:
json.dump(data, updatedFile, ensure_ascii=False, indent=4)
s3.Bucket('bucket-name').upload_file(os.path.abspath('file.json'), 'file.json', ExtraArgs={'ContentType': "application/json"})
Reading the json file before updating the file works completely fine. Additionally, I can read from the json file if it is stored locally. However, in the case that I download the json file immediately from the bucket, the program crashes on data = json.load(file).
Additionally, I've tried replacing the text in the downloaded json file with exactly what's in the local one, and it still crashes so I believe it is some file type issue with the S3 bucket.
This the code I use to read a json file from s3
s3sr = boto3.resource('s3')
bucket_obj=s3sr.Bucket(bucket)
obj = s3sc.get_object(Bucket=bucket, Key=key)
data = obj['Body'].read().decode('utf-8')
jsondata = json.loads(data)
Ok so I am a beginner to AWS in general. I am writing a lambda function to trigger based on file upload event in S3, remove some coulmns and write it to a new bucket. Been banging my head for the past two datas and I am getting different error each time. Can someone modify my code/fix it? outputlv will be my target bucket.. Currently I am getting '/outputlv/output.csv' path does not exist in the with open('/outputlv/output.csv', 'w') as output_file line. Thanks.
import json
import urllib.parse
import boto3
import csv
s3 = boto3.client('s3')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
file_name = s3.get_object(Bucket=bucket, Key=key)
csv_reader = csv.reader(file_name)
with open('/outputlv/output.csv', 'w') as output_file:
wtr = csv.writer(output_file)
for i in csv_reader:
wtr.writerow(i[0], i[2], i[3])
target_bucket = 'outputlv'
final_file = 'outputlv/output.csv'
s3.put_object(Bucket=target_bucket, Key=final_file)
Why don't you get the content, is it required to work with local files at all ?
response = s3.get_object(Bucket=bucket, Key=key)
# Get file content
content = response['Body'].read()
# Pass file content to csv reader
csv_reader = csv.reader(content)
I am trying to read a JSON file from Amazon S3 and its file size is about 2GB. When I use the method .read(), it gives me MemoryError.
Are there any solutions to this problem? Any help would do, thank you so much!
So, I found a way which worked for me efficiently. I had 1.60 GB file and need to load for processing.
s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)
# Now we collected data in the form of bytes array.
data_in_bytes = s3.Object(bucket_name, filename).get()['Body'].read()
#Decode it in 'utf-8' format
decoded_data = data_in_bytes.decode('utf-8')
#I used io module for creating a StringIO object.
stringio_data = io.StringIO(decoded_data)
#Now just read the StringIO obj line by line.
data = stringio_data.readlines()
#Its time to use json module now.
json_data = list(map(json.loads, data))
So json_data is the content of the file. I know there are lots of variable manipulations, but it worked for me.
Just iterate over the object.
s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)
fileObj = s3.get_object(Bucket='bucket_name', Key='key')
for row in fileObj["body"]:
line = row.decode('utf-8')
print(json.loads(line))
I just solved the problem. Here's the code. Hope it helps for future use!
s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)
obj = s3.get_object(Bucket='bucket_name', Key='key')
data = (line.decode('utf-8') for line in obj['Body'].iter_lines())
for row in file_content:
print(json.loads(row))
import json
import boto3
def lambda_handler(event, context):
s3 = boto3.resource('s3')
#reading all s3 bucket
for bucket in s3.buckets.all():
print(bucket.name)
#json_data = s3.Object("vkhan-s3-bucket, "config/sandbox/config.json").get()['Body'].read()
json_data=json.loads(s3.Object("vkhan-s3-bucket", "config/sandbox/config.json").get()['Body'].read().decode())
print(json_data)
return {
'statusCode': 200,
'body': json.dumps(json_data)
}