Problem uploading an sklearn model to S3 bucket using s3fs - boto

I am trying to upload an SVR model (created with sklearn) to S3 bucket using s3fs, but I get an error saying "TypeError: a bytes-like object is required, not 'SVR'". Can anyone suggest how to transform SVR into the right format?
My code is
model = SVR_model
fs = s3fs.S3FileSystem()
with fs.open('s3://bucket/SVR_model', 'wb') as f:
f.write(model)

Use pickle to turn model into a bytes object:
model = pickle.dumps(SVR_model)
fs = s3fs.S3FileSystem()
with fs.open('s3://bucket/SVR_model', 'wb') as f:
f.write(model)

Related

How can I use the T5 transformers decoder part as decoder for TrOCR ? for fine tuning task

#after importing lib
T5model = AutoModel.from_pretrained("t5-small")
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
# from transformers import AutoTokenizer
processor.tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten")
model.config.decoder = T5model.decoder
model.config.pad_token_id = processor.tokenizer.pad_token_id
After uploading IAM data set and setting config and executing using trainer class from hugging face
output["decoder"] = self.decoder.to_dict()
File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1130, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'T5Stack' object has no attribute 'to_dict'

using class and csv file to convert

import csv
class Converter:
def __init__(self,csv):
with open(csv) as f:
obj = csv.reader(f)
conversions = list(obj)
conversions.pop(0)
if __name__ == "__main__":
with open("table.csv", "w") as f:
f.write("type,ratio\ncm_to_inch,0.393701\ninch_to_cm,2.54")
cvt = Converter("table.csv")
r1 = cvt.convert(5.5, "cm", "inch")
r2 = cvt.convert(100, "inch", "cm")
print(f"{r1}\n{r2}")
So I am trying to use class to convert each values from cm to inches and inches to cm. These conversion rate will be written in csv file. However, when I activate this code, AttributeError: 'str' object has no attribute 'reader' continuously pop up. Please help
It's probably a resolution issue try changing the name of your constructor parameter to something other than csv like csv_name, because that's what is being referred to when you're actually trying to refer to the imported csv module in the csv.render() call.

Cannot read json file downloaded from s3 bucket

For context, I have a json file that I am reading in with the following code:
with open("file.json", "w") as jsonFile:
data = json.load(jsonFile)
Then, I modify this dict based on some other information:
for key in data:
if key == label_file[-len(key):]:
print(data)
del data[key]
print(data)
I know that this del call is actually working properly because I can see the actual json file being modified. Then, I update this file in the bucket through these lines
with open('file.json', 'w', encoding='utf-8') as updatedFile:
json.dump(data, updatedFile, ensure_ascii=False, indent=4)
s3.Bucket('bucket-name').upload_file(os.path.abspath('file.json'), 'file.json', ExtraArgs={'ContentType': "application/json"})
Reading the json file before updating the file works completely fine. Additionally, I can read from the json file if it is stored locally. However, in the case that I download the json file immediately from the bucket, the program crashes on data = json.load(file).
Additionally, I've tried replacing the text in the downloaded json file with exactly what's in the local one, and it still crashes so I believe it is some file type issue with the S3 bucket.
This the code I use to read a json file from s3
s3sr = boto3.resource('s3')
bucket_obj=s3sr.Bucket(bucket)
obj = s3sc.get_object(Bucket=bucket, Key=key)
data = obj['Body'].read().decode('utf-8')
jsondata = json.loads(data)

How to read a csv file from S3 bucket using AWS lambda and write it as new CSV to another S3 bucket? Python boto3

Ok so I am a beginner to AWS in general. I am writing a lambda function to trigger based on file upload event in S3, remove some coulmns and write it to a new bucket. Been banging my head for the past two datas and I am getting different error each time. Can someone modify my code/fix it? outputlv will be my target bucket.. Currently I am getting '/outputlv/output.csv' path does not exist in the with open('/outputlv/output.csv', 'w') as output_file line. Thanks.
import json
import urllib.parse
import boto3
import csv
s3 = boto3.client('s3')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
file_name = s3.get_object(Bucket=bucket, Key=key)
csv_reader = csv.reader(file_name)
with open('/outputlv/output.csv', 'w') as output_file:
wtr = csv.writer(output_file)
for i in csv_reader:
wtr.writerow(i[0], i[2], i[3])
target_bucket = 'outputlv'
final_file = 'outputlv/output.csv'
s3.put_object(Bucket=target_bucket, Key=final_file)
Why don't you get the content, is it required to work with local files at all ?
response = s3.get_object(Bucket=bucket, Key=key)
# Get file content
content = response['Body'].read()
# Pass file content to csv reader
csv_reader = csv.reader(content)

How to read large JSON file from Amazon S3 using Boto3

I am trying to read a JSON file from Amazon S3 and its file size is about 2GB. When I use the method .read(), it gives me MemoryError.
Are there any solutions to this problem? Any help would do, thank you so much!
So, I found a way which worked for me efficiently. I had 1.60 GB file and need to load for processing.
s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)
# Now we collected data in the form of bytes array.
data_in_bytes = s3.Object(bucket_name, filename).get()['Body'].read()
#Decode it in 'utf-8' format
decoded_data = data_in_bytes.decode('utf-8')
#I used io module for creating a StringIO object.
stringio_data = io.StringIO(decoded_data)
#Now just read the StringIO obj line by line.
data = stringio_data.readlines()
#Its time to use json module now.
json_data = list(map(json.loads, data))
So json_data is the content of the file. I know there are lots of variable manipulations, but it worked for me.
Just iterate over the object.
s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)
fileObj = s3.get_object(Bucket='bucket_name', Key='key')
for row in fileObj["body"]:
line = row.decode('utf-8')
print(json.loads(line))
I just solved the problem. Here's the code. Hope it helps for future use!
s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)
obj = s3.get_object(Bucket='bucket_name', Key='key')
data = (line.decode('utf-8') for line in obj['Body'].iter_lines())
for row in file_content:
print(json.loads(row))
import json
import boto3
def lambda_handler(event, context):
s3 = boto3.resource('s3')
#reading all s3 bucket
for bucket in s3.buckets.all():
print(bucket.name)
#json_data = s3.Object("vkhan-s3-bucket, "config/sandbox/config.json").get()['Body'].read()
json_data=json.loads(s3.Object("vkhan-s3-bucket", "config/sandbox/config.json").get()['Body'].read().decode())
print(json_data)
return {
'statusCode': 200,
'body': json.dumps(json_data)
}