For context, I have a json file that I am reading in with the following code:
with open("file.json", "w") as jsonFile:
data = json.load(jsonFile)
Then, I modify this dict based on some other information:
for key in data:
if key == label_file[-len(key):]:
print(data)
del data[key]
print(data)
I know that this del call is actually working properly because I can see the actual json file being modified. Then, I update this file in the bucket through these lines
with open('file.json', 'w', encoding='utf-8') as updatedFile:
json.dump(data, updatedFile, ensure_ascii=False, indent=4)
s3.Bucket('bucket-name').upload_file(os.path.abspath('file.json'), 'file.json', ExtraArgs={'ContentType': "application/json"})
Reading the json file before updating the file works completely fine. Additionally, I can read from the json file if it is stored locally. However, in the case that I download the json file immediately from the bucket, the program crashes on data = json.load(file).
Additionally, I've tried replacing the text in the downloaded json file with exactly what's in the local one, and it still crashes so I believe it is some file type issue with the S3 bucket.
This the code I use to read a json file from s3
s3sr = boto3.resource('s3')
bucket_obj=s3sr.Bucket(bucket)
obj = s3sc.get_object(Bucket=bucket, Key=key)
data = obj['Body'].read().decode('utf-8')
jsondata = json.loads(data)
Related
I have a scenario like in one folder having 100 xmls so need to convert in to json with the same names of json instead of xml i have tried many ways but all xmls are converting into one json but i need separate json files
by using this all 100 xmls files are converting into one json file but i need 100 seprate json files, can anyone please suggest best approach
for path, dirs, files in os.walk(r"C:\exml\FND_GLS".format(type) ):
for f in files:
clinical = os.path.join(path, f)
print(clinical)
tree = ET.parse(clinical)
root = tree.getroot()
xmlstr = ET.tostring(root, encoding='utf-8', method='xml')
data_dict = dict(xmltodict.parse(xmlstr))
def write_json(target_path, target_file, data):
if not os.path.exists(target_path):
try:
os.makedirs(target_path)
except Exception as e:
print(e)
raise
with open(os.path.join(target_path, target_file), 'w') as f:
data = json.dump(data_dict,f, sort_keys=False, indent=4)
write_json('C:\ejson\FND_GLS_Json', 'my_json.json', 'data')
How do I read data from a json file in a pyhton file
json file:
{
"bot-token": "BotTokenHere",
"OwnerDiscordName": "UsernameAndTag"
}
I'm trying to get data from a file called config.json and put it into a python variable the python file is call bot.py
You have to load the file content through json lib:
import json
with open("config.json") as f:
data = json.load(f)
Then you can acess the items by dict indexing:
token = data["bot-token"]
I'm trying to open a json file using the json library in Python 3.8 but I have not succeeded.
This is my MWE:
with open(pbit_path + file_name, 'r') as f:
data = json.load(f)
print(data)
where pbit_path and file_name is the absolute path of the .json file. As an example, this is a sample of the .json file that i'm trying to open.
https://github.com/pwnaoj/desktop-tutorial/blob/master/DataModelSchema.json
Error returned
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
I have also tried using the functions loads(), dump(), dumps().
I appreciate any suggestions
Thanks in advance.
I found a solution to my problem. In principle, it is an encoding problem since the type of file I am trying to read is encoded with UCS-2, so in python
with open(file, mode='r', encoding='utf_16_le') as file:
data = file.read()
data = json.loads(data)
file.close()
I have gzipped JSON file, I exported from Auth0 and the content like this:
{"Id":"auth0|59bdb71ea714e32e8a6662fd","Nickname":"autoqa.krd0xj","Name":"autoqa.krd0xj#marketingg2.com","Email":"autoqa.krd0xj#marketingg2.com","Email Verified":false,"Connection":"Username-Password-Authentication","Created At":"2017-09-16T23:43:27.002Z","Updated At":"2017-09-16T23:43:27.490Z"},
{"Id":"auth0|18142559","Nickname":"moharvey","Name":"moharvey#ymail.com","Email":"moharvey#ymail.com","Connection":"Username-Password-Authentication","Created At":"2017-08-31T18:55:02.688Z","Updated At":"2017-08-31T19:01:36.994Z"}
I tried with this code:
import json
import gzip
with gzip.GzipFile("file.gz", 'r') as fin:
json_bytes = fin.read()
json_str = json_bytes.decode('utf-8')
data = json.loads(json_str)
print(data)
But this above code can not read this file.
How can I read all the data in this file? Could I have a suggestion?
I just found the solution. I intended to delete this question but I didn't see any same question like this.
table=[]
with gzip.GzipFile("file.gz", 'r') as fin:
for line in fin:
table.append(json.loads(line))
for row in table:
print(row)
I am trying to read a JSON file from Amazon S3 and its file size is about 2GB. When I use the method .read(), it gives me MemoryError.
Are there any solutions to this problem? Any help would do, thank you so much!
So, I found a way which worked for me efficiently. I had 1.60 GB file and need to load for processing.
s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)
# Now we collected data in the form of bytes array.
data_in_bytes = s3.Object(bucket_name, filename).get()['Body'].read()
#Decode it in 'utf-8' format
decoded_data = data_in_bytes.decode('utf-8')
#I used io module for creating a StringIO object.
stringio_data = io.StringIO(decoded_data)
#Now just read the StringIO obj line by line.
data = stringio_data.readlines()
#Its time to use json module now.
json_data = list(map(json.loads, data))
So json_data is the content of the file. I know there are lots of variable manipulations, but it worked for me.
Just iterate over the object.
s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)
fileObj = s3.get_object(Bucket='bucket_name', Key='key')
for row in fileObj["body"]:
line = row.decode('utf-8')
print(json.loads(line))
I just solved the problem. Here's the code. Hope it helps for future use!
s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)
obj = s3.get_object(Bucket='bucket_name', Key='key')
data = (line.decode('utf-8') for line in obj['Body'].iter_lines())
for row in file_content:
print(json.loads(row))
import json
import boto3
def lambda_handler(event, context):
s3 = boto3.resource('s3')
#reading all s3 bucket
for bucket in s3.buckets.all():
print(bucket.name)
#json_data = s3.Object("vkhan-s3-bucket, "config/sandbox/config.json").get()['Body'].read()
json_data=json.loads(s3.Object("vkhan-s3-bucket", "config/sandbox/config.json").get()['Body'].read().decode())
print(json_data)
return {
'statusCode': 200,
'body': json.dumps(json_data)
}