Get information out of large JSON file - json

I am new to JSON file and i'm strugeling to get any information out of it.
The structure of the JSON file is as following:
Json file Structure
Now what I need is to access the "batches", to get the data from each variable.
I did try codes (shown below) i've found to reach deeper keys but somehow i still didnt get any results.
1.
def safeget(dct, *keys):
for key in keys:
try:
dct = dct[key]
except KeyError:
return None
return dct
safeget(mydata,"batches")
def dict_depth(mydata):
if isinstance(mydata, dict):
return 1 + (max(map(dict_depth, mydata.values()))
if mydata else 0)
return 0
print(dict_depth(mydata))
The final goal then would be to create a loop to extract all the information but thats something for the future.
Any help is highly appreciated, also any recommendations how i should ask things here in the future to get the best answers!

As far as I understood, you simply want to extract all the data without any ordering?
Then this should work out:
# Python program to read
# json file
import json
# Opening JSON file
f = open('data.json',)
# returns JSON object as
# a dictionary
data = json.load(f)
# Iterating through the json
# list
for i in data['emp_details']:
print(i)
# Closing file
f.close()

Related

Django FileField saving empty file to database

I have a view that should generate a temporary JSON file and save this TempFile to the database. The content to this file, a dictionary named assets, is created using DRF using serializers. This file should be written to the database in a model called CollectionSnapshot.
class CollectionSnapshotCreate(generics.CreateAPIView):
permission_classes = [MemberPermission, ]
def create(self, request, *args, **kwargs):
collection = get_collection(request.data['collection_id'])
items = Item.objects.filter(collection=collection)
assets = {
"collection": CollectionSerializer(collection, many=False).data,
"items": ItemSerializer(items, many=True).data,
}
fp = tempfile.TemporaryFile(mode="w+")
json.dump(assets, fp)
fp.flush()
CollectionSnapshot.objects.create(
final=False,
created_by=request.user,
collection_id=collection.id,
file=ContentFile(fp.read(), name="assets.json")
)
fp.close()
return JsonResponse({}, status=200)
Printing assets returns the dictionary correctly. So I am getting the dictionary normally.
Following the solution below I do get the file saved to the db, but without any content:
copy file from one model to another
Seems that json.dump(assets, fp) is failing silently, or I am missing something to actually save the content to the temp file prior to sending it to the database.
The question is: why is the files in the db empty?
I found out that fp.read() throws content based on the current pointer in the file. At least, that is my understanding. So after I dump assets dict as json the to temp file, I have to bring back the cursor to the beggining using fp.seek(0). This way, when I call fp.read() inside file=ContentFile(fp.read(), ...) it actually reads all the content. It was giving me empty because there was nothing to read since the cursor was at the end of the file.
fp = tempfile.TemporaryFile(mode="w+")
json.dump(assets, fp)
fp.flush()
fp.seek(0) // important
CollectionSnapshot.objects.create // stays the same

Micropython: bytearray in json-file

i'm using micropython in the newest version. I also us an DS18b20 temperature sensor. An adress of theses sensor e.g. is "b'(b\xe5V\xb5\x01<:'". These is the string representation of an an bytearray. If i use this to save the adress in a json file, i run in some problems:
If i store directly "b'(b\xe5V\xb5\x01<:'" after reading the json-file there are no single backslahes, and i get b'(bxe5Vxb5x01<:' inside python
If i escape the backslashes like "b'(b\xe5V\xb5\x01<:'" i get double backslashes in python: b'(b\xe5V\xb5\x01<:'
How do i get an single backslash?
Thank you
You can't save bytes in JSON with micropython. As far as JSON is concerned that's just some string. Even if you got it to give you what you think you want (ie. single backslashes) it still wouldn't be bytes. So, you are faced with making some form of conversion, no-matter-what.
One idea is that you could convert it to an int, and then convert it back when you open it. Below is a simple example. Of course you don't have to have a class and staticmethods to do this. It just seemed like a good way to wrap it all into one, and not even need an instance of it hanging around. You can dump the entire class in some other file, import it in the necessary file, and just call it's methods as you need them.
import math, ujson, utime
class JSON(object):
#staticmethod
def convert(data:dict, convert_keys=None) -> dict:
if isinstance(convert_keys, (tuple, list)):
for key in convert_keys:
if isinstance(data[key], (bytes, bytearray)):
data[key] = int.from_bytes(data[key], 'big')
elif isinstance(data[key], int):
data[key] = data[key].to_bytes(1 if not data[key]else int(math.log(data[key], 256)) + 1, 'big')
return data
#staticmethod
def save(filename:str, data:dict, convert_keys=None) -> None:
#dump doesn't seem to like working directly with open
with open(filename, 'w') as doc:
ujson.dump(JSON.convert(data, convert_keys), doc)
#staticmethod
def open(filename:str, convert_keys=None) -> dict:
return JSON.convert(ujson.load(open(filename, 'r')), convert_keys)
#example with both styles of bytes for the sake of being thorough
json_data = dict(address=bytearray(b'\xFF\xEE\xDD\xCC'), data=b'\x00\x01\02\x03', date=utime.mktime(utime.localtime()))
keys = ['address', 'data'] #list of keys to convert to int/bytes
JSON.save('test.json', json_data, keys)
json_data = JSON.open('test.json', keys)
print(json_data) #{'date': 1621035727, 'data': b'\x00\x01\x02\x03', 'address': b'\xff\xee\xdd\xcc'}
You may also want to note that with this method you never actually touch any JSON. You put in a dict, you get out a dict. All the JSON is managed "behind the scenes". Regardless of all of this, I would say using struct would be a better option. You said JSON though so, my answer is about JSON.

Combining and loading Json content as python dictionary

Ihave 100 json files (file1- file 100)in my directory.All these 100 have the same fields and my aim is to load allcontents in one dictionary or dataframe.Basically the content of each file (ie file1- file100) willbe a row for my dictionary or dataframe
To test the code first,I wrote a script to load contents from one json file
file2 = open(r"\Users\sbz\file1.txt","w+")
import json
import traceback
def read_json_file(file2):
with open(file2, "r") as f:
try:
return json.load(f)
for combining i wrote this
def combine_dictionaries(dictionary_list):
my_dictionary = {}
for key in dictionary_list:
my_dictionary.update(key)
return my_dictionary
I am unable to load the file or display contents of dictionary using print(file2)
Is there something I am missing? Or is there better wayto loop in all 100 files and load them as a single dictionary?
If json.load isn't working, my guess is that your JSON file is probably formatted incorrectly. Try getting it to work with a simple file like:
{
"test": 0
}
After that works, then try loading one of your 100 files. I copy-pasted your read_json_file function and I'm able to see the data in my file: print(read_json_file("data.json"))
For looping through the files and combining them:
It doesn't look like your combine_dictionaries function is 100% there yet for what you want to do. update doesn't merge the dictionaries into rows as you want; it will replace the keys of one dictionary with the keys of another, and since each file has the same fields the resulting dictionary will be the last one in the list. Technically, a list of dictionaries is already a list of rows which is what you want and you can index the list based on row number, for example, list_of_dictionaries[0] will get the dictionary created from file1 if you fill the list in order of file1 to file100. If you want to go further than file numbers, you can put all of these dictionaries into another dictionary if you can generate a unique key for each dictionary:
def combine_dictionaries(dictionary_list):
my_dictionary = {}
for dictionary in dictionary_list:
my_dictionary[generate_key(dictionary)] = dictionary
return my_dictionary
Where generate_key is a function that will return a key unique to that dictionary. Now combined_dictionary.get(0) will get file1's dictionary, and combined_dictionary.get(0).get("somefield") will get the "somefield" data from file1.

how to write r.headers from different urls into one json?

I would like to crawl several urls, while using the requests library in python. I am scrutinizing the GET requests as well as the response headers. However, when crawling and getting the data from different urls I am facing the problem, that I don't know all 'key:values', which are coming in. Thus writing those data to a valid csv file is not really possible, in my point of view. Therefore I want to write the data into a json file.
The problem is similar to the following thread from 2014, but not the same:
Get a header with Python and convert in JSON (requests - urllib2 - json)
import requests, json
urls = ['http://www.example.com/', 'http://github.com']
with open('test.json', 'w') as f:
for url in urls:
r = requests.get(url)
rh = r.headers
f.write(json.dumps(dict(rh), sort_keys=True, separators=(',', ':'), indent=4))
I expect a json file, with the headers for each URL. I get a Json file with those data, but my IDE (PyCHarm) is showing an Error, which states out that
JSON standard allows only one top-level value. I have read the documentation:https://docs.python.org/3/library/json.html#repeated-names-within-an-object; but did not get it. Any hint would be appreciated.
EDIT: The only thing which is missing in the outcome is another comma. But where do I enter it and what command do I need for this?
You need to add it to an array and then finally do the json dump to a file. This will work.
urls = ['http://www.example.com/', 'http://github.com']
headers = []
for url in urls:
r = requests.get(url)
header_dict = dict(r.headers)
header_dict['source_url'] = url
headers.append(header_dict)
with open('test.json', 'w', encoding='utf-8') as f:
json.dump(headers, f, sort_keys=True, separators=(',', ':'), indent=4)
You still can write it to a csv:
import pandas as pd
df = pd.DataFrame(headers)
df.to_csv('test.csv')

Parse Twitter JSON Content in Ptython3

I searched for all similar questions and yet couldn't resolve below issue.
Here's my json file content:
https://objectsoftconsultants.s3.amazonaws.com/tweets.json
Code to get a particular element is as below:
import json
testsite_array = []
with open('tweets.json') as json_file:
testsite_array = json_file.readlines()
for text in testsite_array:
json_text = json.dumps(text)
resp = json.loads(json_text)
print(resp["created_at"])
Keep getting below error:
print(resp["created_at"])
TypeError: string indices must be integers
Thanks much for your time and help, well in advance.
I have to guess what you're trying to do and can only hope that this will help you:
with open('tweets.json') as f:
tweets = json.load(f)
print(tweets['created_at'])
It doesn't make sense to read a json file with readlines, because it is unlikely that each line of the file represents a complete json document.
Also I don't get why you're dumping the string only to load it again immediately.
Update:
Try this to parse your file line by line:
with open('tweets.json') as f:
lines = f.readlines()
for line in lines:
try:
tweet = json.loads(line)
print(tweet['created_at'])
except json.decoder.JSONDecodeError:
print('Error')
I want to point out however, that I do not recommend this approach. A file should contain only one json document. If the file does not contain a valid json document, the source for the file should be fixed.