Grabbing Json with Python - json

I have some JSON i am trying grab and loop through, my initial code is below accessing the following JSON. https://textdoc.co/lJAxGYXgcZ9V4NUB
from urllib.request import urlopen
import json
url = "<URL REMOVED"
response = urlopen(url)
data_json = json.loads(response.read())
#loop and grab data for G
for i in data_json[0]["data"]:
print(f'Date: {i[0]} , Value : {i[1]}')
#loop and grab data for I
for i in data_json[1]["data"]:
print(f'Date: {i[0]} , Value : {i[1]}')
#loop and grab data for E
for i in data_json[2]["data"]:
print(f'Date: {i[0]} , Value : {i[1]}')
#loop and grab data for C
for i in data_json[3]["data"]:
print(f'Date: {i[0]} , Value : {i[1]}')
#loop and grab data for SC
for i in data_json[4]["data"]:
print(f'Date: {i[0]} , Value : {i[1]}')
The issue i have is i'm getting TypeError on the for loop saying "list indices must be integers or slices, not str".
I'm a little confused by this as i thought json.loads would deserialise it into an object that i could iterate over?

I can see the issue now, i was treating the 0 or 1 in the JSON as strings, when in actual fact i shouldn't have been. i removed the double quotes and it now works for the first line....
As i have answered this myself i will update the questions to reflect the updated problem.

Related

I need to parse through a dictionary and update its value

I have a problem statement in which I have to read a JSON file. The JSON file when converted to the dictionary using json.loads() has 12 keys.
One of the key('body') has the value which is of type string. When converting this string again to dictionary using json.loads() results in a list of dictionaries. The length of this list of dictionaries is 1000 while each dictionary within has a length of 24.
I need to increase the number of dictionaries so that my list of dictionaries has a new length of 2000. Each dictionary within has a length of 24 has a key('id') that needs to be unique.
Now, this is my code snippet where I'm trying to update the value of the dictionary if my key value is 'id':
val = 1
for each_dict in list_of_dictionary:
for k,v in each_dict.items():
if k == 'id':
v = val
print("value is ",v)
val = val+1
O/P
value is 1
value is 2
and so on...
Now, when I am trying to view the updated value again, I can see the previous values only.
This is the code snippet:
for each_dict in list_of_dictionary:
for k,v in each_dict.items():
if k == 'id':
print("value is ",v)
O/P
value is 11123
value is 11128
and so on...
Whereas I want the output as above since I have updated the values already.
Got the answer. Actually in the first for-in loop, I realized that I forgot to update the dictionary and that's why in the second loop, I couldn't see the updated data. So the updated code for the first for loop would be :
val = 1
for each_dict in list_of_dictionary:
for k,v in each_dict.items():
if k == 'id':
temp = {k, val}
each_dict.update(temp)
val = val+1
Now I'm able to see the updated data in the second loop.

How to check for specific field values based on some condition while converting csv file to json format

Below is the code to convert csv file to json format in python.
I have two fields 'recommendation' and 'rating'. Based on the recommendation value I need to set the value for rating field like if recommendation is 1 then rating =1 and vice versa. With the answer I got I'm getting output for only one record entry instead of getting all the records. I think it's overriding. Do I need to create separate list for that and append each record entry to the list to get the output for all records.
here's the updated code:
def main(input_file):
csv_rows = []
with open(input_file, 'r') as csvfile:
reader = csv.DictReader(csvfile, delimiter='|')
title = reader.fieldnames
for row in reader:
entry = OrderedDict()
for field in title:
entry[field] = row[field]
[c.update({'RATING': c['RECOMMENDATIONS']}) for c in reader]
csv_rows.append(entry)
with open(json_file, 'w') as f:
json.dump(csv_rows, f, sort_keys=True, indent=4, ensure_ascii=False)
f.write('\n')
I want to create the nested format like the below:
"rating": {
"user_rating": {
"rating": 1
},
"recommended": {
"rating": 1
}
After you've read the file in, using the csv.DictReader, you'll have a list of dicts. Since you want to set the values now, it's a simple dict manipulation. There are several ways, of which one is:
[c.update({'rating': c['recommendation']}) for c in read_csvDictReader]
Hope that helps.

Compare JSON List with JSON Dict; only by Keys

I have a List and a dict which need to be compared just by the Keys.
The List is created by Hand for define which Vars will be used in the following process. The List will be used for writing the result in an CSV List = Header.
Some Devices doesn't support all Vars and won't send them Back in the response.
base=["General.IpAddress", "General.ActualHostname", "General.UserLabel1", "General.UserLabel2"]
response_diff='{"general.actualhostname":"ST_38_217","general.ipaddress":"192.168.38.217"}'
As you see the General.UserLabel1 and General.UserLabel2is missing in the response. (There can be missing more vars)
So i have to add to the response the missing Vars with NULL Value.
import json
from pprint import pprint
def compare_ListWithDict(list_base,dict_ref):
#temp dict
dict_base_tmp = {}
dict_ref = dict_ref.lower()
#run thru List an generate an dict with Value 0 for every Key
for item in list_base:
dict_base_tmp[item.lower()] = 0
#load dict_ref as JSON
dict_ref_json=json.loads(dict_ref)
#get len
dict_base_len= len(dict_base_tmp)
dict_ref_len= len(dict_ref_json)
#if lens are equal return the dict_ref (response from Device)
if dict_base_len == dict_ref_len:
return dict_ref_json
else:
#run thru list_base and search for keys they AREN'T in dict_ref_json
#if missing key is found, add the key with Value NULL to the dict_ref_json
for item in list_base:
if not item.lower() in dict_ref_json.keys():
item_lower = item.lower()
dict_ref_json[item_lower]='null'
return dict_ref_json
base=["General.IpAddress", "General.ActualHostname", "General.UserLabel1", "General.UserLabel2"]
response_diff='{"general.actualhostname":"ST_38_217","general.ipaddress":"192.168.38.217"}'
response_equal='{"general.actualhostname":"ST_38_217","general.ipaddress":"192.168.38.217","general.userlabel1":"First Label", "general.userlabel2":"Second Label"}'
Results:
pprint(compare_ListWithDict(base,response_equal))
#base and response are equal by the keys
{'general.actualhostname': 'st_38_217',
'general.ipaddress': '192.168.38.217',
'general.userlabel1': 'first label',
'general.userlabel2': 'second label'}
pprint(compare_ListWithDict(base,response_diff))
#base and response is different by the keys
{'general.actualhostname': 'st_38_217',
'general.ipaddress': '192.168.38.217',
'general.userlabel1': 'null',
'general.userlabel2': 'null'}

Convert Multiple JSON Objects to JSON Array

I have generated a JSON file from data source which is of the format.
{}{}{}
I wish to convert this format to comma separated JSON Array as. [{},{},{}].
End goal is to push the JSON data [{},{},{}] to MongoDB.
My pythoin solution (although naive) looks something like this:
def CreateJSONArrayFile(filename):
print('Opening file with JSON data')
with open(filename) as data_file:
raw_data = data_file.read()
tweaked_data = raw_data.replace('}{', '}^|{')
split_data = tweaked_data.split('^|')
outfile = open('split_data.json', 'w')
outfile.write('[')
for item in split_data:
outfile.write("%s," % item)
outfile.write(']')
print('split_data.json Created with JSON Array')
The above code is giving me wrong results.
Can you please help me optimize the solution? Please let me know if you need more details from my end.
I'm with davedwards on this one, but if not an option -- I think this gets you what you are after.
myJson = """{"This": "is", "a": "test"} {"Of": "The", "Emergency":"Broadcast"}"""
myJson = myJson.replace("} {", "}###{")
new_list = myJson.split('###')
print(new_list)
yields:
['{"This": "is", "a": "test"}', '{"Of": "The", "Emergency":"Broadcast"}']
Not saying it is the most elegant way : )

How to use ijson/other to parse this large JSON file?

I have this massive json file (8gb), and I run out of memory when trying to read it in to Python. How would I implement a similar procedure using ijson or some other library that is more efficient with large json files?
import pandas as pd
#There are (say) 1m objects - each is its json object - within in this file.
with open('my_file.json') as json_file:
data = json_file.readlines()
#So I take a list of these json objects
list_of_objs = [obj for obj in data]
#But I only want about 200 of the json objects
desired_data = [obj for obj in list_of_objs if object['feature']=="desired_feature"]
How would I implement this using ijson or something similar? Is there a way I can extract the objects I want without reading in the whole JSON file?
The file is a list of objects like:
{
"review_id": "zdSx_SD6obEhz9VrW9uAWA",
"user_id": "Ha3iJu77CxlrFm-vQRs_8g",
"business_id": "tnhfDv5Il8EaGSXZGiuQGg",
"stars": 4,
"date": "2016-03-09",
"text": "Great place to hang out after work: the prices are decent, and the ambience is fun. It's a bit loud, but very lively. The staff is friendly, and the food is good. They have a good selection of drinks.",
"useful": 0,
"funny": 0,
}
The file is a list of objects
This is a little ambiguous. Looking at your code snippet it looks like your file contains separate JSON object on each line. Which is not the same as the actual JSON array that starts with [, ends with ] and has , between items.
In the case of a json-per-line file it's as easy as:
import json
from itertools import islice
with(open(filename)) as f:
objects = (json.loads(line) for line in f)
objects = islice(objects, 200)
Note the differences:
you don't need .readlines(), the file object itself is an iterable that yields individual lines
parentheses (..) instead of brackets [..] in (... for line in f) create a lazy generator expression instead of a Python list in memory with all the lines
islice(objects, 200) will give you the first 200 items without iterating further. If objects would've been a list you could just do objects[:200]
Now, if your file is actually a JSON array then you indeed need ijson:
import ijson # or choose a faster backend if needed
from itertools import islice
with open(filename) as f:
objects = ijson.items(f, 'item')
objects = islice(objects, 200)
ijson.items returns a lazy iterator over a parsed array. The 'item' in the second parameter means "each item in a top-level array".
The problem is that not all JSON comes nicely formatted and you cannot rely on line-by-line parsing to extract your objects.
I understood your "acceptance criteria" as "want to collect only those JSON objects whose specified keys contain specified values". For example, only collecting objects about a person if that person's name is "Bob". The following function will provide a list of all objects that fit your criteria. Parsing is done character by character (something that would be much more efficient in C, but Python is still pretty good). This should be more robust because it doesn't care about newlines, formatting etc. I tested this on both formatted and unformatted JSON with 1,000,000 objects.
import json
def parse_out_objects(file, feature, desired_value):
with open(file) as f:
compose_object_flag = False
ignore_characters_flag = False
object_string = ''
selected_objects = []
json_object = None
while True:
c = f.read(1)
if c == '"':
ignore_characters_flag = not ignore_characters_flag
if c == '{' and ignore_characters_flag == False:
compose_object_flag = True
if c == '}' and compose_object_flag == True and ignore_characters_flag == False:
compose_object_flag = False
object_string = object_string + '}'
json_object = json.loads(object_string)
if json_object[feature] == desired_value:
selected_objects.append(json_object)
object_string = ''
if compose_object_flag == True:
object_string = object_string + c
if not c:
break
return selected_objects