Assigning Variables from JSON in Python - json

I've searched across dozens of answers for the last week but I haven't been able to find an example of what I'm trying to do, happy to be pointed to something that I've missed, and I'm new to Python so I apologise if this is something trivial.
I'm trying to read in a configuration from a JSON file so that I can abstract the configuration from the script itself.
I want to be able to assign the configuration value to a variable and perform an action on it, before moving on to the next category in a nested list, of which the categories could change/expand over time (music, pictures, etc).
The JSON file (library.json) currently looks like this:
{"media":{
"tv": [{
"source": "/tmp/tv",
"dest": "/tmp/dest"
}],
"movies": [{
"source": "/tmp/movies",
"dest": "/tmp/dest"
}]
}}
The relevant script looks like this:
import json
with open(libfile) as data_file:
data = json.load(data_file)
for k, v in (data['media']['tv']):
print (k, v)
What I was hoping to see as output was:
dest /tmp/dest
source /tmp/tv
What I am seeing is:
dest source
It feels like I'm missing something simple.

This works,
import json
with open('data.json') as json_file:
data = json.load(json_file)
for p in data['media']['tv']:
dst = (p['dest'])
src = (p['source'])
print (src, dst)

Something like this? Using f-strings and zip() that will aggregate elements.
import json
with open("dummy.json") as data_file:
data = json.load(data_file)
for i, j in data["media"].items():
print(i)
print("\n".join(f'{str(k)} {str(l)}' for k,l in list(zip(j[0].keys(), j[0].values()))))
print("\n")
Output:
tv
source /tmp/tv
dest /tmp/dest
movies
source /tmp/movies
dest /tmp/dest

The problem here is that data['media']['tv'] is actually a list of dictionaries.
You can tell because it looks like this: "movies": [{.. (Note the bracket [)
That means that instead of this:
for k, v in (data['media']['tv']):
print (k, v)
You should be doing this:
for dct in (data['media']['tv']):
for k, v in dct.items():
print(k, v)

Related

dumping list to JSON file creates list within a list [["x", "y","z"]], why?

I want to append multiple list items to a JSON file, but it creates a list within a list, and therefore I cannot acces the list from python. Since the code is overwriting existing data in the JSON file, there should not be any list there. I also tried it by having just an text in the file without brackets. It just creates a list within a list so [["x", "y","z"]] instead of ["x", "y","z"]
import json
filename = 'vocabulary.json'
print("Reading %s" % filename)
try:
with open(filename, "rt") as fp:
data = json.load(fp)
print("Data: %s" % data)#check
except IOError:
print("Could not read file, starting from scratch")
data = []
# Add some data
TEMPORARY_LIST = []
new_word = input("give new word: ")
TEMPORARY_LIST.append(new_word.split())
print(TEMPORARY_LIST)#check
data = TEMPORARY_LIST
print("Overwriting %s" % filename)
with open(filename, "wt") as fp:
json.dump(data, fp)
example and output with appending list with split words:
Reading vocabulary.json
Data: [['my', 'dads', 'house', 'is', 'nice']]
give new word: but my house is nicer
[['but', 'my', 'house', 'is', 'nicer']]
Overwriting vocabulary.json
So, if I understand what you are trying to accomplish correctly, it looks like you are trying to overwrite a list in a JSON file with a new list created from user input. For easiest data manipulation, set up your JSON file in dictionary form:
{
"words": [
"my",
"dad's",
"house",
"is",
"nice"
]
}
You should then set up functions to separate your functionality to make it more manageable:
def load_json(filename):
with open(filename, "r") as f:
return json.load(f)
Now, we can use those functions to load the JSON, access the words list, and overwrite it with the new word.
data = load_json("vocabulary.json")
new_word = input("Give new word: ").split()
data["words"] = new_word
write_json("vocabulary.json", data)
If the user inputs "but my house is nicer", the JSON file will look like this:
{
"words": [
"but",
"my",
"house",
"is",
"nicer"
]
}
Edit
Okay, I have a few suggestions to make before I get into solving the issue. Firstly, it's great that you have delegated much of the functionality of the program over to respective functions. However, using global variables is generally discouraged because it makes things extremely difficult to debug as any of the functions that use that variable could have mutated it by accident. To fix this, use method parameters and pass around the data accordingly. With small programs like this, you can think of the main() method as the point in which all data comes to and from. This means that the main() function will pass data to other functions and receive new or edited data back. One final recommendation, you should only be using all capital letters for variable names if they are going to be constant. For example, PI = 3.14159 is a constant, so it is conventional to make "pi" all caps.
Without using global, main() will look much cleaner:
def main():
choice = input("Do you want to start or manage the list? (start/manage)")
if choice == "start":
data = load_json()
words = data["words"]
dictee(words)
elif choice == "manage":
manage_list()
You can use the load_json() function from earlier (notice that I deleted write_json(), more on that later) if the user chooses to start the game. If the user chooses to manage the file, we can write something like this:
def manage_list():
choice = input("Do you want to add or clear the list? (add/clear)")
if choice == "add":
words_to_add = get_new_words()
add_words("vocabulary.json", words_to_add)
elif choice == "clear":
clear_words("vocabulary.json")
We get the user input first and then we can call two other functions, add_words() and clear_words():
def add_words(filename, words):
with open(filename, "r+") as f:
data = json.load(f)
data["words"].extend(words)
f.seek(0)
json.dump(data, f, indent=4)
def clear_words(filename):
with open(filename, "w+") as f:
data = {"words":[]}
json.dump(data, f, indent=4)
I did not utilize the load_json() function in the two functions above. My reasoning for this is because it would call for opening the file more times than needed, which would hurt performance. Furthermore, in these two functions, we already need to open the file, so it is okayt to load the JSON data here because it can be done with only one line: data = json.load(f). You may also notice that in add_words(), the file mode is "r+". This is the basic mode for reading and writing. "w+" is used in clear_words(), because "w+" not only opens the file for reading and writing, it overwrites the file if the file exists (that is also why we don't need to load the JSON data in clear_words()). Because we have these two functions for writing and/or overwriting data, we don't need the write_json() function that I had initially suggested.
We can then add to the list like so:
>>> Do you want to start or manage the list? (start/manage)manage
>>> Do you want to add or clear the list? (add/clear)add
>>> Please enter the words you want to add, separated by spaces: these are new words
And the JSON file becomes:
{
"words": [
"but",
"my",
"house",
"is",
"nicer",
"these",
"are",
"new",
"words"
]
}
We can then clear the list like so:
>>> Do you want to start or manage the list? (start/manage)manage
>>> Do you want to add or clear the list? (add/clear)clear
And the JSON file becomes:
{
"words": []
}
Great! Now, we implemented the ability for the user to manage the list. Let's move on to creating the functionality for the game: dictee()
You mentioned that you want to randomly select an item from a list and remove it from that list so it doesn't get asked twice. There are a multitude of ways you can accomplish this. For example, you could use random.shuffle:
def dictee(words):
correct = 0
incorrect = 0
random.shuffle(words)
for word in words:
# ask word
# evaluate response
# increment correct/incorrect
# ask if you want to play again
pass
random.shuffle randomly shuffles the list around. Then, you can iterate throught the list using for word in words: and start the game. You don't necessarily need to use random.choice here because when using random.shuffle and iterating through it, you are essentially selecting random values.
I hope this helped illustrate how powerful functions and function parameters are. They not only help you separate your code, but also make it easier to manage, understand, and write cleaner code.

Convert Multiple JSON Objects to JSON Array

I have generated a JSON file from data source which is of the format.
{}{}{}
I wish to convert this format to comma separated JSON Array as. [{},{},{}].
End goal is to push the JSON data [{},{},{}] to MongoDB.
My pythoin solution (although naive) looks something like this:
def CreateJSONArrayFile(filename):
print('Opening file with JSON data')
with open(filename) as data_file:
raw_data = data_file.read()
tweaked_data = raw_data.replace('}{', '}^|{')
split_data = tweaked_data.split('^|')
outfile = open('split_data.json', 'w')
outfile.write('[')
for item in split_data:
outfile.write("%s," % item)
outfile.write(']')
print('split_data.json Created with JSON Array')
The above code is giving me wrong results.
Can you please help me optimize the solution? Please let me know if you need more details from my end.
I'm with davedwards on this one, but if not an option -- I think this gets you what you are after.
myJson = """{"This": "is", "a": "test"} {"Of": "The", "Emergency":"Broadcast"}"""
myJson = myJson.replace("} {", "}###{")
new_list = myJson.split('###')
print(new_list)
yields:
['{"This": "is", "a": "test"}', '{"Of": "The", "Emergency":"Broadcast"}']
Not saying it is the most elegant way : )

How to use ijson/other to parse this large JSON file?

I have this massive json file (8gb), and I run out of memory when trying to read it in to Python. How would I implement a similar procedure using ijson or some other library that is more efficient with large json files?
import pandas as pd
#There are (say) 1m objects - each is its json object - within in this file.
with open('my_file.json') as json_file:
data = json_file.readlines()
#So I take a list of these json objects
list_of_objs = [obj for obj in data]
#But I only want about 200 of the json objects
desired_data = [obj for obj in list_of_objs if object['feature']=="desired_feature"]
How would I implement this using ijson or something similar? Is there a way I can extract the objects I want without reading in the whole JSON file?
The file is a list of objects like:
{
"review_id": "zdSx_SD6obEhz9VrW9uAWA",
"user_id": "Ha3iJu77CxlrFm-vQRs_8g",
"business_id": "tnhfDv5Il8EaGSXZGiuQGg",
"stars": 4,
"date": "2016-03-09",
"text": "Great place to hang out after work: the prices are decent, and the ambience is fun. It's a bit loud, but very lively. The staff is friendly, and the food is good. They have a good selection of drinks.",
"useful": 0,
"funny": 0,
}
The file is a list of objects
This is a little ambiguous. Looking at your code snippet it looks like your file contains separate JSON object on each line. Which is not the same as the actual JSON array that starts with [, ends with ] and has , between items.
In the case of a json-per-line file it's as easy as:
import json
from itertools import islice
with(open(filename)) as f:
objects = (json.loads(line) for line in f)
objects = islice(objects, 200)
Note the differences:
you don't need .readlines(), the file object itself is an iterable that yields individual lines
parentheses (..) instead of brackets [..] in (... for line in f) create a lazy generator expression instead of a Python list in memory with all the lines
islice(objects, 200) will give you the first 200 items without iterating further. If objects would've been a list you could just do objects[:200]
Now, if your file is actually a JSON array then you indeed need ijson:
import ijson # or choose a faster backend if needed
from itertools import islice
with open(filename) as f:
objects = ijson.items(f, 'item')
objects = islice(objects, 200)
ijson.items returns a lazy iterator over a parsed array. The 'item' in the second parameter means "each item in a top-level array".
The problem is that not all JSON comes nicely formatted and you cannot rely on line-by-line parsing to extract your objects.
I understood your "acceptance criteria" as "want to collect only those JSON objects whose specified keys contain specified values". For example, only collecting objects about a person if that person's name is "Bob". The following function will provide a list of all objects that fit your criteria. Parsing is done character by character (something that would be much more efficient in C, but Python is still pretty good). This should be more robust because it doesn't care about newlines, formatting etc. I tested this on both formatted and unformatted JSON with 1,000,000 objects.
import json
def parse_out_objects(file, feature, desired_value):
with open(file) as f:
compose_object_flag = False
ignore_characters_flag = False
object_string = ''
selected_objects = []
json_object = None
while True:
c = f.read(1)
if c == '"':
ignore_characters_flag = not ignore_characters_flag
if c == '{' and ignore_characters_flag == False:
compose_object_flag = True
if c == '}' and compose_object_flag == True and ignore_characters_flag == False:
compose_object_flag = False
object_string = object_string + '}'
json_object = json.loads(object_string)
if json_object[feature] == desired_value:
selected_objects.append(json_object)
object_string = ''
if compose_object_flag == True:
object_string = object_string + c
if not c:
break
return selected_objects

Python3: JSON to CSV

I have a JSON dict in Python which I would like to parse into a CSV, my data and code looks like this:
import csv
import json
x = {
"success": 1,
"return": {
"variable_id": {
"var1": "val1",
"var2": "val2"
}...
f = csv.writer(open("foo.csv", "w", newline=''))
for x in x:
f.writerow([x["success"],
'--variable value--',
x["return"]["variable_id"]["var1"],
x["return"]["variable_id"]["var2"])
However, since variable_id's value is going to change I don't know how to refer to in the code. Apologies if this is trivial but I guess I lack the terminology to find the solution.
You can use the * (unpack) operator to do this, assuming only the values in your variable_id matter :
f.writerow([x["success"],
'--variable value--',
*[val for variable_id in x['return'].values() for val in variable_id.values()])
The unpack operator essentially takes everything in x["return"]["variable_id"].values() and appends it in the list you're creating as input for writerow.
EDIT this should now work if you don't know how to referencevariable_id. This will work best if you have several variable_ids in x['return'].
If you only have one variable_id, then you can also try this :
f.writerow([x["success"],
'--variable value--',
*list(x['return'].values())[0].values()])
Or
f.writerow([x["success"],
'--variable value--',
*next(iter(x['return'].values())).values()])
You can get variable_id's value using x['success']['return'].keys[0].

Separate multiple JSON data in R

I am newbie of R and working on the below JSON file (snippet of head and relevant code example).
{"mdsDat":{"x":[0.098453,-0.19334,-0.23836,-0.28512,0.010195,0.14132,-0.026636,-0.17141,
0.082936,-0.030503,0.22893,0.097832,0.19978,0.048286,0.050141,0.026101,-0.10637,0.040702,
0.013531,0.013531],"y":[-0.21144,-0.25048,0.14525,-0.06405,0.16668,-0.066238,-0.23403,
0.17033,-0.037128,-0.019674,0.0089501,0.0069049,0.10143,-0.14445,0.052727,0.15911,0.049328,
0.074852,0.045969,0.045969],"topics":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
"Freq":[16.358,13.397,12.979,10.383,7.5134,7.16,6.1765,4.9584,4.6035,3.4624,3.4249,3.0709,
1.8512,1.8512,1.4977,0.90723,0.23895,0.16034,0.0031352,0.0031352],
"cluster":[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]},
"tinfo":{"Term":["equation","equations","mathematics","beauty","mathematical","people",
"beautiful","explain","world","time","understand","science","things","meaning","language",
"symbols","simple","life","nature","interesting","art","agree","movie","find","numbers",
"explore","mass","relationship","video","scientists","agree","scientists","amazing","learn",
"apply","artistic","common","fear","beautiful","mathematics","study","mathematical","science",
"meaning","physics","gravity","exchange","math","world","future","explained","sense",
"process","words","equations","experience","move","faster","eyes","fall","nature","power",
"human","exam","things","answer","people","world","ways","truth","equations","video",
"balance","painting","space"
...
"token.table":{"Term":["0","1","2","2","abstract","abstract","addition","admire","agree",
"amazing","answer","answer","apple","application","applied","applied","apply","art","artist",
"artist","artistic","arts","balance","balance","balance","beautiful","beautiful","beautiful",
"beautiful","beauty","beauty","bring","bring","bunch","bunch","calculate","calculation",
"collings","collings","collings","common","complex","complex","complex","contact","curiosity",
"curiosity","daily","difficult","discover","documentary","documentary","documentary","earth",
"earth","einstein","energy","energy","english","english","enjoy","enjoy","enjoy","equation",
"equation","equation","equation","equation","equations","equations","equations","equations",
"exam","exam","examination","examination","exchange","exchange","exchange","experience",
"experience","explain","explained","explained","explore","eyes","eyes","fact","fall","famous",
"famous","faster","faster","fear","feel","film","film","find","find","force","formula","formula","found",
...
"work","world","world","world","worlds","years","years"],"Topic":[8,5,11,13,8,10,5,4,1,1,2,
15,9,10,9,12,1,3,4,9,1,5,2,4,10,1,4,7,14,2,6,3,15,10,15,
...
,16,3,7,2,14,2,5,1,8,4,9,10,15,1,2,14,9,11,13],"Freq":[0.97036,0.9702,0.75081,0.25027,0.22141,
0.77494,0.97584,0.96609,0.99493,0.98083,0.73954,0.24651,0.99013,0.
...
In the situation of the project, I have created a variable of getJSONfield as below,
getJSONfield <- json %>%
spread_values(jsonList = jstring("token.table")) %>%
select(jsonList)
Also, it returns a JSON list something like this
jsonNodes
1 list(Term = list("0", "1", "1", "2", "Data of JSON"),
Topic = list(9, 1, 10,"Data of JSON"),
Freq = list(0.99834, "Data of JSON"))
And, I have to separate the multiple variables (i.e. Term, Topic and Freq) as head and edge of network diagram. Something that I would like to use the JSON data for:
jsonNode <-lapply(json$**topic**, header=T, as.is = T)
jsonTermsLinkS <- lapply(json$**term**, header=T, as.is=T)
jsonTermsLinkE <- lapply(json$**freq**, header=T, as.is=T)
But, first, I need to separate or call them successfully. Can anyone have any idea or advise on this?
Great thanks if someone help me!