Convert Multiple JSON Objects to JSON Array - json

I have generated a JSON file from data source which is of the format.
{}{}{}
I wish to convert this format to comma separated JSON Array as. [{},{},{}].
End goal is to push the JSON data [{},{},{}] to MongoDB.
My pythoin solution (although naive) looks something like this:
def CreateJSONArrayFile(filename):
print('Opening file with JSON data')
with open(filename) as data_file:
raw_data = data_file.read()
tweaked_data = raw_data.replace('}{', '}^|{')
split_data = tweaked_data.split('^|')
outfile = open('split_data.json', 'w')
outfile.write('[')
for item in split_data:
outfile.write("%s," % item)
outfile.write(']')
print('split_data.json Created with JSON Array')
The above code is giving me wrong results.
Can you please help me optimize the solution? Please let me know if you need more details from my end.

I'm with davedwards on this one, but if not an option -- I think this gets you what you are after.
myJson = """{"This": "is", "a": "test"} {"Of": "The", "Emergency":"Broadcast"}"""
myJson = myJson.replace("} {", "}###{")
new_list = myJson.split('###')
print(new_list)
yields:
['{"This": "is", "a": "test"}', '{"Of": "The", "Emergency":"Broadcast"}']
Not saying it is the most elegant way : )

Related

How to check for specific field values based on some condition while converting csv file to json format

Below is the code to convert csv file to json format in python.
I have two fields 'recommendation' and 'rating'. Based on the recommendation value I need to set the value for rating field like if recommendation is 1 then rating =1 and vice versa. With the answer I got I'm getting output for only one record entry instead of getting all the records. I think it's overriding. Do I need to create separate list for that and append each record entry to the list to get the output for all records.
here's the updated code:
def main(input_file):
csv_rows = []
with open(input_file, 'r') as csvfile:
reader = csv.DictReader(csvfile, delimiter='|')
title = reader.fieldnames
for row in reader:
entry = OrderedDict()
for field in title:
entry[field] = row[field]
[c.update({'RATING': c['RECOMMENDATIONS']}) for c in reader]
csv_rows.append(entry)
with open(json_file, 'w') as f:
json.dump(csv_rows, f, sort_keys=True, indent=4, ensure_ascii=False)
f.write('\n')
I want to create the nested format like the below:
"rating": {
"user_rating": {
"rating": 1
},
"recommended": {
"rating": 1
}
After you've read the file in, using the csv.DictReader, you'll have a list of dicts. Since you want to set the values now, it's a simple dict manipulation. There are several ways, of which one is:
[c.update({'rating': c['recommendation']}) for c in read_csvDictReader]
Hope that helps.

Assigning Variables from JSON in Python

I've searched across dozens of answers for the last week but I haven't been able to find an example of what I'm trying to do, happy to be pointed to something that I've missed, and I'm new to Python so I apologise if this is something trivial.
I'm trying to read in a configuration from a JSON file so that I can abstract the configuration from the script itself.
I want to be able to assign the configuration value to a variable and perform an action on it, before moving on to the next category in a nested list, of which the categories could change/expand over time (music, pictures, etc).
The JSON file (library.json) currently looks like this:
{"media":{
"tv": [{
"source": "/tmp/tv",
"dest": "/tmp/dest"
}],
"movies": [{
"source": "/tmp/movies",
"dest": "/tmp/dest"
}]
}}
The relevant script looks like this:
import json
with open(libfile) as data_file:
data = json.load(data_file)
for k, v in (data['media']['tv']):
print (k, v)
What I was hoping to see as output was:
dest /tmp/dest
source /tmp/tv
What I am seeing is:
dest source
It feels like I'm missing something simple.
This works,
import json
with open('data.json') as json_file:
data = json.load(json_file)
for p in data['media']['tv']:
dst = (p['dest'])
src = (p['source'])
print (src, dst)
Something like this? Using f-strings and zip() that will aggregate elements.
import json
with open("dummy.json") as data_file:
data = json.load(data_file)
for i, j in data["media"].items():
print(i)
print("\n".join(f'{str(k)} {str(l)}' for k,l in list(zip(j[0].keys(), j[0].values()))))
print("\n")
Output:
tv
source /tmp/tv
dest /tmp/dest
movies
source /tmp/movies
dest /tmp/dest
The problem here is that data['media']['tv'] is actually a list of dictionaries.
You can tell because it looks like this: "movies": [{.. (Note the bracket [)
That means that instead of this:
for k, v in (data['media']['tv']):
print (k, v)
You should be doing this:
for dct in (data['media']['tv']):
for k, v in dct.items():
print(k, v)

How to use ijson/other to parse this large JSON file?

I have this massive json file (8gb), and I run out of memory when trying to read it in to Python. How would I implement a similar procedure using ijson or some other library that is more efficient with large json files?
import pandas as pd
#There are (say) 1m objects - each is its json object - within in this file.
with open('my_file.json') as json_file:
data = json_file.readlines()
#So I take a list of these json objects
list_of_objs = [obj for obj in data]
#But I only want about 200 of the json objects
desired_data = [obj for obj in list_of_objs if object['feature']=="desired_feature"]
How would I implement this using ijson or something similar? Is there a way I can extract the objects I want without reading in the whole JSON file?
The file is a list of objects like:
{
"review_id": "zdSx_SD6obEhz9VrW9uAWA",
"user_id": "Ha3iJu77CxlrFm-vQRs_8g",
"business_id": "tnhfDv5Il8EaGSXZGiuQGg",
"stars": 4,
"date": "2016-03-09",
"text": "Great place to hang out after work: the prices are decent, and the ambience is fun. It's a bit loud, but very lively. The staff is friendly, and the food is good. They have a good selection of drinks.",
"useful": 0,
"funny": 0,
}
The file is a list of objects
This is a little ambiguous. Looking at your code snippet it looks like your file contains separate JSON object on each line. Which is not the same as the actual JSON array that starts with [, ends with ] and has , between items.
In the case of a json-per-line file it's as easy as:
import json
from itertools import islice
with(open(filename)) as f:
objects = (json.loads(line) for line in f)
objects = islice(objects, 200)
Note the differences:
you don't need .readlines(), the file object itself is an iterable that yields individual lines
parentheses (..) instead of brackets [..] in (... for line in f) create a lazy generator expression instead of a Python list in memory with all the lines
islice(objects, 200) will give you the first 200 items without iterating further. If objects would've been a list you could just do objects[:200]
Now, if your file is actually a JSON array then you indeed need ijson:
import ijson # or choose a faster backend if needed
from itertools import islice
with open(filename) as f:
objects = ijson.items(f, 'item')
objects = islice(objects, 200)
ijson.items returns a lazy iterator over a parsed array. The 'item' in the second parameter means "each item in a top-level array".
The problem is that not all JSON comes nicely formatted and you cannot rely on line-by-line parsing to extract your objects.
I understood your "acceptance criteria" as "want to collect only those JSON objects whose specified keys contain specified values". For example, only collecting objects about a person if that person's name is "Bob". The following function will provide a list of all objects that fit your criteria. Parsing is done character by character (something that would be much more efficient in C, but Python is still pretty good). This should be more robust because it doesn't care about newlines, formatting etc. I tested this on both formatted and unformatted JSON with 1,000,000 objects.
import json
def parse_out_objects(file, feature, desired_value):
with open(file) as f:
compose_object_flag = False
ignore_characters_flag = False
object_string = ''
selected_objects = []
json_object = None
while True:
c = f.read(1)
if c == '"':
ignore_characters_flag = not ignore_characters_flag
if c == '{' and ignore_characters_flag == False:
compose_object_flag = True
if c == '}' and compose_object_flag == True and ignore_characters_flag == False:
compose_object_flag = False
object_string = object_string + '}'
json_object = json.loads(object_string)
if json_object[feature] == desired_value:
selected_objects.append(json_object)
object_string = ''
if compose_object_flag == True:
object_string = object_string + c
if not c:
break
return selected_objects

Converting mongoengine objects to JSON

i tried to fetch data from mongodb using mongoengine with flask. query is work perfect the problem is when i convert query result into json its show only fields name.
here is my code
view.py
from model import Users
result = Users.objects()
print(dumps(result))
model.py
class Users(DynamicDocument):
meta = {'collection' : 'users'}
user_name = StringField()
phone = StringField()
output
[["id", "user_name", "phone"], ["id", "user_name", "phone"]]
why its show only fields name ?
Your query returns a queryset. Use the .to_json() method to convert it.
Depending on what you need from there, you may want to use something like json.loads() to get a python dictionary.
For example:
from model import Users
# This returns <class 'mongoengine.queryset.queryset.QuerySet'>
q_set = Users.objects()
json_data = q_set.to_json()
# You might also find it useful to create python dictionaries
import json
dicts = json.loads(json_data)

How to convert Mnesia query results to a JSON'able list?

I am trying to use JSX to convert a list of tuples to a JSON object.
The list items are based on a record definition:
-record(player, {index, name, description}).
and looks like this:
[
{player,1,"John Doe","Hey there"},
{player,2,"Max Payne","I am here"}
]
The query function looks like this:
select_all() ->
SelectAllFunction =
fun() ->
qlc:eval(qlc:q(
[Player ||
Player <- mnesia:table(player)
]
))
end,
mnesia:transaction(SelectAllFunction).
What's the proper way to make it convertable to a JSON knowing that I have a schema of the record used and knowing the structure of tuples?
You'll have to convert the record into a term that jsx can encode to JSON correctly. Assuming you want an array of objects in the JSON for the list of player records, you'll have to either convert each player to a map or list of tuples. You'll also have to convert the strings to binaries or else jsx will encode it to a list of integers. Here's some sample code:
-record(player, {index, name, description}).
player_to_json_encodable(#player{index = Index, name = Name, description = Description}) ->
[{index, Index}, {name, list_to_binary(Name)}, {description, list_to_binary(Description)}].
go() ->
Players = [
{player, 1, "John Doe", "Hey there"},
% the following is just some sugar for a tuple like above
#player{index = 2, name = "Max Payne", description = "I am here"}
],
JSON = jsx:encode(lists:map(fun player_to_json_encodable/1, Players)),
io:format("~s~n", [JSON]).
Test:
1> r:go().
[{"index":1,"name":"John Doe","description":"Hey there"},{"index":2,"name":"Max Payne","description":"I am here"}]