Parsing nested JSON and collecting data in a list - json

I am trying to parse a nested JSON and trying to collect data into a list under some condition.
Input JSON as below:
[
{
"name": "Thomas",
"place": "USA",
"items": [
{"item_name":"This is a book shelf", "level":1},
{"item_name":"Introduction", "level":1},
{"item_name":"parts", "level":2},
{"item_name":"market place", "level":3},
{"item_name":"books", "level":1},
{"item_name":"pens", "level":1},
{"item_name":"pencils", "level":1}
],
"descriptions": [
{"item_name": "Books"}
]
},
{
"name": "Samy",
"place": "UK",
"items": [
{"item_name":"This is a cat house", "level":1},
{"item_name":"Introduction", "level":1},
{"item_name":"dog house", "level":3},
{"item_name":"cat house", "level":1},
{"item_name":"cat food", "level":2},
{"item_name":"cat name", "level":1},
{"item_name":"Samy", "level":2}
],
"descriptions": [
{"item_name": "cat"}
]
}
]
I am reading json as below:
with open('test.json', 'r', encoding='utf8') as fp:
data = json.load(fp)
for i in data:
if i['name'] == "Thomas":
#collect "item_name", values in a list (my_list) if "level":1
#my_list = []
Expected output:
my_list = ["This is a book shelf", "Introduction", "books", "pens", "pencils"]
Since it's a nested complex JSON, I am not able to collect the data into a list as I mentioned above. Please let me know no how to collect the data from the nested JSON.

Try:
import json
with open("test.json", "r", encoding="utf8") as fp:
data = json.load(fp)
my_list = [
i["item_name"]
for d in data
for i in d["items"]
if d["name"] == "Thomas" and i["level"] == 1
]
print(my_list)
This prints:
['This is a book shelf', 'Introduction', 'books', 'pens', 'pencils']
Or without list comprehension:
my_list = []
for d in data:
if d["name"] != "Thomas":
continue
for i in d["items"]:
if i["level"] == 1:
my_list.append(i["item_name"])
print(my_list)

Once we have the data we iterate over the outermost list of objects.
We check if the object has the name equals to "Thomas" if true then we apply filter method with a lambda function on items list with a condition of level == 1
This gives us a list of item objects who have level = 1
In order to extract the item_name we use a comprehension so the final value in the final_list will be as you have expected.
["This is a book shelf", "Introduction", "books", "pens", "pencils"]
import json
def get_final_list():
with open('test.json', 'r', encoding='utf8') as fp:
data = json.load(fp)
final_list = []
for obj in data:
if obj.get("name") == "Thomas":
x = list(filter(lambda item: item['level'] == 1, obj.get("items")))
final_list = final_list + x
final_list = [i.get("item_name") for i in final_list]
return final_list

Related

How to split a specific column data into a nested json data structure using python?

I began working with a csv data which has the following data:
I want to create a json data structure which looks like this:
{
"name": "Place",
"Details": [
{
"detail": "I",
"info": [
"Iran",
"Iraq"
]
},
{
"detail": "J",
"info": "Japan"
}
]
}
Below is my code but I m unable to split the second column as required:
import pandas as pd
path="/content/file.csv/"
data=pd.read_csv(path)
df=pd.DataFrame(data)
out=df.to_json(orient="records")
print(out)
Use custom function with splitting values in Details column:
def f(x):
out = []
splitted = x.split(',')
for x in splitted:
a, b = x.split('-')
c = b.split('/')
if len(c) == 1:
d = {'detail': a, 'info':c[0]}
else:
d = {'detail': a, 'info':c}
out.append(d)
return out
df['Details'] = df['Details'].apply(f)
print (df)
Name Details Verfied
0 Alphabet [{'detail': 'A', 'info': 'Apple'}, {'detail': ... Yes
1 Place [{'detail': 'I', 'info': ['Iran', 'Iraq']}, {'... Yes
out=df[['Name','Details']].to_json(orient="records")
print(out)
[{"Name":"Alphabet","Details":[{"detail":"A","info":"Apple"},
{"detail":"B","info":"Ball"}]},
{"Name":"Place","Details":[{"detail":"I","info":["Iran","Iraq"]},
{"detail":"J","info":"Japan"}]}]

Join nested JSON dataframe and another dataframe

I am trying to join a dataframe1 generated by the JSON with dataframe2 using the field order_id, then assign the "status" from dataframe2 to the "status" of dataframe1. Anyone knows how to do this. Many thanks for your help.
dataframe1
[{
"client_id": 1,
"name": "Test01",
"olist": [{
"order_id": 10000,
"order_dt_tm": "2012-12-01",
"status": "" <== use "status" from dataframe2 to populate this field
},
{
"order_id": 10000,
"order_dt_tm": "2012-12-01",
"status": ""
}
]
},
{
"client_id": 2,
"name": "Test02",
"olist": [{
"order_id": 10002,
"order_dt_tm": "2012-12-01",
"status": ""
},
{
"order_id": 10003,
"order_dt_tm": "2012-12-01",
"status": ""
}
]
}
]
dataframe2
order_id status
10002 "Delivered"
10001 "Ordered"
Here is your raw dataset as a json string:
d = """[{
"client_id": 1,
"name": "Test01",
"olist": [{
"order_id": 10000,
"order_dt_tm": "2012-12-01",
"status": ""
},
{
"order_id": 10000,
"order_dt_tm": "2012-12-01",
"status": ""
}
]
},
{
"client_id": 2,
"name": "Test02",
"olist": [{
"order_id": 10002,
"order_dt_tm": "2012-12-01",
"status": ""
},
{
"order_id": 10003,
"order_dt_tm": "2012-12-01",
"status": ""
}
]
}
]"""
Firstly, I would load it as json:
import json
data = json.loads(d)
Then, I would turn it into a Pandas dataframe, notice that I remove status field as it will be populated by the join step :
df1 = pd.json_normalize(data, 'olist')[['order_id', 'order_dt_tm']]
Then, from the second dataframe sample, I would do a left join using merge function:
data = {'order_id':[10002, 10001],'status':['Delivered', 'Ordered']}
df2 = pd.DataFrame(data)
result = df1.merge(df2, on='order_id', how='left')
Good luck
UPDATE
# JSON to Dataframe
df1 = pd.json_normalize(data)
# Sub JSON to dataframe
df1['sub_df'] = df1['olist'].apply(lambda x: pd.json_normalize(x).drop('status', axis=1))
# Build second dataframe
data2 = {'order_id':[10002, 10001],'status':['Delivered', 'Ordered']}
df2 = pd.DataFrame(data2)
# Populates status in sub dataframes
df1['sub_df'] = df1['sub_df'].apply(lambda x: x.merge(df2, on='order_id', how='left').fillna(''))
# Sub dataframes back to JSON
def back_to_json_str(df):
# turns a df back to string json
return str(df.to_json(orient="records", indent=4))
df1['olist'] = df1['sub_df'].apply(lambda x: back_to_json_str(x))
# Global DF back to JSON string
parsed = str(df1.drop('sub_df', axis=1).to_json(orient="records", indent=4))
parsed = parsed.replace(r'\n', '\n')
parsed = parsed.replace(r'\"', '\"')
# Print result
print(parsed)
UPDATE 2
here is a way to add index colum to a dataframe:
df1['index'] = [e for e in range(df1.shape[0])]
This is my code assigning title values from a dataframe back to the JSON object. The assignment operation takes a bit time if the number records in the JSON object is 100000. Anyone knows how to improve the performance of this code. Many thanks.
import json
import random
import pandas as pd
import pydash as _
data = [{"pid":1,"name":"Test1","title":""},{"pid":2,"name":"Test2","title":""}] # 5000 records
# dataframe1
df = pd.json_normalize(data)
# dataframe2
pid = [x for x in range(1, 5000)]
title_set = ["Boss", "CEO", "CFO", "PMO", "Team Lead"]
titles = [title_set[random.randrange(0, 5)] for x in range(1, 5000)]
df2 = pd.DataFrame({'pid': pid, 'title': titles})
#left join dataframe1 and dataframe2
df3 = df.merge(df2, on='pid', how='left')
#assign title values from dataframe back to the json object
for row in df3.iterrows():
idx = _.find_index(data, lambda x: x['pid'] == row[1]['pid'])
data[idx]['title'] = row[1]['title_y']
print(data)

How to loop different types of nested JSON objects multiple times in the same message

Python noob here, again. I'm trying to create a python script to auto-generate a JSON with multiple item but records multiple times using a for loop to generate them, the JSON message is structured and cardinality are as follows:
messageHeader[1]
-item [1-*]
--itemAttributesA [0-1]
--itemAttributesB [0-1]
--itemAttributesC [0-1]
--itemLocaton [1]
--itemRelationships [0-1]
I've had some really good help before for looping through the same object but for one record for example just the itemRelationships record. However as soon as I try to create one message with many items (i.e. 5) and a single instance of an itemAttribute, itemLocation and itemRelationships it does not work as I keep getting a key error. I've tried to define what a keyError is in relation to what I am trying to do but cannot link what I am doing wrong to the examples else where.
Here's my code as it stands:
import json
import random
data = {'messageID': random.randint(0, 2147483647), 'messageType': 'messageType'}
data['item'] = list()
itemAttributeType = input("Please selct what type of Attribute item has, either 'A', 'B' or 'C' :")
for x in range(0, 5):
data['item'].append({
'itemId': "I",
'itemType': "T"})
if itemAttributeType == "A":
data['item'][0]['itemAttributesA']
data['item'][0]['itemAttributesA'].append({
'attributeA': "ITA"})
elif itemAttributeType == "B":
data['item'][0]['itemAttributesB']
data['item'][0]['itemAttributesB'].append({
'attributeC': "ITB"})
else:
data['item'][0]['itemAttributesC']
data['item'][0]['itemAttributesC'].append({
'attributeC': "ITC"})
pass
data['item'][0]['itemLocation'] = {
'itemDetail': "ITC"}
itemRelation = input("Does the item have a relation: ")
if itemRelation > '':
data['item'][0]['itemRelations'] = {
'itemDetail': "relation"}
else:
pass
print(json.dumps(data, indent=4))
I have tried also tried this code which gives me better results:
import json
import random
data = {'messageID': random.randint(0, 2147483647), 'messageType': 'messageType'}
data['item'] = list()
itemAttributeType = input("Please selct what type of Attribute item has, either 'A', 'B' or 'C' :")
for x in range(0, 5):
data['item'].append({
'itemId': "I",
'itemType': "T"})
if itemAttributeType == "A":
data['item'][0]['itemAttributesA'] = {
'attributeA': "ITA"}
elif itemAttributeType == "B":
data['item'][0]['itemAttributesB'] = {
'attributeB': "ITB"}
else:
data['item'][0]['itemAttributesC'] = {
'attributeC': "ITC"}
pass
data['item'][0]['itemLocation'] = {
'itemDetail': "ITC"}
itemRelation = input("Does the item have a relation: ")
if itemRelation > '':
data['item'][0]['itemRelations'] = {
'itemDetail': "relation"}
else:
pass
print(json.dumps(data, indent=4))
This actually gives me a result but gives me messageHeader, item, itemAttributeA, itemLocation, itemRelations, and then four items records at the end as follows:
{
"messageID": 1926708779,
"messageType": "messageType",
"item": [
{
"itemId": "I",
"itemType": "T",
"itemAttributesA": {
"itemLocationType": "ITA"
},
"itemLocation": {
"itemDetail": "location"
},
"itemRelations": {
"itemDetail": "relation"
}
},
{
"itemId": "I",
"itemType": "T"
},
{
"itemId": "I",
"itemType": "T"
},
{
"itemId": "I",
"itemType": "T"
},
{
"itemId": "I",
"itemType": "T"
}
]
}
What I am trying to achieve is this output:
{
"messageID": 2018369867,
"messageType": "messageType",
"item": [{
"itemId": "I",
"itemType": "T",
"itemAttributesA": {
"attributeA": "ITA"
},
"itemLocation": {
"itemDetail": "Location"
},
"itemRelation": [{
"itemDetail": "D"
}]
}, {
"item": [{
"itemId": "I",
"itemType": "T",
"itemAttributesB": {
"attributeA": "ITB"
},
"itemLocation": {
"itemDetail": "Location"
},
"itemRelation": [{
"itemDetail": "D"
}]
}, {
"item": [{
"itemId": "I",
"itemType": "T",
"itemAttributesC": {
"attributeA": "ITC"
},
"itemLocation": {
"itemDetail": "Location"
},
"itemRelation": [{
"itemDetail": "D"
}]
}, {
"item": [{
"itemId": "I",
"itemType": "T",
"itemAttributesA": {
"attributeA": "ITA"
},
"itemLocation": {
"itemDetail": "Location"
},
"itemRelation": [{
"itemDetail": "D"
}]
},
{
"item": [{
"itemId": "I",
"itemType": "T",
"itemAttributesB": {
"attributeA": "ITB"
},
"itemLocation": {
"itemDetail": "Location"
},
"itemRelation": [{
"itemDetail": "D"
}]
}]
}
]
}]
}]
}]
}
I've been at this for the best part of a whole day trying to get it to work, butchering away at code, where am I going wrong, any help would be greatly appreciated
Your close. I think the part your are missing is adding the dict to your current dict and indentation with your for loop.
import json
import random
data = {'messageID': random.randint(0, 2147483647), 'messageType': 'messageType'}
data['item'] = list()
itemAttributeType = input("Please selct what type of Attribute item has, either 'A', 'B' or 'C' :")
for x in range(0, 5):
data['item'].append({
'itemId': "I",
'itemType': "T"})
if itemAttributeType == "A":
# First you need to add `itemAttributesA` to your dict:
data['item'][x]['itemAttributesA'] = dict()
# You could also do data['item'][x] = {'itemAttributesA': = dict()}
data['item'][x]['itemAttributesA']['attributeA'] = "ITA"
elif itemAttributeType == "B":
data['item'][x]['itemAttributesB'] = dict()
data['item'][x]['itemAttributesB']['attributeC'] = "ITB"
else:
data['item'][x]['itemAttributesC'] = dict()
data['item'][x]['itemAttributesC']['attributeC'] = "ITC"
data['item'][x]['itemLocation'] = {'itemDetail': "ITC"}
itemRelation = input("Does the item have a relation: ")
if itemRelation > '':
data['item'][x]['itemRelations'] = {'itemDetail': "relation"}
else:
pass
print(json.dumps(data, indent=4))
This code can also be shortened considerably if your example is close to what you truly desire:
import json
import random
data = {'messageID': random.randint(0, 2147483647), 'messageType': 'messageType'}
data['item'] = list()
itemAttributeType = input("Please selct what type of Attribute item has, either 'A', 'B' or 'C' :")
for x in range(0, 5):
new_item = {
'itemId': "I",
'itemType': "T",
'itemAttributes' + str(itemAttributeType): {
'attribute' + str(itemAttributeType): "IT" + str(itemAttributeType)
},
'itemLocation': {'itemDetail': "ITC"}
}
itemRelation = input("Does the item have a relation: ")
if itemRelation > '':
new_item['itemRelations'] = {'itemDetail': itemRelation}
data['item'].append(new_item)
print(json.dumps(data, indent=4))
Another note: If you want messageID to be truly unique than you should probably look into a UUID; otherwise you may have message ids that match.
import uuid
unique_id = str(uuid.uuid4())
print(unique_id)

Metadata in jsonlite - R

I have a questions regarding metadata in a JSON file using R. I have a dataframe in R and I am using the function jsonlite::toJSON to convert it to a JSON file.
However, I would like to add some metadata to the JSON file. Basically to have my JSON output looking like that?
{
"metadata" :{
"status": "active",
"msg": "my_message"
},
"data" :{
"id": 1001,
"name": "Bob"
}
}
Let me know how I can make it happen !
Thanks.
You could do something like
df <- data.frame(id = 1001, name = 'Bob')
meta <- data.frame(status = 'active', msg = 'my_msg')
jsonlite::toJSON(list('metadata'=meta, 'data'=df), pretty = T)
which yields
{
"metadata": [
{
"status": "active",
"msg": "my_msg"
}
],
"data": [
{
"id": 1001,
"name": "Bob"
}
]
}
The key idea is to make up a list of metadata and data.
Update due to comment:
df <- data.frame(id = 1001, name = 'Bob')
meta <- list(status = 'active', msg = 'my_msg')
jsonlite::toJSON(list('metadata'=meta, 'data'= df), pretty = F, auto_unbox = T)

JSON to dimensional array

I'm trying to convert this JSON:
{
"labels": ["time", "free", "used", "cached", "buffers"],
"data": [
[1478635365, 26.91797, 460.9844, 479.3906, 5.80859]
]
}
to var['time'] = 1478635365, var['free'] = 26.91797 ...
Can you help me?
Here is the dict comprehension expression to create var dict using zip() as:
>>> my_json = { "labels": ["time", "free", "used", "cached", "buffers"], "data": [ [ 1478635365, 26.91797, 460.9844, 479.3906, 5.80859] ] }
>>> var = {k: v for k, v in zip(my_json["labels"], my_json["data"][0])}
Now you may access the values from var dict as:
>>> var["time"]
1478635365
>>> var["free"]
26.91797