How to split a specific column data into a nested json data structure using python? - json

I began working with a csv data which has the following data:
I want to create a json data structure which looks like this:
{
"name": "Place",
"Details": [
{
"detail": "I",
"info": [
"Iran",
"Iraq"
]
},
{
"detail": "J",
"info": "Japan"
}
]
}
Below is my code but I m unable to split the second column as required:
import pandas as pd
path="/content/file.csv/"
data=pd.read_csv(path)
df=pd.DataFrame(data)
out=df.to_json(orient="records")
print(out)

Use custom function with splitting values in Details column:
def f(x):
out = []
splitted = x.split(',')
for x in splitted:
a, b = x.split('-')
c = b.split('/')
if len(c) == 1:
d = {'detail': a, 'info':c[0]}
else:
d = {'detail': a, 'info':c}
out.append(d)
return out
df['Details'] = df['Details'].apply(f)
print (df)
Name Details Verfied
0 Alphabet [{'detail': 'A', 'info': 'Apple'}, {'detail': ... Yes
1 Place [{'detail': 'I', 'info': ['Iran', 'Iraq']}, {'... Yes
out=df[['Name','Details']].to_json(orient="records")
print(out)
[{"Name":"Alphabet","Details":[{"detail":"A","info":"Apple"},
{"detail":"B","info":"Ball"}]},
{"Name":"Place","Details":[{"detail":"I","info":["Iran","Iraq"]},
{"detail":"J","info":"Japan"}]}]

Related

Combine pandas dataframe and additional value into a JSON file

I have created a function to output a JSON file into which I input a pandas data frame and a variable.
Dataframe (df) has 3 columns: ' a', 'b', and 'c'.
df = pd.DataFrame([[1,2,3], [0.1, 0.2, 0.3]], columns=('a','b','c'))
Error is a variable with a float value.
Error = 45
The output format of the JSON file should look like this:
{
"Error": 45,
"Data": [
{
"a": [1, 0.1]
},
{
"b": [2, 0.2]
},
{
"c": [3, 0.3]
},
]
}
I can convert the dataframe into a JSON using the below code. But how can I obtain the desired format of the JSON file?
def OutputFunction(df, Error):
#json_output = df_ViolationSummary.to_json(orient = 'records')
df.to_json(r'C:\Users\Administrator\Downloads\export_dataframe.json', orient = 'records')
## Calling the Function
OutputFunction(df, Error)
calling to_dict(orient='list') will return a dictionary object with each key representing the column and value as the column values in a list. Then you can achieve your desired json object like this: output = {"Error":Error, "Data": df.to_dict(orient='list')}.
Running this line will return:
{'Error': 45, 'Data': {'a': [1.0, 0.1], 'b': [2.0, 0.2], 'c': [3.0, 0.3]}}
Note that the integer values will be in a float format since some of the values in the dataframe are floats, so the the columns' data types become float. If you really wish to have mixed types, you could use some form of mapping/dictionary comprehension as the following, although it should not be necessary for most cases:
output = {
"Error":Error,
"Data": {
col: [
int(v) if v%1 == 0 else v
for v in vals
]
for col,vals in df.to_dict(orient='list').items()
}
}

Parsing nested JSON and collecting data in a list

I am trying to parse a nested JSON and trying to collect data into a list under some condition.
Input JSON as below:
[
{
"name": "Thomas",
"place": "USA",
"items": [
{"item_name":"This is a book shelf", "level":1},
{"item_name":"Introduction", "level":1},
{"item_name":"parts", "level":2},
{"item_name":"market place", "level":3},
{"item_name":"books", "level":1},
{"item_name":"pens", "level":1},
{"item_name":"pencils", "level":1}
],
"descriptions": [
{"item_name": "Books"}
]
},
{
"name": "Samy",
"place": "UK",
"items": [
{"item_name":"This is a cat house", "level":1},
{"item_name":"Introduction", "level":1},
{"item_name":"dog house", "level":3},
{"item_name":"cat house", "level":1},
{"item_name":"cat food", "level":2},
{"item_name":"cat name", "level":1},
{"item_name":"Samy", "level":2}
],
"descriptions": [
{"item_name": "cat"}
]
}
]
I am reading json as below:
with open('test.json', 'r', encoding='utf8') as fp:
data = json.load(fp)
for i in data:
if i['name'] == "Thomas":
#collect "item_name", values in a list (my_list) if "level":1
#my_list = []
Expected output:
my_list = ["This is a book shelf", "Introduction", "books", "pens", "pencils"]
Since it's a nested complex JSON, I am not able to collect the data into a list as I mentioned above. Please let me know no how to collect the data from the nested JSON.
Try:
import json
with open("test.json", "r", encoding="utf8") as fp:
data = json.load(fp)
my_list = [
i["item_name"]
for d in data
for i in d["items"]
if d["name"] == "Thomas" and i["level"] == 1
]
print(my_list)
This prints:
['This is a book shelf', 'Introduction', 'books', 'pens', 'pencils']
Or without list comprehension:
my_list = []
for d in data:
if d["name"] != "Thomas":
continue
for i in d["items"]:
if i["level"] == 1:
my_list.append(i["item_name"])
print(my_list)
Once we have the data we iterate over the outermost list of objects.
We check if the object has the name equals to "Thomas" if true then we apply filter method with a lambda function on items list with a condition of level == 1
This gives us a list of item objects who have level = 1
In order to extract the item_name we use a comprehension so the final value in the final_list will be as you have expected.
["This is a book shelf", "Introduction", "books", "pens", "pencils"]
import json
def get_final_list():
with open('test.json', 'r', encoding='utf8') as fp:
data = json.load(fp)
final_list = []
for obj in data:
if obj.get("name") == "Thomas":
x = list(filter(lambda item: item['level'] == 1, obj.get("items")))
final_list = final_list + x
final_list = [i.get("item_name") for i in final_list]
return final_list

Join nested JSON dataframe and another dataframe

I am trying to join a dataframe1 generated by the JSON with dataframe2 using the field order_id, then assign the "status" from dataframe2 to the "status" of dataframe1. Anyone knows how to do this. Many thanks for your help.
dataframe1
[{
"client_id": 1,
"name": "Test01",
"olist": [{
"order_id": 10000,
"order_dt_tm": "2012-12-01",
"status": "" <== use "status" from dataframe2 to populate this field
},
{
"order_id": 10000,
"order_dt_tm": "2012-12-01",
"status": ""
}
]
},
{
"client_id": 2,
"name": "Test02",
"olist": [{
"order_id": 10002,
"order_dt_tm": "2012-12-01",
"status": ""
},
{
"order_id": 10003,
"order_dt_tm": "2012-12-01",
"status": ""
}
]
}
]
dataframe2
order_id status
10002 "Delivered"
10001 "Ordered"
Here is your raw dataset as a json string:
d = """[{
"client_id": 1,
"name": "Test01",
"olist": [{
"order_id": 10000,
"order_dt_tm": "2012-12-01",
"status": ""
},
{
"order_id": 10000,
"order_dt_tm": "2012-12-01",
"status": ""
}
]
},
{
"client_id": 2,
"name": "Test02",
"olist": [{
"order_id": 10002,
"order_dt_tm": "2012-12-01",
"status": ""
},
{
"order_id": 10003,
"order_dt_tm": "2012-12-01",
"status": ""
}
]
}
]"""
Firstly, I would load it as json:
import json
data = json.loads(d)
Then, I would turn it into a Pandas dataframe, notice that I remove status field as it will be populated by the join step :
df1 = pd.json_normalize(data, 'olist')[['order_id', 'order_dt_tm']]
Then, from the second dataframe sample, I would do a left join using merge function:
data = {'order_id':[10002, 10001],'status':['Delivered', 'Ordered']}
df2 = pd.DataFrame(data)
result = df1.merge(df2, on='order_id', how='left')
Good luck
UPDATE
# JSON to Dataframe
df1 = pd.json_normalize(data)
# Sub JSON to dataframe
df1['sub_df'] = df1['olist'].apply(lambda x: pd.json_normalize(x).drop('status', axis=1))
# Build second dataframe
data2 = {'order_id':[10002, 10001],'status':['Delivered', 'Ordered']}
df2 = pd.DataFrame(data2)
# Populates status in sub dataframes
df1['sub_df'] = df1['sub_df'].apply(lambda x: x.merge(df2, on='order_id', how='left').fillna(''))
# Sub dataframes back to JSON
def back_to_json_str(df):
# turns a df back to string json
return str(df.to_json(orient="records", indent=4))
df1['olist'] = df1['sub_df'].apply(lambda x: back_to_json_str(x))
# Global DF back to JSON string
parsed = str(df1.drop('sub_df', axis=1).to_json(orient="records", indent=4))
parsed = parsed.replace(r'\n', '\n')
parsed = parsed.replace(r'\"', '\"')
# Print result
print(parsed)
UPDATE 2
here is a way to add index colum to a dataframe:
df1['index'] = [e for e in range(df1.shape[0])]
This is my code assigning title values from a dataframe back to the JSON object. The assignment operation takes a bit time if the number records in the JSON object is 100000. Anyone knows how to improve the performance of this code. Many thanks.
import json
import random
import pandas as pd
import pydash as _
data = [{"pid":1,"name":"Test1","title":""},{"pid":2,"name":"Test2","title":""}] # 5000 records
# dataframe1
df = pd.json_normalize(data)
# dataframe2
pid = [x for x in range(1, 5000)]
title_set = ["Boss", "CEO", "CFO", "PMO", "Team Lead"]
titles = [title_set[random.randrange(0, 5)] for x in range(1, 5000)]
df2 = pd.DataFrame({'pid': pid, 'title': titles})
#left join dataframe1 and dataframe2
df3 = df.merge(df2, on='pid', how='left')
#assign title values from dataframe back to the json object
for row in df3.iterrows():
idx = _.find_index(data, lambda x: x['pid'] == row[1]['pid'])
data[idx]['title'] = row[1]['title_y']
print(data)

how to parse CSV to JSON from 2 CSV Files in Groovy

Please help with parse CSV to JSON from 2 CSV Files in groovy
For example :
CSV1:
testKey,status
Name001,PASS
Name002,PASS
Name003,FAIL
CSV2:
Kt,Pd
PT-01,Name001
PT-02,Name002
PT-03,Name003
PT-04,Name004
I want to input in "testlist" data from CSV2.val[1..-1],CSV1.val[1..-1]
Result should be like :
{
"testExecutionKey": "DEMO-303",
"info": {
"user": "admin"
},
"tests": [
{
"TestKey": "PT-01",
"status": "PASS"
},
{
"TestKey": "PT-02",
"status": "PASS"
},
{
"TestKey": "PT-03",
"status": "FAIL"
}
]
code without this modification (from only 1 csv):
import groovy.json.*
def kindaFile = '''
TestKey;Finished;user;status
Name001;PASS;
Name002;PASS;
'''.trim()
def keys
def testList = []
//parse CSV
kindaFile.splitEachLine( /;/ ){ parts ->
if( !keys )
keys = parts
else{
def test = [:]
parts.eachWithIndex{ val, ix -> test[ keys[ ix ] ] = val }
testList << test
}
}
def builder = new JsonBuilder()
def root = builder {
testExecutionKey 'DEMO-303'
info user: 'admin'
tests testList
}
println JsonOutput.prettyPrint(JsonOutput.toJson(root))
Your sample JSON doesn't match the CSV definition. It looks lile you're using fields [1..-1] from CSV 1, as you stated, but fields [0..-2] from CSV 2. As you only have 2 fields in each CSV that's the equivalent of csv1[1] and csv2[0]. The example below uses [0..-2]. Note that if you always have exactly two fields in your input files then the following code could be simplified a little. I've given a more generic solution that can cope with more fields.
Load both CSV files into lists
File csv1 = new File( 'one.csv')
File csv2 = new File( 'two.csv')
def lines1 = csv1.readLines()
def lines2 = csv2.readLines()
assert lines1.size() <= lines2.size()
Note the assert. That's there as I noticed you have 4 tests in CSV2 but only 3 in CSV1. To allow the code to work with your sample data, it iterates through through CSV1 and adds the matching data from CSV2.
Get the field names
fieldSep = /,[ ]*/
def fieldNames1 = lines1[0].split( fieldSep )
def fieldNames2 = lines1[0].split( fieldSep )
Build the testList collection
def testList = []
lines1[1..-1].eachWithIndex { csv1Line, lineNo ->
def mappedLine = [:]
def fieldsCsv1 = csv1Line.split( fieldSep )
fieldsCsv1[1..-1].eachWithIndex { value, fldNo ->
String name = fieldNames1[ fldNo + 1 ]
mappedLine[ name ] = value
}
def fieldsCsv2 = lines2[lineNo + 1].split( fieldSep )
fieldsCsv2[0..-2].eachWithIndex { value, fldNo ->
String name = fieldNames2[ fldNo ]
mappedLine[ name ] = value
}
testList << mappedLine
}
Parsing
You can now parse the list of maps with your existing code. I've made a change to the way the JSON string is displayed though.
def builder = new JsonBuilder()
def root = builder {
testExecutionKey 'DEMO-303'
info user: 'admin'
tests testList
}
println builder.toPrettyString()
JSON Output
Running the above code, using your CSV1 and CSV 2 data, gives the JSON that you desire.
for CSV1:
testKey,status
Name001,PASS
Name002,PASS
Name003,FAIL
and CSV2:
Kt,Pd
PT-01,Name007
PT-02,Name001
PT-03,Name003
PT-05,Name002
PT-06,Name004
PT-07,Name006
result is:
{
"testExecutionKey": "DEMO-303",
"info": {
"user": "admin"
},
"tests": [
{
"status": "PASS",
"testKey": "PT-01"
},
{
"status": "PASS",
"testKey": "PT-02"
},
{
"status": "FAIL",
"testKey": "PT-03"
}
]
}
but I need exactly the same values for testKey (testKey from CSV1=Kt from CSV2)
{
"testExecutionKey": "DEMO-303",
"info": {
"user": "admin"
},
"tests": [
{
"testKey": "PT-02",
"status": "PASS"
},
{
"testKey": "PT-05",
"status": "PASS"
},
{
"testKey": "PT-03",
"status": "FAIL"
}
]
}

Metadata in jsonlite - R

I have a questions regarding metadata in a JSON file using R. I have a dataframe in R and I am using the function jsonlite::toJSON to convert it to a JSON file.
However, I would like to add some metadata to the JSON file. Basically to have my JSON output looking like that?
{
"metadata" :{
"status": "active",
"msg": "my_message"
},
"data" :{
"id": 1001,
"name": "Bob"
}
}
Let me know how I can make it happen !
Thanks.
You could do something like
df <- data.frame(id = 1001, name = 'Bob')
meta <- data.frame(status = 'active', msg = 'my_msg')
jsonlite::toJSON(list('metadata'=meta, 'data'=df), pretty = T)
which yields
{
"metadata": [
{
"status": "active",
"msg": "my_msg"
}
],
"data": [
{
"id": 1001,
"name": "Bob"
}
]
}
The key idea is to make up a list of metadata and data.
Update due to comment:
df <- data.frame(id = 1001, name = 'Bob')
meta <- list(status = 'active', msg = 'my_msg')
jsonlite::toJSON(list('metadata'=meta, 'data'= df), pretty = F, auto_unbox = T)