How to do 3 layer flattening nested json to dataframe? - json

For example the JSON is :
{
"samples": [
{
"sample_id": "A2434",
"start": "1664729482",
"end": "1664729482",
"parts": [
{
"name": "123",
"start": "1664736682",
"end": "1618688700",
"fail": ""
}
]
}
]
}
I want the df and columns like below :
sample_id,start,end,parts.name,parts.start,parts.end,parts.fail

Using json.normalize
df = pd.json_normalize(
data=data["samples"],
record_path="parts",
record_prefix="parts.",
meta=["sample_id", "start", "end"]
).drop(columns="parts.name")
print(df)
parts.start parts.end parts.fail sample_id start end
0 1664736682 1618688700 A2434 1664729482 1664729482

Use pd.read_json() to unpack the JSON to a Dataframe. You can then use pd.json_normalize() as required on the generated columns to get the more nested data out.

You can use df.explode to get the separate item in a list on a row, and use agg(pd.Series) to convert key, value pairs in a dictionary to columns and rows, respectively.
df = pd.DataFrame(data['samples'])
df[['parts.start','parts.end','parts.fail']] = df['parts'].explode().agg(pd.Series)
df.drop('parts', axis=1, inplace=True)
Output:
sample_id start end parts.name parts.start parts.end parts.fail
0 A2434 1664729482 1664729482 123 1664736682 1618688700

Related

Extracting multiple values having same path in json using json map in sas

Dose anyone can help me get multiple values in a json having same path using a json map. Any help is appreciated. Thank you.
JSON
{
"totalCount": 2,
"facets": {},
"content": [
[
{
"name": "customer_ID",
"value": "1"
},
{
"name": "customer_name",
"value": "John"
}
]
]
}
JSON MAP
{
"DATASETS": [
{
"DSNAME": "customers",
"TABLEPATH": "/root/content",
"VARIABLES": [
{
"NAME": "name",
"TYPE": "CHARACTER",
"PATH": "/root/content/name"/*output as customer_ID*/
},
{
"NAME": "name",
"TYPE": "CHARACTER",
"PATH": "/root/content/name"/*output as customer_name*/
},
{
"NAME": "value",
"TYPE": "CHARACTER",
"PATH": "/root/content/value"/*output as 1*/
},
{
"NAME": "value",
"TYPE": "CHARACTER",
"PATH": "/root/content/value"/*output as John*/
}
]
}
]
}
When i use the above json map I get the output for name as only "customer_name", but i need both "customer_ID" and "customer_name" in the output.
Similarly i need both values of "value"
JSON is a hierarchy of name-value pairs. The JSON engine in SAS will take the "name" and assign as a variable name, and then populate with the value. In your JSON, there are two sets of name-values, one being the name of an intended variable, and another being its value. This is a common output scheme we find in GraphQL responses -- and these require a little manipulation to turn into 2-D data sets.
For your example, you could use PROC TRANSPOSE:
libname j json fileref=test;
proc transpose
data=j.content
out=want;
id name;
var value;
run;
Output:
customer_ customer_
Obs _NAME_ ID name
1 value 1 John
You can also do more manual seek/assignment by using DATA step to process what you see in the ALLDATA member in the JSON libname. In your example, SAS sees that as:
Obs P P1 P2 V Value
1 1 totalCount 1 2
2 1 facets 0
3 1 content 0
4 1 content 0
5 2 content name 1 customer_ID
6 2 content value 1 1
7 1 content 0
8 2 content name 1 customer_name
9 2 content value 1 John
Processing the ALLDATA member is not as friendly as using the relational data that the JSON engine can create, but I find with GraphQL responses that's what you need to do to get more control over the name, length, and type/format for output variables.

Groovy Json - get array element size and parse with element count

My Json file
{
"Apps": [
{
"ClientName": "abc",
"Version": "1.0"
},
{
"ClientName": "b",
"Version": "1.0"
}
]
}
In the above JSON file created with a single array along with multiple elements (name, value pair)
I want to calculate how many elements are there in the array [Apps] using groovy.
Based on the count I'm going to create a loop and parse elements one by one.
In order to process elements one by one you don't need a count. It can be accomplished using standard iteration methods:
def txt = '{ "Apps": [ { "ClientName": "abc", "Version": "1.0" }, { "ClientName": "b", "Version": "1.0" } ] }'
def json = new groovy.json.JsonSlurper().parseText txt
println "count = ${json.Apps.size()}"
json.Apps.each this.&println
prints:
count = 2
[ClientName:abc, Version:1.0]
[ClientName:b, Version:1.0]

Exclude column header when writing DataFrame to json

I have the following dataframe df1
SomeJson
=================
[{
"Number": "1234",
"Color": "blue",
"size": "Medium"
}, {
"Number": "2222",
"Color": "red",
"size": "Small"
}
]
and I am trying to write just the contents of this column to blob storage as a json.
df1.select("SomeJson")
.write
.option("header", false)
.mode("append")
.json(blobStorageOutput)
This Code works but it creates the following json in blob storage.
{
"SomeJson": [{
"Number": "1234",
"Color": "blue",
"size": "Medium"
}, {
"Number": "2222",
"Color": "red",
"size": "Small"
}
]
}
But I just want the contents of the column not the column Header as well, I dont want the "SomeJson" in my final Json. Any Suggestions?
If you don't want dataframe column to get appended, write your dataframe as text and not as json. It will only write the content of your column.
df1.select("SomeJson")
.write
.option("header", false)
.mode("append")
.text(blobStorageOutput)
Just an additional assumption to this question,
We derive JSON structure itself from the dataset and then we encounter this header scenario like here. We can follow the below approach.
spark.sql("SELECT COLLECT_SET(STRUCT(<field_name>)) AS `` FROM <table_name> LIMIT 1").coalesce(1).write.format("org.apache.spark.sql.json").mode("overwrite").save(<Blob Path1/ ADLS Path1>)
Output will be like,
{"":[{<field_name>:<field_value>}]}
Here the header can be avoided by following 3 lines (Assumption No Tilda in data),
jsonToCsvDF=spark.read.format("com.databricks.spark.csv").option("delimiter", "~").load(<Blob Path1/ ADLS Path1>)
jsonToCsvDF.createOrReplaceTempView("json_to_csv")
spark.sql("SELECT SUBSTR(`_c0`,5,length(`_c0`)-5) FROM json_to_csv").coalesce(1).write.option("header",false).mode("overwrite").text(<Blob Path2/ ADLS Path2>)

how to retrieve all "name" & "id" from json file in python3

How can i retrieve all name and id from json file.This is a short version of my json file. I want to retrieve all names and id's so that i can match them with my variable. Then i can triger some work on it.So please help me to retrieve all Id and name. I searched in google but couldn't find. Every example was of single json.
[
{
"id": 707860,
"name": "Hurzuf",
"country": "UA",
"coord": {
"lon": 34.283333,
"lat": 44.549999
}
},
{
"id": 519188,
"name": "Novinki",
"country": "RU",
"coord": {
"lon": 37.666668,
"lat": 55.683334
}
},
{
"id": 1283378,
"name": "Gorkhā",
"country": "NP",
"coord": {
"lon": 84.633331,
"lat": 28
}
}
]
Here's My Code:
import json
with open('city.list.json') as f:
data = json.load(f)
for p_id in data:
hay = p_id.get('name')
suppose,i got a word delhi, now i am comparing it with name in dictionary above. when it hits i want to retrieve it's id.
if hay == delhi:
ga = # retrieve delhi's id
You need to check for name and apply a condition:
for p_id in data:
u_id = p_id.get('id')
u_name = p_id.get('name')
if(u_id == 1283378 and u_name == "Gorkha"):
# dosomthing
Not sure exactly on your output. But this extracts id and name in a new variable.
ids=[]
for p_id in data:
ids.append((p_id['id'], p_id['name']))
print(ids)
Output:
[(707860, 'Hurzuf'), (519188, 'Novinki'), (1283378, 'Gorkhā')]
I would suggest a different approach, process the JSON data into a dict and get the information you want from that. For example:
import json
with open('city.list.json') as f:
data = json.load(f)
name_by_id = dict([(str(p['id']), p['name']) for p in data])
id_by_name = dict([(p['name'], p['id']) for p in data])
And the results:
>>> print(id_by_name['Hurzuf'])
707860
>>> print(name_by_id['519188'])
Novinki
import json
with open('citylist.json') as f:
data = json.load(f)
list1 = list ((p_id.get('id') for p_id in data if p_id.get('name') == "Novinki"))
# you can put this in print statement,
# but since goal is to save and not just print,
# you can store in a variable
print(*list1, sep="\n")
gives
519188
[Program finished]

Parsing and cleaning text file in Python?

I have a text file which contains raw data. I want to parse that data and clean it so that it can be used further.The following is the rawdata.
"{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime-ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x221a0d4d6e-0c00-11e7-a16f-0242ac110002\x22,\x0A \x22device_id\x22: \x22423e49efa4b8b013\x22\x0A },\x0A \x22stats\x22: [\x0A {\x0A \x22timestamp\x22: \x222017-03-22T03:21:11+0000\x22,\x0A \x22software_id\x22: \x22A-ACTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A \x22device_id\x22: \x22423e49efa4b8b013\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A }\x0A ]\x0A}"
I want to remove all the hexadecimal characters,I tried parsing the data and storing in an array and cleaning it using re.sub() but it gives the same data.
for line in f:
new_data = re.sub(r'[^\x00-\x7f],\x22',r'', line)
data.append(new_data)
\x0A is the hex code for newline. After s = <your json string>, print(s) gives
>>> print(s)
{
"identifier": {
"company_code": "TSC",
"product_type": "airtime-ctg",
"host_type": "android"
},
"id": {
"type": "guest",
"group": "guest",
"uuid": "1a0d4d6e-0c00-11e7-a16f-0242ac110002",
"device_id": "423e49efa4b8b013"
},
"stats": [
{
"timestamp": "2017-03-22T03:21:11+0000",
"software_id": "A-ACTG",
"action_id": "open_app",
"values": {
"device_id": "423e49efa4b8b013",
"language": "en"
}
}
]
}
You should parse this with the json module load (from file) or loads (from string) functions. You will get a dict with 2 dicts and a list with a dict.