Python - Converted Nested JSON with Nested Columns - json

I am new to Python and JSON data structures and was looking for some assistance
I have been able to create some Python code that calls a Web API and converts the returning JSON data (report_rows) into a dataframe successfully using json_normalize()
I am having some issues converting and sorting the JSON column names into the dataframe column names and was wondering if I could get some help on the following...
Get Column Names from JSON data - In the dataframe I would like to convert the column names: c1, c2, c3, etc to RECORD_NO, REF_RECORD_NO, SOV_LINEITEM_NO. The column names are in the JSON data [data][report_header][cXX][name] where cXX is the column number
Sort Column Names - I would like to order the dataframe columns so instead of c1, c10, c11, c12, c2, c3, etc it is c1, c2, c3 ... c10, c11,c12
If someone is able to provide some help, it would be greatly appreciated
Thanks in advance
Python Code
json_data = json.loads(res.read())
data = pd.json_normalize(json_data['data'], record_path=['report_row'])
print(data)
which outputs the following
c1 c10 c11 ... c7 c8 c9
0 CON-0000001 71 VEN-0000001 ... Build IT System Contract 123 Pending
1 CON-0000002 72 VEN-0000002 ... Build IT System Contract XYZ Approved
JSON Data
"data": [
{
"report_header": {
"c11": {
"name": "VENDOR_RECORD",
"type": "java.lang.String"
},
"c10": {
"name": "VENDOR_ID",
"type": "java.lang.Integer"
},
"c12": {
"name": "VENDOR_NAME",
"type": "java.lang.String"
},
"c1": {
"name": "RECORD_NO",
"type": "java.lang.String"
},
"c2": {
"name": "REF_RECORD_NO",
"type": "java.lang.String"
},
"c3": {
"name": "SOV_LINEITEM_NO",
"type": "java.lang.String"
},
"c4": {
"name": "REF_ITEM",
"type": "java.lang.String"
},
"c5": {
"name": "PROJECTNUMBER",
"type": "java.lang.String"
},
"c6": {
"name": "PROJECTNAME",
"type": "java.lang.String"
},
"c7": {
"name": "TITLE",
"type": "java.lang.String"
},
"c8": {
"name": "CONTRACT_NO",
"type": "java.lang.String"
},
"c9": {
"name": "STATUS",
"type": "java.lang.String"
}
},
"report_row": [
{
"c1": "CON-0000001",
"c10": "71 ",
"c11": "VEN-0000001",
"c12": "Microsoft",
"c2": "",
"c3": "1",
"c4": "",
"c5": "P-0037",
"c6": "Project ABC",
"c7": "Build IT System",
"c8": "Contract 123",
"c9": "Pending"
},
{
"c1": "CON-0000002",
"c10": "72 ",
"c11": "VEN-0000002",
"c12": "Google",
"c2": "",
"c3": "1.1",
"c4": "",
"c5": "P-0037",
"c6": "Project ABC",
"c7": "Build IT System",
"c8": "Contract XYZ",
"c9": "Approved"
}
]
}
],
"message": [
"OK"
],
"status": 200
}

i was able to resolve the issue by adding the following code...
# Get the number of fields/columns in the JSON data
number_of_fields = len((json_data['data'][0]['report_header']))
reorder_columns = []
new_column_names = []
field_index = 0
# Loop through the Columns and do the following...
# reorder_columns - this is the column order that i want: c1, c2, c3 ... c10, c11, c12
# new_column_name - this will retrieve the column names from the header: c1.name, c2.name, etc
while field_index < number_of_fields:
field_index += 1
new_column = "c" + str(field_index)
reorder_columns.append(new_column)
column_header = new_column + '.name'
new_column_name = header.iloc[0][new_column + '.name']
new_column_names.append(new_column_name)
data = pd.json_normalize(json_data['data'], record_path=['report_row'])
data = data.reindex(columns=reorder_columns)
data.columns = new_column_names

Related

How to get required json output from complex nested json format.?

My original file is in CSV format which I have converted to python JSON array to JSON Sring.
jsonfile
<class 'list'>
<class 'dict'>
[
{
"key": "timestamp",
"source": "eia007",
"turnover": "65million",
"url": "abc.com",
"record": "",
"loc.reg": "nord000",
"loc.count": "abs39i5",
"loc.town": "cold54",
"co.gdp": "nscrt77",
"co.pop.min": "min50",
"co.pop.max": "max75",
"co.rev": "",
"chain.system": "5t5t5",
"chain.type": "765ef",
"chain.strat": "",
}
]
I would like to get the output as below:
{
"timestamp001": {
"key": "timestamp001",
"phNo": "ner007",
"turnover": "65million",
"url": "abc.com",
"record": "",
"loc": {
"reg": "nord000",
"count": "abs39i5",
"town": "cold54"
},
"co": {
"form": "nscrt77",
"pop": {
"min": "min50",
"max": "max75"
},
"rev: ""
},
"chain":{
"system": "5t5t5",
"type": "765ef",
"strat": ""
}
...
}
...
}
]
I have tried different options; tried to enumerate, but cannot get the required output. Please help me with this. Thanks in advance.
You can use something like this to create the nested dict:
import json
def unflatten(somedict):
unflattened = {}
for key, value in somedict.items():
splitkey = key.split(".")
print(f"doing {key} {value} {splitkey}")
# subdict is the dict that goes deeper in the nested structure
subdict = unflattened
for subkey in splitkey[:-1]:
# if this is the first time we see this key, add it
if subkey not in subdict:
subdict[subkey] = {}
# shift the subdict a level deeper
subdict = subdict[subkey]
# add the value
subdict[splitkey[-1]] = value
return unflattened
data = {
"key": "timestamp",
"source": "eia007",
"turnover": "65million",
"url": "abc.com",
"record": "",
"loc.reg": "nord000",
"loc.count": "abs39i5",
"loc.town": "cold54",
"co.gdp": "nscrt77",
"co.pop.min": "min50",
"co.pop.max": "max75",
"co.rev": "",
"chain.system": "5t5t5",
"chain.type": "765ef",
"chain.strat": "",
}
unflattened = unflatten(data)
print(json.dumps(unflattened, indent=4))
Which produces:
{
"key": "timestamp",
"source": "eia007",
"turnover": "65million",
"url": "abc.com",
"record": "",
"loc": {
"reg": "nord000",
"count": "abs39i5",
"town": "cold54"
},
"co": {
"gdp": "nscrt77",
"pop": {
"min": "min50",
"max": "max75"
},
"rev": ""
},
"chain": {
"system": "5t5t5",
"type": "765ef",
"strat": ""
}
}
Cheers!

Snowflake - Querying Nested JSON

I need some help querying this JSON file I've ingested into a temp table in Snowflake. So, I've created a JSON_DATA variant column and plan to query and do a COPY INTO another table, but my query isn't working yet... I feel I'm close (possibly?)
JSON layout:
{
"nextPage": "01",
"page": "0",
"status": "ok",
"transactions": [
{
"id": "65985",
"recordTp": "vendorbill",
"values": {
"account": [
{
"text": "14500 Deferred Expenses",
"value": "249"
}
],
"account.number": "1450",
"account.type": [
{
"text": "Deferred Expense",
"value": "DeferExpense"
}
],
"amount": "51733",
"classnohierarchy": [
{
"text": "901 Corporate",
"value": "139"
}
],
"currency": [
{
"text": "Canadian Dollar",
"value": "3"
}
],
"customer.altname": "V Sties expenses (Tor)",
"customer.custate": "12/31/2019",
"customer.custentient": "ada Inc.",
"customer.custendate": "1/1/2019",
"customer.entyid": "PR781",
"departmentnohierarchy": [
{
"text": "8rity",
"value": "37"
}
],
"fxamount": "689",
"location": [
{
"text": "Othad Projects",
"value": "48"
}
],
"postingperiod": [
{
"text": "Jan 2020",
"value": "1"
}
],
"subsidiary.custrecord_region": [
{
"text": "CANADA",
"value": "3"
}
],
"subsidiarynohierarchy": [
{
"text": "ada Inc.",
"value": "25"
}
]
}
},
I've been able to query the values that are not (deeply) nested but I need help getting, for example, the values from 'classnohierarchy', to get both the 'text' and 'value' I tried:
transactions.value:"values".classnohierarchy.text::string as class_txt,
transactions.value:"values".classnohierarchy.value::string as class_val,
but it's returning NULL values.
Below is my entire query:
SELECT
JSON_DATA:status::string as connection_status,
transactions.value:id::string as id,
transactions.value:recordType::string as record_type,
transactions.value:"values"::variant as trans_val,
transactions.value:"values".account as acc,
transactions.value:"values".account.text as text,
transactions.value:"values".account.value as val,
transactions.value:"values"."account.number"::string as acc_num,
transactions.value:"values"."account.type".text::string as acc_type_txt,
transactions.value:"values"."account.type".value::string as acc_type_val,
transactions.value:"values".amount::string as amount,
**transactions.value:"values".classnohierarchy.text::string as class_txt,
transactions.value:"values".classnohierarchy.value::string as class_val,**
transactions.value:"values".currency.text::string as currency_text,
transactions.value:"values".currency.value::string as currency_val,
transactions.value:"values"."customer.altname"::string as customer_project_name,
transactions.value:"values"."customer.custate"::string as customer_end_date,
transactions.value:"values"."customer.custentient"::string as customer_end_client,
transactions.value:"values"."customer.custendate"::string as customer_start_date,
transactions.value:"values"."customer.entyid"::string as customer_project_id,
transactions.value:"values".departmentnohierarchy.text::string as department_name,
transactions.value:"values".departmentnohierarchy.value::string as department_value,
transactions.value:"values".fxamount::string as fx_amount,
transactions.value:"values".location.text::string as product_name,
transactions.value:"values".postingperiod.text::string as postingperiod,
transactions.value:"values".postingperiod.value::string as postingperiod,
transactions.value:"values"."subsidiary.custrecord_region".text::string as region_name,
transactions.value:"values"."subsidiary.custrecord_region".value::string as region_value,
transactions.value:"values".subsidiarynohierarchy.text::string as entity_name,
transactions.value:"values".subsidiarynohierarchy.value::string as entity_value,
FROM MY_TABLE,
LATERAL FLATTEN (JSON_DATA:transactions) as transactions
and here's a picture of whats showing in Snowflake:
SNOWFLAKE_SCREENSHOT
departmentnohierarchy is an array. you need to mention the index as below.
select *,transactions.VALUE:"values".departmentnohierarchy[0].value::text as department_name
FROM jsont1,
LATERAL FLATTEN (JSON_DATA:transactions) as transactions

Postgres JSON Query on Unknown Keys

I have a jsonb column where the structure always remains the same, but the keys within the json may change. For example,
{
"key-12345":
{
"values-12345": [
{"type": 5200,
"source": "somesource",
"messageid": 707643203507,
"timestamp": "2018-07-26T21:25:42.612Z",
"destination": "somedestination",
"previouslyRouted": false
},
{"type": 5200,
"source": "anothersource",
"messageid": 707643203507,
"timestamp": "2018-07-26T21:26:01.542Z",
"destination": "anotherdestination",
"previouslyRouted": false
}
]
},
"key-6789":
{
"values-34512": [
{"type": 5200,
"source": "yetantohersomesource",
"messageid": 707643203507,
"timestamp": "2018-07-26T21:25:42.612Z",
"destination": "yetanothersomedestination",
"previouslyRouted": false
},
{"type": 5200,
"source": "anothersource",
"messageid": 707643203507,
"timestamp": "2018-07-26T21:26:01.542Z",
"destination": "anotherdestination",
"previouslyRouted": false
}
]
}
}
I know that the structure of that document will be the same, but the keys could be anything. I can pull the keys themselves out with
select jsonb_object_keys(column) from table;
easily enough but I don't know how to pull the object assigned to that key and deal with it. How do I select from a jsonb object based on the value of a key, rather than the values.
select object from document where json_key = 'key-12345';
This is how it's done:
SELECT object->>'key-12345' FROM document;
and with a where clause:
SELECT * FROM document WHERE object->>'key-12345' IS NOT NULL;

AWS Athena - Querying JSON - Searching for Values

I have nested JSON files on S3 and am trying to query them with Athena.
However, I am having problems to query the nested JSON values.
My JSON file looks like this:
{
"id": "17842007980192959",
"acount_id": "17841401243773780",
"stats": [
{
"name": "engagement",
"period": "lifetime",
"values": [
{
"value": 374
}
],
"title": "Engagement",
"description": "Total number of likes and comments on the media object",
"id": "17842007980192959/insights/engagement/lifetime"
},
{
"name": "impressions",
"period": "lifetime",
"values": [
{
"value": 11125
}
],
"title": "Impressions",
"description": "Total number of times the media object has been seen",
"id": "17842007980192959/insights/impressions/lifetime"
},
{
"name": "reach",
"period": "lifetime",
"values": [
{
"value": 8223
}
],
"title": "Reach",
"description": "Total number of unique accounts that have seen the media object",
"id": "17842007980192959/insights/reach/lifetime"
},
{
"name": "saved",
"period": "lifetime",
"values": [
{
"value": 0
}
],
"title": "Saved",
"description": "Total number of unique accounts that have saved the media object",
"id": "17842007980192959/insights/saved/lifetime"
}
],
"import_date": "2017-12-04"
}
What I'm trying to do is to query the "stats" field value where name=impressions.
So ideally something like:
SELECT id, account_id, stats.values.value WHERE stats.name='engagement'
AWS example: https://docs.aws.amazon.com/athena/latest/ug/searching-for-values.html
Any help would be appreciated.
You can query the JSON with the following table definition:
CREATE EXTERNAL TABLE test(
id string,
acount_id string,
stats array<
struct<
name:string,
period:string,
values:array<
struct<value:string>>,
title:string
>
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://bucket/';
Now, the value column is available through the following unnesting:
select id, acount_id, stat.name,x.value
from test
cross join UNNEST(test.stats) as st(stat)
cross join UNNEST(stat."values") as valx(x)
WHERE stat.name='engagement';

Search JSON for multiple values, not using a library

I'd like to be able to search the following JSON object for objects containing the key 'location' then get in return an array or json object with the 'name' of the person plus the value of location for that person.
Sample return:
var matchesFound = [{Tom Brady, New York}, {Donald Steven,Los Angeles}];
var fbData0 = {
"data": [
{
"id": "X999_Y999",
"location": "New York",
"from": {
"name": "Tom Brady", "id": "X12"
},
"message": "Looking forward to 2010!",
"actions": [
{
"name": "Comment",
"link": "http://www.facebook.com/X999/posts/Y999"
},
{
"name": "Like",
"link": "http://www.facebook.com/X999/posts/Y999"
}
],
"type": "status",
"created_time": "2010-08-02T21:27:44+0000",
"updated_time": "2010-08-02T21:27:44+0000"
},
{
"id": "X998_Y998",
"location": "Los Angeles",
"from": {
"name": "Donald Steven", "id": "X18"
},
"message": "Where's my contract?",
"actions": [
{
"name": "Comment",
"link": "http://www.facebook.com/X998/posts/Y998"
},
{
"name": "Like",
"link": "http://www.facebook.com/X998/posts/Y998"
}
],
"type": "status",
"created_time": "2010-08-02T21:27:44+0000",
"updated_time": "2010-08-02T21:27:44+0000"
}
]
};
#vsiege - you can use this javascript lib (http://www.defiantjs.com/) to search your JSON structure.
var fbData0 = {
...
},
res = JSON.search( fbData0, '//*[./location and ./from/name]' ),
str = '';
for (var i=0; i<res.length; i++) {
str += res[i].location +': '+ res[i].from.name +'<br/>';
}
document.getElementById('output').innerHTML = str;
Here is a working fiddle;
http://jsfiddle.net/hbi99/XhRLP/
DefiantJS extends the global object JSON with the method "search" and makes it possible to query JSON with XPath expressions (XPath is standardised query language). The method returns an array with the matches (empty array if none were found).
You can test XPath expressions by pasting your JSON here:
http://www.defiantjs.com/#xpath_evaluator