Flattening Json In Snowflake using regexp in Json Path? - json

I met a problem in flattening the json into a relational table.
for example I have a json file like the below,
How can I flatten the table content in both sheets:'sheet:1':'section 1':table
and sheets:'sheet:2':'section 1':table ??
the number of sheet and section changes in each json file.
is there any way to use regular expression in the json path?
each json path of each file is following the same pattern, but the keys are not the same...
{
"extraction date": {
"month": "OCTOBER",
"monthValue": 10,
"year": 2020
},
"fileName": "test_1.xls",
"number of sheets": 2,
"sheets": {
"sheet:1": {
"content": {
"conversion state": "Success",
"section 1": {
"meta": {
"Remark": "This is the remark",
"Row: 4": "this is the title"
},
"table": [
{
"col1": null,
"col2": "2020-07-14"
"Row": 9
},
{
"col1": null,
"col2": "2020-07-14"
"Row": 10
}
]
}
},
"name": "Sheet1",
"sections": 1
},
"sheet:2": {
"content": {
"conversion state": "Success",
"section 1": {
"meta": {
"Remark": " null",
"Row: 4": "title a"
},
"table": [
{
"col1": null,
"col2": "2020-07-14",
"Row": 8
},
{
"col1": null,
"col2": "2020-07-14",
"Row": 9
}
]
}
},
"name": "mySheetName",
"sections": 1
}
}
}

Related

find on id and append value to json parameter

I have the following data frame, df1:
A B C
123 B1 C1
456 B2 C2
And data frame df2:
A
[
{
"id": "123",
"details": {
"id": "123",
"color": null,
"param_1": {
"name": "mike"
},
"location": "US",
"items": [
{
"item_1": "#227858",
"offer_id": null,
"item_details": {
"detials_1": [{ "notes": "other:", "quantity": 1 }]
}
}
],
"version": 1,
}
}
]
[
{
"id": "456",
"details": {
"id": "456",
"color": null,
"param_1": {
"name": "james"
},
"location": "KR",
"items": [
{
"item_1": "#2221",
"offer_id": null,
"item_details": {
"detials_1": [{ "notes": "other", "quantity": 1 }]
}
}
],
"version": 2,
}
}
]
I want to find all values in df1[A] inside the JSON found inside df2[A] under the first instance of the id parameter. Once found, I want to replace the NULL values inside the color parameter with the df1[B] and offer_id with df1[C].
The output should create a new column with the appended values:
df2[B]:
[
{
"id": "123",
"details": {
"id": "123",
"color": B1,
"param_1": {
"name": "mike"
},
"location": "US",
"items": [
{
"item_1": "#227858",
"offer_id": C1,
"item_details": {
"detials_1": [{ "notes": "other:", "quantity": 1 }]
}
}
],
"version": 1,
}
}
]
[
{
"id": "456",
"details": {
"id": "456",
"color": B2,
"param_1": {
"name": "james"
},
"location": "KR",
"items": [
{
"item_1": "#2221",
"offer_id": C2,
"item_details": {
"detials_1": [{ "notes": "other", "quantity": 1 }]
}
}
],
"version": 2,
}
}
]
I just started researching how to approach this, but I need guidance on the most efficient way. Any insight would be greatly appreciated.

Cannot get jq to query json object [duplicate]

This question already has answers here:
How to use jq when the variable has reserved characters?
(3 answers)
Closed 6 months ago.
I have a JSON file that I am trying to query with jq. I am unable to retrieve the observations. I am trying to retieve each of the "observations using the following command and not able to get to the result:
cat sample3.json | jq .dataSets[0].series.0:0:0:0:0.observations.0[0]
I am able to retieve up to the series using:
cat sample3.json | jq .dataSets[0].series
But once I try to drill down further I am getting a compile error:
$ cat sample3.json | jq .dataSets[0].series.0:0:0:0:0
jq: error: syntax error, unexpected LITERAL, expecting end of file (Unix shell quoting issues?) at <top-level>, line 1:
.dataSets[0].series.0:0:0:0:0
jq: 1 compile error
I am not sure what I am doing wrong here....
The input file is:
{
"header": {
"id": "b8be2cd5-33bf-4687-9e81-eb032f6f8a71",
"test": false,
"prepared": "2022-09-01T13:30:57.013+02:00",
"sender": {
"id": "ECB"
}
},
"dataSets": [
{
"action": "Replace",
"validFrom": "2022-09-01T13:30:57.013+02:00",
"series": {
"0:0:0:0:0": {
"attributes": [
0,
null,
0,
null,
null,
null,
null,
null,
null,
null,
null,
null,
0,
null,
0,
null,
0,
0,
0,
0
],
"observations": {
"0": [
1.4529,
0,
0,
null,
null
],
"1": [
1.4472,
0,
0,
null,
null
],
"2": [
1.4591,
0,
0,
null,
null
]
}
}
}
}
],
"structure": {
"links": [
{
"title": "Exchange Rates",
"rel": "dataflow",
"href": "https://sdw-wsrest.ecb.europa.eu:443/service/dataflow/ECB/EXR/1.0"
}
],
"name": "Exchange Rates",
"dimensions": {
"series": [
{
"id": "FREQ",
"name": "Frequency",
"values": [
{
"id": "D",
"name": "Daily"
}
]
},
{
"id": "CURRENCY",
"name": "Currency",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "CURRENCY_DENOM",
"name": "Currency denominator",
"values": [
{
"id": "EUR",
"name": "Euro"
}
]
},
{
"id": "EXR_TYPE",
"name": "Exchange rate type",
"values": [
{
"id": "SP00",
"name": "Spot"
}
]
},
{
"id": "EXR_SUFFIX",
"name": "Series variation - EXR context",
"values": [
{
"id": "A",
"name": "Average"
}
]
}
],
"observation": [
{
"id": "TIME_PERIOD",
"name": "Time period or range",
"role": "time",
"values": [
{
"id": "2022-08-29",
"name": "2022-08-29",
"start": "2022-08-29T00:00:00.000+02:00",
"end": "2022-08-29T23:59:59.999+02:00"
},
{
"id": "2022-08-30",
"name": "2022-08-30",
"start": "2022-08-30T00:00:00.000+02:00",
"end": "2022-08-30T23:59:59.999+02:00"
},
{
"id": "2022-08-31",
"name": "2022-08-31",
"start": "2022-08-31T00:00:00.000+02:00",
"end": "2022-08-31T23:59:59.999+02:00"
}
]
}
]
},
"attributes": {
"series": [
{
"id": "TIME_FORMAT",
"name": "Time format code",
"values": [
{
"name": "P1D"
}
]
},
{
"id": "BREAKS",
"name": "Breaks",
"values": []
},
{
"id": "COLLECTION",
"name": "Collection indicator",
"values": [
{
"id": "A",
"name": "Average of observations through period"
}
]
},
{
"id": "COMPILING_ORG",
"name": "Compiling organisation",
"values": []
},
{
"id": "DISS_ORG",
"name": "Data dissemination organisation",
"values": []
},
{
"id": "DOM_SER_IDS",
"name": "Domestic series ids",
"values": []
},
{
"id": "PUBL_ECB",
"name": "Source publication (ECB only)",
"values": []
},
{
"id": "PUBL_MU",
"name": "Source publication (Euro area only)",
"values": []
},
{
"id": "PUBL_PUBLIC",
"name": "Source publication (public)",
"values": []
},
{
"id": "UNIT_INDEX_BASE",
"name": "Unit index base",
"values": []
},
{
"id": "COMPILATION",
"name": "Compilation",
"values": []
},
{
"id": "COVERAGE",
"name": "Coverage",
"values": []
},
{
"id": "DECIMALS",
"name": "Decimals",
"values": [
{
"id": "4",
"name": "Four"
}
]
},
{
"id": "NAT_TITLE",
"name": "National language title",
"values": []
},
{
"id": "SOURCE_AGENCY",
"name": "Source agency",
"values": [
{
"id": "4F0",
"name": "European Central Bank (ECB)"
}
]
},
{
"id": "SOURCE_PUB",
"name": "Publication source",
"values": []
},
{
"id": "TITLE",
"name": "Title",
"values": [
{
"name": "Australian dollar/Euro"
}
]
},
{
"id": "TITLE_COMPL",
"name": "Title complement",
"values": [
{
"name": "ECB reference exchange rate, Australian dollar/Euro, 2:15 pm (C.E.T.)"
}
]
},
{
"id": "UNIT",
"name": "Unit",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "UNIT_MULT",
"name": "Unit multiplier",
"values": [
{
"id": "0",
"name": "Units"
}
]
}
],
"observation": [
{
"id": "OBS_STATUS",
"name": "Observation status",
"values": [
{
"id": "A",
"name": "Normal value"
}
]
},
{
"id": "OBS_CONF",
"name": "Observation confidentiality",
"values": [
{
"id": "F",
"name": "Free"
}
]
},
{
"id": "OBS_PRE_BREAK",
"name": "Pre-break observation value",
"values": []
},
{
"id": "OBS_COM",
"name": "Observation comment",
"values": []
}
]
}
}
}
The .foo syntax cannot be used if the key name has anything but alphanumeric characters or the underscore, or if the first character of the key name is numeric.
Assuming you are using a recent version of jq,
you can always use the form: ."foo", which is actually an abbreviation of the basic form, .["foo"].
So assuming you're using a sufficiently recent version of jq, your query could begin with:
.dataSets[0].series."0:0:0:0:0"
If you are presenting the jq query on a command line, then you may have to escape the double-quotes appropriately, e.g. in a bash shell, by enclosing the jq query in single-quotes.

How to solve JQ processing as multiple dictionaries if document has an array and JQ filter uses group_by?

I have the following JSON document
[
{
"id": 6,
"description": "Component 1",
"due": "20211122T183000Z",
"entry": "20211119T181735Z",
"modified": "20211119T181735Z",
"project": "product1",
"status": "pending",
"uuid": "55bf0497-208c-492a-8f76-bb692d48afaa",
"tags": [
"abc",
"123"
],
"urgency": 13.9699
},
{
"id": 10,
"description": "Component 2",
"due": "20211129T183000Z",
"entry": "20211130T045620Z",
"modified": "20211130T045620Z",
"project": "product2",
"status": "pending",
"uuid": "d57eb8f7-e5ec-497c-ac47-f1cf34b005db",
"tags": [
"foo",
"bar"
],
"urgency": 14.0151
},
{
"id": 11,
"description": "Component 3",
"due": "20211202T183000Z",
"entry": "20211130T121529Z",
"completed": "20211130T123915Z",
"project": "product3",
"status": "pending",
"uuid": "9f15e6a4-5cef-4b0f-915b-fc916ab152c7",
"tags": [
"xyz",
"676"
],
"urgency": 14.0096
},
{
"id": 12,
"description": "Component 4",
"due": "20211202T183000Z",
"entry": "20211130T122537Z",
"pending": "20211130T122537Z",
"project": "product1",
"status": "pending",
"uuid": "91c9ec76-42a7-4ebc-9649-b3a12027feb1",
"tags": [
"def"
],
"urgency": 13.9096
}
]
I have written below JQ filter to parse the JSON, the expected output is not to generate multiple dictionaries.
group_by(.project,.status)
| .[]
| { project: .[0].project , status: .[0].status ,
description: [{"\(.[].description)" : (.[].tags | join(";"))}] }
After applying the filter, i get the below output with multiple dictionaries because of the tags array
{
"project": "product1",
"status": "pending",
"description": [
{
"Component 1": "abc;123"
},
{
"Component 1": "def"
},
{
"Component 4": "abc;123"
},
{
"Component 4": "def"
}
]
}
{
"project": "product2",
"status": "completed",
"description": [
{
"Component 2": "foo;bar"
}
]
}
{
"project": "product3",
"status": "completed",
"description": [
{
"Component 3": "xyz;676"
}
]
}
The output I am expecting is without multiple dictionaries as below
{
"project": "product1",
"status": "pending",
"description": [
{
"Component 1": "abc;123"
},
{
"Component 4": "def"
}
]
}
{
"project": "product2",
"status": "completed",
"description": [
{
"Component 2": "foo;bar"
}
]
}
{
"project": "product3",
"status": "completed",
"description": [
{
"Component 3": "xyz;676"
}
]
}
How can I generate the above-expected output using JQ?
One similar option to yours would be
jq 'group_by(.project)[]
| { project: .[0].project, status:.[0].status, "description": [.[]
| { (.description) : .tags|join(";") } ] }'
Demo
To just bring together .description and .tags use
jq '.[] | del(.description, .tags) + ({(.description): .tags | join(";")})'
Demo
To also group by .project and just consider .project, .status and an array with the .description and .tags from above, go
jq '
group_by(.project)[]
| (first | {project, status})
+ {description: map({(.description): .tags | join(";")})}
'
Demo

Nested json - store values in csv

I am trying to convert a nested json file into csv. It's data from a darts API and the structure is always the same. Nevertheless I got some problems flattening and storing the values in a csv because of the nested structure.
json:
{
"summaries": [{
"sport_event": {
"id": "sr:sport_event:12967512",
"start_time": "2017-11-11T13:15:00+00:00",
"start_time_confirmed": true,
"sport_event_context": {
"sport": {
"id": "sr:sport:22",
"name": "Darts"
},
"category": {
"id": "sr:category:104",
"name": "International"
},
"competition": {
"id": "sr:competition:597",
"name": "Grand Slam of Darts"
},
"season": {
"id": "sr:season:47332",
"name": "Grand Slam of Darts 2017",
"start_date": "2017-11-11",
"end_date": "2017-11-20",
"year": "2017",
"competition_id": "sr:competition:597"
},
"stage": {
"order": 1,
"type": "league",
"phase": "stage_1",
"start_date": "2017-11-11",
"end_date": "2017-11-15",
"year": "2017"
},
"round": {
"number": 1
},
"groups": [{
"id": "sr:league:29766",
"name": "Grand Slam of Darts 2017, Group G",
"group_name": "G"
}]
},
"coverage": {
"live": true
},
"competitors": [{
"id": "sr:competitor:35936",
"name": "Smith, Michael",
"abbreviation": "SMI",
"qualifier": "home"
}, {
"id": "sr:competitor:83895",
"name": "Wilson, James",
"abbreviation": "WIL",
"qualifier": "away"
}]
},
"sport_event_status": {
"status": "closed",
"match_status": "ended",
"home_score": 5,
"away_score": 3,
"winner_id": "sr:competitor:35936"
}
}, {
"sport_event": {
"id": "sr:sport_event:12967508",
"start_time": "2017-11-11T13:40:00+00:00",
"start_time_confirmed": true,
"sport_event_context": {
"sport": {
"id": "sr:sport:22",
"name": "Darts"
},
"category": {
"id": "sr:category:104",
"name": "International"
},
"competition": {
"id": "sr:competition:597",
"name": "Grand Slam of Darts"
},
"season": {
"id": "sr:season:47332",
"name": "Grand Slam of Darts 2017",
"start_date": "2017-11-11",
"end_date": "2017-11-20",
"year": "2017",
"competition_id": "sr:competition:597"
},
"stage": {
"order": 1,
"type": "league",
"phase": "stage_1",
"start_date": "2017-11-11",
"end_date": "2017-11-15",
"year": "2017"
},
"round": {
"number": 1
},
"groups": [{
"id": "sr:league:29764",
"name": "Grand Slam of Darts 2017, Group F",
"group_name": "F"
}]
},
"coverage": {
"live": true
},
"competitors": [{
"id": "sr:competitor:70916",
"name": "Bunting, Stephen",
"abbreviation": "BUN",
"qualifier": "home"
}, {
"id": "sr:competitor:191262",
"name": "de Zwaan, Jeffrey",
"abbreviation": "DEZ",
"qualifier": "away"
}]
},
"sport_event_status": {
"status": "closed",
"match_status": "ended",
"home_score": 5,
"away_score": 4,
"winner_id": "sr:competitor:70916"
}
}
So for each sport_event I would like to store the variables:
"start_time"
from "season" the variable "name"
from "competitors" both "id" and "name"
from "sport_event_status" the "winner_id"
I have already tried to flatten the json file with this code:
import json
f = open(r'path of file.json')
data = json.load(f)
def flatten(data):
for key,value in data.items():
print (str(key)+'->'+str(value))
if type(value) == type(dict()):
flatten(value)
elif type(value) == type(list()):
for val in value:
if type(val) == type(str()):
pass
elif type(val) == type(list()):
pass
else:
flatten(val)
flatten(data)
print(data)
This actually prints out the following:
id->sr:season:47332
name->Grand Slam of Darts 2017
start_date->2017-11-11
end_date->2017-11-20
year->2017
competition_id->sr:competition:597
Now my question is how to store the values I mentioned above in a csv file.
Thanks in advance for your support.
Using jq, you basically just have to transcribe your specification, adding a bit of context and taking care of an embedded array:
.summaries[]
| .sport_event # Your specification:
| [.start_time, # start_time
.sport_event_context.season.name] # from "season" the variable "name"
+ [.competitors[] | .id, .name] # from "competitors" both "id" and "name"
+ [.sport_event_status.winner_id] # from "sport_event_status" the "winner_id"
| #csv
Invocation
E.g.
jq -rf program.jq my.json

How to merge two different JSON (using python)

I'm totally new to python. I want to merge two JSON files who have the same objects but different keys.
Here is a basic example of the result I would love to get :
JSON1 :
{
"json1" : {
"1" : {
"id": 1,
"name": "first_artist",
"imageUrl": "https://1.jpg",
"genre": "Rap "
},
"2" : {
"id": 2,
"name": "second_artist",
"imageUrl": "https://2.jpg",
"genre": "Hip-Hop"
}
}
}
JSON2:
{
"json2" : {
"1" : {
"date": 17/07/19,
"venue": "venue1"
},
"2" : {
"date": 19/07/19,
"venue": "venue2"
}
}
}
Expected JSON:
{
"expected_json" : {
"1" : {
"id": 1,
"name": "first_artist",
"imageUrl": "https://1.jpg",
"genre": "Rap "
"date": 17/07/19,
"venue": "venue1"
},
"2" : {
"id": 2,
"name": "second_artist",
"imageUrl": "https://2.jpg",
"genre": "Hip-Hop"
"date": 19/07/19,
"venue": "venue2"
}
}
}
Can someone give tips and direction to make this possible ? Thanks
You can simplify your input to:
A.json:
{
"1" : {
"id": 1,
"name": "first_artist",
"imageUrl": "https://1.jpg",
"genre": "Rap "
},
"2" : {
"id": 2,
"name": "second_artist",
"imageUrl": "https://2.jpg",
"genre": "Hip-Hop"
}
}
B.json:
{
"1" : {
"date": "17/07/19",
"venue": "venue1"
},
"2" : {
"date": "19/07/19",
"venue": "venue2"
}
}
and you have to change 19/07/19 to "19/07/19" for it to be valid json.
Now you can use the json module:
import json
#from pprint import pprint
# load json from files
with open('A.json') as A_file:
A = json.load(A_file) # returns a dict()
#print('A:')
#pprint(A)
with open('B.json') as B_file:
B = json.load(B_file)
#print('\nB:')
#pprint(A)
# get a list of unique keys -> {'1', '2'}
keys = set()
keys.update(A.keys())
keys.update(B.keys())
#print(f'\nkeys: {keys}')
# for each key merge values from dicts A and B
result = {}
for key in keys:
#print(f'\n{key}:')
merge = {}
if key in A:
merge.update(A[key])
if key in B:
merge.update(B[key])
#pprint(merge)
result[key] = merge
#print('\nresult:')
#pprint(result)
# write the result to expected.json
with open('expected.json', 'w+') as expected_file:
expected_file.write(json.dumps(result, sort_keys=True, indent='\t'))
This writes:
expected.json:
{
"1": {
"date": "17/07/19",
"genre": "Rap ",
"id": 1,
"imageUrl": "https://1.jpg",
"name": "first_artist",
"venue": "venue1"
},
"2": {
"date": "19/07/19",
"genre": "Hip-Hop",
"id": 2,
"imageUrl": "https://2.jpg",
"name": "second_artist",
"venue": "venue2"
}
}