reformat output of hierarchical data, based on an element

reformat output of hierarchical data, based on an element - json

Given the following json, which contains hierarchical data, i need to convert the following flat structure into a parent child json output format:
[{
"ID": 1042,
"NameID": "200",
"Name": "related",
"path": "1042"
}, {
"ID": 1561,
"NameID": " 230",
"Name": "Patr",
"FatherID": 1042,
"path": "1042\/1561"
}, {
"ID": 1370,
"NameID": " 230",
"Name": "Dog",
"FatherID": 1561,
"path": "1042\/1561\/1370"
}, {
"ID": 1560,
"NameID": " 230.1",
"Name": "Ort",
"FatherID": 1561,
"path": "1042\/1561\/1560"
}, {
"ID": 213,
"NameID": " 232",
"Name": "Jim",
"FatherID": 1561,
"path": "1042\/1561\/213"
}]
How i could get an output like below, based on the path hierarchy?:
i have replaced the first values only, since i need to show that the depth may go on and on...
[
{
"200": "related",
"Children": [
{
" 230": "Patr",
"Children": [
{
"230.1": "Ort",
"Children": [
{
"NameID": "Name",
"Children": [
{
"NameID": "Name",
"children": [
{
"NameID": "Name"
},
{
"NameID": "Name"
}
]
},
{
"NameID": "Name",
"children": [
{
"NameID": "Name"
}
]
}
]
}
]
}
]
}
]
}

The key to the following solution is to convert the flat array into a hierarchical structure. We use setpath to do this as follows:
reduce .[] as $element ({};
setpath($element | .path | split("\/");
$element | {NameID, Name}))
With your input, this produces the following:
{
"1042": {
"NameID": "200",
"Name": "related",
"1561": {
"NameID": " 230",
"Name": "Patr",
"1370": {
"NameID": " 230",
"Name": "Dog"
},
"1560": {
"NameID": " 230.1",
"Name": "Ort"
},
"213": {
"NameID": " 232",
"Name": "Jim"
}
}
}
}
Now it's just a question of munging, which can be done using the following helper function:
def promote:
. as $in
| (if .NameID then {(.NameID): .Name } else {} end) as $base
| del(.NameID) | del(.Name)
| if length == 0 then $base
else $base + {Children: (reduce keys_unsorted[] as $k ([]; . + [$in[$k] | promote] ))}
end;
With this def, the solution becomes:
reduce .[] as $element ({};
setpath($element | .path | split("\/");
$element | {NameID, Name}))
| promote
| .Children
Output
[
{
"200": "related",
"Children": [
{
" 230": "Patr",
"Children": [
{
" 230": "Dog"
},
{
" 230.1": "Ort"
},
{
" 232": "Jim"
}
]
}
]
}
]

Related

Pyspark transform json into multiple dataframes

I have multiple json with this structure (association can have one or multiple objects & Charasteritics doesn't always has the same number of kv pairs:
{
"vl:VNETList": {
"Template": {
"ID": "SomeId",
"Object": [
{
"ID": "my_first_id",
"Context": {
"ID": "Avngate"
},
"Name": "Model Description",
"ClassID": "PID",
"Association": [
{
"Object": {
"ID": "test.svg",
"Context": {
"ID": "Avngate"
}
},
"#type": "is fulfilled by"
},
{
"Object": {
"ID": "Project Description",
"Context": {
"ID": "Avngate"
}
},
"#type": "is an element of"
}
],
"Characteristic": [
{
"Name": "InfoType",
"Value": "image/svg+xml"
},
{
"Name": "LOCK",
"Value": false
},
{
"Name": "EXFI",
"Value": 10000
}
]
},
{
"ID": "my_second_id",
"Context": {
"ID": "Avngate2"
},
"Name": "Model Description2",
"ClassID": "PID2",
"Association": [
{
"Object": {
"ID": "test2.svg",
"Context": {
"ID": "Avngate"
}
},
"#type": "is fulfilled by"
}
],
"Characteristic": [
{
"Name": "Dbtencoding",
"Value": "unicode"
}
]
}
]
}
}
I would like to build two dataframes like this:
and the second dataframe like this:
What's the best approach? If too complex, I would be able also to save the characteristics as a separate table referencing the objectId like with the association.

Read json and groupBy for the first one, just select for the second one with explode.
df1 = spark.read.json('test.json', multiLine=True)
df2 = df1.select(f.explode('vl:VNETList.Template.Object').alias('value')) \
.select('value.*')
df_f1 = df2.withColumn('Characteristic', f.explode('Characteristic')) \
.groupBy('ID', 'Name', 'ClassId') \
.pivot('Characteristic.Name') \
.agg(f.first('Characteristic.Value'))
df_f2 = df2.withColumn('Association', f.explode('Association')) \
.select('ID', 'Association.Object.ID', 'Association.#Type') \
.toDF('ID', 'AssociationId', 'AssociationType')
df_f1.show()
df_f2.show()
+------------+------------------+-------+-----------+-----+-------------+-----+
| ID| Name|ClassId|Dbtencoding| EXFI| InfoType| LOCK|
+------------+------------------+-------+-----------+-----+-------------+-----+
| my_first_id| Model Description| PID| null|10000|image/svg+xml|false|
|my_second_id|Model Description2| PID2| unicode| null| null| null|
+------------+------------------+-------+-----------+-----+-------------+-----+
+------------+-------------------+----------------+
| ID| AssociationId| AssociationType|
+------------+-------------------+----------------+
| my_first_id| test.svg| is fulfilled by|
| my_first_id|Project Description|is an element of|
|my_second_id| test2.svg| is fulfilled by|
+------------+-------------------+----------------+

need to extract specific string with JQ

I have a JSON file (see below) and with JQ I need to extract the resourceName value for value = mail#mail1.com
So in my case, the result should be name_1
Any idea to do that ?
Because this does not work :
jq '.connections[] | select(.emailAddresses.value | test("mail#mail1.com"; "i")) | .resourceName' file.json
{
"connections": [
{
"resourceName": "name_1",
"etag": "123456789",
"emailAddresses": [
{
"metadata": {
"primary": true,
"source": {
"type": "CONTACT",
"id": "123456"
}
},
"value": "mail#mail1.com",
}
]
},
{
"resourceName": "name_2",
"etag": "987654321",
"emailAddresses": [
{
"metadata": {
"primary": true,
"source": {
"type": "CONTACT",
"id": "654321"
},
"sourcePrimary": true
},
"value": "mail#mail2.com"
}
]
}
],
"totalPeople": 187,
"totalItems": 187
}

One solution is to store the parent object while selecting on the child array:
jq '.connections[] | . as $parent | .emailAddresses // empty | .[] | select(.value == "mail#mail1.com") | $parent.resourceName' file.json

emailAddresses is an array. Use any if finding one element that matches will suffice.
.connections[] | select(any(.emailAddresses[];.value == "mail#mail1.com")).resourceName

Map conditional child elements

I am working with a JSON file which has contains lot of data that can be removed before sending to an API.
Found that JQ can be used to achieve this but not sure on how to map to get the desired results.
Input JSON
{
"name": "Sample name",
"id": "123",
"userStory": {
"id": "234",
"storyName": "Story Name",
"narrative": "Narrative",
"type": "feature"
},
"testSteps": [
{
"number": 1,
"description": "Step 1",
"level": 0,
"children": [
{
"number": 2,
"description": "Description",
"children": [
{
"number": 3,
"description": "Description"
}
]
},
{
"number": 4,
"anotherfield": "another field"
}
]
}
]
}
Desired Output
{
"name": "Sample name",
"userStory": {
"storyName": "Story Name"
},
"testSteps": [
{
"description": "Step 1",
"children": [
{
"description": "Description",
"children": [
{
"description": "Description"
}
]
},
{
"anotherfield": "anotherfield"
}
]
}
]
}
Tried to do it with the following jq command
map_values(..|{name, id, userStory})
but not sure how to filter only the userStory.storyName.
Thanks in advance.
Note: The actual JSON has different child elements that are repeated in some cases.

To delete .id from root object:
del(.id)
To leave only .storyName in .userStory:
.userStory |= {storyName}
To delete .number and .level from every object on any level in .testSteps:
.testSteps |= walk(if type == "object" then del(.number, .level) else . end)
Putting it all together:
del(.id) | (.userStory |= {storyName}) | (.testSteps |=
walk(if type == "object" then del(.number, .level) else . end))
Online demo

Processing JSON with jq - handling array index/name into output

I'm trying to use jq to parse a JSON file for me. I want to get a value from a definition header into the output data in place of an index. A simplified example:
{
"header": {
"type": {
"0": {
"name": "Cats"
},
"3": {
"name": "Dogs"
}
}
},
"data": [
{
"time": "2019-01-01T02:00:00Z",
"reading": {
"0": {"value": 90, "note": "start" },
"3": {"value": 100 }
}
}
]
}
Using a jq command like jq '.data[] | {time: .time, data: .reading[]}' gives me:
"time": "2019-01-01T02:00:00Z",
"data": {
"value": 90,
"note": "start"
}
}
{
"time": "2019-01-01T02:00:00Z",
"data": {
"value": 100
}
}
I need to get "Cats" or "Dogs" into the result, heading towards an SQL insert.
Something like:
{
"time": "2019-01-01T02:00:00Z",
"data": {
"type: "Cats", <- line added
"value": 90,
"note": "start"
}
}
...
Or better yet:
{
"time": "2019-01-01T02:00:00Z",
"Cats": { <- label set to "Cats" instead of "data"
"value": 90,
"note": "start"
}
}
...
Is there a way I can get - what I see as the array index "0" or "3" - to be added as "Cats" or "Dogs"?

Using the built-in function, INDEX, for creating a dictionary allows a straightforward solution as follows:
(.header.type
| INDEX(to_entries[]; .key)
| map_values(.value.name)) as $dict
| .data[]
| (.reading | keys_unsorted[]) as $k
| {time} + { ($dict[$k]) : .reading[$k] }
Output
{
"time": "2019-01-01T02:00:00Z",
"Cats": {
"value": 90,
"note": "start"
}
}
{
"time": "2019-01-01T02:00:00Z",
"Dogs": {
"value": 100
}
}

jq select objects that have specific value in nested json bash

this is my json.
[
{
"time": "2017-06-10 00:00:48-0400,317",
"UserInfo": {
"AppId": "ONE_SEARCH",
"UsageGroupId": "92600",
},
"Items": [
{
"PublicationCode": "",
"OpenUrlRefId": "",
"ReferringUrl": "N",
"OpenAccess": "0",
"ItmId": "1328515516"
}
]
},
{
"time": "2017-06-10 00:00:48-0400,548",
"UserInfo": {
"AppId": "DIALOG",
"UsageGroupId": "1195735",
},
"Items": [
{
"Origin": "Alert",
"PublicationCode": "",
"NumberOfCopies": 1,
"ItmId": "1907446549"
},
{
"Origin": "Alert",
"PublicationCode": "",
"NumberOfCopies": 1,
"ItmId": "1907446950",
}
]
}
]
I want use jq to extract the object that have "Origin": "Alert" in its element "Items". And the result should looks like this:
{
"time": "2017-06-10 00:00:48-0400,548",
"UserInfo": {
"AppId": "DIALOG",
"UsageGroupId": "1195735",
},
"Items": [
{
"Origin": "Alert",
"PublicationCode": "",
"NumberOfCopies": 1,
"ItmId": "1907446549"
},
{
"Origin": "Alert",
"PublicationCode": "",
"NumberOfCopies": 1,
"ItmId": "1907446950",
}
]
}
Or this:
{
"Items": [
{
"Origin": "Alert",
"PublicationCode": "",
"NumberOfCopies": 1,
"ItmId": "1907446549",
"ReasonCode": ""
},
{
"Origin": "Alert",
"PublicationCode": "",
"NumberOfCopies": 1,
"ItmId": "1907446950",
}
]
}
How to do it by using jq? I have tried several ways but most of them will just return an array with all children objects that include "Origin":"Alert". I need these children objects still keep there structure, because I need to know which of them happened together and which of them happened separately.
BTW, the only value of "Origin" is "Alert". So if you have any method to select an object with a given key name, it should also work.
Thank you! :)

The filter:
.[] | select( any(.Items[]; .Origin == "Alert"))
produces the first-mentioned admissible result. If your jq does not have any/2 then I'd suggest upgrading. If that's not an option, then you could use the following simple but rather inefficient filter instead:
.[] | select( .Items | map(.Origin) | index("Alert"))
Or:
.[] | select(reduce .Items[] as $item (false; . or ($item | .Origin == "Alert")))

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

reformat output of hierarchical data, based on an element - json

Related

Pyspark transform json into multiple dataframes

need to extract specific string with JQ

Map conditional child elements

Processing JSON with jq - handling array index/name into output

jq select objects that have specific value in nested json bash

Categories

Resources