Extract keys and data and write an array - json

Given the following json source
{
"pages":{
"yomama/first key": {
"data": {
"fieldset": "lesson-video-overview",
"title": "5th Grade Math - Interpreting Fractions",
},
"order": 4
},
"yomama/second key": {
"data": {
"fieldset": "lesson-video-clip-single",
"title": "Post-Lesson Debrief Part 5",
},
"order": 14
},
"yopapa/Third key": {
"data": {
"fieldset": "lesson-video-clip-single",
"title": "Lesson Part 2B",
},
"order": 6
}
}
}
How could I output an array-type output like this? The main challenge for me is extracting the key e.g. "yomama/first key" and in the ideal world, I can filter like just give me an array of those keys that start with "yomama" (but not yopapa)
[
{
"url" : "yomama/first key",
"data": {
"fieldset": "lesson-video-overview",
"title": "5th Grade Math - Interpreting Fractions",
},
"order": 4
},
{
"url" : "yomama/second key",
"data": {
"fieldset": "lesson-video-clip-single",
"title": "Post-Lesson Debrief Part 5",
},
"order": 14
},
{
"url" : "yopapa/Third key",
"data": {
"fieldset": "lesson-video-clip-single",
"title": "Lesson Part 2B",
},
"order": 6
}
]

Assuming the input is in so.json and corrected to well-formatted JSON you may use:
jq '[.pages | to_entries[] | {"url": .key, "data": .value.data, "order": .value.order}]' < so.json

Here's a solution that does not require being explicit about including all the other keys:
.pages
| [ to_entries[]
| select(.key | startswith("yomama"))
| {url: .key} + .value ]

Related

Pyspark transform json into multiple dataframes

I have multiple json with this structure (association can have one or multiple objects & Charasteritics doesn't always has the same number of kv pairs:
{
"vl:VNETList": {
"Template": {
"ID": "SomeId",
"Object": [
{
"ID": "my_first_id",
"Context": {
"ID": "Avngate"
},
"Name": "Model Description",
"ClassID": "PID",
"Association": [
{
"Object": {
"ID": "test.svg",
"Context": {
"ID": "Avngate"
}
},
"#type": "is fulfilled by"
},
{
"Object": {
"ID": "Project Description",
"Context": {
"ID": "Avngate"
}
},
"#type": "is an element of"
}
],
"Characteristic": [
{
"Name": "InfoType",
"Value": "image/svg+xml"
},
{
"Name": "LOCK",
"Value": false
},
{
"Name": "EXFI",
"Value": 10000
}
]
},
{
"ID": "my_second_id",
"Context": {
"ID": "Avngate2"
},
"Name": "Model Description2",
"ClassID": "PID2",
"Association": [
{
"Object": {
"ID": "test2.svg",
"Context": {
"ID": "Avngate"
}
},
"#type": "is fulfilled by"
}
],
"Characteristic": [
{
"Name": "Dbtencoding",
"Value": "unicode"
}
]
}
]
}
}
I would like to build two dataframes like this:
and the second dataframe like this:
What's the best approach? If too complex, I would be able also to save the characteristics as a separate table referencing the objectId like with the association.
Read json and groupBy for the first one, just select for the second one with explode.
df1 = spark.read.json('test.json', multiLine=True)
df2 = df1.select(f.explode('vl:VNETList.Template.Object').alias('value')) \
.select('value.*')
df_f1 = df2.withColumn('Characteristic', f.explode('Characteristic')) \
.groupBy('ID', 'Name', 'ClassId') \
.pivot('Characteristic.Name') \
.agg(f.first('Characteristic.Value'))
df_f2 = df2.withColumn('Association', f.explode('Association')) \
.select('ID', 'Association.Object.ID', 'Association.#Type') \
.toDF('ID', 'AssociationId', 'AssociationType')
df_f1.show()
df_f2.show()
+------------+------------------+-------+-----------+-----+-------------+-----+
| ID| Name|ClassId|Dbtencoding| EXFI| InfoType| LOCK|
+------------+------------------+-------+-----------+-----+-------------+-----+
| my_first_id| Model Description| PID| null|10000|image/svg+xml|false|
|my_second_id|Model Description2| PID2| unicode| null| null| null|
+------------+------------------+-------+-----------+-----+-------------+-----+
+------------+-------------------+----------------+
| ID| AssociationId| AssociationType|
+------------+-------------------+----------------+
| my_first_id| test.svg| is fulfilled by|
| my_first_id|Project Description|is an element of|
|my_second_id| test2.svg| is fulfilled by|
+------------+-------------------+----------------+

How to extract a paticular key from the json

I am trying to extract values from a json that I obtained using the curl command for api testing. My json looks as below. I need some help extracting the value "20456" from here?
{
"meta": {
"status": "OK",
"timestamp": "2022-09-16T14:45:55.076+0000"
},
"links": {},
"data": {
"id": 24843,
"username": "abcd",
"firstName": "abc",
"lastName": "xyz",
"email": "abc#abc.com",
"phone": "",
"title": "",
"location": "",
"licenseType": "FLOATING",
"active": true,
"uid": "u24843",
"type": "users"
}
}
{
"meta": {
"status": "OK",
"timestamp": "2022-09-16T14:45:55.282+0000",
"pageInfo": {
"startIndex": 0,
"resultCount": 1,
"totalResults": 1
}
},
"links": {
"data.createdBy": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.createdBy}"
},
"data.fields.user1": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.fields.user1}"
},
"data.modifiedBy": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.modifiedBy}"
},
"data.fields.projectManager": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.fields.projectManager}"
},
"data.parent": {
"type": "projects",
"href": "https://abc#abc.com/rest/v1/projects/{data.parent}"
}
},
"data": [
{
"id": 20456,
"projectKey": "Stratus",
"parent": 20303,
"isFolder": false,
"createdDate": "2018-03-12T23:46:59.000+0000",
"modifiedDate": "2020-04-28T22:14:35.000+0000",
"createdBy": 18994,
"modifiedBy": 18865,
"fields": {
"projectManager": 18373,
"user1": 18628,
"projectKey": "Stratus",
"text1": "",
"name": "Stratus",
"description": "",
"date2": "2019-03-12",
"date1": "2018-03-12"
},
"type": "projects"
}
]
}
I have tried the following, but end up getting error:
▶ cat jqTrial.txt | jq '.data[].id'
jq: error (at <stdin>:21): Cannot index number with string "id"
20456
Also tried this but I get strings outside the object that I am not sure how to remove:
cat jqTrial.txt | jq '.data[]'
Assuming you want the project id not the user id:
jq '
.data
| if type == "object" then . else .[] end
| select(.type == "projects")
| .id
' file.json
There's probably a better way to write the 2nd expression
Indeed, thanks to #pmf
.data | objects // arrays[] | select(.type == "projects").id
Your input consists of two JSON documents; both have a data field on top level. But while the first one is itself an object which has an .id field, the second one is an array with one object item, which also has an .id field.
To retrieve both, you could use the --slurp (or -s) option which wraps both top-level objects into an array, then you can address them separately by index:
jq --slurp '.[0].data.id, .[1].data[].id' jqTrial.txt
24843
20456
Demo

Using JQ to filter JSON data at different levels

I have this JSON data : some-json-file which contains the following
{
"result": [
{
"id": "1234567812345678",
"name": "somewebsite.com",
"status": "active",
"type": "secondary",
"activated_on": "2021-12-12T15:44:40.444433Z",
"plan": {
"id": "77777777777777777777777777",
"name": "Enterprise Website",
"is_subscribed": true,
"legacy_id": "enterprise",
"externally_managed": true
}
}
],
"result_info": {
"page": 1,
"total_pages": 1
},
"success": true,
"messages": []
}
And I am trying to get this filtered output from it using jq
{
"name": "somewebsite.com",
"type": "secondary",
"plan": {
"name": "Enterprise Website",
"id": "77777777777777777777777777"
}
}
But I can't figure out how to do that.
I can filter the first layer of labels like this
cat some-json-file | jq '.result[] | {name,type,plan}'
Which gets me this output
{
"name": "somewebsite.com",
"type": "secondary",
"plan": {
"id": "77777777777777777777777777",
"name": "Enterprise Website",
"is_subscribed": true,
"legacy_id": "enterprise",
"externally_managed": true
}
}
That gets me close, but I can't further filter the child labels under .plan so that I see just the .name and .id.
Any ideas? Thanks!
You were almost there. Just set the new context and use the same technique again:
jq '.result[] | {name,type,plan: .plan | {name,id}}' some-json-file
{
"name": "somewebsite.com",
"type": "secondary",
"plan": {
"name": "Enterprise Website",
"id": "77777777777777777777777777"
}
}
Demo
Note: You don't need to cat the input, jq accepts the filename as parameter.

need to extract specific string with JQ

I have a JSON file (see below) and with JQ I need to extract the resourceName value for value = mail#mail1.com
So in my case, the result should be name_1
Any idea to do that ?
Because this does not work :
jq '.connections[] | select(.emailAddresses.value | test("mail#mail1.com"; "i")) | .resourceName' file.json
{
"connections": [
{
"resourceName": "name_1",
"etag": "123456789",
"emailAddresses": [
{
"metadata": {
"primary": true,
"source": {
"type": "CONTACT",
"id": "123456"
}
},
"value": "mail#mail1.com",
}
]
},
{
"resourceName": "name_2",
"etag": "987654321",
"emailAddresses": [
{
"metadata": {
"primary": true,
"source": {
"type": "CONTACT",
"id": "654321"
},
"sourcePrimary": true
},
"value": "mail#mail2.com"
}
]
}
],
"totalPeople": 187,
"totalItems": 187
}
One solution is to store the parent object while selecting on the child array:
jq '.connections[] | . as $parent | .emailAddresses // empty | .[] | select(.value == "mail#mail1.com") | $parent.resourceName' file.json
emailAddresses is an array. Use any if finding one element that matches will suffice.
.connections[] | select(any(.emailAddresses[];.value == "mail#mail1.com")).resourceName

Map conditional child elements

I am working with a JSON file which has contains lot of data that can be removed before sending to an API.
Found that JQ can be used to achieve this but not sure on how to map to get the desired results.
Input JSON
{
"name": "Sample name",
"id": "123",
"userStory": {
"id": "234",
"storyName": "Story Name",
"narrative": "Narrative",
"type": "feature"
},
"testSteps": [
{
"number": 1,
"description": "Step 1",
"level": 0,
"children": [
{
"number": 2,
"description": "Description",
"children": [
{
"number": 3,
"description": "Description"
}
]
},
{
"number": 4,
"anotherfield": "another field"
}
]
}
]
}
Desired Output
{
"name": "Sample name",
"userStory": {
"storyName": "Story Name"
},
"testSteps": [
{
"description": "Step 1",
"children": [
{
"description": "Description",
"children": [
{
"description": "Description"
}
]
},
{
"anotherfield": "anotherfield"
}
]
}
]
}
Tried to do it with the following jq command
map_values(..|{name, id, userStory})
but not sure how to filter only the userStory.storyName.
Thanks in advance.
Note: The actual JSON has different child elements that are repeated in some cases.
To delete .id from root object:
del(.id)
To leave only .storyName in .userStory:
.userStory |= {storyName}
To delete .number and .level from every object on any level in .testSteps:
.testSteps |= walk(if type == "object" then del(.number, .level) else . end)
Putting it all together:
del(.id) | (.userStory |= {storyName}) | (.testSteps |=
walk(if type == "object" then del(.number, .level) else . end))
Online demo