jq select objects that have specific value in nested json bash - json

this is my json.
[
{
"time": "2017-06-10 00:00:48-0400,317",
"UserInfo": {
"AppId": "ONE_SEARCH",
"UsageGroupId": "92600",
},
"Items": [
{
"PublicationCode": "",
"OpenUrlRefId": "",
"ReferringUrl": "N",
"OpenAccess": "0",
"ItmId": "1328515516"
}
]
},
{
"time": "2017-06-10 00:00:48-0400,548",
"UserInfo": {
"AppId": "DIALOG",
"UsageGroupId": "1195735",
},
"Items": [
{
"Origin": "Alert",
"PublicationCode": "",
"NumberOfCopies": 1,
"ItmId": "1907446549"
},
{
"Origin": "Alert",
"PublicationCode": "",
"NumberOfCopies": 1,
"ItmId": "1907446950",
}
]
}
]
I want use jq to extract the object that have "Origin": "Alert" in its element "Items". And the result should looks like this:
{
"time": "2017-06-10 00:00:48-0400,548",
"UserInfo": {
"AppId": "DIALOG",
"UsageGroupId": "1195735",
},
"Items": [
{
"Origin": "Alert",
"PublicationCode": "",
"NumberOfCopies": 1,
"ItmId": "1907446549"
},
{
"Origin": "Alert",
"PublicationCode": "",
"NumberOfCopies": 1,
"ItmId": "1907446950",
}
]
}
Or this:
{
"Items": [
{
"Origin": "Alert",
"PublicationCode": "",
"NumberOfCopies": 1,
"ItmId": "1907446549",
"ReasonCode": ""
},
{
"Origin": "Alert",
"PublicationCode": "",
"NumberOfCopies": 1,
"ItmId": "1907446950",
}
]
}
How to do it by using jq? I have tried several ways but most of them will just return an array with all children objects that include "Origin":"Alert". I need these children objects still keep there structure, because I need to know which of them happened together and which of them happened separately.
BTW, the only value of "Origin" is "Alert". So if you have any method to select an object with a given key name, it should also work.
Thank you! :)

The filter:
.[] | select( any(.Items[]; .Origin == "Alert"))
produces the first-mentioned admissible result. If your jq does not have any/2 then I'd suggest upgrading. If that's not an option, then you could use the following simple but rather inefficient filter instead:
.[] | select( .Items | map(.Origin) | index("Alert"))
Or:
.[] | select(reduce .Items[] as $item (false; . or ($item | .Origin == "Alert")))

Related

Pyspark transform json into multiple dataframes

I have multiple json with this structure (association can have one or multiple objects & Charasteritics doesn't always has the same number of kv pairs:
{
"vl:VNETList": {
"Template": {
"ID": "SomeId",
"Object": [
{
"ID": "my_first_id",
"Context": {
"ID": "Avngate"
},
"Name": "Model Description",
"ClassID": "PID",
"Association": [
{
"Object": {
"ID": "test.svg",
"Context": {
"ID": "Avngate"
}
},
"#type": "is fulfilled by"
},
{
"Object": {
"ID": "Project Description",
"Context": {
"ID": "Avngate"
}
},
"#type": "is an element of"
}
],
"Characteristic": [
{
"Name": "InfoType",
"Value": "image/svg+xml"
},
{
"Name": "LOCK",
"Value": false
},
{
"Name": "EXFI",
"Value": 10000
}
]
},
{
"ID": "my_second_id",
"Context": {
"ID": "Avngate2"
},
"Name": "Model Description2",
"ClassID": "PID2",
"Association": [
{
"Object": {
"ID": "test2.svg",
"Context": {
"ID": "Avngate"
}
},
"#type": "is fulfilled by"
}
],
"Characteristic": [
{
"Name": "Dbtencoding",
"Value": "unicode"
}
]
}
]
}
}
I would like to build two dataframes like this:
and the second dataframe like this:
What's the best approach? If too complex, I would be able also to save the characteristics as a separate table referencing the objectId like with the association.
Read json and groupBy for the first one, just select for the second one with explode.
df1 = spark.read.json('test.json', multiLine=True)
df2 = df1.select(f.explode('vl:VNETList.Template.Object').alias('value')) \
.select('value.*')
df_f1 = df2.withColumn('Characteristic', f.explode('Characteristic')) \
.groupBy('ID', 'Name', 'ClassId') \
.pivot('Characteristic.Name') \
.agg(f.first('Characteristic.Value'))
df_f2 = df2.withColumn('Association', f.explode('Association')) \
.select('ID', 'Association.Object.ID', 'Association.#Type') \
.toDF('ID', 'AssociationId', 'AssociationType')
df_f1.show()
df_f2.show()
+------------+------------------+-------+-----------+-----+-------------+-----+
| ID| Name|ClassId|Dbtencoding| EXFI| InfoType| LOCK|
+------------+------------------+-------+-----------+-----+-------------+-----+
| my_first_id| Model Description| PID| null|10000|image/svg+xml|false|
|my_second_id|Model Description2| PID2| unicode| null| null| null|
+------------+------------------+-------+-----------+-----+-------------+-----+
+------------+-------------------+----------------+
| ID| AssociationId| AssociationType|
+------------+-------------------+----------------+
| my_first_id| test.svg| is fulfilled by|
| my_first_id|Project Description|is an element of|
|my_second_id| test2.svg| is fulfilled by|
+------------+-------------------+----------------+

How to extract a paticular key from the json

I am trying to extract values from a json that I obtained using the curl command for api testing. My json looks as below. I need some help extracting the value "20456" from here?
{
"meta": {
"status": "OK",
"timestamp": "2022-09-16T14:45:55.076+0000"
},
"links": {},
"data": {
"id": 24843,
"username": "abcd",
"firstName": "abc",
"lastName": "xyz",
"email": "abc#abc.com",
"phone": "",
"title": "",
"location": "",
"licenseType": "FLOATING",
"active": true,
"uid": "u24843",
"type": "users"
}
}
{
"meta": {
"status": "OK",
"timestamp": "2022-09-16T14:45:55.282+0000",
"pageInfo": {
"startIndex": 0,
"resultCount": 1,
"totalResults": 1
}
},
"links": {
"data.createdBy": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.createdBy}"
},
"data.fields.user1": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.fields.user1}"
},
"data.modifiedBy": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.modifiedBy}"
},
"data.fields.projectManager": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.fields.projectManager}"
},
"data.parent": {
"type": "projects",
"href": "https://abc#abc.com/rest/v1/projects/{data.parent}"
}
},
"data": [
{
"id": 20456,
"projectKey": "Stratus",
"parent": 20303,
"isFolder": false,
"createdDate": "2018-03-12T23:46:59.000+0000",
"modifiedDate": "2020-04-28T22:14:35.000+0000",
"createdBy": 18994,
"modifiedBy": 18865,
"fields": {
"projectManager": 18373,
"user1": 18628,
"projectKey": "Stratus",
"text1": "",
"name": "Stratus",
"description": "",
"date2": "2019-03-12",
"date1": "2018-03-12"
},
"type": "projects"
}
]
}
I have tried the following, but end up getting error:
▶ cat jqTrial.txt | jq '.data[].id'
jq: error (at <stdin>:21): Cannot index number with string "id"
20456
Also tried this but I get strings outside the object that I am not sure how to remove:
cat jqTrial.txt | jq '.data[]'
Assuming you want the project id not the user id:
jq '
.data
| if type == "object" then . else .[] end
| select(.type == "projects")
| .id
' file.json
There's probably a better way to write the 2nd expression
Indeed, thanks to #pmf
.data | objects // arrays[] | select(.type == "projects").id
Your input consists of two JSON documents; both have a data field on top level. But while the first one is itself an object which has an .id field, the second one is an array with one object item, which also has an .id field.
To retrieve both, you could use the --slurp (or -s) option which wraps both top-level objects into an array, then you can address them separately by index:
jq --slurp '.[0].data.id, .[1].data[].id' jqTrial.txt
24843
20456
Demo

JQ - unique count of each value in an array

I'm needing to solve this with JQ. I have a large lists of arrays in my json file and am needing to do some sort | uniq -c types of stuff on them. Specifically I have a relatively nasty looking fruit array that needs to break down what is inside. I'm aware of unique and things like that, and imagine there is likely a simple way to do this, but I've been trying run down assigning things as variables and appending and whatnot, but I can't get the most basic part of counting the unique values per that fruit array, and especially not without breaking the rest of the content (hence the variable ideas). Please tell me I'm overthinking this.
I'd like to turn this;
[
{
"uid": "123abc",
"tID": [
"T19"
],
"fruit": [
"Kiwi",
"Apple",
"",
"",
"",
"Kiwi",
"",
"Kiwi",
"",
"",
"Mango",
"Kiwi"
]
},
{
"uid": "456xyz",
"tID": [
"T15"
],
"fruit": [
"",
"Orange"
]
}
]
Into this;
[
{
"uid": "123abc",
"tID": [
"T19"
],
"metadata": [
{
"name": "fruit",
"value": "Kiwi - 3"
},
{
"name": "fruit",
"value": "Mango - 1"
},
{
"name": "fruit",
"value": "Apple - 1"
}
]
},
{
"uid": "456xyz",
"tID": [
"T15"
],
"metadata": [
{
"name": "fruit",
"value": "Orange - 1"
}
]
}
]
Using group_by and length would be one way:
jq '
map(with_entries(select(.key == "fruit") |= (
.value |= (group_by(.) | map(
{name: "fruit", value: "\(.[0] | select(. != "")) - \(length)"}
))
| .key = "metadata"
)))
'
[
{
"uid": "123abc",
"tID": [
"T19"
],
"metadata": [
{
"name": "fruit",
"value": "Apple - 1"
},
{
"name": "fruit",
"value": "Kiwi - 4"
},
{
"name": "fruit",
"value": "Mango - 1"
}
]
},
{
"uid": "456xyz",
"tID": [
"T15"
],
"metadata": [
{
"name": "fruit",
"value": "Orange - 1"
}
]
}
]
Demo

Using jq to convert object to key with values

I have been playing around with jq to format a json file but I am having some issues trying to solve a particular transformation. Given a test.json file in this format:
[
{
"name": "A", // This would be the first key
"number": 1,
"type": "apple",
"city": "NYC" // This would be the second key
},
{
"name": "A",
"number": "5",
"type": "apple",
"city": "LA"
},
{
"name": "A",
"number": 2,
"type": "apple",
"city": "NYC"
},
{
"name": "B",
"number": 3,
"type": "apple",
"city": "NYC"
}
]
I was wondering, how can I format it this way using jq?
[
{
"key": "A",
"values": [
{
"key": "NYC",
"values": [
{
"number": 1,
"type": "a"
},
{
"number": 2,
"type": "b"
}
]
},
{
"key": "LA",
"values": [
{
"number": 5,
"type": "b"
}
]
}
]
},
{
"key": "B",
"values": [
{
"key": "NYC",
"values": [
{
"number": 3,
"type": "apple"
}
]
}
]
}
]
I have followed this thread Using jq, convert array of name/value pairs to object with named keys and tried to group the json using this expression
jq '. | group_by(.name) | group_by(.city) ' ./test.json
but I have not been able to add the keys in the output.
You'll want to group the items at the different levels and building out your result objects as you want.
group_by(.name) | map({
key: .[0].name,
values: (group_by(.city) | map({
key: .[0].city,
values: map({number,type})
}))
})
Just keep in mind that group_by/1 yields groups in a sorted order. You'll probably want an implementation that preserves that order.
def group_by_unsorted(key_selector):
reduce .[] as $i ({};
.["\($i|key_selector)"] += [$i]
)|[.[]];

Rename JSON key field with value in object

Have the following json output:
[
{
"id": "47",
"canUpdate": true,
"canDelete": true,
"canArchive": true,
"info": [
{
"key": "problem_type",
"value": "PAN",
"valueCaption": "PAN",
"keyCaption": "Category"
},
{
"key": "status",
"value": 3,
"valueCaption": "Closed",
"keyCaption": "Status"
},
{
"key": "insert_time",
"value": 1466446314000,
"valueCaption": "2016-06-20 14:11:54.0",
"keyCaption": "Request time"
}
As you can see under "info" they actually label the key:value pair as "key": "problem_type" and "value": "PAN" and then "valueCaption": "PAN" "keyCaption": "Category". What I need to do is remap the file so that, in this example, it shows as "problem_type": "PAN" and "Category": "PAN". What would be the best method to iterate through the output to remap the key:value pairs in this manner?
How it needs to be:
[
{
"id": "47",
"canUpdate": true,
"canDelete": true,
"canArchive": true,
"info": [
{
"problem_type": "PAN",
"Category": "PAN"
},
{
"status": 3,
"Status": "Closed"
},
{
"insert_time": 1466446314000,
"Request time": "2016-06-20 14:11:54.0"
}
Here is a jq solution which uses Update assignment |=
.[].info[] |= {(.key):.value, (.keyCaption):.valueCaption}
Sample Run (assumes data in data.json)
$ jq -M '.[].info[] |= {(.key):.value, (.keyCaption):.valueCaption}' data.json
[
{
"id": "47",
"canUpdate": true,
"canDelete": true,
"canArchive": true,
"info": [
{
"problem_type": "PAN",
"Category": "PAN"
},
{
"status": 3,
"Status": "Closed"
},
{
"insert_time": 1466446314000,
"Request time": "2016-06-20 14:11:54.0"
}
]
}
]
Try it online at jqplay.org