Query parent data (multi-level) based on a child value, on a json file, using jq - json

I have a ksh script that retrives (using curl) a json file similar to the one bellow:
{
"Type1": {
"dev": {
"server": [
{ "group": "APP1", "name": "DAPP1002", "ip": "10.1.1.1" },
{ "group": "APP2", "name": "DAPP2001", "ip": "10.1.1.2" }
]
},
"qa": {
"server": [
{ "group": "APP1", "name": "QAPP1002", "ip": "10.1.2.1" },
{ "group": "APP2", "name": "QAPP2001", "ip": "10.1.2.2" }
]
},
"prod": {
"proxy": "type1.prod.proxy.mydomain.com",
"server": [
{ "group": "APP1", "name": "PAPP1001", "ip": "10.1.3.1" },
{ "group": "APP1", "name": "PAPP1002", "ip": "10.1.3.2" },
{ "group": "APP2", "name": "PAPP2001", "ip": "10.1.3.3" }
]
}
},
"Type2": {
"dev": {
"server": [
{ "group": "APP8", "name": "DAPP8002", "ip": "10.2.1.1" },
{ "group": "APP9", "name": "DAPP9001", "ip": "10.2.1.2" }
]
},
"qa": {
"server": [
{ "group": "APP8", "name": "QAPP8002", "ip": "10.2.2.1" },
{ "group": "APP9", "name": "QAPP9001", "ip": "10.2.2.2" }
]
},
"prod": {
"proxy": "type2.prod.proxy.mydomain.com",
"server": [
{ "group": "APP8", "name": "PAPP8001", "ip": "10.2.3.1" },
{ "group": "APP9", "name": "PAPP9001", "ip": "10.2.3.2" },
{ "group": "APP9", "name": "PAPP9002", "ip": "10.2.3.3" }
]
}
}
}
... based on a server name (field "name") I would have to collect the following info, to pass to a function:
"Type", "name", "ip", "proxy"
(Note that the "proxy" info is optional)
I am new to json, and I am trying to get this filtered with jq but so far, I am out of lucky.
What I acomplished so far is the following jq query, when searching for "PAPP9001" :
jq '.[] | .[] | select(.server[].name=="PAPP9001") | .proxy as $proxy | .server[] | {proxy: $proxy, name: .name, ip: .ip} | select(.name=="PAPP9001")' curlreturn.json
which returns me:
{
"proxy": "type2.prod.proxy.mydomain.com",
"name": "PAPP9001",
"ip": "10.2.3.2"
}
but:
I could not get the "Type" info, at the top level
Considering the number of pipes and the 2 selects, I doubt that this is the most efficient way.

One way to retrieve the key names programmatically is using to_entries. For example, given your input, this jq filter:
to_entries[]
| .key as $type
| .value[]
| .proxy as $proxy
| .server[]
| select(.name == "PAPP9001")
| { Type: $type, name, ip, proxy: $proxy }
yields:
{
"Type": "Type2",
"name": "PAPP9001",
"ip": "10.2.3.2",
"proxy": "type2.prod.proxy.mydomain.com"
}
Variations
If, for example, you wanted these four fields as a CSV row, then you could replace the last line of the filter above with:
| [$type, .name, .ip, $proxy] | #csv
See the jq manual for how to use string interpolation.

Related

Pyspark transform json into multiple dataframes

I have multiple json with this structure (association can have one or multiple objects & Charasteritics doesn't always has the same number of kv pairs:
{
"vl:VNETList": {
"Template": {
"ID": "SomeId",
"Object": [
{
"ID": "my_first_id",
"Context": {
"ID": "Avngate"
},
"Name": "Model Description",
"ClassID": "PID",
"Association": [
{
"Object": {
"ID": "test.svg",
"Context": {
"ID": "Avngate"
}
},
"#type": "is fulfilled by"
},
{
"Object": {
"ID": "Project Description",
"Context": {
"ID": "Avngate"
}
},
"#type": "is an element of"
}
],
"Characteristic": [
{
"Name": "InfoType",
"Value": "image/svg+xml"
},
{
"Name": "LOCK",
"Value": false
},
{
"Name": "EXFI",
"Value": 10000
}
]
},
{
"ID": "my_second_id",
"Context": {
"ID": "Avngate2"
},
"Name": "Model Description2",
"ClassID": "PID2",
"Association": [
{
"Object": {
"ID": "test2.svg",
"Context": {
"ID": "Avngate"
}
},
"#type": "is fulfilled by"
}
],
"Characteristic": [
{
"Name": "Dbtencoding",
"Value": "unicode"
}
]
}
]
}
}
I would like to build two dataframes like this:
and the second dataframe like this:
What's the best approach? If too complex, I would be able also to save the characteristics as a separate table referencing the objectId like with the association.
Read json and groupBy for the first one, just select for the second one with explode.
df1 = spark.read.json('test.json', multiLine=True)
df2 = df1.select(f.explode('vl:VNETList.Template.Object').alias('value')) \
.select('value.*')
df_f1 = df2.withColumn('Characteristic', f.explode('Characteristic')) \
.groupBy('ID', 'Name', 'ClassId') \
.pivot('Characteristic.Name') \
.agg(f.first('Characteristic.Value'))
df_f2 = df2.withColumn('Association', f.explode('Association')) \
.select('ID', 'Association.Object.ID', 'Association.#Type') \
.toDF('ID', 'AssociationId', 'AssociationType')
df_f1.show()
df_f2.show()
+------------+------------------+-------+-----------+-----+-------------+-----+
| ID| Name|ClassId|Dbtencoding| EXFI| InfoType| LOCK|
+------------+------------------+-------+-----------+-----+-------------+-----+
| my_first_id| Model Description| PID| null|10000|image/svg+xml|false|
|my_second_id|Model Description2| PID2| unicode| null| null| null|
+------------+------------------+-------+-----------+-----+-------------+-----+
+------------+-------------------+----------------+
| ID| AssociationId| AssociationType|
+------------+-------------------+----------------+
| my_first_id| test.svg| is fulfilled by|
| my_first_id|Project Description|is an element of|
|my_second_id| test2.svg| is fulfilled by|
+------------+-------------------+----------------+

How to extract a paticular key from the json

I am trying to extract values from a json that I obtained using the curl command for api testing. My json looks as below. I need some help extracting the value "20456" from here?
{
"meta": {
"status": "OK",
"timestamp": "2022-09-16T14:45:55.076+0000"
},
"links": {},
"data": {
"id": 24843,
"username": "abcd",
"firstName": "abc",
"lastName": "xyz",
"email": "abc#abc.com",
"phone": "",
"title": "",
"location": "",
"licenseType": "FLOATING",
"active": true,
"uid": "u24843",
"type": "users"
}
}
{
"meta": {
"status": "OK",
"timestamp": "2022-09-16T14:45:55.282+0000",
"pageInfo": {
"startIndex": 0,
"resultCount": 1,
"totalResults": 1
}
},
"links": {
"data.createdBy": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.createdBy}"
},
"data.fields.user1": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.fields.user1}"
},
"data.modifiedBy": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.modifiedBy}"
},
"data.fields.projectManager": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.fields.projectManager}"
},
"data.parent": {
"type": "projects",
"href": "https://abc#abc.com/rest/v1/projects/{data.parent}"
}
},
"data": [
{
"id": 20456,
"projectKey": "Stratus",
"parent": 20303,
"isFolder": false,
"createdDate": "2018-03-12T23:46:59.000+0000",
"modifiedDate": "2020-04-28T22:14:35.000+0000",
"createdBy": 18994,
"modifiedBy": 18865,
"fields": {
"projectManager": 18373,
"user1": 18628,
"projectKey": "Stratus",
"text1": "",
"name": "Stratus",
"description": "",
"date2": "2019-03-12",
"date1": "2018-03-12"
},
"type": "projects"
}
]
}
I have tried the following, but end up getting error:
▶ cat jqTrial.txt | jq '.data[].id'
jq: error (at <stdin>:21): Cannot index number with string "id"
20456
Also tried this but I get strings outside the object that I am not sure how to remove:
cat jqTrial.txt | jq '.data[]'
Assuming you want the project id not the user id:
jq '
.data
| if type == "object" then . else .[] end
| select(.type == "projects")
| .id
' file.json
There's probably a better way to write the 2nd expression
Indeed, thanks to #pmf
.data | objects // arrays[] | select(.type == "projects").id
Your input consists of two JSON documents; both have a data field on top level. But while the first one is itself an object which has an .id field, the second one is an array with one object item, which also has an .id field.
To retrieve both, you could use the --slurp (or -s) option which wraps both top-level objects into an array, then you can address them separately by index:
jq --slurp '.[0].data.id, .[1].data[].id' jqTrial.txt
24843
20456
Demo

Using JQ to filter JSON data at different levels

I have this JSON data : some-json-file which contains the following
{
"result": [
{
"id": "1234567812345678",
"name": "somewebsite.com",
"status": "active",
"type": "secondary",
"activated_on": "2021-12-12T15:44:40.444433Z",
"plan": {
"id": "77777777777777777777777777",
"name": "Enterprise Website",
"is_subscribed": true,
"legacy_id": "enterprise",
"externally_managed": true
}
}
],
"result_info": {
"page": 1,
"total_pages": 1
},
"success": true,
"messages": []
}
And I am trying to get this filtered output from it using jq
{
"name": "somewebsite.com",
"type": "secondary",
"plan": {
"name": "Enterprise Website",
"id": "77777777777777777777777777"
}
}
But I can't figure out how to do that.
I can filter the first layer of labels like this
cat some-json-file | jq '.result[] | {name,type,plan}'
Which gets me this output
{
"name": "somewebsite.com",
"type": "secondary",
"plan": {
"id": "77777777777777777777777777",
"name": "Enterprise Website",
"is_subscribed": true,
"legacy_id": "enterprise",
"externally_managed": true
}
}
That gets me close, but I can't further filter the child labels under .plan so that I see just the .name and .id.
Any ideas? Thanks!
You were almost there. Just set the new context and use the same technique again:
jq '.result[] | {name,type,plan: .plan | {name,id}}' some-json-file
{
"name": "somewebsite.com",
"type": "secondary",
"plan": {
"name": "Enterprise Website",
"id": "77777777777777777777777777"
}
}
Demo
Note: You don't need to cat the input, jq accepts the filename as parameter.

storing json output in bash from cloudfromation

I am using aws ecs query to get list of properties being used by the current running task.
command -
cft = "aws ecs describe-tasks --cluster arn:aws:ecs:us-west-2:4984314772:cluster/secrets --tasks arn:aws:ecs:us-west-2:4984314772:task/secrets/86855757eec4487f9d4475a1f7c4cb0b
I am storing this in an output variable
output= $( eval $cft)
Output:
"tasks": [
{
"attachments": [
{
"id": "da8a1312-8278-46d5-8e3b-6b6a1d96f820",
"type": "ElasticNetworkInterface",
"status": "ATTACHED",
"details": [
{
"name": "subnetId",
"value": "subnet-0a151f2eb959ad4"
},
{
"name": "networkInterfaceId",
"value": "eni-081948e3666253f"
},
{
"name": "macAddress",
"value": "02:2a:9i:5c:4a:77"
},
{
"name": "privateDnsName",
"value": "ip-172-56-17-177.us-west-2.compute.internal"
},
{
"name": "privateIPv4Address",
"value": "172.56.17.177"
}
]
}
],
"availabilityZone": "us-west-2a",
"clusterArn": "arn:aws:ecs:us-west-2:4984314772:cluster/secrets",
"containers": [
{
"taskArn": "arn:aws:ecs:us-west-2:4984314772:task/secrets/86855757eec4487f9d4475a1f7c4cb0b",
"name": "nginx",
"image": "nginx",
"lastStatus": "PENDING",
"networkInterfaces": [
{
"attachmentId": "da8a1312-8278-46d5-6b6a1d96f820",
"privateIpv4Address": "172.31.17.176"
}
],
"healthStatus": "UNKNOWN",
"cpu": "0"
}
],
"cpu": "256",
"createdAt": "2020-12-10T18:00:16.320000+05:30",
"desiredStatus": "RUNNING",
"group": "family:nginx",
"healthStatus": "UNKNOWN",
"lastStatus": "PENDING",
"launchType": "FARGATE",
"memory": "512",
"overrides": {
"containerOverrides": [
{
"name": "nginx"
}
],
"inferenceAcceleratorOverrides": []
},
"platformVersion": "1.4.0",
"tags": [],
"taskArn": "arn:aws:ecs:us-west-2:4984314772:task/secrets/86855757eec4487f9d4475a1f7c4cb0b",
"taskDefinitionArn": "arn:aws:ecs:us-west-2:4984314772:task-definition/nginx:17",
"version": 2
}
],
"failures": []
}
now if do an echo of $output.tasks[0].containers[0] nothing happens it prints the entire thing again, i want to store the result in output variable and refer different parameter like we do in json format.
You will need to use a json parser such as jq and so:
eval $cft | jq '.tasks[].containers[]'
To avoid using eval you could simple pipe the aws command into jq and so:
aws ecs describe-tasks --cluster arn:aws:ecs:us-west-2:4984314772:cluster/secrets --tasks arn:aws:ecs:us-west-2:4984314772:task/secrets/86855757eec4487f9d4475a1f7c4cb0b | jq '.tasks[].containers[]'
or:
cft=$(aws ecs describe-tasks --cluster arn:aws:ecs:us-west-2:4984314772:cluster/secrets --tasks arn:aws:ecs:us-west-2:4984314772:task/secrets/86855757eec4487f9d4475a1f7c4cb0b | jq '.tasks[].containers[]')
echo $cft | jq '.tasks[].containers[]'

Selecting multiple conditionals in JQ

I've just started using jq json parser, is there anyway to choose multiple select?
I have this:
cat file | jq -r '.Instances[] | {ip: .PrivateIpAddress, name: .Tags[]}
| select(.name.Key == "Name")'
And I need to also include the .name.Key == "Type"
This is the JSON:
{
"Instances": [
{
"PrivateIpAddress": "1.1.1.1",
"Tags": [
{
"Value": "Daily",
"Key": "Backup"
},
{
"Value": "System",
"Key": "Name"
},
{
"Value": "YES",
"Key": "Is_in_Domain"
},
{
"Value": "PROD",
"Key": "Type"
}
]
}
]
}
And this is the current output:
{
"ip": "1.1.1.1",
"name": "System"
}
{
"ip": "2.2.2.2",
"name": "host"
}
{
"ip": "3.3.3.3",
"name": "slog"
}
Desired output:
{
"ip": "1.1.1.1",
"name": "System",
"type": "PROD"
}
{
"ip": "2.2.2.2",
"name": "host",
"type": "PROD"
}
{
"ip": "3.3.3.3",
"name": "slog",
"type": "PROD"
}
What is the right way to do it? Thanks.
There's no "right" way to do it, but there are approaches to take that can make things easier for you.
The tags are already in a format that makes converting to objects simple (they're object entries). Convert the tags to an object for easy access to the properties.
$ jq '.Instances[]
| .Tags |= from_entries
| {
ip: .PrivateIpAddress,
name: .Tags.Name,
type: .Tags.Type
}' file