How to print a value using json several levels above? - json

Given a json such as:
{
"clusters": [
{
"domain": "crap1",
"name": "BB1",
"nodes": [
{
"gpu": null,
"node": "bb1-1",
"role": "worker"
},
{
"gpu": {
"P40": 2
},
"node": "bb1-2",
"role": "master"
}
],
"site": "B-place",
"hardware": "prod-2",
"timezone": "US/Eastern",
"type": "CCE",
"subtype": null
}
]
}
where there are actually many more clusters, I want to see if I can parse the json searching for node bb1-2, for example, and print out the cluster name it belongs to BB1?
I know I can search for that node with:
.clusters[] | .nodes[] | select(.node == "bb1-2")
but can't figure out how to code it to print out a value at a higher level?

In addition to the other approaches, a very general way to hold on to higher level context is to bind it to a variable.
jq '
.clusters[] |
. as $cluster |
.nodes[] |
select(.node == "bb1-2") |
{cluster_name:$cluster.name, node:.}
'
{
"cluster_name": "BB1",
"node": {
"gpu": {
"P40": 2
},
"node": "bb1-2",
"role": "master"
}
}
This makes sure you know both the cluster and the matching node itself, and avoids the confusion that arises if your select condition matches the same cluster more than once.

How about
.clusters[] | select(.nodes[].node == "bb1-2").name
Try it:
JQ play

Related

how to denormalise this json structure

I have a json formatted overview of backups, generated using pgbackrest. For simplicity I removed a lot of clutter so the main structures remain. The list can contain multiple backup structures, I reduced here to just 1 for simplicity.
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
],
"name": "dbname1"
}
]
Using jq I tried to generate a simpeler format out of this, until now without any luck.
What I would like to see is the backup.archive, backup.info, backup.label, backup.type, name combined in one simple structure, without getting into a cartesian product. I would be very happy to get the following output:
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"name": "dbname1",
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"name": "dbname1",
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
]
}
]
where name is redundantly added to the list. How can I use jq to convert the shown input to the requested output? In the end I just want to generate a simple csv from the data. Even with the simplified structure using
'.[].backup[].name + ":" + .[].backup[].type'
I get a cartesian product:
"dbname1:full"
"dbname1:full"
"dbname1:incr"
"dbname1:incr"
how to solve that?
So, for each object in the top-level array you want to pull in .name into each of its .backup array's elements, right? Then try
jq 'map(.backup[] += {name} | del(.name))'
Demo
Then, generating a CSV output using jq is easy: There is a builtin called #csv which transforms an array into a string of its values with quotes (if they are stringy) and separated by commas. So, all you need to do is to iteratively compose your desired values into arrays. At this point, removing .name is not necessary anymore as we are piecing together the array for CSV output anyway. And we're giving the -r flag to jq in order to make the output raw text rather than JSON.
jq -r '.[]
| .backup[] + {name}
| [(.archive | .start, .stop), .name, .info.size, .label, .type]
| #csv
'
Demo
First navigate to backup and only then “print” the stuff you’re interested.
.[].backup[] | .name + ":" + .type

Selecting objects from a list with jq based on a value, accessing the top-level object

So I have a structure like this:
[
{
"name": "aaaaa",
"type": "A",
"class": "IN",
"status": "NOERROR",
"data": {
"answers": [
{
"ttl": 30,
"type": "CNAME",
"class": "IN",
"name": "aaaaa",
"data": "bbbbb"
},
{
"ttl": 1800,
"type": "CNAME",
"class": "IN",
"name": "bbbbb",
"data": "ccccc"
},
{
"ttl": 60,
"type": "A",
"class": "IN",
"name": "ccccc",
"data": "1.2.3.4"
}
],
},
{
"name": ...
...
It's basically a list of DNS resolution data.
For each question, there may be more than one answer. All I care about the is type: "A" record. After a little work and reading, I came up with this:
. | select(
(.status = "NOERROR") and
(.class = "IN") and
(.data.answers? != null)) | select(.data.answers | map(select(.type == "A")))
This works to give me only the objects where the type is "A". However, because of the way that I built it, I lose the top level "name" value. The only thing returned are the actual answer objects themselves, the objects in the answers[]
The problem is, I need to access the original (top-level) name value. I'm missing something simple here, can someone give me a hand?
Thanks
EDIT: What I want to print out at the end is basically the top-level .name value and each .data value where the type == "A". There's a probably a much simpler way to do this, so if there is a completely different approach, I'm happy to hear that as well!
EDIT2: I originally thought this would be simpler and did:
. | select(
(.status = "NOERROR") and
(.class = "IN") and
(.data.answers? != null) and
(.data.answers[].type=="A")) | .
... but this returns the entire list of A, CNAME, and other types of values, as long as there is a single A present, it seems. So no luck there
However, because of the way that I built it, I lose the top level "name" value
Stay in the top-level then.
.[]
| select(.class == "IN" and .status == "NOERROR")
| .name + ": " + (.data.answers[] | select(.type == "A") .data)?

jq - Find a JSON object based on one of its values and get another value from it

I've started using jq just very recently and I would like to know if something like this is even possible.
Example:
{
"name": "device",
"version": "1.0.0",
"address": [
{
"address": "10.1.2.3",
"interface": "wlan1_wifi"
},
{
"address": "10.1.2.5",
"interface": "wlan2_link"
},
{
"address": "10.1.2.4",
"interface": "ether1"
}
],
"wireless": [
{
"name": "wlan1_wifi",
"type": "5Ghz",
"ssid": "wifi"
},
{
"name": "wlan2_link",
"type": "2Ghz",
"ssid": "link"
}
]
}
Firstly let's transform the example to this json object:
cat json | jq '. | {"name": ."name", "version": ."version", "wireless": [."wireless"[] | {"name": ."name", "type": ."type", "ssid": ."ssid"}]}'
{
"name": "device",
"version": "1.0.0",
"wireless": [
{
"name": "wlan1_wifi",
"type": "5Ghz",
"ssid": "wifi"
},
{
"name": "wlan2_link",
"type": "2Ghz",
"ssid": "link"
}
]
}
Now there's a problem. I need to assign an address to the "wireless" array. The address is stored in "address" array.
So the question: is there a way of finding the right json object in "address" based on "name" (in wireless array) and "interface" (in address array) for every json object in "wireless" array and then assigning "address" to it?
The final result should look like this:
{
"name": "device",
"version": "1.0.0",
"wireless": [
{
"name": "wlan1_wifi",
"type": "5Ghz",
"ssid": "wifi",
"address": "10.1.2.3"
},
{
"name": "wlan2_link",
"type": "2Ghz",
"ssid": "link",
"address": "10.1.2.5"
}
]
}
Answer:
Here's my answer based on the answer from #peak. Instead of copying the content of .wireless and then using map, I'm cherry picking the keys that I want to include only. This also allows me to position "address" how ever I want.
(INDEX(.address[]; .interface)) as $dict
| {name: .name, version: .version,
wireless: [.wireless[] | {name, address: ($dict[.name]|.address), type, ssid}]}
The following produces the output as originally requested:
(.wireless[].name) as $name
| .address[]
| select(.interface == $name)
| { wireless: {name: $name, address}}
However the above filter could potentially produce more than one result, so you might want to make modifications accordingly.
Revised revised requirements
If your jq has INDEX/2 (which was only made available AFTER jq 1.5 was released), you can simply use it to create a lookup table:
(INDEX(.address[]; .interface)) as $dict
| {name,
version,
wireless: (.wireless
| map(. + {address: ($dict[.name]|.address) }) ) }
Or (depending perhaps on the exact requirements):
(INDEX(.address[]; .interface)) as $dict
| del(.address)
| .wireless |= map(. + {address: ($dict[.name]|.address) })
If your jq does not have INDEX/2, then you could easily adapt the above (using reduce), or even more easily snarf the def of INDEX/2 from https://github.com/stedolan/jq/blob/master/src/builtin.jq

Elegant way to select nested objects with the associated key based on a specific criteria

Given an example document in JSON similar to this:
{
"id": "post-1",
"type": "blog-post",
"tags": [
{
"id": "tag-1",
"name": "Tag 1"
},
{
"id": "tag-2",
"name": "Tag 2"
}
],
"heading": "Post 1",
"body": "this is my first blog post",
"links": [
{
"id": "post-2",
"heading": "Post 2",
"tags": [
{
"id": "tag-1",
"name": "Tag 1"
},
{
"id": "tag-3",
"name": "Tag 3"
}
]
}
],
"metadata": {
"user": {
"social": [
{
"id": "twitter",
"handle": "#user"
},
{
"id": "facebook",
"handle": "123456"
},
{
"id": "youtube",
"handle": "ABC123xyz"
}
]
},
"categories": [
{
"name": "Category 1"
},
{
"name": "Category 2"
}
]
}
}
I would like to select any object (regardless of depth) that has an attribute "id", as well as the attribute name of the parent object. The above example should be taken as just that, an example. The actual data, that I'm not at liberty to share, can have any depth and just about any structure. Attributes can be introduced and removed at any time. Using the Blog Post style is just because it is quite popular for examples and I have very limited imagination.
The attribute signifies a particular type within the domain, that might also be (but is not necessarily) coded into the value of the attribute.
If an object does not have the "id" attribute it is not interesting and should not be selected.
A very important special case is when the value of an attribute is an array of objects, in that case I need to keep the attribute name and associate it with each element in the array.
An example of the desired output would be:
[
{
"type": "tags",
"node": {
"id": "tag-1",
"name": "Tag 1"
}
},
{
"type": "tags",
"node": {
"id": "tag-2",
"name": "Tag 2"
}
},
{
"type": "links",
"node": {
"id": "post-2",
"heading": "Post 2",
"tags": [
{
"id": "tag-1",
"name": "Tag 1"
},
{
"id": "tag-3",
"name": "Tag 3"
}
]
}
},
{
"type": "tags",
"node": {
"id": "tag-1",
"name": "Tag 1"
}
},
{
"type": "tags",
"node": {
"id": "tag-3",
"name": "Tag 3"
}
},
{
"type": "social",
"node": {
"id": "twitter",
"handle": "#user"
}
},
{
"type": "social",
"node": {
"id": "facebook",
"handle": "123456"
}
},
{
"type": "social",
"node": {
"id": "youtube",
"handle": "ABC123xyz"
}
}
]
It isn't strictly necessary that the output is identical, order for instance is irrelevant for my use-case it could be grouped as well. Since the top level object has an attribute "id" it could be included with a special name, but I'd prefer if it was not included at all.
I've tried to use walk, reduce and recurse to no avail, I'm afraid my jq skills are too limited. But I imagine that a good solution would make use of at least one of them.
I would like an expression something like
to_entries[] | .value | .. | select(has("id")?)
which would select the correct objects, but with .. I'm no longer able to keep the associated attribute name.
The best I've come up with is
. as $document
| [paths | if length > 1 and .[-1] == "id" then .[0:-1] else empty end]
| map(. as $path
| $document
| { "type": [$path[] | if type == "string" then . else empty end][-1],
"node": getpath($path) })
Which works, but feels quite complicated and involves first extracting all paths, ignoring any path that does not have "id" as the last element, then remove the "id" segment to get the path to the actual object and storing the (by now last) segment that is a string, which corresponds to the parent objects attribute containing the interesting object. Finally the actual object is selected through getpath.
Is there a more elegant, or at the least shorter way to express this?
I should note that I'd like to use jq for the convenience of having bindings to other languages as well as being able to run the program on the command line.
For the scope of this question, I'm not really interested in alternatives to jq as I can imagine how to solve this differently using other tooling, but I would really like to "just" use jq.
Since the actual requirements aren’t clear to me, I’ll assume that the given implementation defines the functional requirements, and propose a shorter and hopefully sleeker version:
. as $document
| paths
| select(length > 1 and .[-1] == "id")
| .[0:-1] as $path
| { "type": last($path[] | strings),
"node": $document | getpath($path) }
This produces a stream, so if you want an array, you could simply enclose the above in square brackets.
last(stream) emits null if the stream is empty, which accords with the behavior of .[-1].
This works:
[
foreach (paths | select(.[-1] == "id" and length > 1)[:-1]) as $path ({i:.};
.o = {
type: last($path[] | strings),
node: (.i | getpath($path))
};
.o
)
]
The trick is to know that any numbers in the path indicates the value is part of an array. You'll have to adjust the path to get the parent name. But using last/1 with a string filter makes it simpler.

Parsing aws ec2 describe-instances to get all private ip addresses

Two of my EC2 instances have 3 IPs each. I managed to successfully grab a list of JSON objects:
aws ec2 describe-instances | jq '.Reservations[] | .Instances[] | (.Tags | { "iname": ( map ( select(.Value | contains("my-vm")))[] | .Value ) } ) + ( { "ip": ( .NetworkInterfaces[].PrivateIpAddress) } )' | jq -s .
Giving me the following result:
[
{
"iname": "my-vm-b",
"ip": "10.11.2.145"
},
{
"iname": "my-vm-b",
"ip": "10.11.1.146"
},
{
"iname": "my-vm-b",
"ip": "10.11.10.144"
},
{
"iname": "my-vm-a",
"ip": "10.11.1.9"
},
{
"iname": "my-vm-a",
"ip": "10.11.10.125"
},
{
"iname": "my-vm-a",
"ip": "10.11.2.85"
}
]
and then I added to the command the following:
... | jq ' group_by(.iname)[] | {(.[0].iname): [.[] | .ip]}' | jq -s .
To finally get the list of objects the way I wanted:
[
{
"my-vm-a": [
"10.11.1.9",
"10.11.10.125",
"10.11.2.85"
]
},
{
"my-vm-b": [
"10.11.2.145",
"10.11.1.146",
"10.11.10.144"
]
}
]
Notice I had to call jq like 4 times. I know I must be doing something wrong so I was wondering if I could do it with a single jq call.
Thanks!
You can easily eliminate the calls to jq -s by wrapping expressions as appropriately in square brackets, or maybe better, using map.
For example, your last pair of jq calls can be simplified to:
jq 'group_by(.iname) | map({(.[0].iname): [.[] | .ip]})'
The following should allow you to reduce the four calls to one:
[.Reservations[]
| .Instances[]
| (.Tags | { "iname": ( map ( select(.Value | contains("my-vm")))[] | .Value ) } )
+ ( { "ip": ( .NetworkInterfaces[].PrivateIpAddress) } ) ]
| group_by(.iname) | map({(.[0].iname): [.[] | .ip]})
However, I would advise against using contains here, unless you fully understand the complications.
Before you go into trying to simplify your jq calls, I think it would be more beneficial to first look at the source data and how it relates to the result you want.
Ignoring a lot of the other details in the data, I think we can agree that your data looks sorta like this:
{
"Reservations": [
{
"Instances": [
{
"NetworkInterfaces": [
{ "PrivateIpAddress": "10.11.2.145" },
{ "PrivateIpAddress": "10.11.1.146" },
{ "PrivateIpAddress": "10.11.10.144" }
],
"Tags": [
{ "Key": "Name", "Value": "my-vm-b" }
]
},
{
"NetworkInterfaces": [
{ "PrivateIpAddress": "10.11.1.9" },
{ "PrivateIpAddress": "10.11.10.125" },
{ "PrivateIpAddress": "10.11.2.85" }
],
"Tags": [
{ "Key": "Name", "Value": "my-vm-a" }
]
}
]
}
]
}
With something that looks like this, your jq query can simply be:
[.Reservations[].Instances[] |
{
(.Tags|from_entries.Name): [.NetworkInterfaces[].PrivateIpAddress]
}
]
No intermediate results needed. Just a few things of note here.
The tags are already an array of key/value pairs, you can easily read values from here converting them to an object first using from_entries
You are selecting instances based on an existence of a tag value containing "my-vm". I'm not sure you even need to do this, I don't know what your data looks like but they are likely in a fixed name so you should just use that name.