Chain select and max_by on json doc with jq in bash

Chain select and max_by on json doc with jq in bash - json

I want to get the SnapshotIdentifier of the snapshot with the maximum SnapshotCreateTime, and filter it by ClusterIdentifier. Here is the command I'm using:
aws redshift describe-cluster-snapshots --region us-west-2 |
jq -r '.Snapshots[]
| select(.ClusterIdentifier == "dev-cluster")
| max_by(.SnapshotCreateTime)
| .SnapshotIdentifier '
Here is the json
{
"Snapshots": [
{
"EstimatedSecondsToCompletion": 0,
"OwnerAccount": "45645641155",
"CurrentBackupRateInMegaBytesPerSecond": 6.2857,
"ActualIncrementalBackupSizeInMegaBytes": 22.0,
"NumberOfNodes": 3,
"Status": "available",
"VpcId": "myvpc",
"ClusterVersion": "1.0",
"Tags": [],
"MasterUsername": "ayxbizops",
"TotalBackupSizeInMegaBytes": 192959.0,
"DBName": "dev",
"BackupProgressInMegaBytes": 22.0,
"ClusterCreateTime": "2016-09-06T15:56:08.170Z",
"RestorableNodeTypes": [
"dc1.large"
],
"EncryptedWithHSM": false,
"ClusterIdentifier": "dev-cluster",
"SnapshotCreateTime": "2016-09-06T16:00:25.595Z",
"AvailabilityZone": "us-west-2c",
"NodeType": "dc1.large",
"Encrypted": false,
"ElapsedTimeInSeconds": 3,
"SnapshotType": "manual",
"Port": 5439,
"SnapshotIdentifier": "thismorning"
}
]
}

max_by expects an array as input. Thus the following variant of your filter would work:
[.Snapshots[] | select(.ClusterIdentifier == "dev-cluster")]
| max_by(.SnapshotCreateTime)
| .SnapshotIdentifier
Based on your verbal description, it would seem you want to run max_by before select:
.Snapshots
| max_by(.SnapshotCreateTime)
| select(.ClusterIdentifier == "dev-cluster")
| .SnapshotIdentifier
If there is possibly more than one maximal object, you might want to use maximal_by rather than max_by:
def maximal_by(f):
(map(f) | max) as $mx
| .[] | select(f == $mx);

Related

JQ to convert JSON to CSV for specific Keys

I am trying to convert JSON to CSV for selected keys using jq.
file.json
{
"_ref": "ipv4address/Li5pcHY0X2FkZHJlc3yMDIuMS8w:10.202.202.1",
"discovered_data": {
"bgp_as": 64638,
"device_model": "catalyst37xxStack",
"device_port_name": "Vl2002",
"device_port_type": "propVirtual",
"device_type": "Switch-Router",
"device_vendor": "Cisco",
"discovered_name": "Test_Device.network.local",
"discoverer": "Network Insight",
"first_discovered": 1580161888,
"last_discovered": 1630773758,
"mac_address": "aa:bb:cc:dd:ee:ff",
"mgmt_ip_address": "10.202.202.1",
"os": "15.2(4)E10",
"port_speed": "Unknown",
"port_vlan_name": "TEST-DATA",
"port_vlan_number": 2002
},
"ip_address": "10.202.202.1",
"is_conflict": false,
"mac_address": "",
"names": ["Test_Device"],
"network": "10.202.202.0/23",
"network_view": "TEST VIEW",
"objects": [],
"status": "USED",
"types": [
"UNMANAGED"
],
"usage": []
}
my desired output is:
names,ip_address,discovered_data.mac_address,discovered_data.discovered_name
Test_Device,10.202.202.1,aa:bb:cc:dd:ee:ff,Test_Device.network.local
So far, I have tried using following command but getting some syntax error:
jq -r 'map({names,ip_address,discovered_data.mac_address,discovered_data.discovered_name}) | (first | keys_unsorted) as $keys | map([to_entries[] | .value]) as $rows | $keys,$rows[] | #csv' < file.json

Assuming the JSON has been fixed, consider the output of:
(null
| {names,
ip_address,
"discovered_data.mac_address",
"discovered_data.discovered_name"} | keys_unsorted) as $keys
| $keys,
({names: .names[],
ip_address,
"discovered_data.mac_address": .discovered_data.mac_address,
"discovered_data.discovered_name": .discovered_data.discovered_name }
| [.[]])
| #csv
Assuming jq is invoked with the -r command-line option, this has the advantage of producing valid CSV. If you prefer to have all the key names and values unquoted, you might wish to consider using join(",") instead of #csv, or some more sophisticated variation if you want to have your cake and eat it.

how to output all the keys and values from json using jq?

I am trying to out all the data from my json file that matches the value "data10=true" it does that but only grabs the names, how can i make it so it will output everything in my json file with anything that matches the "data10=true"?
this is what ive got data=$(jq -c 'to_entries[] | select (.value.data10 == "true")| [.key, .value.name]' data.json )
This is in my YAML template btw, running it as a pipeline in devops.

The detailed requirements are unclear, but hopefully you'll be able to use the following jq program as a guide:
..
| objects
| select( .data10 == "true" )
| to_entries[]
| select(.key != "data10")
| [.key, .value]
This will recursively (thanks to the initial ..) examine all the JSON objects in the input.
p.s.
If you want to make the selection based on whether .data10 is "true" or true, you could change the criterion to .data10 | . == true or . == "true".

jq 'to_entries | map(select(.value.data10=="true")) | from_entries' data.json
input data.json,
with false value:
{
"FOO": {
"data10": "false",
"name": "Donald",
"location": "Stockholm"
},
"BAR": {
"data10": "true",
"name": "Walt",
"location": "Stockholm"
},
"BAZ": {
"data10": "true",
"name": "Jack",
"location": "Whereever"
}
}
output:
{
"BAR": {
"data10": "true",
"name": "Walt",
"location": "Stockholm"
},
"BAZ": {
"data10": "true",
"name": "Jack",
"location": "Whereever"
}
}
based on: https://stackoverflow.com/a/37843822/983325

Combine JSON Field and add values using jq

I have to aggregate a few JSON results from a site. Because the site has a query concurrency limit and the queries timeout, the time frame for the queries have to be divided. So I am left with a JSON as follows:
{
"results": [
[
{
"field": "AccountId",
"value": "11352"
},
{
"field": "number_of_requests",
"value": "241398"
}
],
[
{
"field": "AccountId",
"value": "74923"
},
{
"field": "number_of_requests",
"value": "238566"
}
]
],
"statistics": {
"recordsMatched": 502870.0,
"recordsScanned": 165908292.0,
"bytesScanned": 744173091162.0
},
"status": "Complete"
}
{
"results": [
[
{
"field": "AccountId",
"value": "11352"
},
{
"field": "number_of_requests",
"value": "185096"
}
]
],
"statistics": {
"recordsMatched": 502870.0,
"recordsScanned": 165908292.0,
"bytesScanned": 744173091162.0
},
"status": "Complete"
}
I need to aggregate the results, match the values to the number of requests and print out the result in descending Order.
Desired Output:
AccountID : Number of Requests
11352 : 426494
74923 : 238566
Current Output:
AccountID : Number of Requests
11352 : 241398
11352 : 185096
74923 : 238566
The jq query I am running currently takes the file name as ResultDir:
list=$(jq -S '.results[] | map( { (.field) : .value} ) | add ' $ResultsDir |
jq -s -c 'sort_by(.number_of_requests|tonumber) | reverse[] ' |
jq -r '"\(.AccountId) : \(.number_of_requests)"')
How do I combine the results of the same accounts before printing it out? The results also need to be in descending order of number of requests.

When possible, it's generally advisable to minimize the number of calls to jq. In this case, it's easy enough to achieve the desired output with just one call to jq.
Assuming the input is a valid stream of JSON objects along the lines shown in the Q, the following produces the desired output:
jq -nr '
[inputs | .results[] | map( { (.field) : .value} ) | add]
| group_by(.AccountId)
| map([.[0].AccountId, (map(.number_of_requests|tonumber) | add)])
| sort_by(.[1]) | reverse
| .[]
| join(" : ")
'

How can I filter by a numeric field using jq?

I am writing a script to query the Bitbucket API and delete SNAPSHOT artifacts that have never been downloaded. This script is failing because it gets ALL snapshot artifacts, the select for the number of downloads does not appear to be working.
What is wrong with my select statement to filter objects by the number of downloads?
Of course the more direct solution here would be if I could just query the Bitbucket API with a filter. To the best of my knowledge the API does not support filtering by downloads.
My script is:
#!/usr/bin/env bash
curl -X GET --user "me:mykey" "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads?pagelen=100" > downloads.json
# get all values | reduce the set to just be name and downloads | select entries where downloads is zero | select entries where name contains SNAPSHOT | just get the name
#TODO i screwed up the selection somewhere its returning files that contain SNAPSHOT regardless of number of downloads
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads==0) | select(.name | contains("SNAPSHOT")) | .name' downloads.json > snapshots_without_any_downloads.js
#unique sort, not sure why jq gives me multiple values
sort -u snapshots_without_any_downloads.js | tr -d '"' > unique_snapshots_without_downloads.js
cat unique_snapshots_without_downloads.js | xargs -t -I % curl -Ss -X DELETE --user "me:mykey" "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/%" > deleted_files.txt
A deidentified sample of the raw input from the API is:
{
"pagelen": 10,
"size": 40,
"values": [
{
"name": "myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip"
}
},
"downloads": 2,
"created_on": "2018-03-15T17:50:00.157310+00:00",
"user": {
"username": "me",
"display_name": "me",
"type": "user",
"uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/users/me"
},
"html": {
"href": "https://bitbucket.org/me/"
},
"avatar": {
"href": "https://bitbucket.org/account/me/avatar/32/"
}
}
},
"type": "download",
"size": 430894
},
{
"name": "myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip"
}
},
"downloads": 0,
"created_on": "2018-03-15T17:50:00.157310+00:00",
"user": {
"username": "me",
"display_name": "me",
"type": "user",
"uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/users/me"
},
"html": {
"href": "https://bitbucket.org/me/"
},
"avatar": {
"href": "https://bitbucket.org/account/me/avatar/32/"
}
}
},
"type": "download",
"size": 430894
},
{
"name": "myproject_1.0_mc_3.5.1.zip",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.1.zip"
}
},
"downloads": 5,
"created_on": "2018-03-15T17:49:14.885544+00:00",
"user": {
"username": "me",
"display_name": "me",
"type": "user",
"uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/users/me"
},
"html": {
"href": "https://bitbucket.org/me/"
},
"avatar": {
"href": "https://bitbucket.org/account/me/avatar/32/"
}
}
},
"type": "download",
"size": 430934
}
],
"page": 1,
"next": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads?pagelen=10&page=2"
}
The output I want from this snippet is myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip - that artifact is a SNAPSHOT and has zero downloads.
I have used this intermediate step to do some debugging:
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads>0) | select(.name | contains("SNAPSHOT")) | unique' downloads.json > snapshots_with_downloads.js
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads==0) | select(.name | contains("SNAPSHOT")) | .name' downloads.json > snapshots_without_any_downloads.js
#this returns the same values for each list!
diff unique_snapshots_with_downloads.js unique_snapshots_without_downloads.js
This adjustment gives a cleaner and unique structure, it suggests that theres some sort of splitting or streaming aspect of jq that I do not fully understand:
#this returns a "unique" array like I expect, adding select to this still does not produce the desired outcome
jq '.values | [{name: .[].name, downloads: .[].downloads}] | unique' downloads.json
The data after this step looks like this. It just removed the cruft I didn't need from the raw API response:
[
{
"name": "myproject_1.0_2400a51_mc_3.4.0.zip",
"downloads": 0
},
{
"name": "myproject_1.0_2400a51_mc_3.4.1.zip",
"downloads": 2
},
{
"name": "myproject_1.1-SNAPSHOT_391f4d5_mc_3.5.0.zip",
"downloads": 0
},
{
"name": "myproject_1.1-SNAPSHOT_391f4d5_mc_3.5.1.zip",
"downloads": 2
}
]

As I understand it:
You want globally unique outputs
You want only items with downloads==0
You want only items whose name contains "SNAPSHOT"
The following will accomplish that:
jq -r '
[.values[] | {(.name): .downloads}]
| add
| to_entries[]
| select(.value == 0)
| .key | select(contains("SNAPSHOT"))'
Rather than making unique an explicit step, this version generates a map from names to download counters (adding the values together -- which means that in case of conflicts, the last one wins), and thereby both ensures that the outputs are unique.
Given your test JSON, output is:
myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip
Applied to the overall problem context, this strategy can be used to simplify the overall process:
jq -r '[.values[] | {(.links.self.href): .downloads}] | add | to_entries[] | select(.value == 0) | .key | select(contains("SNAPSHOT"))'
It simplifies the overall process by acting on the URL to the file rather than the name only. This simplifies the subsequent DELETE call. The sort and tr calls can also be removed.

Here's a solution which sums up the .download values per .name before making the selection based on the total number of downloads:
reduce (.values[] | select(.name | contains("SNAPSHOT"))) as $v
({}; .[$v.name] += $v.downloads)
| with_entries(select(.value == 0))
| keys_unsorted[]
Example:
$ jq -r -f program.jq input.json
myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip
p.s.
What is wrong with my select statement ...?
The problem that jumps out is the bit of the pipeline just before the "select" filter:
.values | {name: .[].name, downloads: .[].downloads}
The use of .[] in this manner results in the Cartesian product being formed -- that is, the above expression will emit n*n JSON sets, where n is the length of .values. You evidently intended to write:
.values[] | {name: .name, downloads: .downloads}
which can be abbreviated to:
.values[] | {name, downloads}

Parsing multiple key/values in json tree with jq

Using jq, I'd like to cherry-pick key/value pairs from the following json:
{
"project": "Project X",
"description": "This is a description of Project X",
"nodes": [
{
"name": "server001",
"detail001": "foo",
"detail002": "bar",
"networks": [
{
"net_tier": "network_tier_001",
"ip_address": "10.1.1.10",
"gateway": "10.1.1.1",
"subnet_mask": "255.255.255.0",
"mac_address": "00:11:22:aa:bb:cc"
}
],
"hardware": {
"vcpu": 1,
"mem": 1024,
"disks": [
{
"disk001": 40,
"detail001": "foo"
},
{
"disk002": 20,
"detail001": "bar"
}
]
},
"os": "debian8",
"geo": {
"region": "001",
"country": "Sweden",
"datacentre": "Malmo"
},
"detail003": "baz"
}
],
"detail001": "foo"
}
For the sake of an example, I'd like to parse the following keys and their values: "Project", "name", "net_tier", "vcpu", "mem", "disk001", "disk002".
I'm able to parse individual elements without much issue, but due to the hierarchical nature of the full parse, I've not had much luck parsing down different branches (i.e. both networks and hardware > disks).
Any help appreciated.
Edit:
For clarity, the output I'm going for is a comma-separated CSV. In terms of parsing all combinations, covering the sample data in the example will do for now. I will hopefully be able to expand on any suggestions.

Here is a different filter which computes the unique set of network tier and disk names and then generates a result with columns appropriate to the data.
{
tiers: [ .nodes[].networks[].net_tier ] | unique
, disks: [ .nodes[].hardware.disks[] | keys[] | select(startswith("disk")) ] | unique
} as $n
| def column_names($n): [ "project", "name" ] + $n.tiers + ["vcpu", "mem"] + $n.disks ;
def tiers($n): [ $n.tiers[] as $t | .networks[] | if .net_tier==$t then $t else null end ] ;
def disks($n): [ $n.disks[] as $d | map(select(.[$d]!=null)|.[$d])[0] ] ;
def rows($n):
.project as $project
| .nodes[]
| .name as $name
| tiers($n) as $tier_values
| .hardware
| .vcpu as $vcpu
| .mem as $mem
| .disks
| disks($n) as $disk_values
| [$project, $name] + $tier_values + [$vcpu, $mem] + $disk_values
;
column_names($n), rows($n)
| #csv
The benfit of this approach becomes apparent if we add another node to the sample data:
{
"name": "server002",
"networks": [
{
"net_tier": "network_tier_002"
}
],
"hardware": {
"vcpu": 1,
"mem": 1024,
"disks": [
{
"disk002": 40,
"detail001": "foo"
}
]
}
}
Sample Run (assuming filter in filter.jq and amended data in data.json)
$ jq -Mr -f filter.jq data.json
"project","name","network_tier_001","network_tier_002","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001","",1,1024,40,20
"Project X","server002",,"network_tier_002",1,1024,,40
Try it online!

Here's one way you could achieve the desired output.
program.jq:
["project","name","net_tier","vcpu","mem","disk001","disk002"],
[.project]
+ (.nodes[] | .networks[] as $n |
[
.name,
$n.net_tier,
(.hardware |
.vcpu,
.mem,
(.disks | add["disk001","disk002"])
)
]
)
| #csv
$ jq -r -f program.jq input.json
"project","name","net_tier","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001",1,1024,40,20
Basically, you'll want to project the fields that you want into arrays so you may convert those arrays to csv rows. Your input makes it seem like there could potentially be multiple networks for a given node. So if you wanted to output all combinations, that would have to be flattened out.

Here's another approach, that is short enough to speak for itself:
def s(f): first(.. | f? // empty) // null;
[s(.project), s(.name), s(.net_tier), s(.vcpu), s(.mem), s(.disk001), s(.disk002)]
| #csv
Invocation:
$ jq -r -f value-pairs.jq input.json
Result:
"Project X","server001","network_tier_001",1,1024,40,20
With headers
Using the same s/1 as above:
. as $d
| ["project", "name", "net_tier", "vcpu", "mem", "disk001","disk002"]
| (., map( . as $v | $d | s(.[$v])))
| #csv
With multiple nodes
Again with s/1 as above:
.project as $p
| ["project", "name", "net_tier", "vcpu", "mem", "disk001","disk002"] as $h
| ($h,
(.nodes[] as $d
| $h
| map( . as $v | $d | s(.[$v]) )
| .[0] = $p)
) | #csv
Output with the illustrative multi-node data:
"project","name","net_tier","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001",1,1024,40,20
"Project X","server002","network_tier_002",1,1024,,40

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Chain select and max_by on json doc with jq in bash - json

Related

JQ to convert JSON to CSV for specific Keys

how to output all the keys and values from json using jq?

Combine JSON Field and add values using jq

How can I filter by a numeric field using jq?

Parsing multiple key/values in json tree with jq

Categories

Resources