Combine JSON Field and add values using jq - json

I have to aggregate a few JSON results from a site. Because the site has a query concurrency limit and the queries timeout, the time frame for the queries have to be divided. So I am left with a JSON as follows:
{
"results": [
[
{
"field": "AccountId",
"value": "11352"
},
{
"field": "number_of_requests",
"value": "241398"
}
],
[
{
"field": "AccountId",
"value": "74923"
},
{
"field": "number_of_requests",
"value": "238566"
}
]
],
"statistics": {
"recordsMatched": 502870.0,
"recordsScanned": 165908292.0,
"bytesScanned": 744173091162.0
},
"status": "Complete"
}
{
"results": [
[
{
"field": "AccountId",
"value": "11352"
},
{
"field": "number_of_requests",
"value": "185096"
}
]
],
"statistics": {
"recordsMatched": 502870.0,
"recordsScanned": 165908292.0,
"bytesScanned": 744173091162.0
},
"status": "Complete"
}
I need to aggregate the results, match the values to the number of requests and print out the result in descending Order.
Desired Output:
AccountID : Number of Requests
11352 : 426494
74923 : 238566
Current Output:
AccountID : Number of Requests
11352 : 241398
11352 : 185096
74923 : 238566
The jq query I am running currently takes the file name as ResultDir:
list=$(jq -S '.results[] | map( { (.field) : .value} ) | add ' $ResultsDir |
jq -s -c 'sort_by(.number_of_requests|tonumber) | reverse[] ' |
jq -r '"\(.AccountId) : \(.number_of_requests)"')
How do I combine the results of the same accounts before printing it out? The results also need to be in descending order of number of requests.

When possible, it's generally advisable to minimize the number of calls to jq. In this case, it's easy enough to achieve the desired output with just one call to jq.
Assuming the input is a valid stream of JSON objects along the lines shown in the Q, the following produces the desired output:
jq -nr '
[inputs | .results[] | map( { (.field) : .value} ) | add]
| group_by(.AccountId)
| map([.[0].AccountId, (map(.number_of_requests|tonumber) | add)])
| sort_by(.[1]) | reverse
| .[]
| join(" : ")
'

Related

How to parse nested json to csv using command line

I want to parse a nested json to csv. The data looks similar to this.
{"tables":[{"name":"PrimaryResult","columns":[{"name":"name","type":"string"},{"name":"id","type":"string"},{"name":"custom","type":"dynamic"}]"rows":[["Alpha","1","{\"age\":\"23\",\"number\":\"xyz\"}]]]}
I want csv file as:
name id age number
alpha 1 23 xyz
I tried:
jq -r ".tables | .[] | .columns | map(.name)|#csv" demo.json > demo.csv
jq -r ".tables | .[] | .rows |.[]|#csv" demo.json >> demo.csv
But I am not getting expected result.
Output:
name id custom
alpha 1 {"age":"23","number":"xyz}
Expected:
name id age number
alpha 1 23 xyz
Assuming valid JSON input:
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{ "name": "name", "type": "string" },
{ "name": "id", "type": "string" },
{ "name": "custom", "type": "dynamic" }
],
"rows": [
"Alpha",
"1",
"{\"age\":\"23\",\"number\":\"xyz\"}"
]
}
]
}
And assuming fixed headers:
jq -r '["name", "id", "age", "number"],
(.tables[].rows | [.[0,1], (.[2] | fromjson | .age, .number)])
| #csv' input.json
Output:
"name","id","age","number"
"Alpha","1","23","xyz"
If any of the assumptions is wrong, you need to clarify your requirements, e.g.
How are column names determined?
What happens if the input contains multiple tables?
As the "dynamic" object always of the same shape? Or can it sometimes contain fewer, more, or different columns?
Assuming that the .rows array is a 2D array of rows and fields, and that a column of type "dynamic" always expects a JSON-encoded object whose fields represent further columns but may or may not always be present in every row.
Then you could go with transposing the headers array and the rows array in order to integratively process each column by their type, especially collecting all keys from the "dynamic" type on the fly, and then transpose it back to get the row-based CSV output.
Input (I have added another row for illustration):
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{
"name": "name",
"type": "string"
},
{
"name": "id",
"type": "string"
},
{
"name": "custom",
"type": "dynamic"
}
],
"rows": [
[
"Alpha",
"1",
"{\"age\":\"23\",\"number\":\"123\"}"
],
[
"Beta",
"2",
"{\"age\":\"45\",\"word\":\"xyz\"}"
]
]
}
]
}
Filter:
jq -r '
.tables[] | [.columns, .rows[]] | transpose | map(
if first.type == "string" then first |= .name
elif first.type == "dynamic" then
.[1:] | map(fromjson)
| (map(keys[]) | unique) as $keys
| [$keys, (.[] | [.[$keys[]]])] | transpose[]
else empty end
)
| transpose[] | #csv
'
Output:
"name","id","age","number","word"
"Alpha","1","23","123",
"Beta","2","45",,"xyz"
Demo

use jq to format json data into csv data

{
"Users": [
{
"Attributes": [
{
"Name": "sub",
"Value": "1"
},
{
"Name": "phone_number",
"Value": "1234"
},
{
"Name": "referral_code",
"Value": "abc"
}
]
},
{
"Attributes": [
{
"Name": "sub",
"Value": "2"
},
{
"Name": "phone_number",
"Value": "5678"
},
{
"Name": "referral_code",
"Value": "def"
}
]
}
]
}
How can I produce output like below ?
1,1234,abc
2,5678,def
jq '.Users[] .Attributes[] .Value' test.json
produces
1
1234
abc
2
5678
def
Not sure this is the cleanest way to handle this, but the following will get the desired output:
.Users[].Attributes | map(.Value) | #csv
Loop through all the deep Attributes .Users[].Attributes
map() to get all the Value's
Convert to #csv
jqPlay demo
If you don't need the output to be guaranteed to be CSV, and if you're sure the "Name" values are presented in the same order, you could go with:
.Users[].Attributes
| from_entries
| [.[]]
| join(",")
To be safe though it would be better to ensure consistency of ordering:
(.Users[0] | [.Attributes[] | .Name]) as $keys
| .Users[]
| .Attributes
| from_entries
| [.[ $keys[] ]]
| join(",")
Using join(",") will produce the comma-separated values as shown in the Q (without the quotation marks), but is not guaranteed to produce the expected CSV for all valid values of the input. If you don't mind the pesky quotation marks, you could use #csv, or if you want to skip the quotation marks around all numeric values:
map(tonumber? // .) | #csv

using jq : how can i use the same search in other field without duplicate code?

I have the following json file for exemple:
{
"FOO": {
"name": "Donald",
"location": "Stockholm"
},
"BAR": {
"name": "Walt",
"location": "Stockholm"
},
"BAZ": {
"name": "Jack",
"location": "Whereever"
}
}
and i have this jq command :
cat json | jq .[] | {newname : select(.location=="Stockholm") | .name , contains_w : select(.location=="Stockholm") | .name | startswith("W")}
so i get the result :
{
"newname": "Donald",
"contains_w": false
}
{
"newname": "Walt",
"contains_w": true
}
my question is : is there any way to DRY my command ?
i mean how can i get the same result without duplicate the part :
select(.location=="Stockholm") | .name
how can i reuse the result of newname feild ?
i have a really big file to work with so i don't want to waste time and resources.
You are filtering multiple times during object construction. You could filter first and then do the construction on the filtered list eg.
map(select(.location=="Stockholm"))
| map({newname: .name, contains_w: (.name | startswith("W"))})
https://jqplay.org/s/aXjlgOEDnb

How can I filter by a numeric field using jq?

I am writing a script to query the Bitbucket API and delete SNAPSHOT artifacts that have never been downloaded. This script is failing because it gets ALL snapshot artifacts, the select for the number of downloads does not appear to be working.
What is wrong with my select statement to filter objects by the number of downloads?
Of course the more direct solution here would be if I could just query the Bitbucket API with a filter. To the best of my knowledge the API does not support filtering by downloads.
My script is:
#!/usr/bin/env bash
curl -X GET --user "me:mykey" "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads?pagelen=100" > downloads.json
# get all values | reduce the set to just be name and downloads | select entries where downloads is zero | select entries where name contains SNAPSHOT | just get the name
#TODO i screwed up the selection somewhere its returning files that contain SNAPSHOT regardless of number of downloads
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads==0) | select(.name | contains("SNAPSHOT")) | .name' downloads.json > snapshots_without_any_downloads.js
#unique sort, not sure why jq gives me multiple values
sort -u snapshots_without_any_downloads.js | tr -d '"' > unique_snapshots_without_downloads.js
cat unique_snapshots_without_downloads.js | xargs -t -I % curl -Ss -X DELETE --user "me:mykey" "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/%" > deleted_files.txt
A deidentified sample of the raw input from the API is:
{
"pagelen": 10,
"size": 40,
"values": [
{
"name": "myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip"
}
},
"downloads": 2,
"created_on": "2018-03-15T17:50:00.157310+00:00",
"user": {
"username": "me",
"display_name": "me",
"type": "user",
"uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/users/me"
},
"html": {
"href": "https://bitbucket.org/me/"
},
"avatar": {
"href": "https://bitbucket.org/account/me/avatar/32/"
}
}
},
"type": "download",
"size": 430894
},
{
"name": "myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip"
}
},
"downloads": 0,
"created_on": "2018-03-15T17:50:00.157310+00:00",
"user": {
"username": "me",
"display_name": "me",
"type": "user",
"uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/users/me"
},
"html": {
"href": "https://bitbucket.org/me/"
},
"avatar": {
"href": "https://bitbucket.org/account/me/avatar/32/"
}
}
},
"type": "download",
"size": 430894
},
{
"name": "myproject_1.0_mc_3.5.1.zip",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.1.zip"
}
},
"downloads": 5,
"created_on": "2018-03-15T17:49:14.885544+00:00",
"user": {
"username": "me",
"display_name": "me",
"type": "user",
"uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/users/me"
},
"html": {
"href": "https://bitbucket.org/me/"
},
"avatar": {
"href": "https://bitbucket.org/account/me/avatar/32/"
}
}
},
"type": "download",
"size": 430934
}
],
"page": 1,
"next": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads?pagelen=10&page=2"
}
The output I want from this snippet is myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip - that artifact is a SNAPSHOT and has zero downloads.
I have used this intermediate step to do some debugging:
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads>0) | select(.name | contains("SNAPSHOT")) | unique' downloads.json > snapshots_with_downloads.js
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads==0) | select(.name | contains("SNAPSHOT")) | .name' downloads.json > snapshots_without_any_downloads.js
#this returns the same values for each list!
diff unique_snapshots_with_downloads.js unique_snapshots_without_downloads.js
This adjustment gives a cleaner and unique structure, it suggests that theres some sort of splitting or streaming aspect of jq that I do not fully understand:
#this returns a "unique" array like I expect, adding select to this still does not produce the desired outcome
jq '.values | [{name: .[].name, downloads: .[].downloads}] | unique' downloads.json
The data after this step looks like this. It just removed the cruft I didn't need from the raw API response:
[
{
"name": "myproject_1.0_2400a51_mc_3.4.0.zip",
"downloads": 0
},
{
"name": "myproject_1.0_2400a51_mc_3.4.1.zip",
"downloads": 2
},
{
"name": "myproject_1.1-SNAPSHOT_391f4d5_mc_3.5.0.zip",
"downloads": 0
},
{
"name": "myproject_1.1-SNAPSHOT_391f4d5_mc_3.5.1.zip",
"downloads": 2
}
]
As I understand it:
You want globally unique outputs
You want only items with downloads==0
You want only items whose name contains "SNAPSHOT"
The following will accomplish that:
jq -r '
[.values[] | {(.name): .downloads}]
| add
| to_entries[]
| select(.value == 0)
| .key | select(contains("SNAPSHOT"))'
Rather than making unique an explicit step, this version generates a map from names to download counters (adding the values together -- which means that in case of conflicts, the last one wins), and thereby both ensures that the outputs are unique.
Given your test JSON, output is:
myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip
Applied to the overall problem context, this strategy can be used to simplify the overall process:
jq -r '[.values[] | {(.links.self.href): .downloads}] | add | to_entries[] | select(.value == 0) | .key | select(contains("SNAPSHOT"))'
It simplifies the overall process by acting on the URL to the file rather than the name only. This simplifies the subsequent DELETE call. The sort and tr calls can also be removed.
Here's a solution which sums up the .download values per .name before making the selection based on the total number of downloads:
reduce (.values[] | select(.name | contains("SNAPSHOT"))) as $v
({}; .[$v.name] += $v.downloads)
| with_entries(select(.value == 0))
| keys_unsorted[]
Example:
$ jq -r -f program.jq input.json
myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip
p.s.
What is wrong with my select statement ...?
The problem that jumps out is the bit of the pipeline just before the "select" filter:
.values | {name: .[].name, downloads: .[].downloads}
The use of .[] in this manner results in the Cartesian product being formed -- that is, the above expression will emit n*n JSON sets, where n is the length of .values. You evidently intended to write:
.values[] | {name: .name, downloads: .downloads}
which can be abbreviated to:
.values[] | {name, downloads}

Parsing aws ec2 describe-instances to get all private ip addresses

Two of my EC2 instances have 3 IPs each. I managed to successfully grab a list of JSON objects:
aws ec2 describe-instances | jq '.Reservations[] | .Instances[] | (.Tags | { "iname": ( map ( select(.Value | contains("my-vm")))[] | .Value ) } ) + ( { "ip": ( .NetworkInterfaces[].PrivateIpAddress) } )' | jq -s .
Giving me the following result:
[
{
"iname": "my-vm-b",
"ip": "10.11.2.145"
},
{
"iname": "my-vm-b",
"ip": "10.11.1.146"
},
{
"iname": "my-vm-b",
"ip": "10.11.10.144"
},
{
"iname": "my-vm-a",
"ip": "10.11.1.9"
},
{
"iname": "my-vm-a",
"ip": "10.11.10.125"
},
{
"iname": "my-vm-a",
"ip": "10.11.2.85"
}
]
and then I added to the command the following:
... | jq ' group_by(.iname)[] | {(.[0].iname): [.[] | .ip]}' | jq -s .
To finally get the list of objects the way I wanted:
[
{
"my-vm-a": [
"10.11.1.9",
"10.11.10.125",
"10.11.2.85"
]
},
{
"my-vm-b": [
"10.11.2.145",
"10.11.1.146",
"10.11.10.144"
]
}
]
Notice I had to call jq like 4 times. I know I must be doing something wrong so I was wondering if I could do it with a single jq call.
Thanks!
You can easily eliminate the calls to jq -s by wrapping expressions as appropriately in square brackets, or maybe better, using map.
For example, your last pair of jq calls can be simplified to:
jq 'group_by(.iname) | map({(.[0].iname): [.[] | .ip]})'
The following should allow you to reduce the four calls to one:
[.Reservations[]
| .Instances[]
| (.Tags | { "iname": ( map ( select(.Value | contains("my-vm")))[] | .Value ) } )
+ ( { "ip": ( .NetworkInterfaces[].PrivateIpAddress) } ) ]
| group_by(.iname) | map({(.[0].iname): [.[] | .ip]})
However, I would advise against using contains here, unless you fully understand the complications.
Before you go into trying to simplify your jq calls, I think it would be more beneficial to first look at the source data and how it relates to the result you want.
Ignoring a lot of the other details in the data, I think we can agree that your data looks sorta like this:
{
"Reservations": [
{
"Instances": [
{
"NetworkInterfaces": [
{ "PrivateIpAddress": "10.11.2.145" },
{ "PrivateIpAddress": "10.11.1.146" },
{ "PrivateIpAddress": "10.11.10.144" }
],
"Tags": [
{ "Key": "Name", "Value": "my-vm-b" }
]
},
{
"NetworkInterfaces": [
{ "PrivateIpAddress": "10.11.1.9" },
{ "PrivateIpAddress": "10.11.10.125" },
{ "PrivateIpAddress": "10.11.2.85" }
],
"Tags": [
{ "Key": "Name", "Value": "my-vm-a" }
]
}
]
}
]
}
With something that looks like this, your jq query can simply be:
[.Reservations[].Instances[] |
{
(.Tags|from_entries.Name): [.NetworkInterfaces[].PrivateIpAddress]
}
]
No intermediate results needed. Just a few things of note here.
The tags are already an array of key/value pairs, you can easily read values from here converting them to an object first using from_entries
You are selecting instances based on an existence of a tag value containing "my-vm". I'm not sure you even need to do this, I don't know what your data looks like but they are likely in a fixed name so you should just use that name.