jq summarize similar elements - json

I want to use jq to summarize elements with the same description field value.
So that each unique field value of description gets it's own element with the amount field summarized.
I'm using jq 1.5
Before:
{ "frames":
[
{ "description": "Stuff", "amount": 8 },
{ "description": "Stuff", "amount": 4 },
{ "description": "other_stuff", "amount": 2 },
{ "description": "more_stuff", "amount": 20 }
]
}
After:
{ "frames":
[
{ "description": "Stuff": 12 },
{ "description": "other_stuff", "amount": 2 },
{ "description": "more_stuff", "amount": 20 }
]
}

With your input, the following program produces valid JSON in the spirit of the question:
def sum_by(f;g): reduce .[] as $x ({}; .[$x|f] += ($x|g));
.frames |= sum_by(.description; .amount)
Using sum_by is more efficient than using group_by for various reasons.
Output
{
"frames": {
"Stuff": 12,
"other_stuff": 2,
"more_stuff": 20
}
}
Variant
If you want description and amount in the output, you could tweak the above as follows:
.frames |=
(sum_by(.description; .amount)
| to_entries
| map( {description: .key, amount: .value} ))
The output would then be:
{
"frames": [
{
"description": "Stuff",
"amount": 12
},
{
"description": "other_stuff",
"amount": 2
},
{
"description": "more_stuff",
"amount": 20
}
]
}

Related

How to select max value by condition and then compare it with others?

I have a backup set that is described by json. Sample is below.
I want to count how much increment backups were added since the last full backup.
I try to select max timestamp of the record with type "full" so after that i will count how much records with type "incr" has the bigger timestamp.
{
"archive": [
{
"database": {
"id": 1
},
"id": "11-1",
"max": "0000000A000018B90000006A",
"min": "0000000A0000167D000000C7"
}
],
"backup": [
{
"archive": {
"start": "0000000A0000181600000030",
"stop": "0000000A0000181C00000083"
},
"backrest": {
"format": 5,
"version": "2.28"
},
"database": {
"id": 1
},
"info": {
"delta": 417875448942,
"repository": {
"delta": 67466720725,
"size": 67466720725
},
"size": 417875448942
},
"label": "20201213-200009F",
"prior": null,
"reference": null,
"timestamp": {
"start": 1607878809,
"stop": 1607896232
},
"type": "full"
},
{
"archive": {
"start": "0000000A0000182900000065",
"stop": "0000000A0000182F00000069"
},
"backrest": {
"format": 5,
"version": "2.28"
},
"database": {
"id": 1
},
"info": {
"delta": 122520170241,
"repository": {
"delta": 19316550760,
"size": 67786280115
},
"size": 416998156028
},
"label": "20201213-200009F_20201214-200009I",
"prior": "20201213-200009F",
"reference": [
"20201213-200009F"
],
"timestamp": {
"start": 1607965209,
"stop": 1607974161
},
"type": "incr"
},
{
"archive": {
"start": "0000000A0000185B000000DD",
"stop": "0000000A0000185B000000F4"
},
"backrest": {
"format": 5,
"version": "2.28"
},
"database": {
"id": 1
},
"info": {
"delta": 126982395984,
"repository": {
"delta": 19541379733,
"size": 67993072945
},
"size": 421395153101
},
"label": "20201213-200009F_20201217-200105I",
"prior": "20201213-200009F_20201214-200009I",
"reference": [
"20201213-200009F",
"20201213-200009F_20201214-200009I"
],
"timestamp": {
"start": 1608224465,
"stop": 1608233408
},
"type": "incr"
}
]
}
I tried to complete first part by this command but it says that "number (1607896232) and number (1607896232) cannot be iterated over"
.[0] |.backup[] | select(.type=="full").timestamp.stop|max
I tried sort_by but has no luck. So what am I doing wrong here?
With the aid of a generic helper-function for counting, here's a complete solution, assuming you want to count based on .timestamp.start:
def count(s): reduce s as $x (0; .+1);
.backup
| (map( select( .type == "full" ).timestamp.stop) | max) as $max
| count(.[] | select( .type == "incr" and .timestamp.start > $max))
Using max/1
For large arrays, it would probably be more efficient to use a streaming version of max:
def count(s): reduce s as $x (0; .+1);
# Note: max(empty) #=> null
def max(s):
reduce s as $s (null; if $s > .m then $s else . end);
.backup
| max(.[] | select( .type == "full" ).timestamp.stop) as $max
| count(.[] | select( .type == "incr" and .timestamp.start > $max))
max expects an array.
[ .backup[] | select( .type == "full" ).timestamp.stop ] | max
Test
or
.backup | map( select( .type == "full" ).timestamp.stop ) | max
Test
So, after I solved problem with getting concrete value (thanks #ikegami), I solved my entire question by this way
jq '(.[0] |[.backup[] | select(.type=="full").timestamp.stop]|max) as $i| [.[0] |.backup[] | select(.type=="incr" and .timestamp.stop>$i)]|length
Not sure if it made optimal, but it works anyway.
Here's also an alternative (non-jq) solution how to achieve the same JSON query with jtc tool:
bash $ <input.json jtc -jw'[timestamp]:<>G:[-1][type]' / -w'<full><>k'
2
PS. I'm a developer of jtc unix JSON processor
PPS. the above disclaimer is required by SO.

Storing top level JSON key for use after reshaping JSON with jq

I'm playing around with some MTGJSON data and I'm trying to convert data from a file called AllPrintings.json that looks like:
{
"10E": {
"cards": [
{
"name": "Abundance",
"prices": {
"paper": {
"2020-06-11": 1.4
},
"paperFoil": {
"2020-06-11": 31.12
}
},
"uuid": "1669af17-d287-5094-b005-4b143441442f"
},
{
"name": "Academy Researchers",
"prices": {
"paper": {
"2020-06-11": 0.36
},
"paperFoil": {
"2020-06-11": 1.22
}
},
"uuid": "047d5499-a21c-5f5c-9679-1599fcaf9815"
}
]
},
"BFZ": {
"cards": [
{
"name": "Adverse Conditions",
"prices": {
"paper": {
"2020-06-11": 0.23
},
"paperFoil": {
"2020-06-11": 1.86
}
},
"uuid": "1669af17-d287-5094-b005-4b143441123"
},
{
"name": "Akoum Firebird",
"prices": {
"paper": {
"2020-06-11": 0.51
},
"paperFoil": {
"2020-06-11": 3.85
}
},
"uuid": "047d5499-a21c-5f5c-9679-1599fcafad567"
}
]
}
}
Into:
{
{
"name": "Abundance",
"price": 1.4,
"uuid": "1669af17-d287-5094-b005-4b143441442f",
"set": "10E"
},
{
"name": "Academy Researchers",
"price": 0.36,
"uuid": "047d5499-a21c-5f5c-9679-1599fcaf9815",
"set": "10E"
},
{
"name": "Adverse Conditions",
"price": 0.23,
"uuid": "1669af17-d287-5094-b005-4b143441123",
"set": "BFZ"
},
{
"name": "Akoum Firebird",
"price": 0.51,
"uuid": "047d5499-a21c-5f5c-9679-1599fcafad567",
"set": "BFZ"
},
}
I'm able to get everything except the set by running
cat AllPrintings.json | jq '[.[] | .cards | .[] | {uuid: .uuid, name: .name, price: .prices.paper | .[]? }]'
which returns
{
{
"name": "Abundance",
"price": 1.4,
"uuid": "1669af17-d287-5094-b005-4b143441442f",
},
{
"name": "Academy Researchers",
"price": 0.36,
"uuid": "047d5499-a21c-5f5c-9679-1599fcaf9815",
},
{
"name": "Adverse Conditions",
"price": 0.23,
"uuid": "1669af17-d287-5094-b005-4b143441123",
},
{
"name": "Akoum Firebird",
"price": 0.51,
"uuid": "047d5499-a21c-5f5c-9679-1599fcafad567",
},
}
I've tried storing the top level keys as $k and can get an array of the keys in a separate command but I'm unable to keep iterating over the original data afterwards. I've tried the comma separator but I get errors or the query hangs. For instance
cat AllPrintings.json | jq ‘. | keys as $k, [.[] | .cards | .[] | {uuid: .uuid, name: .name, price: .prices.paper, key: $k | .[]? }]'
I've search here and have read through the jq documentation but I'm not sure exactly what I'm looking for. I'm also likely overthinking this or missing an obvious solution. Any help would be appreciated. If this is a duplicate question please link me to the original and I'll delete my post.
Thanks.
You were close. The missing part is to_entries, which converts an object into {key, value} pairs:
to_entries | map(.key as $set | .value.cards[] | {uuid, name, price: .prices.paper | .[]?, set: $set })

Processing JSON with jq - handling array index/name into output

I'm trying to use jq to parse a JSON file for me. I want to get a value from a definition header into the output data in place of an index. A simplified example:
{
"header": {
"type": {
"0": {
"name": "Cats"
},
"3": {
"name": "Dogs"
}
}
},
"data": [
{
"time": "2019-01-01T02:00:00Z",
"reading": {
"0": {"value": 90, "note": "start" },
"3": {"value": 100 }
}
}
]
}
Using a jq command like jq '.data[] | {time: .time, data: .reading[]}' gives me:
"time": "2019-01-01T02:00:00Z",
"data": {
"value": 90,
"note": "start"
}
}
{
"time": "2019-01-01T02:00:00Z",
"data": {
"value": 100
}
}
I need to get "Cats" or "Dogs" into the result, heading towards an SQL insert.
Something like:
{
"time": "2019-01-01T02:00:00Z",
"data": {
"type: "Cats", <- line added
"value": 90,
"note": "start"
}
}
...
Or better yet:
{
"time": "2019-01-01T02:00:00Z",
"Cats": { <- label set to "Cats" instead of "data"
"value": 90,
"note": "start"
}
}
...
Is there a way I can get - what I see as the array index "0" or "3" - to be added as "Cats" or "Dogs"?
Using the built-in function, INDEX, for creating a dictionary allows a straightforward solution as follows:
(.header.type
| INDEX(to_entries[]; .key)
| map_values(.value.name)) as $dict
| .data[]
| (.reading | keys_unsorted[]) as $k
| {time} + { ($dict[$k]) : .reading[$k] }
Output
{
"time": "2019-01-01T02:00:00Z",
"Cats": {
"value": 90,
"note": "start"
}
}
{
"time": "2019-01-01T02:00:00Z",
"Dogs": {
"value": 100
}
}

Extract keys and data and write an array

Given the following json source
{
"pages":{
"yomama/first key": {
"data": {
"fieldset": "lesson-video-overview",
"title": "5th Grade Math - Interpreting Fractions",
},
"order": 4
},
"yomama/second key": {
"data": {
"fieldset": "lesson-video-clip-single",
"title": "Post-Lesson Debrief Part 5",
},
"order": 14
},
"yopapa/Third key": {
"data": {
"fieldset": "lesson-video-clip-single",
"title": "Lesson Part 2B",
},
"order": 6
}
}
}
How could I output an array-type output like this? The main challenge for me is extracting the key e.g. "yomama/first key" and in the ideal world, I can filter like just give me an array of those keys that start with "yomama" (but not yopapa)
[
{
"url" : "yomama/first key",
"data": {
"fieldset": "lesson-video-overview",
"title": "5th Grade Math - Interpreting Fractions",
},
"order": 4
},
{
"url" : "yomama/second key",
"data": {
"fieldset": "lesson-video-clip-single",
"title": "Post-Lesson Debrief Part 5",
},
"order": 14
},
{
"url" : "yopapa/Third key",
"data": {
"fieldset": "lesson-video-clip-single",
"title": "Lesson Part 2B",
},
"order": 6
}
]
Assuming the input is in so.json and corrected to well-formatted JSON you may use:
jq '[.pages | to_entries[] | {"url": .key, "data": .value.data, "order": .value.order}]' < so.json
Here's a solution that does not require being explicit about including all the other keys:
.pages
| [ to_entries[]
| select(.key | startswith("yomama"))
| {url: .key} + .value ]

reformat output of hierarchical data, based on an element

Given the following json, which contains hierarchical data, i need to convert the following flat structure into a parent child json output format:
[{
"ID": 1042,
"NameID": "200",
"Name": "related",
"path": "1042"
}, {
"ID": 1561,
"NameID": " 230",
"Name": "Patr",
"FatherID": 1042,
"path": "1042\/1561"
}, {
"ID": 1370,
"NameID": " 230",
"Name": "Dog",
"FatherID": 1561,
"path": "1042\/1561\/1370"
}, {
"ID": 1560,
"NameID": " 230.1",
"Name": "Ort",
"FatherID": 1561,
"path": "1042\/1561\/1560"
}, {
"ID": 213,
"NameID": " 232",
"Name": "Jim",
"FatherID": 1561,
"path": "1042\/1561\/213"
}]
How i could get an output like below, based on the path hierarchy?:
i have replaced the first values only, since i need to show that the depth may go on and on...
[
{
"200": "related",
"Children": [
{
" 230": "Patr",
"Children": [
{
"230.1": "Ort",
"Children": [
{
"NameID": "Name",
"Children": [
{
"NameID": "Name",
"children": [
{
"NameID": "Name"
},
{
"NameID": "Name"
}
]
},
{
"NameID": "Name",
"children": [
{
"NameID": "Name"
}
]
}
]
}
]
}
]
}
]
}
The key to the following solution is to convert the flat array into a hierarchical structure. We use setpath to do this as follows:
reduce .[] as $element ({};
setpath($element | .path | split("\/");
$element | {NameID, Name}))
With your input, this produces the following:
{
"1042": {
"NameID": "200",
"Name": "related",
"1561": {
"NameID": " 230",
"Name": "Patr",
"1370": {
"NameID": " 230",
"Name": "Dog"
},
"1560": {
"NameID": " 230.1",
"Name": "Ort"
},
"213": {
"NameID": " 232",
"Name": "Jim"
}
}
}
}
Now it's just a question of munging, which can be done using the following helper function:
def promote:
. as $in
| (if .NameID then {(.NameID): .Name } else {} end) as $base
| del(.NameID) | del(.Name)
| if length == 0 then $base
else $base + {Children: (reduce keys_unsorted[] as $k ([]; . + [$in[$k] | promote] ))}
end;
With this def, the solution becomes:
reduce .[] as $element ({};
setpath($element | .path | split("\/");
$element | {NameID, Name}))
| promote
| .Children
Output
[
{
"200": "related",
"Children": [
{
" 230": "Patr",
"Children": [
{
" 230": "Dog"
},
{
" 230.1": "Ort"
},
{
" 232": "Jim"
}
]
}
]
}
]