How to merge and aggregate values in 2 JSON files using jq? - json

I am using jq in a shell script to manipulate JSON files.
I have 2 files and I'd like to merge them into one file while also aggregating (sum) the values when names in the name/value pairs are the same.
As an example:
Input1.json
[
{
"A": "Name 1",
"B": "1.1",
"C": "2"
},
{
"A": "Name 2",
"B": "3.2",
"C": "4"
}
]
Input2.json
[
{
"A": "Name 2",
"B": "5",
"C": "6"
},
{
"A": "Name 3",
"B": "7",
"C": "8"
}
]
Expected result:
Output.json
[
{
"A": "Name 1",
"B": "1.1",
"C": "2"
},
{
"A": "Name 2",
"B": "8.2",
"C": "10"
},
{
"A": "Name 3",
"B": "7",
"C": "8"
}
]
I can use other tools other than jq but prefer to ultimately keep the solution contained into a shell script I can call from the Terminal.
Any help is appreciated. Thank you.

I can use other tools other than jq but prefer to ultimately keep the solution contained into a shell script I can call from the Terminal.
You could give the JSON parser xidel a try:
$ xidel -se '
array{
let $src:=(json-doc("Input1.json")(),json-doc("Input2.json")())
for $name in distinct-values($src/A)
let $obj:=$src[A=$name]
return
if (count($obj) gt 1) then
map:merge(
$obj[1]() ! {
.:if ($obj[1](.) castable as decimal) then
string($obj[1](.) + $obj[2](.))
else
$obj[1](.)
}
)
else
$obj
}
'
Intermediate steps.

jq is beautiful for problems like this:
$ jq -n '
reduce inputs[] as {$A,$B,$C} ({};
.[$A] |= {
$A,
B: (.B + ($B|tonumber)),
C: (.C + ($C|tonumber))
}
)
| map({
A,
B: (.B|tostring),
C: (.C|tostring)
})
' input1.json input2.json
The first reduce creates a map from the different "A" values to the aggregated result object. Then given the mapping, converts back to an array of the result objects adjusting the types of the results.
jqplay

Here's one way, but there are others:
jq -s '
def to_n: tonumber? // null;
def merge_values($x;$y):
if $x == $y then $x
elif $x == null then $y
elif $y == null then $x
else ($x|to_n) as $xn
| if $xn then ($y|to_n) as $yn | ($xn+$yn)|tostring
else [$x, $y]
end
end;
def merge($x;$y):
reduce ($x + $y |keys_unsorted)[] as $k (null;
.[$k] = merge_values($x[$k]; $y[$k]) );
INDEX(.[0][]; .A) as $in1
| INDEX(.[1][]; .A) as $in2
| ($in1 + $in2|keys_unsorted) as $keys
| reduce $keys[] as $k ([];
. + [merge($in1[$k]; $in2[$k]) ])
' input1.json inut2.json

Related

Reduce nested json (PowerDNS stats)

I'm trying to improve on a jq reduce, but finding that some of the returned data is nested and the code I'm using breaks on that.
This is where I've got the jq code from: https://github.com/influxdata/telegraf/tree/master/plugins/inputs/powerdns_recursor
Taking tonumber off I get the following clipped output:
[...]
"x-ourtime8-16": "0",
"zone-disallowed-notify": "0",
"response-by-qtype": [
{
"name": "A",
"value": "8958"
},
{
"name": "NS",
"value": "6"
},
[...]
The original code, with tonumber left in:
curl -s -H 'X-API-Key: <key>' http://127.0.0.1:8082/api/v1/servers/localhost/statistics | jq 'reduce .[] as $item ({}; . + { ($item.name): ($item.value|tonumber)})'
The output I'm after:
[...]
"x-ourtime8-16": 0,
"zone-disallowed-notify": 0,
"response-by-qtype.A": 8958,
"response-by-qtype.NS": 6,
[...]
I've spent some time Googling jq and nested input, but I don't want the index numbers this gave me in the names. I'm hoping a small tweak will do the trick.
To transform this input :
{
"x-ourtime8-16": "0",
"zone-disallowed-notify": "0",
"response-by-qtype": [
{
"name": "A",
"value": "8958"
},
{
"name": "NS",
"value": "6"
}
]
}
You can run :
jq ' to_entries |
map(if (.value | type) == "string"
then .value |= tonumber
else .key as $key | .value[] |
.name |= $key+"."+. |
.value |= tonumber
end
) | from_entries
' input.json
to get :
{
"x-ourtime8-16": 0,
"zone-disallowed-notify": 0,
"response-by-qtype.A": 8958,
"response-by-qtype.NS": 6
}
You can convert numeric strings to numbers using:
if type == "string" then . as $in | try tonumber catch $in else . end
As a post-processing step, you could use walk as a wrapper:
walk(if type == "string" then . as $in | try tonumber catch $in else . end)

Reverse flatten nested arrays with jq

Suppose I have the following nested data structure
cat nested.json
[
{
"a": "a",
"b": [
{"c": "c"}
]
},
{
"a": "a",
"b": [
{"c": "c"}
]
}
]
I can flatten it like this
cat nested.json | jq '
[. as $in | reduce paths(scalars) as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) }
)]
' > flat.json
cat flat.json
[
{
"0.a": "a",
"0.b.0.c": "c",
"1.a": "a",
"1.b.0.c": "c"
}
]
To reverse the flatten operation with jq I tried this
cat flat.json | jq '
.[0] | reduce to_entries[] as $kv ({};
setpath($kv.key|split("."); $kv.value)
)
'
{
"0": {
"a": "a",
"b": {
"0": {
"c": "c"
}
}
},
"1": {
"a": "a",
"b": {
"0": {
"c": "c"
}
}
}
}
However, I want to convert numbers in the setpath param to create arrays. This doesn't quite work, but I think it's close?
cat flat.json | jq '
def makePath($s): [split(".")[] | if (test("\\d+")) then tonumber else . end];
.[0] | reduce to_entries[] as $kv ({}; setpath(makePath($kv.key); $kv.value))
'
jq: error (at <stdin>:8): split input and separator must be strings
The desired output is the same as the original data in nested.json
Wouldn't it be simpler to do it this way:
Encode your input with
jq '[path(.. | scalars) as $path | {($path | join(".")): getpath($path)}] | add' nested.json
{
"0.a": "a",
"0.b.0.c": "c",
"1.a": "a",
"1.b.0.c": "c"
}
And decode it with
jq 'reduce to_entries[] as $item (null; setpath($item.key / "." | map(tonumber? // .); $item.value))' flat.json
[
{
"a": "a",
"b": [
{
"c": "c"
}
]
},
{
"a": "a",
"b": [
{
"c": "c"
}
]
}
]
However, if you don't care about your special dot notation (e.g. "0.b.0.c") for the encoded keys, you can simply convert the path array into a JSON string instead, having albeit uglier virtually the same effect. Moreover, it would automatically enable the handling of input object field names that include dots (e.g. {"a.b":3}) or look like numbers (e.g. {"42":"Panic!"}).
Using JSON keys, encode your input with
jq '[path(.. | scalars) as $path | {($path | tojson): getpath($path)}] | add' nested.json
{
"[0,\"a\"]": "a",
"[0,\"b\",0,\"c\"]": "c",
"[1,\"a\"]": "a",
"[1,\"b\",0,\"c\"]": "c"
}
And decode it with
jq 'reduce to_entries[] as $item (null; setpath($item.key | fromjson; $item.value))' flat.json
[
{
"a": "a",
"b": [
{
"c": "c"
}
]
},
{
"a": "a",
"b": [
{
"c": "c"
}
]
}
]

jq: sort object values

I want to sort this data structure by the object keys (easy with -S and sort the object values (the arrays) by the 'foo' property.
I can sort them with
jq -S '
. as $in
| keys[]
| . as $k
| $in[$k] | sort_by(.foo)
' < test.json
... but that loses the keys.
I've tried variations of adding | { "\($k)": . }, but then I end up with a list of objects instead of one object. I also tried variations of adding to $in (same problem) or using $in = $in * { ... }, but that gives me syntax errors.
The one solution I did find was to just have the separate objects and then pipe it into jq -s add, but ... I really wanted it to work the other way. :-)
Test data below:
{
"": [
{ "foo": "d" },
{ "foo": "g" },
{ "foo": "f" }
],
"c": [
{ "foo": "abc" },
{ "foo": "def" }
],
"e": [
{ "foo": "xyz" },
{ "foo": "def" }
],
"ab": [
{ "foo": "def" },
{ "foo": "abc" }
]
}
Maybe this?
jq -S '.[] |= sort_by(.foo)'
Output
{
"": [
{
"foo": "d"
},
{
"foo": "f"
},
{
"foo": "g"
}
],
"ab": [
{
"foo": "abc"
},
{
"foo": "def"
}
],
"c": [
{
"foo": "abc"
},
{
"foo": "def"
}
],
"e": [
{
"foo": "def"
},
{
"foo": "xyz"
}
]
}
#user197693 had a great answer. A suggestion I got in a private message elsewhere was to use
jq -S 'with_entries(.value |= sort_by(.foo))'
If for some reason using the -S command-line option is not a satisfactory option, you can also perform the by-key sort using the to_entries | sort_by(.key) | from_entries idiom. So a complete solution to the problem would be:
.[] |= sort_by(.foo)
| to_entries | sort_by(.key) | from_entries

How to substitute strings in JSON with jq based on the input

Given the input in this form
[
{
"DIR" : "/foo/bar/a/b/c",
"OUT" : "/foo/bar/x/y/z",
"ARG" : [ "aaa", "bbb", "/foo/bar/a", "BASE=/foo/bar" ]
},
{
"DIR" : "/foo/baz/d/e/f",
"OUT" : "/foo/baz/x/y/z",
"ARG" : [ "ccc", "ddd", "/foo/baz/b", "BASE=/foo/baz" ]
},
{
"foo" : "bar"
}
]
I'm trying to find out how to make jq transform that into this:
[
{
"DIR" : "BASE/a/b/c",
"OUT" : "BASE/x/y/z",
"ARG" : [ "aaa", "bbb", "BASE/a", "BASE=/foo/bar" ]
},
{
"DIR" : "BASE/d/e/f",
"OUT" : "BASE/x/y/z",
"ARG" : [ "ccc", "ddd", "BASE/b", "BASE=/foo/baz" ]
},
{
"foo" : "bar"
}
]
In other words, objects having an "ARG" array, containing a string that starts with "BASE=" should use the string after "BASE=", e.g. "/foo" to substitute other string values that start with "/foo" (except the "BASE=/foo" which should remain unchanged")
I'm not even close to finding a solution myself, and at this point I'm unsure that jq alone will do the job.
With jq:
#!/usr/bin/jq -f
# fix-base.jq
def fix_base:
(.ARG[] | select(startswith("BASE=")) | split("=")[1]) as $base
| .DIR?|="BASE"+ltrimstr($base)
| .OUT?|="BASE"+ltrimstr($base)
| .ARG|=map(if startswith($base) then "BASE"+ltrimstr($base) else . end)
;
map(if .ARG? then fix_base else . end)
You can run it like this:
jq -f fix-base.jq input.json
or make it an executable like this:
chmod +x fix-base.jq
./fix-base.jq input.json
Don't worry, jq alone will do the job:
jq 'def sub_base($base): if (startswith("BASE") | not) then sub($base; "BASE") else . end;
map(if .["ARG"] then ((.ARG[] | select(startswith("BASE=")) | split("=")[1]) as $base
| to_entries
| map(if (.value | type == "string") then .value |= sub_base($base)
else .value |= map(sub_base($base)) end)
| from_entries)
else . end)' input.json
The output:
[
{
"DIR": "BASE/a/b/c",
"OUT": "BASE/x/y/z",
"ARG": [
"aaa",
"bbb",
"BASE/a",
"BASE=/foo/bar"
]
},
{
"DIR": "BASE/d/e/f",
"OUT": "BASE/x/y/z",
"ARG": [
"ccc",
"ddd",
"BASE/b",
"BASE=/foo/baz"
]
},
{
"foo": "bar"
}
]
Some helper functions make the going much easier. The first is generic and worthy perhaps of your standard library:
# Returns the integer index, $i, corresponding to the first element
# at which f is truthy, else null
def indexof(f):
label $out
| foreach .[] as $x (null; .+1;
if ($x|f) then (.-1, break $out) else empty end) // null;
# Change the string $base to BASE using gsub
def munge($base):
if type == "string" and (test("^BASE=")|not) then gsub($base; "BASE")
elif type=="array" then map(munge($base))
elif type=="object" then map_values(munge($base))
else .
end;
And now the easy part:
map(if has("ARG")
then (.ARG|indexof(test("^BASE="))) as $ix
| if $ix
then (.ARG[$ix]|sub("^BASE=";"")) as $base | munge($base)
else . end
else . end )
Some points to note:
You may wish to use sub rather than gsub in munge;
The above solution assumes you want to make the change in all keys, not just "DIR", "OUT", and "ARG"
The above solution allows specifications of BASE that include one or more occurrences of "=".

how to use jq to merge/join/concat values with the same key

I need to aggregate values by key. Example JSON input is:
$ cat json | jq
[
{
"key": "john",
"value": "ontario"
},
{
"key": "ryan",
"value": "chicago"
},
{
"key": "ryan",
"value": "illinois"
},
{
"key": "john",
"value": "toronto"
},
]
Is it possible and if so how to merge/join/concat values with the same key so that the result is:
[
{
"key": "john",
"value": "toronto ontario"
},
{
"key": "ryan",
"value": "illinois chicago"
},
]
I am targetting JQ specifically because of its ease of use from cfengine.
Group the pairs by key, then combine the values.
group_by(.key) | map({key:.[0].key,value:(map(.value) | join(" "))})
For this type of problem, I prefer to avoid the overhead of sorting, and to guarantee that the ordering of the objects in the input is respected.
Here's one approach that assumes the values associated with "key" and "value" are all strings (as is the case in the example). This assumption makes it easy to avoid an inefficient lookup:
def merge_by_key(separator):
reduce .[] as $o
({}; $o["key"] as $k
| if .[$k] then .[$k] += (separator + $o["value"])
else .[$k] = $o["value"] end);
merge_by_key(" ") | to_entries
Output:
[{"key":"john","value":"ontario toronto"},
{"key":"ryan","value":"chicago illinois"}]
Generic solution
def merge_at_key(separator):
reduce .[] as $o
([];
$o["key"] as $k
| (map(.key) | index($k)) as $i
| if $i then (.[$i] | .value) += (separator + $o["value"])
else . + [$o] end);