Merging property values of objects with common key with jq - json

I have an array of objects with 2 properties, say "key" and "value":
[
{key: 1, value: a},
{key: 2, value: b},
{key: 1, value: c}
]
Now, I would like to merge the values of the "value" properties of objects with the same "key" property value. That is the previous array is transformed into:
[
{key: 1, value: [a, c]},
{key: 2, value: [b]}
]
I tried something like:
$ echo '[{"key": "1", "val": "a"}, {"key": "2", "val": "b"}, {"key": "1", "val": "c"}]' | jq '. | group_by(.["key"]) | .[] | reduce .[] as $in ({"val": []}; {"key": $in.key, "val": [$in.val] + .["val"]})'
But it triggers a jq syntax error and I have no idea why. I am stuck.
Any idea ?
Thanks
B

Your approach using reduce could be sanitized to
jq 'group_by(.["key"]) | .[] |= reduce .[] as $in (
{value: []}; .key = $in.key | .value += [$in.value]
)'
[
{
"value": [
"a",
"c"
],
"key": 1
},
{
"value": [
"b"
],
"key": 2
}
]
Demo
Another approach using map would be
jq 'group_by(.key) | map({key: .[0].key, value: map(.value)})'
[
{
"key": 1,
"value": [
"a",
"c"
]
},
{
"key": 2,
"value": [
"b"
]
}
]
Demo

Related

How to combine key values from list of dictionaries using jq [duplicate]

I have an array of objects with 2 properties, say "key" and "value":
[
{key: 1, value: a},
{key: 2, value: b},
{key: 1, value: c}
]
Now, I would like to merge the values of the "value" properties of objects with the same "key" property value. That is the previous array is transformed into:
[
{key: 1, value: [a, c]},
{key: 2, value: [b]}
]
I tried something like:
$ echo '[{"key": "1", "val": "a"}, {"key": "2", "val": "b"}, {"key": "1", "val": "c"}]' | jq '. | group_by(.["key"]) | .[] | reduce .[] as $in ({"val": []}; {"key": $in.key, "val": [$in.val] + .["val"]})'
But it triggers a jq syntax error and I have no idea why. I am stuck.
Any idea ?
Thanks
B
Your approach using reduce could be sanitized to
jq 'group_by(.["key"]) | .[] |= reduce .[] as $in (
{value: []}; .key = $in.key | .value += [$in.value]
)'
[
{
"value": [
"a",
"c"
],
"key": 1
},
{
"value": [
"b"
],
"key": 2
}
]
Demo
Another approach using map would be
jq 'group_by(.key) | map({key: .[0].key, value: map(.value)})'
[
{
"key": 1,
"value": [
"a",
"c"
]
},
{
"key": 2,
"value": [
"b"
]
}
]
Demo

jq ~ collapse specific single object arrays?

corresponding to jq ~ is there a better way to collapse single object arrays? and R: Nested data.table to JSON
how do I collapse only specific elements?
I want to get rid of the "group" arrays in
[
{
"id2": "A",
"group": [
{
"data": [
{
"id1": 1,
"group": [
{
"data": [
{
"a": 1,
"b": 1
},
{
"a": 2,
"b": 2
}
],
"type": "test"
}
],
"type": "B"
}
],
"type": "C"
}
]
},
{
"id2": "C",
"group": [
{
"data": [
{
"id1": 3,
"group": [
{
"data": [
{
"a": 1,
"b": 1
}
],
"type": "test"
}
],
"type": "B"
}
],
"type": "C"
}
]
}
]
desired output
[{
"id2": "A",
"group": {
"data": [{
"id1": 1,
"group": {
"data": [{
"a": 1,
"b": 1
},
{
"a": 2,
"b": 2
}
],
"type": "test"
},
"type": "B"
}],
"type": "C"
}
},
{
"id2": "C",
"group": {
"data": [{
"id1": 3,
"group": {
"data": [{
"a": 1,
"b": 1
}],
"type": "test"
},
"type": "B"
}],
"type": "C"
}
}
]
The line 'walk(if type=="array" and length==1 then .[0] else . end)' additionally removes the array from the single "data" object.
Unfortunately, we are not able to install the jq 1.6 version on our RStudio Server und thereby I'm not able to use the walk function. (Although is working perfectly fine on my local system)
Can anybody help me out with an alternative solution without walk? Would be highly appreciated.
edit
Ok I got it. I can manually add the walk function such as:
'def walk(f):
. as $in
| if type == "object" then
reduce keys_unsorted[] as $key
( {}; . + { ($key): ($in[$key] | walk(f)) } ) | f
elif type == "array" then map( walk(f) ) | f
else f
end; walk(if type=="object"
and has("group")
and (.group | type)=="array"
and (.group | length)==1
then .group = .group[0]
else . end)'
We could operate one level higher in the nesting hierarchy, and test for "group" being a key, then update accordingly .group = .group[0] instead of . = .[0]
jq 'walk(if type=="object"
and has("group")
and (.group | type)=="array"
and (.group | length)==1
then .group = .group[0]
else . end)'

Parse 2 files based on key value and recreate another json file [JQ]

I am new to JQ.
I need to make a json file based on another 2 files.
I am worked with it whole day and stack here. Badly need this.
Here is file 1
{
"name": "foo",
"key": "1",
"id": "x"
}
{
"name": "bar",
"key": "2",
"id": "x"
}
{
"name": "baz",
"key": "3",
"id": "y"
}
file 2
{
"name": "a",
"key": "1"
}
{
"name": "b",
"key": "1"
}
{
"name": "c",
"key": "2"
}
{
"name": "d",
"key": "2"
}
{
"name": "e",
"key": "3"
}
Expected Result:
{
"x": {
"foo": [
"a",
"b"
],
"bar": [
"c",
"d"
]
},
"y": {
"baz": [
"e"
]
}
}
I can do it with python script but I need it with jq.
Thanks in advance.
Use reduce on the first file's items ($i) to successively build up the result object using setpath with fields from the item and values as a matching map on the secondary dictionary file ($d).
jq -s --slurpfile d file2 '
reduce .[] as $i ({}; setpath(
[$i.id, $i.name];
[$d[] | select(.key == $i.key).name]
))
' file1
For efficiency, the following solution first constructs a "dictionary" based on file2; furthermore, it does so without having to "slurp" it.
< file2 jq -nc --slurpfile file1 file1 '
(reduce inputs as {$name, $key} ({};
.[$key] += [$name])) as $dict
| reduce $file1[] as {$name, $key, $id} ({};
.[$id] += [ {($name): $dict[$key]} ] )
'

Streaming without truncating

I have json data of the form below. I want to transform it, making the key of each record into a field of that record in a streaming fashion. My problem: I don't know how to do that without truncating the key and losing it. I have inferred the required structure of the stream, see at the bottom.
Question: how do I transform the input data into a stream without losing the key?
Data:
{
"foo" : {
"a" : 1,
"b" : 2
},
"bar" : {
"a" : 1,
"b" : 2
}
}
A non-streaming transformation uses:
jq 'with_entries(.value += {key}) | .[]'
yielding:
{
"a": 1,
"b": 2,
"key": "foo"
}
{
"a": 1,
"b": 2,
"key": "bar"
}
Now, if my data file is very very large, I'd prefer to stream:
jq -ncr --stream 'fromstream(1|truncate_stream(inputs))`
The problem: this truncates the keys "foo" and "bar". On the other hand, not truncating the stream and just calling fromstream(inputs) is pretty meaningless: this makes the whole --stream part a no-op and jq reads everything into memory.
The structure of the stream is the following, using . | tostream:
[
[
"foo",
"a"
],
1
]
[
[
"foo",
"b"
],
2
]
[
[
"foo",
"b"
]
]
[
[
"bar",
"a"
],
1
]
[
[
"bar",
"b"
],
2
]
[
[
"bar",
"b"
]
]
[
[
"bar"
]
]
while with truncation, . as $dot | (1|truncate_stream($dot | tostream)), the structure is:
[
[
"a"
],
1
]
[
[
"b"
],
2
]
[
[
"b"
]
]
[
[
"a"
],
1
]
[
[
"b"
],
2
]
[
[
"b"
]
]
So it looks like that in order for me to construct a stream the way I need it, I will have to generate the following structure (I have inserted a [["foo"]] after the first record is finished):
[
[
"foo",
"a"
],
1
]
[
[
"foo",
"b"
],
2
]
[
[
"foo",
"b"
]
]
[
[
"foo"
]
]
[
[
"bar",
"a"
],
1
]
[
[
"bar",
"b"
],
2
]
[
[
"bar",
"b"
]
]
[
[
"bar"
]
]
Making this into a string jq can consume, I indeed get what I need (see also the snippet here: https://jqplay.org/s/iEkMfm_u92):
fromstream([ [ "foo", "a" ], 1 ],[ [ "foo", "b" ], 2 ],[ [ "foo", "b" ] ],[["foo"]],[ [ "bar", "a" ], 1 ],[ [ "bar", "b" ], 2 ],[ [ "bar", "b" ] ],[ [ "bar" ] ])
yielding:
{
"foo": {
"a": 1,
"b": 2
}
}
{
"bar": {
"a": 1,
"b": 2
}
}
The final result (see https://jqplay.org/s/-UgbEC4BN8) would be:
fromstream([ [ "foo", "a" ], 1 ],[ [ "foo", "b" ], 2 ],[ [ "foo", "b" ] ],[["foo"]],[ [ "bar", "a" ], 1 ],[ [ "bar", "b" ], 2 ],[ [ "bar", "b" ] ],[ [ "bar" ] ]) | with_entries(.value += {key}) | .[]
yielding
{
"a": 1,
"b": 2,
"key": "foo"
}
{
"a": 1,
"b": 2,
"key": "bar"
}
A generic function, atomize(s), for converting objects to key-value objects is provided in the jq Cookbook. Using it, the solution to the problem here is simply:
atomize(inputs) | to_entries[] | .value + {key}
({key} is shorthand for {key: .key}.)
For reference, here is the def:
atomize(s)
# Convert an object (presented in streaming form as the stream s) into
# a stream of single-key objects
# Example:
# atomize(inputs) (used in conjunction with "jq -n --stream")
def atomize(s):
fromstream(foreach s as $in ( {previous:null, emit: null};
if ($in | length == 2) and ($in|.[0][0]) != .previous and .previous != null
then {emit: [[.previous]], previous: ($in|.[0][0])}
else { previous: ($in|.[0][0]), emit: null}
end;
(.emit // empty), $in
) ) ;

Merge two complex JSON objects with arrays

I have the two following json as input:
{
"one": {
"vars": [
{
"name": "a",
"value": "a"
},
{
"name": "b",
"value": "b"
}
]
},
"two": {
"vars": [
{
"name": "c",
"value": "c"
},
{
"name": "d",
"value": "d"
}
]
},
"extras": "whatever"
}
{
"one": {
"vars": [
{
"name": "e",
"value": "e"
},
{
"name": "f",
"value": "f"
}
]
},
"two": {
"vars": [
{
"name": "g",
"value": "g"
},
{
"name": "h",
"value": "h"
}
]
}
}
And I'd like to merge them in order to obtain the following result where each of the vars array of each section are merged together:
{
"one": {
"vars": [
{
"name": "a",
"value": "a"
},
{
"name": "b",
"value": "b"
},
{
"name": "e",
"value": "e"
},
{
"name": "f",
"value": "f"
}
]
},
"two": {
"vars": [
{
"name": "c",
"value": "c"
},
{
"name": "d",
"value": "d"
},
{
"name": "g",
"value": "g"
},
{
"name": "h",
"value": "h"
}
]
},
"extras": "whatever"
}
Ideally but not mandatory:
the keys (here one and two) would be arbitrary and an undefined number of them could be present.
the vars array would not contain duplicate (based on name) and right precedence would be applied to override values from the first array.
I managed to merge the two objects and only 1 array with the following command but the key is hardcoded and I'm a bit stuck from there:
jq -s '.[0].one.vars=([.[].one.vars]|flatten)|.[0]' file1.json file2.json
First, here is a solution which is oblivious to the top-level key names, but which does not attempt to avoid duplicates:
$A
| reduce keys_unsorted[] as $k (.;
if .[$k] | (type == "object") and has("vars")
then (.[$k]|.vars) += ($B[$k]|.vars) else . end )
Here of course $A and $B refer to the two objects. You can set $A and $B in several ways.
If you want to reorder the top-level keys, you can simply extend the above with a filter specifying the order, e.g.: {extras, two, one}.
To avoid duplicates, I'd suggest writing a helper function to do just that, as illustrated in the following section.
Avoiding duplicates
def extend(stream):
reduce stream as $s (.;
(map(.name) | index($s|.name)) as $i
| if $i then .[$i] += $s
else . + [$s]
end) ;
$A
| reduce keys_unsorted[] as $k (.;
if .[$k] | (type == "object") and has("vars")
then (.[$k].vars) = ( .[$k].vars | extend(($B[$k].vars[])))
else . end
)
jq -n 'input as $b | input
| .one.vars |= . + $b.one.vars
| .two.vars |= . + $b.two.vars' file2.json file1.json
file1.json must come after file2.json in order to preserve extras.