I have json data of the form below. I want to transform it, making the key of each record into a field of that record in a streaming fashion. My problem: I don't know how to do that without truncating the key and losing it. I have inferred the required structure of the stream, see at the bottom.
Question: how do I transform the input data into a stream without losing the key?
Data:
{
"foo" : {
"a" : 1,
"b" : 2
},
"bar" : {
"a" : 1,
"b" : 2
}
}
A non-streaming transformation uses:
jq 'with_entries(.value += {key}) | .[]'
yielding:
{
"a": 1,
"b": 2,
"key": "foo"
}
{
"a": 1,
"b": 2,
"key": "bar"
}
Now, if my data file is very very large, I'd prefer to stream:
jq -ncr --stream 'fromstream(1|truncate_stream(inputs))`
The problem: this truncates the keys "foo" and "bar". On the other hand, not truncating the stream and just calling fromstream(inputs) is pretty meaningless: this makes the whole --stream part a no-op and jq reads everything into memory.
The structure of the stream is the following, using . | tostream:
[
[
"foo",
"a"
],
1
]
[
[
"foo",
"b"
],
2
]
[
[
"foo",
"b"
]
]
[
[
"bar",
"a"
],
1
]
[
[
"bar",
"b"
],
2
]
[
[
"bar",
"b"
]
]
[
[
"bar"
]
]
while with truncation, . as $dot | (1|truncate_stream($dot | tostream)), the structure is:
[
[
"a"
],
1
]
[
[
"b"
],
2
]
[
[
"b"
]
]
[
[
"a"
],
1
]
[
[
"b"
],
2
]
[
[
"b"
]
]
So it looks like that in order for me to construct a stream the way I need it, I will have to generate the following structure (I have inserted a [["foo"]] after the first record is finished):
[
[
"foo",
"a"
],
1
]
[
[
"foo",
"b"
],
2
]
[
[
"foo",
"b"
]
]
[
[
"foo"
]
]
[
[
"bar",
"a"
],
1
]
[
[
"bar",
"b"
],
2
]
[
[
"bar",
"b"
]
]
[
[
"bar"
]
]
Making this into a string jq can consume, I indeed get what I need (see also the snippet here: https://jqplay.org/s/iEkMfm_u92):
fromstream([ [ "foo", "a" ], 1 ],[ [ "foo", "b" ], 2 ],[ [ "foo", "b" ] ],[["foo"]],[ [ "bar", "a" ], 1 ],[ [ "bar", "b" ], 2 ],[ [ "bar", "b" ] ],[ [ "bar" ] ])
yielding:
{
"foo": {
"a": 1,
"b": 2
}
}
{
"bar": {
"a": 1,
"b": 2
}
}
The final result (see https://jqplay.org/s/-UgbEC4BN8) would be:
fromstream([ [ "foo", "a" ], 1 ],[ [ "foo", "b" ], 2 ],[ [ "foo", "b" ] ],[["foo"]],[ [ "bar", "a" ], 1 ],[ [ "bar", "b" ], 2 ],[ [ "bar", "b" ] ],[ [ "bar" ] ]) | with_entries(.value += {key}) | .[]
yielding
{
"a": 1,
"b": 2,
"key": "foo"
}
{
"a": 1,
"b": 2,
"key": "bar"
}
A generic function, atomize(s), for converting objects to key-value objects is provided in the jq Cookbook. Using it, the solution to the problem here is simply:
atomize(inputs) | to_entries[] | .value + {key}
({key} is shorthand for {key: .key}.)
For reference, here is the def:
atomize(s)
# Convert an object (presented in streaming form as the stream s) into
# a stream of single-key objects
# Example:
# atomize(inputs) (used in conjunction with "jq -n --stream")
def atomize(s):
fromstream(foreach s as $in ( {previous:null, emit: null};
if ($in | length == 2) and ($in|.[0][0]) != .previous and .previous != null
then {emit: [[.previous]], previous: ($in|.[0][0])}
else { previous: ($in|.[0][0]), emit: null}
end;
(.emit // empty), $in
) ) ;
Related
I have an array of objects with 2 properties, say "key" and "value":
[
{key: 1, value: a},
{key: 2, value: b},
{key: 1, value: c}
]
Now, I would like to merge the values of the "value" properties of objects with the same "key" property value. That is the previous array is transformed into:
[
{key: 1, value: [a, c]},
{key: 2, value: [b]}
]
I tried something like:
$ echo '[{"key": "1", "val": "a"}, {"key": "2", "val": "b"}, {"key": "1", "val": "c"}]' | jq '. | group_by(.["key"]) | .[] | reduce .[] as $in ({"val": []}; {"key": $in.key, "val": [$in.val] + .["val"]})'
But it triggers a jq syntax error and I have no idea why. I am stuck.
Any idea ?
Thanks
B
Your approach using reduce could be sanitized to
jq 'group_by(.["key"]) | .[] |= reduce .[] as $in (
{value: []}; .key = $in.key | .value += [$in.value]
)'
[
{
"value": [
"a",
"c"
],
"key": 1
},
{
"value": [
"b"
],
"key": 2
}
]
Demo
Another approach using map would be
jq 'group_by(.key) | map({key: .[0].key, value: map(.value)})'
[
{
"key": 1,
"value": [
"a",
"c"
]
},
{
"key": 2,
"value": [
"b"
]
}
]
Demo
I have an array of objects with 2 properties, say "key" and "value":
[
{key: 1, value: a},
{key: 2, value: b},
{key: 1, value: c}
]
Now, I would like to merge the values of the "value" properties of objects with the same "key" property value. That is the previous array is transformed into:
[
{key: 1, value: [a, c]},
{key: 2, value: [b]}
]
I tried something like:
$ echo '[{"key": "1", "val": "a"}, {"key": "2", "val": "b"}, {"key": "1", "val": "c"}]' | jq '. | group_by(.["key"]) | .[] | reduce .[] as $in ({"val": []}; {"key": $in.key, "val": [$in.val] + .["val"]})'
But it triggers a jq syntax error and I have no idea why. I am stuck.
Any idea ?
Thanks
B
Your approach using reduce could be sanitized to
jq 'group_by(.["key"]) | .[] |= reduce .[] as $in (
{value: []}; .key = $in.key | .value += [$in.value]
)'
[
{
"value": [
"a",
"c"
],
"key": 1
},
{
"value": [
"b"
],
"key": 2
}
]
Demo
Another approach using map would be
jq 'group_by(.key) | map({key: .[0].key, value: map(.value)})'
[
{
"key": 1,
"value": [
"a",
"c"
]
},
{
"key": 2,
"value": [
"b"
]
}
]
Demo
corresponding to jq ~ is there a better way to collapse single object arrays? and R: Nested data.table to JSON
how do I collapse only specific elements?
I want to get rid of the "group" arrays in
[
{
"id2": "A",
"group": [
{
"data": [
{
"id1": 1,
"group": [
{
"data": [
{
"a": 1,
"b": 1
},
{
"a": 2,
"b": 2
}
],
"type": "test"
}
],
"type": "B"
}
],
"type": "C"
}
]
},
{
"id2": "C",
"group": [
{
"data": [
{
"id1": 3,
"group": [
{
"data": [
{
"a": 1,
"b": 1
}
],
"type": "test"
}
],
"type": "B"
}
],
"type": "C"
}
]
}
]
desired output
[{
"id2": "A",
"group": {
"data": [{
"id1": 1,
"group": {
"data": [{
"a": 1,
"b": 1
},
{
"a": 2,
"b": 2
}
],
"type": "test"
},
"type": "B"
}],
"type": "C"
}
},
{
"id2": "C",
"group": {
"data": [{
"id1": 3,
"group": {
"data": [{
"a": 1,
"b": 1
}],
"type": "test"
},
"type": "B"
}],
"type": "C"
}
}
]
The line 'walk(if type=="array" and length==1 then .[0] else . end)' additionally removes the array from the single "data" object.
Unfortunately, we are not able to install the jq 1.6 version on our RStudio Server und thereby I'm not able to use the walk function. (Although is working perfectly fine on my local system)
Can anybody help me out with an alternative solution without walk? Would be highly appreciated.
edit
Ok I got it. I can manually add the walk function such as:
'def walk(f):
. as $in
| if type == "object" then
reduce keys_unsorted[] as $key
( {}; . + { ($key): ($in[$key] | walk(f)) } ) | f
elif type == "array" then map( walk(f) ) | f
else f
end; walk(if type=="object"
and has("group")
and (.group | type)=="array"
and (.group | length)==1
then .group = .group[0]
else . end)'
We could operate one level higher in the nesting hierarchy, and test for "group" being a key, then update accordingly .group = .group[0] instead of . = .[0]
jq 'walk(if type=="object"
and has("group")
and (.group | type)=="array"
and (.group | length)==1
then .group = .group[0]
else . end)'
I have the following JSON snippet:
{
"a": [ 1, "a:111" ],
"b": [ 2, "a:111", "irrelevant" ],
"c": [ 1, "a:222" ],
"d": [ 1, "b:222" ],
"e": [ 2, "b:222", "irrelevant"]
}
and I would like to swap the key with the second value of the array and accumulate keys with the same value, discarding possible values that come after the second one:
{ "a:111": [ [ 1, "a" ], [ 2, "b" ] ],
"a:222": [ [ 1, "c" ] ],
"b:222": [ [ 1, "d" ], [ 2, "e" ] ]
}
My initial solution is the following:
echo '{
"a": [ 1, "a:111" ],
"b": [ 2, "a:111", "irrelevant" ],
"c": [ 1, "a:222" ],
"d": [ 1, "b:222" ],
"e": [ 2, "b:222", "irrelevant"]
}' \
| jq 'to_entries
| map({(.value[1]|tostring) : [[.value[0], .key]]})
| reduce .[] as $o ({}; reduce ($o|keys)[] as $key (.; .[$key] += $o[$key]))'
This produces the needed result but is probably not very robust, hard to read and excessively long. I guess there is a much more readable solution using with_entries but it has eluded me for now.
Short jq approach:
jq 'reduce to_entries[] as $o ({};
.[$o.value[1]] += [[$o.value[0], $o.key]])' input.json
The output:
{
"a:111": [
[
1,
"a"
],
[
2,
"b"
]
],
"a:222": [
[
1,
"c"
]
],
"b:222": [
[
1,
"d"
],
[
2,
"e"
]
]
}
I have a json file, example.json:
[
[
"126",
1522767000
],
[
"122",
1522859400
],
[
"126",
1523348520
]
]
...and would like to add multiple parent items with the desired output:
{
"target": "Systolic",
"datapoints": [
[
"126",
1522767000
],
[
"122",
1522859400
],
[
"126",
1523348520
]
]
}
I'm having trouble, attempting things like:
cat example.json | jq -s '{target:.[]}', which adds the one key but not understanding how to add a value to the target and another key datapoints.
With straightforward jq expression:
jq '{target: "Systolic", datapoints: .}' example.json
The output:
{
"target": "Systolic",
"datapoints": [
[
"126",
1522767000
],
[
"122",
1522859400
],
[
"126",
1523348520
]
]
}