Reverse flatten nested arrays with jq - json

Suppose I have the following nested data structure
cat nested.json
[
{
"a": "a",
"b": [
{"c": "c"}
]
},
{
"a": "a",
"b": [
{"c": "c"}
]
}
]
I can flatten it like this
cat nested.json | jq '
[. as $in | reduce paths(scalars) as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) }
)]
' > flat.json
cat flat.json
[
{
"0.a": "a",
"0.b.0.c": "c",
"1.a": "a",
"1.b.0.c": "c"
}
]
To reverse the flatten operation with jq I tried this
cat flat.json | jq '
.[0] | reduce to_entries[] as $kv ({};
setpath($kv.key|split("."); $kv.value)
)
'
{
"0": {
"a": "a",
"b": {
"0": {
"c": "c"
}
}
},
"1": {
"a": "a",
"b": {
"0": {
"c": "c"
}
}
}
}
However, I want to convert numbers in the setpath param to create arrays. This doesn't quite work, but I think it's close?
cat flat.json | jq '
def makePath($s): [split(".")[] | if (test("\\d+")) then tonumber else . end];
.[0] | reduce to_entries[] as $kv ({}; setpath(makePath($kv.key); $kv.value))
'
jq: error (at <stdin>:8): split input and separator must be strings
The desired output is the same as the original data in nested.json

Wouldn't it be simpler to do it this way:
Encode your input with
jq '[path(.. | scalars) as $path | {($path | join(".")): getpath($path)}] | add' nested.json
{
"0.a": "a",
"0.b.0.c": "c",
"1.a": "a",
"1.b.0.c": "c"
}
And decode it with
jq 'reduce to_entries[] as $item (null; setpath($item.key / "." | map(tonumber? // .); $item.value))' flat.json
[
{
"a": "a",
"b": [
{
"c": "c"
}
]
},
{
"a": "a",
"b": [
{
"c": "c"
}
]
}
]
However, if you don't care about your special dot notation (e.g. "0.b.0.c") for the encoded keys, you can simply convert the path array into a JSON string instead, having albeit uglier virtually the same effect. Moreover, it would automatically enable the handling of input object field names that include dots (e.g. {"a.b":3}) or look like numbers (e.g. {"42":"Panic!"}).
Using JSON keys, encode your input with
jq '[path(.. | scalars) as $path | {($path | tojson): getpath($path)}] | add' nested.json
{
"[0,\"a\"]": "a",
"[0,\"b\",0,\"c\"]": "c",
"[1,\"a\"]": "a",
"[1,\"b\",0,\"c\"]": "c"
}
And decode it with
jq 'reduce to_entries[] as $item (null; setpath($item.key | fromjson; $item.value))' flat.json
[
{
"a": "a",
"b": [
{
"c": "c"
}
]
},
{
"a": "a",
"b": [
{
"c": "c"
}
]
}
]

Related

How to merge and aggregate values in 2 JSON files using jq?

I am using jq in a shell script to manipulate JSON files.
I have 2 files and I'd like to merge them into one file while also aggregating (sum) the values when names in the name/value pairs are the same.
As an example:
Input1.json
[
{
"A": "Name 1",
"B": "1.1",
"C": "2"
},
{
"A": "Name 2",
"B": "3.2",
"C": "4"
}
]
Input2.json
[
{
"A": "Name 2",
"B": "5",
"C": "6"
},
{
"A": "Name 3",
"B": "7",
"C": "8"
}
]
Expected result:
Output.json
[
{
"A": "Name 1",
"B": "1.1",
"C": "2"
},
{
"A": "Name 2",
"B": "8.2",
"C": "10"
},
{
"A": "Name 3",
"B": "7",
"C": "8"
}
]
I can use other tools other than jq but prefer to ultimately keep the solution contained into a shell script I can call from the Terminal.
Any help is appreciated. Thank you.
I can use other tools other than jq but prefer to ultimately keep the solution contained into a shell script I can call from the Terminal.
You could give the JSON parser xidel a try:
$ xidel -se '
array{
let $src:=(json-doc("Input1.json")(),json-doc("Input2.json")())
for $name in distinct-values($src/A)
let $obj:=$src[A=$name]
return
if (count($obj) gt 1) then
map:merge(
$obj[1]() ! {
.:if ($obj[1](.) castable as decimal) then
string($obj[1](.) + $obj[2](.))
else
$obj[1](.)
}
)
else
$obj
}
'
Intermediate steps.
jq is beautiful for problems like this:
$ jq -n '
reduce inputs[] as {$A,$B,$C} ({};
.[$A] |= {
$A,
B: (.B + ($B|tonumber)),
C: (.C + ($C|tonumber))
}
)
| map({
A,
B: (.B|tostring),
C: (.C|tostring)
})
' input1.json input2.json
The first reduce creates a map from the different "A" values to the aggregated result object. Then given the mapping, converts back to an array of the result objects adjusting the types of the results.
jqplay
Here's one way, but there are others:
jq -s '
def to_n: tonumber? // null;
def merge_values($x;$y):
if $x == $y then $x
elif $x == null then $y
elif $y == null then $x
else ($x|to_n) as $xn
| if $xn then ($y|to_n) as $yn | ($xn+$yn)|tostring
else [$x, $y]
end
end;
def merge($x;$y):
reduce ($x + $y |keys_unsorted)[] as $k (null;
.[$k] = merge_values($x[$k]; $y[$k]) );
INDEX(.[0][]; .A) as $in1
| INDEX(.[1][]; .A) as $in2
| ($in1 + $in2|keys_unsorted) as $keys
| reduce $keys[] as $k ([];
. + [merge($in1[$k]; $in2[$k]) ])
' input1.json inut2.json

Transforming high-redundancy CSV data into nested JSON using jq (or awk)?

Say I have the following CSV data in input.txt:
broker,client,contract_id,task_type,doc_names
alice#company.com,John Doe,33333,prove-employment,important-doc-pdf
alice#company.com,John Doe,33333,prove-employment,paperwork-pdf
alice#company.com,John Doe,33333,submit-application,blah-pdf
alice#company.com,John Doe,00000,prove-employment,test-pdf
alice#company.com,John Doe,00000,submit-application,test-pdf
alice#company.com,Jane Smith,11111,prove-employment,important-doc-pdf
alice#company.com,Jane Smith,11111,submit-application,paperwork-pdf
alice#company.com,Jane Smith,11111,submit-application,unimportant-pdf
bob#company.com,John Doe,66666,submit-application,pdf-I-pdf
bob#company.com,John Doe,77777,submit-application,pdf-J-pdf
And I'd like to transform it into the following JSON:
[
{"broker": "alice#company.com",
"clients": [
{
"client": "John Doe",
"contracts": [
{
"contract_id": 33333,
"documents": [
{
"task_type": "prove-employment",
"doc_names": ["important-doc-pdf", "paperwork-pdf"]
},
{
"task_type": "submit-application",
"doc_names": ["blah-pdf"]
}
]
},
{
"contract_id": 00000,
"documents": [
{
"task_type": "prove-employment",
"doc_names": ["test-pdf"]
},
{
"task_type": "submit-application",
"doc_names": ["test-pdf"]
}
]
}
]
},
{
"client": "Jane Smith",
"contracts": [
{
"contract_id": 11111,
"documents": [
{
"task_type": "prove-employment",
"doc_names": ["important-doc-pdf"]
},
{
"task_type": "submit-application",
"doc_names": ["paperwork-pdf", "unimportant-pdf"]
}
]
}
]
}
]
},
{"broker": "bob#company.com",
"clients": [
{
"client": "John Doe",
"contracts": [
{
"contract_id": 66666,
"documents": [
{
"task_type": "submit-application",
"doc_names": ["pdf-I-pdf"]
}
]
},
{
"contract_id": 77777,
"documents": [
{
"task_type": "submit-application",
"doc_names": ["pdf-J-pdf"]
}
]
}
]
}
]
}
]
Based on a quick search, it seems like people recommend jq for this type of task. I read some of the manual and played around with it for a bit, and I'm understand that it's meant to be used by composing its filters together to produce the desired output.
So far, I've been able to transform each line of the CSV into a list of strings for example with jq -Rs '. / "\n" | .[] | . / ","'.
But I'm having trouble with something even a bit more complex, like assigning a key to each value on a line (not even the final JSON form I'm looking to get). This is what I tried: jq -Rs '[inputs | . / "\n" | .[] | . / "," as $line | {"broker": $line[0], "client": $line[1], "contract_id": $line[2], "task_type": $line[3], "doc_name": $line[4]}]', and it gives back [].
Maybe jq isn't the best tool for the job here? Perhaps I should be using awk? If all else fails, I'd probably just parse this using Python.
Any help is appreciated.
Here's a jq solution that assumes the CSV input is very simple (e.g., no field has embedded commas), followed by a brief explanation.
To handle arbitrary CSV, you could use a CSV-to-TSV conversion tool in conjunction with the jq program given below with trivial modifications.
A Solution
The following jq program assumes jq is invoked with the -R option.
(The -n option should not be used as the header row is read without using input.)
# sort-free plug-in replacement for the built-in group_by/1
def GROUP_BY(f):
reduce .[] as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| .[$t][$y] += [$x] )
| [.[][]]
;
# input: an array
def obj($keys):
. as $in | reduce range(0; $keys|length) as $i ({}; .[$keys[$i]] = $in[$i]);
# input: an array to be grouped by $keyname
# output: an object
def gather_by($keyname; $newkey):
($keyname + "s") as $plural
| GROUP_BY(.[$keyname])
| {($plural): map({($keyname): .[0][$keyname],
($newkey) : map(del(.[$keyname])) } ) }
;
split(",") as $headers
| [inputs
| split(",")
| obj($headers)
]
| gather_by("broker"; "clients")
| .brokers[].clients |= (gather_by("client"; "contracts") | .clients)
| .brokers[].clients[].contracts |= (gather_by("contract_id"; "documents") | .contract_ids)
| .brokers[].clients[].contracts[].documents |= (gather_by("task_type"; "doc_names") | .task_types)
| .brokers[].clients[].contracts[].documents[].doc_names |= map(.doc_names)
| .brokers
Explanation
The expected output as shown respects the ordering of the input lines, and so jq's built-in group_by may not be appropriate; hence GROUP_BY is defined above as a plug-in replacement for group_by. It's a bit complicated because it is completely generic in the same way as group_by.
The obj filter converts an array into an object with keys $keys.
The gather_by filter groups together items in the input array as appropriate for the present problem.
gather_by/2 example
To get a feel for what gather_by does, here's an example:
[ {a:1,b:1}, {a:2, b:2}, {a:1,b:0}] | gather_by("a"; "objects")
produces:
{
"as": [
{
"a": 1,
"objects": [
{
"b": 1
},
{
"b": 0
}
]
},
{
"a": 2,
"objects": [
{
"b": 2
}
]
}
]
}
Output
[
{
"broker": "alice#company.com",
"clients": [
{
"client": "John Doe",
"contracts": [
{
"contract_id": "33333",
"documents": [
{
"task_type": "prove-employment",
"doc_names": [
"important-doc-pdf",
"paperwork-pdf"
]
},
{
"task_type": "submit-application",
"doc_names": [
"blah-pdf"
]
}
]
},
{
"contract_id": "00000",
"documents": [
{
"task_type": "prove-employment",
"doc_names": [
"test-pdf"
]
},
{
"task_type": "submit-application",
"doc_names": [
"test-pdf"
]
}
]
}
]
},
{
"client": "Jane Smith",
"contracts": [
{
"contract_id": "11111",
"documents": [
{
"task_type": "prove-employment",
"doc_names": [
"important-doc-pdf"
]
},
{
"task_type": "submit-application",
"doc_names": [
"paperwork-pdf",
"unimportant-pdf"
]
}
]
}
]
}
]
},
{
"broker": "bob#company.com",
"clients": [
{
"client": "John Doe",
"contracts": [
{
"contract_id": "66666",
"documents": [
{
"task_type": "submit-application",
"doc_names": [
"pdf-I-pdf"
]
}
]
},
{
"contract_id": "77777",
"documents": [
{
"task_type": "submit-application",
"doc_names": [
"pdf-J-pdf"
]
}
]
}
]
}
]
}
]
Here's a jq solution which uses a generic approach that makes no reference to specific header names except for the specification of certain plural forms.
The generic approach is encapsulated in the recursively defined filter nested_group_by($headers; $plural).
The main assumptions are:
The CVS input can be parsed by splitting on commas;
jq is invoked with the -R command-line option.
# Emit a stream of arrays, each array being a group defined by a value of f,
# which can be any jq filter that produces exactly one value for each item in `stream`.
def GROUP_BY(f):
reduce .[] as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| .[$t][$y] += [$x] )
| [.[][]]
;
def obj($headers):
. as $in | reduce range(0; $headers|length) as $i ({}; .[$headers[$i]] = $in[$i]);
def nested_group_by($array; $plural):
def plural: $plural[.] // (. + "s");
if $array == [] then .
elif $array|length == 1 then GROUP_BY(.[$array[0]]) | map(map(.[])[])
else ($array[1] | plural) as $groupkey
| $array[0] as $a0
| GROUP_BY(.[$a0])
| map( { ($a0): .[0][$a0], ($groupkey): map(del( .[$a0] )) } )
| map( .[$groupkey] |= nested_group_by($array[1:]; $plural) )
end
;
split(",") as $headers
| {contract_id: "contracts",
task_type: "documents",
doc_names: "doc_names" } as $plural
| [inputs
| split(",")
| obj($headers)
]
| nested_group_by($headers; $plural)

Using jq to parse Array and map to string

I have the following JSON Data-Structure:
{
"data": [
[
{
"a": "1",
"b": "i"
},
{
"a": "2",
"b": "ii"
},
{
"a": "3",
"b": "iii"
}
],
[
{
"a": "4",
"b": "iv"
},
{
"a": "5",
"b": "v"
},
{
"a": "6",
"b": "vi"
}
]
]
}
And I need to get the following output:
1+2+3 i|ii|iii
4+5+6 iv|v|vi
I tried the following without success:
$ cat data.json | jq -r '.data[] | .[].a | join("+")'
jq: error (at <stdin>:1642): Cannot iterate over string ("1")
And also this, but I don't even got an idea how to solve this:
$ cat data.json | jq -r '.data[] | to_entries | .[]'
Looks like an endless journey for me at this time, I you can help me, I would be very happy. :-)
Should be pretty simple. Get both the fields into an array, join them with the required delimit character and put it in a tabular format
jq -r '.data[] | [ ( map(.a) | join("+") ), ( map(.b) | join("|") ) ] | #tsv'

jq: sort object values

I want to sort this data structure by the object keys (easy with -S and sort the object values (the arrays) by the 'foo' property.
I can sort them with
jq -S '
. as $in
| keys[]
| . as $k
| $in[$k] | sort_by(.foo)
' < test.json
... but that loses the keys.
I've tried variations of adding | { "\($k)": . }, but then I end up with a list of objects instead of one object. I also tried variations of adding to $in (same problem) or using $in = $in * { ... }, but that gives me syntax errors.
The one solution I did find was to just have the separate objects and then pipe it into jq -s add, but ... I really wanted it to work the other way. :-)
Test data below:
{
"": [
{ "foo": "d" },
{ "foo": "g" },
{ "foo": "f" }
],
"c": [
{ "foo": "abc" },
{ "foo": "def" }
],
"e": [
{ "foo": "xyz" },
{ "foo": "def" }
],
"ab": [
{ "foo": "def" },
{ "foo": "abc" }
]
}
Maybe this?
jq -S '.[] |= sort_by(.foo)'
Output
{
"": [
{
"foo": "d"
},
{
"foo": "f"
},
{
"foo": "g"
}
],
"ab": [
{
"foo": "abc"
},
{
"foo": "def"
}
],
"c": [
{
"foo": "abc"
},
{
"foo": "def"
}
],
"e": [
{
"foo": "def"
},
{
"foo": "xyz"
}
]
}
#user197693 had a great answer. A suggestion I got in a private message elsewhere was to use
jq -S 'with_entries(.value |= sort_by(.foo))'
If for some reason using the -S command-line option is not a satisfactory option, you can also perform the by-key sort using the to_entries | sort_by(.key) | from_entries idiom. So a complete solution to the problem would be:
.[] |= sort_by(.foo)
| to_entries | sort_by(.key) | from_entries

jq: split one resultset to multiple json objects

I'm trying to transform one big json resultset to multiple objects.
Input:
{
"results": {
"2019-11-27 00:00:00": [
{
"e": "10814",
"s": "153330",
"t": "164144"
}
],
"2019-11-27 00:15:00": [
{
"e": "11052",
"s": "148692",
"t": "159744"
}
],
"2019-11-27 00:30:00": [
{
"e": "11550",
"s": "152379",
"t": "163929"
}
],
"2019-11-27 00:45:00": [
{
"e": "12640",
"s": "154984",
"t": "167624"
}
]
}
}
This is the output I'm trying to reach:
{"timestamp":"2019-11-27 00:00:00","e":"10814","s":"153330","t":"164144"}
{"timestamp":"2019-11-27 00:15:00","e":"11052","s":"148692","t":"159744"}
{"timestamp":"2019-11-27 00:30:00","e":"11550","s":"152379","t":"163929"}
{"timestamp":"2019-11-27 00:45:00","e":"12640","s":"154984","t":"167624"}
I tried so far:
$ cat input.json | jq -cr '.[] | keys[] as $k | { "timestamp": "\($k)"}'
{"timestamp":"2019-11-27 00:00:00"}
{"timestamp":"2019-11-27 00:15:00"}
{"timestamp":"2019-11-27 00:30:00"}
{"timestamp":"2019-11-27 00:45:00"}
and
$ cat input.json | jq -c '.[] | .[] | .[]'
{"e":"10814","s":"153330","t":"164144"}
{"e":"11052","s":"148692","t":"159744"}
{"e":"11550","s":"152379","t":"163929"}
{"e":"12640","s":"154984","t":"167624"}
I just need a hint to combine these two filters to obtain the result as described above. I'm not sure how to do it. Any ideas?
You were almost there. Just add the objects in those arrays to objects you created out of keys.
.results | keys_unsorted[] as $k | { timestamp: $k } + .[$k][]
Online demo with your sample
Online demo with a slightly different input to show what + .[$k][] does clearly
Or using to_entries:
.results
| to_entries[]
| { timestamp: .key } + .value[]