Deep JSON merge - json

I have multiple JSON files that I'd like to merge into one.
Some have the same root element but different children. I don't want to overwrite the children but too extend them if they have the same parent element.
I've tried this answer, but it doesn't work:
jq: error (at file2.json:0): array ([{"title":"...) and array ([{"title":"...) cannot be multiplied
Sample files and wanted result (Gist)
Thank you in advance.

Here is a recursive solution which uses group_by(.key) to decide
which objects to combine. This could be a little simpler if .children
were more uniform. Sometimes it's absent in the sample data and sometimes it's the unusual value [{}].
def merge:
def kids:
map(
.children
| if length<1 then empty else .[] end
)
| if length<1 then {} else {children:merge} end
;
def mergegroup:
{
title: .[0].title
, key: .[0].key
} + kids
;
if .==[{}] then .
else group_by(.key) | map(mergegroup)
end
;
[ .[] | .[] ] | merge
When run with the -s option as follows
jq -M -s -f filter.jq file1.json file2.json
It produces the following output.
[
{
"title": "Title1",
"key": "12345678",
"children": [
{
"title": "SubTitle2",
"key": "123456713",
"children": [
{}
]
},
{
"title": "SubTitle1",
"key": "12345679",
"children": [
{
"title": "SubSubTitle1",
"key": "12345610"
},
{
"title": "SubSubTitle2",
"key": "12345611"
},
{
"title": "DifferentSubSubTitle1",
"key": "12345612"
}
]
}
]
}
]
If the ordering of the objects within the .children matters
then an a sort_by can be added to the {children:merge} expression,
e.g. {children:merge|sort_by(.key)}

Here is something that will reproduce your desired result. It's by no means automatic, It's really a proof of concept at this stage.
One liner:
jq -s '. as $in | ($in[0][].children[].children + $in[1][].children[0].children | unique) as $a1 | $in[1][].children[1] as $s1 | $in[0] | .[0].children[0].children = ($a1) | .[0].children += [$s1]' file1.json file2.json
Multi line breakdown (Copy/Paste):
jq -s '. as $in
| ($in[0][].children[].children + $in[1][].children[0].children
| unique) as $a1
| $in[1][].children[1] as $s1
| $in[0]
| .[0].children[0].children = ($a1)
| .[0].children += [$s1]' file1.json file2.json
Where:
$in : file1.json and file2.json combined input
$a1: merged "SubSubTitle" array
$s1: second subtitle object
I suspect the reason this didn't work was because your schema is different and has nested arrays.
I find it quite hypnotic looking at this, it would be good if you could elaborate a bit on how fixed the structure is and what the requirements are.

Related

Export json via jq to csv

I have this output as test.json ( Its an AWS extract, but I have changed the names )
[
{
"InstanceId": "I-1234",
"Vol": "vol-5678",
"Delete": false,
"State": "in-use",
"Tags": [
{
"Key": "Size",
"Value": "large"
},
{
"Key": "Colour",
"Value": "red"
},
{
"Key": "Shape",
"Value": "square"
},
{
"Key": "Weight",
"Value": "light"
}
]
}
]
I want to export specific fields, including all tags to a csv, so it looks like this:
id,vol,state,size,colour,shape,weight
value,value,value,value,value,value,value
I have run this:
cat test.json | jq -c ' { id: .[].InstanceId, vol: .[].Vol, tags: .[].Tags | map ( [ .Key, .Value] | join (":")) | #csv } ' >> test.csv
And it looks like this:
cat test.csv
{"id":"I-1234","vol":"vol-5678","tags":"\"Size:large\",\"Colour:red\",\"Shape:square\",\"Weight:25kg\""}
if I open in Excel, looks like:
{"id":"I-1234" vol:"vol-5678" tags:"\"Size:large\" \"Colour:red\" \"Shape:square\" \"Weight:25kg\""}
I will be looping this over many aws resources, and would like to keep appending to csv.
I want to remove
{ } at beginning and end.
the key description I would like at top as a header, rather than to the left of the value..
so for: "id":"I-1234" vol:"vol-5678"
I would like
id, vol
I-1234, vol-5678
and the same with the Tags
remove the Array Name: "tags:" ( think its the array name, I'm not a developer, infrastructure dude! ) and just leave
Size,Colour,Shape,Weight, ...
large,red,square,25kg, ...
Can anyone help, point me in the right direction ..
thanks .. :)
jq -r '
["Size","Colour","Shape","Weight"] as $Keys
| (["id", "vol"] + ($Keys|map(ascii_downcase))),
( .[]
| (.Tags|from_entries) as $dict
| [.InstanceId, .Vol, $dict[$Keys[]]] )
| #csv
'
This will produce valid CSV, with the columns in the desired order, irrespective of the ordering of the items in the .Tags array.
If you don't want the strings in the rows to be quoted, then (at the risk of not having valid CSV) one option to consider would be replacing #csv above by join(","). Alternatively, you might wish to consider using #tsv and then replacing the tabs by commas (e.g. using sed or tror even jq :-).

jq denormalize a nested field

disclaimer: indeed, there are already different answers (like JQ Join JSON files by key or denormalizing JSON with jq) for but none of them helped me yet or did have different circumstances I was unable to derive a solution from ;/
I have 2 files, both are lists of objects where one of them ha field references to object ids of the other one
given
[
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1"
}
}
]
and
[
{
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
]
my goal would be to get a denormalized object list:
expected
[
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
}
]
while I'm able to do the main parts, I didn't challenged to bring both together yet:
with
example 1
jq -s '(.[1][] | select(.id == "5de82d5072f4a72ad5d5dcc1"))' objects.json referredObjects.json
I get
{
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
and with
example 2
jq -s '.[0][] | .reference = {}' objects.json referredObjects.json
I can manipulate any .reference getting
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {}
}
(even I loose the list structure)
But: I can't do s.th. like
execpted "join"
jq -s '.[0][] as $obj | $obj.reference = (.[1][] | select(.id == $obj.reference.id))' objects.json referredObjects.json
even approaches with foreach or reduce looks promising
jq -s '[foreach .[0][] as $obj ({}; .reference.id = ""; . + $obj )]' objects.json referredObjects.json
=>
[
{
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1"
},
"id": "5b9f50ccdcdf200283f29052"
}
]
where I expected to get the same as in second example
I end up in headaches and looking forward to write a ineffective while routine in any language ... hopefully I would appreciate any help on this
~Marcel
Transform the second file into an object where ids and names are paired and use it as a reference while updating the first file.
$ jq '(map({(.id): .}) | add) as $idx
| input
| map_values(.reference = $idx[.reference.id])' file2 file1
[
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
}
]
The following solution uses the same strategy as used in the solution by #OguzIsmail but uses the built-in function INDEX/2 to construct the dictionary from the second file.
The important point is that this strategy allows the arrays in both files to be of arbitrary size.
Invocation
jq --argfile file2 file2.json -f program.jq file1.json
program.jq
INDEX($file2[]; .id) as $dict
| map(.reference.id as $id | .reference = $dict[$id])

Append multiple dummy objects to json array with jq

Lets say this is my array :
[
{
"name": "Matias",
"age": "33"
}
]
I can do this :
echo "$response" | jq '[ .[] | select(.name | test("M.*"))] | . += [.[]]'
And it will output :
[
{
"name": "Matias",
"age": "33"
},
{
"name": "Matias",
"age": "33"
}
]
But I cant do this :
echo "$response" | jq '[ .[] | select(.name | test("M.*"))] | . += [.[] * 3]'
jq: error (at <stdin>:7): object ({"name":"Ma...) and number (3) cannot be multiplied
I need to extend an array to create a dummy array with 100 values. And I cant do it. Also, I would like to have a random age on the objects. ( So later on I can filter the file to measure performance of an app .
Currently jq does not have a built-in randomization function, but it's easy enough to generate random numbers that jq can use. The following solution uses awk but in a way that some other PRNG can easily be used.
#!/bin/bash
function template {
cat<<EOF
[
{
"name": "Matias",
"age": "33"
}
]
EOF
}
function randoms {
awk -v n=$1 'BEGIN { for(i=0;i<n;i++) {print int(100*rand())} }'
}
randoms 100 | jq -n --argfile template <(template) '
first($template[] | select(.name | test("M.*"))) as $t
| [ $t | .age = inputs]
'
Note on performance
Even though the above uses awk and jq together, this combination is about 10 times faster than the posted jtc solution using -eu:
jq+awk: u+s = 0.012s
jtc with -eu: u+s = 0.192s
Using jtc in conjunction with awk as above, however, gives u+s == 0.008s on the same machine.

JQ: key selection from numeric objects

I use jq 1.6 in a Windows 10 PowerShell enviroment and trying to select keys from coincidentally numeric json objects.
Json exampel:
{
"alliances_info":{
"744085325458334213":{
"emblem":3,
"name":"wellwell",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"MELL",
"slogan":"",
"id":744085325458334213
},
"744128593839677958":{
"emblem":0,
"name":"Brave",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"GABA",
"slogan":"",
"id":744128593839677958
},
"746034084459209223":{
"emblem":0,
"name":"Queen",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"QUE",
"slogan":"",
"id":746034084459209223
},
"750446471312466445":{
"emblem":0,
"name":"Phoenix Inc",
"member_count":35,
"level":6,
"military_might":453369,
"public":true,
"tag":"PHOI",
"slogan":"",
"id":750446471312466445
},
"750446518934594062":{
"emblem":11,
"name":"Australia",
"member_count":44,
"level":8,
"military_might":957211,
"public":true,
"tag":"AUST",
"slogan":"Go Australia",
"id":750446518934594062
}
},
"server_version":"v7.190.4-master.000000006"
}
I tried several jq commands:
.alliances_info | .[] | [{alliance_name: .name, alliance_count: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slogan, alliance_id: .id}]
or
.alliances_info | .. | objects | [{alliance_name: .name, alliance_c
ount: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slog
an, alliance_id: .id}]
But Always get a jq error: parse error: Invalid numeric literal at line 1, column 3
I renounce on the object Building in the first command (and built only a Array) it works. But i need that objects. Any tips?
BR
Timo
Your first query works perfectly well with the given JSON sample. Perhaps you're invoking jq incorrectly. If you have the jq program in a file, say select.jq, you'd invoke jq like so:
jq -f select.jq sample.json
If that doesn't help, then try:
jq empty sample.json
If that fails, there might be something wrong with the encoding of the JSON.
I'm not sure I understand what you want.
Your first attempt works for me, but generates one output for JSON value in the input. That is, I created a file named so.json and put in it your JSON from above:
{
"alliances_info": {
"744085325458334213": {
"emblem": 3,
⋮
}
When I run your program , I get:
$ jq '.alliances_info | .[] | [{alliance_name: .name, alliance_count: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slogan, alliance_id: .id}]' so.json
[
{
"alliance_name": "wellwell",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"alliance_tag": "MELL",
"alliance_slogan": "",
"alliance_id": 744085325458334200
}
]
[
{
"alliance_name": "Brave",
⋮
]
If you want an array at all, you probably want one array containing all the alliances like this:
$ jq '.alliances_info | [ .[] | { alliance_name: .name, alliance_id: .id } ]' so.json
[
{
"alliance_name": "wellwell",
"alliance_id": 744085325458334200
},
{
"alliance_name": "Brave",
"alliance_id": 744128593839678000
},
{
"alliance_name": "Queen",
"alliance_id": 746034084459209200
},
{
"alliance_name": "Phoenix Inc",
"alliance_id": 750446471312466400
},
{
"alliance_name": "Australia",
"alliance_id": 750446518934594000
}
]
Starting from the left,
- .alliances_info looks in its input object for the field named "alliances_info" and outputs its value
- the | next says take the output from the left-hand side and pass those as inputs to the right-hand side.
- right after that first |, I have a [ «jq expressions» ] which tells jq to create one JSON array output for each input; the elements of that array are the outputs of that inner «jq expressions»
- that inner expression starts with .[] which means to produce one output for each JSON value (ignoring the keys) in the input object. For us, that will be the objects named "744085325458334213", "744128593839677958", …
- The next | uses those objects as input and for each, generates a JSON object { alliance_name: .name, alliance_id: .id }
That's why I end up with one JSON array containing 5 JSON objects.
As far as I can tell, you are mostly just renaming a bunch of the fields. For that, you could just do something like this:
$ jq --argjson renameMap '{ "name": "alliance_name", "member_count": "alliance_count", "level": "alliance_level", "military_might": "alliance_power", "tag": "alliance_tag", "slog": "alliance_slogan"}' '.alliances_info |= ( . | [ to_entries[] | ( .value |= ( . | [ to_entries[] | ( .key |= ( if $renameMap[.] then $renameMap[.] else . end ) ) ] | from_entries ) ) ] | from_entries )' so.json
{
"alliances_info": {
"744085325458334213": {
"emblem": 3,
"alliance_name": "wellwell",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"public": false,
"alliance_tag": "MELL",
"slogan": "",
"id": 744085325458334200
},
"744128593839677958": {
"emblem": 0,
"alliance_name": "Brave",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"public": false,
"alliance_tag": "GABA",
"slogan": "",
"id": 744128593839678000
},
⋮
},
"server_version": "v7.190.4-master.000000006"
}
well i am a idiot (to be here totally clear). I found the reason (and this is normally a nobrainer...). I read the input from a file and the funny thing is that the file is Unicode but no UTF8. after recoding the command is working fine. Thanks for the help.
BR
Timo

Parsing multiple key/values in json tree with jq

Using jq, I'd like to cherry-pick key/value pairs from the following json:
{
"project": "Project X",
"description": "This is a description of Project X",
"nodes": [
{
"name": "server001",
"detail001": "foo",
"detail002": "bar",
"networks": [
{
"net_tier": "network_tier_001",
"ip_address": "10.1.1.10",
"gateway": "10.1.1.1",
"subnet_mask": "255.255.255.0",
"mac_address": "00:11:22:aa:bb:cc"
}
],
"hardware": {
"vcpu": 1,
"mem": 1024,
"disks": [
{
"disk001": 40,
"detail001": "foo"
},
{
"disk002": 20,
"detail001": "bar"
}
]
},
"os": "debian8",
"geo": {
"region": "001",
"country": "Sweden",
"datacentre": "Malmo"
},
"detail003": "baz"
}
],
"detail001": "foo"
}
For the sake of an example, I'd like to parse the following keys and their values: "Project", "name", "net_tier", "vcpu", "mem", "disk001", "disk002".
I'm able to parse individual elements without much issue, but due to the hierarchical nature of the full parse, I've not had much luck parsing down different branches (i.e. both networks and hardware > disks).
Any help appreciated.
Edit:
For clarity, the output I'm going for is a comma-separated CSV. In terms of parsing all combinations, covering the sample data in the example will do for now. I will hopefully be able to expand on any suggestions.
Here is a different filter which computes the unique set of network tier and disk names and then generates a result with columns appropriate to the data.
{
tiers: [ .nodes[].networks[].net_tier ] | unique
, disks: [ .nodes[].hardware.disks[] | keys[] | select(startswith("disk")) ] | unique
} as $n
| def column_names($n): [ "project", "name" ] + $n.tiers + ["vcpu", "mem"] + $n.disks ;
def tiers($n): [ $n.tiers[] as $t | .networks[] | if .net_tier==$t then $t else null end ] ;
def disks($n): [ $n.disks[] as $d | map(select(.[$d]!=null)|.[$d])[0] ] ;
def rows($n):
.project as $project
| .nodes[]
| .name as $name
| tiers($n) as $tier_values
| .hardware
| .vcpu as $vcpu
| .mem as $mem
| .disks
| disks($n) as $disk_values
| [$project, $name] + $tier_values + [$vcpu, $mem] + $disk_values
;
column_names($n), rows($n)
| #csv
The benfit of this approach becomes apparent if we add another node to the sample data:
{
"name": "server002",
"networks": [
{
"net_tier": "network_tier_002"
}
],
"hardware": {
"vcpu": 1,
"mem": 1024,
"disks": [
{
"disk002": 40,
"detail001": "foo"
}
]
}
}
Sample Run (assuming filter in filter.jq and amended data in data.json)
$ jq -Mr -f filter.jq data.json
"project","name","network_tier_001","network_tier_002","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001","",1,1024,40,20
"Project X","server002",,"network_tier_002",1,1024,,40
Try it online!
Here's one way you could achieve the desired output.
program.jq:
["project","name","net_tier","vcpu","mem","disk001","disk002"],
[.project]
+ (.nodes[] | .networks[] as $n |
[
.name,
$n.net_tier,
(.hardware |
.vcpu,
.mem,
(.disks | add["disk001","disk002"])
)
]
)
| #csv
$ jq -r -f program.jq input.json
"project","name","net_tier","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001",1,1024,40,20
Basically, you'll want to project the fields that you want into arrays so you may convert those arrays to csv rows. Your input makes it seem like there could potentially be multiple networks for a given node. So if you wanted to output all combinations, that would have to be flattened out.
Here's another approach, that is short enough to speak for itself:
def s(f): first(.. | f? // empty) // null;
[s(.project), s(.name), s(.net_tier), s(.vcpu), s(.mem), s(.disk001), s(.disk002)]
| #csv
Invocation:
$ jq -r -f value-pairs.jq input.json
Result:
"Project X","server001","network_tier_001",1,1024,40,20
With headers
Using the same s/1 as above:
. as $d
| ["project", "name", "net_tier", "vcpu", "mem", "disk001","disk002"]
| (., map( . as $v | $d | s(.[$v])))
| #csv
With multiple nodes
Again with s/1 as above:
.project as $p
| ["project", "name", "net_tier", "vcpu", "mem", "disk001","disk002"] as $h
| ($h,
(.nodes[] as $d
| $h
| map( . as $v | $d | s(.[$v]) )
| .[0] = $p)
) | #csv
Output with the illustrative multi-node data:
"project","name","net_tier","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001",1,1024,40,20
"Project X","server002","network_tier_002",1,1024,,40