Combining json files using jq objects with array - json

I've tried using
jq "reduce inputs.skins as $s (.; .skins += $s)" file1.json file2.json > combined.json
but it just creates two boots.name and fun.name from each file
any way I can use jq and combine the objects and arrays without having duplicates?
I apologize for any confusion, jq is kind of complicated to find an easy tutorial for me to understand
file1.json
{
"skins": [
{
"Item Shortname": "boots.name",
"skins": [
2,
25,
41,
]
},
{
"Item Shortname": "fun.name",
"skins": [
12,
8,
]
}
]
}
file2.json
{
"skins": [
{
"Item Shortname": "boots.name",
"skins": [
2,
20,
]
},
{
"Item Shortname": "fun.name",
"skins": [
90,
6,
82,
]
}
]
}
combined.json
{
"skins": [
{
"Item Shortname": "boots.name",
"skins": [
2,
20,
25,
41,
]
},
{
"Item Shortname": "fun.name",
"skins": [
90,
6,
82,
12,
8,
]
}
]
}

The tricky part here is meeting the apparent uniqueness requirements, for which the following generic filter can be used:
# emit a stream of the distinct items in `stream`
def uniques(stream):
foreach stream as $s ({};
($s|type) as $t
| (if $t == "string" then $s else ($s|tostring) end) as $y
| if .[$t][$y] then .emit = false else .emit = true | (.item = $s) | (.[$t][$y] = true) end;
if .emit then .item else empty end );
This ensures the ordering is preserved. It's a bit tricky because it is completely generic -- it allows both 1 and "1" and distinguishes between them, just as unique does.
(If the ordering did not matter, then you could use unique.)
So, assuming an invocation along the lines of
jq -s -f program.jq file1.json file2.json
you would place the above def followed by the following “main” program in program.jq:
.[0] as $file1 | .[1] as $file2
| (INDEX($file1.skins[]; .["Item Shortname"]) | map_values(.skins)) as $dict
| $file2
| .skins |= map( .["Item Shortname"] as $name
| .skins += $dict[$name]
| .skins |= [uniques(.[])] )
A better solution would avoid the -s option (e.g. as shown below), but the above method of feeding the two files to jq is at least straightforward, and will work regardless of which version of jq you are using.
Solution using input
One way to avoid slurping the two files would be to use input in conjunction with the -n command line option instead of -s. The "main" part of the jq program would then be as follows:
(INDEX(input.skins[]; .["Item Shortname"]) | map_values(.skins)) as $dict
| input
| .skins |= map( .["Item Shortname"] as $name
| .skins += $dict[$name]
| .skins |= [uniques(.[])] )

Related

Append multiple dummy objects to json array with jq

Lets say this is my array :
[
{
"name": "Matias",
"age": "33"
}
]
I can do this :
echo "$response" | jq '[ .[] | select(.name | test("M.*"))] | . += [.[]]'
And it will output :
[
{
"name": "Matias",
"age": "33"
},
{
"name": "Matias",
"age": "33"
}
]
But I cant do this :
echo "$response" | jq '[ .[] | select(.name | test("M.*"))] | . += [.[] * 3]'
jq: error (at <stdin>:7): object ({"name":"Ma...) and number (3) cannot be multiplied
I need to extend an array to create a dummy array with 100 values. And I cant do it. Also, I would like to have a random age on the objects. ( So later on I can filter the file to measure performance of an app .
Currently jq does not have a built-in randomization function, but it's easy enough to generate random numbers that jq can use. The following solution uses awk but in a way that some other PRNG can easily be used.
#!/bin/bash
function template {
cat<<EOF
[
{
"name": "Matias",
"age": "33"
}
]
EOF
}
function randoms {
awk -v n=$1 'BEGIN { for(i=0;i<n;i++) {print int(100*rand())} }'
}
randoms 100 | jq -n --argfile template <(template) '
first($template[] | select(.name | test("M.*"))) as $t
| [ $t | .age = inputs]
'
Note on performance
Even though the above uses awk and jq together, this combination is about 10 times faster than the posted jtc solution using -eu:
jq+awk: u+s = 0.012s
jtc with -eu: u+s = 0.192s
Using jtc in conjunction with awk as above, however, gives u+s == 0.008s on the same machine.

Use jq to merge keys with common id

consider a file 'b.json':
[
{
"id": 3,
"foo": "cannot be replaced, id isn't in a.json, stay untouched",
"baz": "do not touch3"
},
{
"id": 2,
"foo": "should be replaced with 'foo new2'",
"baz": "do not touch2"
}
]
and 'a.json':
[
{
"id": 2,
"foo": "foo new2",
"baz": "don't care"
}
]
I want to update the key "foo" in b.json using jq with the matching value from a.json. It should also work with more than one entry in a.json.
Thus the desired output is:
[
{
"id": 3,
"foo": "cannot be replaced, id isn't in a.json, stay untouched",
"baz": "do not touch3"
},
{
"id": 2,
"foo": "foo new2",
"baz": "do not touch2"
}
]
Here's one of several possibilities that use INDEX/2. If your jq does not have this as a built-in, see below.
jq --argfile a a.json '
INDEX($a[]; .id) as $dict
| map( (.id|tostring) as $id
| if ($dict|has($id)) then .foo = $dict[$id].foo
else . end)' b.json
There are other ways to pass in the contents of a.json and b.json.
Caveat
The above use of INDEX assumes there are no "collisions", which would happen if, for example, one of the objects has .id equal to 1 and another has .id equal to "1". If there is a possibility of such a collision, then a more complex definition of INDEX could be used.
INDEX/2
Straight from builtin.jq:
def INDEX(stream; idx_expr):
reduce stream as $row ({}; .[$row|idx_expr|tostring] = $row);
Here's a generic answer that makes no assumptions about the values of the .id keys except that they are distinct JSON values.
Generalization of INDEX/2
def type2: [type, if type == "string" then . else tojson end];
def dictionary(stream; f):
reduce stream as $s ({}; setpath($s|f|type2; $s));
def lookup(value):
getpath(value|type2);
def indictionary(value):
(value|type2) as $t
| has($t[0]) and (.[$t[0]] | has($t[1]));
Invocation
jq --argfile a a.json -f program.jq b.json
main
dictionary($a[]; .id) as $dict
| b
| map( .id as $id
| if ($dict|indictionary($id))
then .foo = ($dict|lookup($id).foo)
else . end)

Parsing multiple key/values in json tree with jq

Using jq, I'd like to cherry-pick key/value pairs from the following json:
{
"project": "Project X",
"description": "This is a description of Project X",
"nodes": [
{
"name": "server001",
"detail001": "foo",
"detail002": "bar",
"networks": [
{
"net_tier": "network_tier_001",
"ip_address": "10.1.1.10",
"gateway": "10.1.1.1",
"subnet_mask": "255.255.255.0",
"mac_address": "00:11:22:aa:bb:cc"
}
],
"hardware": {
"vcpu": 1,
"mem": 1024,
"disks": [
{
"disk001": 40,
"detail001": "foo"
},
{
"disk002": 20,
"detail001": "bar"
}
]
},
"os": "debian8",
"geo": {
"region": "001",
"country": "Sweden",
"datacentre": "Malmo"
},
"detail003": "baz"
}
],
"detail001": "foo"
}
For the sake of an example, I'd like to parse the following keys and their values: "Project", "name", "net_tier", "vcpu", "mem", "disk001", "disk002".
I'm able to parse individual elements without much issue, but due to the hierarchical nature of the full parse, I've not had much luck parsing down different branches (i.e. both networks and hardware > disks).
Any help appreciated.
Edit:
For clarity, the output I'm going for is a comma-separated CSV. In terms of parsing all combinations, covering the sample data in the example will do for now. I will hopefully be able to expand on any suggestions.
Here is a different filter which computes the unique set of network tier and disk names and then generates a result with columns appropriate to the data.
{
tiers: [ .nodes[].networks[].net_tier ] | unique
, disks: [ .nodes[].hardware.disks[] | keys[] | select(startswith("disk")) ] | unique
} as $n
| def column_names($n): [ "project", "name" ] + $n.tiers + ["vcpu", "mem"] + $n.disks ;
def tiers($n): [ $n.tiers[] as $t | .networks[] | if .net_tier==$t then $t else null end ] ;
def disks($n): [ $n.disks[] as $d | map(select(.[$d]!=null)|.[$d])[0] ] ;
def rows($n):
.project as $project
| .nodes[]
| .name as $name
| tiers($n) as $tier_values
| .hardware
| .vcpu as $vcpu
| .mem as $mem
| .disks
| disks($n) as $disk_values
| [$project, $name] + $tier_values + [$vcpu, $mem] + $disk_values
;
column_names($n), rows($n)
| #csv
The benfit of this approach becomes apparent if we add another node to the sample data:
{
"name": "server002",
"networks": [
{
"net_tier": "network_tier_002"
}
],
"hardware": {
"vcpu": 1,
"mem": 1024,
"disks": [
{
"disk002": 40,
"detail001": "foo"
}
]
}
}
Sample Run (assuming filter in filter.jq and amended data in data.json)
$ jq -Mr -f filter.jq data.json
"project","name","network_tier_001","network_tier_002","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001","",1,1024,40,20
"Project X","server002",,"network_tier_002",1,1024,,40
Try it online!
Here's one way you could achieve the desired output.
program.jq:
["project","name","net_tier","vcpu","mem","disk001","disk002"],
[.project]
+ (.nodes[] | .networks[] as $n |
[
.name,
$n.net_tier,
(.hardware |
.vcpu,
.mem,
(.disks | add["disk001","disk002"])
)
]
)
| #csv
$ jq -r -f program.jq input.json
"project","name","net_tier","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001",1,1024,40,20
Basically, you'll want to project the fields that you want into arrays so you may convert those arrays to csv rows. Your input makes it seem like there could potentially be multiple networks for a given node. So if you wanted to output all combinations, that would have to be flattened out.
Here's another approach, that is short enough to speak for itself:
def s(f): first(.. | f? // empty) // null;
[s(.project), s(.name), s(.net_tier), s(.vcpu), s(.mem), s(.disk001), s(.disk002)]
| #csv
Invocation:
$ jq -r -f value-pairs.jq input.json
Result:
"Project X","server001","network_tier_001",1,1024,40,20
With headers
Using the same s/1 as above:
. as $d
| ["project", "name", "net_tier", "vcpu", "mem", "disk001","disk002"]
| (., map( . as $v | $d | s(.[$v])))
| #csv
With multiple nodes
Again with s/1 as above:
.project as $p
| ["project", "name", "net_tier", "vcpu", "mem", "disk001","disk002"] as $h
| ($h,
(.nodes[] as $d
| $h
| map( . as $v | $d | s(.[$v]) )
| .[0] = $p)
) | #csv
Output with the illustrative multi-node data:
"project","name","net_tier","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001",1,1024,40,20
"Project X","server002","network_tier_002",1,1024,,40

Deep JSON merge

I have multiple JSON files that I'd like to merge into one.
Some have the same root element but different children. I don't want to overwrite the children but too extend them if they have the same parent element.
I've tried this answer, but it doesn't work:
jq: error (at file2.json:0): array ([{"title":"...) and array ([{"title":"...) cannot be multiplied
Sample files and wanted result (Gist)
Thank you in advance.
Here is a recursive solution which uses group_by(.key) to decide
which objects to combine. This could be a little simpler if .children
were more uniform. Sometimes it's absent in the sample data and sometimes it's the unusual value [{}].
def merge:
def kids:
map(
.children
| if length<1 then empty else .[] end
)
| if length<1 then {} else {children:merge} end
;
def mergegroup:
{
title: .[0].title
, key: .[0].key
} + kids
;
if .==[{}] then .
else group_by(.key) | map(mergegroup)
end
;
[ .[] | .[] ] | merge
When run with the -s option as follows
jq -M -s -f filter.jq file1.json file2.json
It produces the following output.
[
{
"title": "Title1",
"key": "12345678",
"children": [
{
"title": "SubTitle2",
"key": "123456713",
"children": [
{}
]
},
{
"title": "SubTitle1",
"key": "12345679",
"children": [
{
"title": "SubSubTitle1",
"key": "12345610"
},
{
"title": "SubSubTitle2",
"key": "12345611"
},
{
"title": "DifferentSubSubTitle1",
"key": "12345612"
}
]
}
]
}
]
If the ordering of the objects within the .children matters
then an a sort_by can be added to the {children:merge} expression,
e.g. {children:merge|sort_by(.key)}
Here is something that will reproduce your desired result. It's by no means automatic, It's really a proof of concept at this stage.
One liner:
jq -s '. as $in | ($in[0][].children[].children + $in[1][].children[0].children | unique) as $a1 | $in[1][].children[1] as $s1 | $in[0] | .[0].children[0].children = ($a1) | .[0].children += [$s1]' file1.json file2.json
Multi line breakdown (Copy/Paste):
jq -s '. as $in
| ($in[0][].children[].children + $in[1][].children[0].children
| unique) as $a1
| $in[1][].children[1] as $s1
| $in[0]
| .[0].children[0].children = ($a1)
| .[0].children += [$s1]' file1.json file2.json
Where:
$in : file1.json and file2.json combined input
$a1: merged "SubSubTitle" array
$s1: second subtitle object
I suspect the reason this didn't work was because your schema is different and has nested arrays.
I find it quite hypnotic looking at this, it would be good if you could elaborate a bit on how fixed the structure is and what the requirements are.

Get the index of the array element in JSON with jq

I have the following type of json:
{
"foo": "hello",
"bar": [
{
"key": "k1",
"val": "v1"
},
{
"key": "k2",
"val": "v2"
},
{
"key": "k3",
"val": "v3"
}
]
}
I want to output the following:
"hello", 1, "k1", "v1"
"hello", 2, "k2", "v2"
"hello", 3, "k3", "v3"
I am using jq to tranform this and the answer should also be with a jq transformation.
I am currently at:
echo '{"foo": "hello","bar": [{"key": "k1","val": "v1"},{"key": "k2","val": "v2"},{"key": "k3","val": "v3"} ]}' | jq -c -r '.bar[] as $b | [.foo, ($b | .key, .val)] | #csv'
Which gives me:
"hello","k1","v1"
"hello","k2","v2"
"hello","k3","v3"
How can I also get the index to show of the array element being parsed?
You could convert the array to entries to access the index and the value. Then you can build out the CSV rows.
$ jq -r '[.foo] + (.bar | to_entries[] | [.key+1,.value.key,.value.val]) | #csv' input.json
"hello",1,"k1","v1"
"hello",2,"k2","v2"
"hello",3,"k3","v3"
Assuming you have access to jq 1.5 and that the key/val keys are presented in that order:
jq -r '.foo as $foo
| foreach .bar[] as $i (0; .+1; [$foo, .] + [$i[]])
| #csv'
would produce:
"hello",1,"k1","v1"
"hello",2,"k2","v2"
"hello",3,"k3","v3"
The -r option is often used with #csv to convert the JSON string that would otherwise be produced by #csv into a comma-separated list of values.
If you really want to join with ", ", then it's a bit messier, but if you're not worried about the functionality that #csv provides, here's one way:
$ jq -r '"\"\(.foo)\"" as $foo
| foreach .bar[] as $i
(0; .+1; "\($foo), \(.), \($i | map("\"\(.)\"")|join(", "))")'
This produces:
"hello", 1, "k1", "v1"
"hello", 2, "k2", "v2"
"hello", 3, "k3", "v3"
If your jq does not have foreach then you could similarly use reduce, but it might be easier to upgrade.