How to recursively merge inherited json array elements? - json

I have the following json file named CMakePresets.json that is a cmake-preset file:
{
"configurePresets": [
{
"name": "default",
"hidden": true,
"generator": "Ninja",
"binaryDir": "${sourceDir}/_build/${presetName}",
"cacheVariables": {
"YIO_DEV": "1",
"BUILD_TESTING": "1"
}
},
{
"name": "debug",
"inherits": "default",
"cacheVariables": {
"CMAKE_BUILD_TYPE": "Debug"
}
},
{
"name": "release",
"inherits": "default",
"binaryDir": "${sourceDir}/_build/Debug",
"cacheVariables": {
"CMAKE_BUILD_TYPE": "Release"
}
},
{
"name": "arm",
"inherits": "debug",
"cacheVariables": {
"CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/Toolchain/arm-none-eabi-gcc.cmake"
}
}
]
}
I want recursively merge with * the configurePresets elements that inherit themselves for a specific entry name. I have example a node with name arm and want to have resulting json object with resolved inheritance. The parent has the name stored inside .inherits of each element. arm inherits over debug which inherits over default.
I could write a bash shell loop that I believe works, with the help of Remove a key:value from an JSON object using jq and this answer:
input=arm
# extract one element
g() { jq --arg name "$1" '.configurePresets[] | select(.name == $name)' CMakePresets.json; };
# get arm element
acc=$(g "$input");
# If .inherits field exists
while i=$(<<<"$acc" jq -r .inherits) && [[ -n "$i" && "$i" != "null" ]]; do
# remove it from input
a=$(<<<"$acc" jq 'del(.inherits)');
# get parent element
b=$(g "$i");
# merge parent with current
acc=$(printf "%s\n" "$b" "$a" | jq -s 'reduce .[] as $item ({}; . * $item)');
done;
echo "$acc"
outputs, which I believe is the expected output for arm:
{
"name": "arm",
"hidden": true,
"generator": "Ninja",
"binaryDir": "${sourceDir}/_build/${presetName}",
"cacheVariables": {
"YIO_DEV": "1",
"BUILD_TESTING": "1",
"CMAKE_BUILD_TYPE": "Debug",
"CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/Toolchain/arm-none-eabi-gcc.cmake"
}
}
But I want to write it in jq. I tried and jq language is not intuitive for me. I can do it for example for two (ie. countable) elements:
< CMakePresets.json jq --arg name "arm" '
def g(n): .configurePresets[] | select(.name == n);
g($name) * (g($name) | .inherits) as $name2 | g($name2)
'
But I do not know how to do reduce .[] as $item ({}; . * $item) when the $item is really g($name) that depends on the last g($name) | .inherits. I tried reading jq manual and learning about variables and loops, but jq has a very different syntax. I tried to use while, but that's just syntax error that I do not understand and do not know how to fix. I guess while and until might not be right here, as they operate on previous loop output, while the elements are always from root.
$ < CMakePresets.json jq --arg name "arm" 'def g(n): .configurePresets[] | select(.name == n);
while(g($name) | .inherits as $name; g($name))
'
jq: error: syntax error, unexpected ';', expecting '|' (Unix shell quoting issues?) at <top-level>, line 2:
while(g($name) | .inherits as $name; g($name))
jq: 1 compile error
How to write such loop in jq language?

Assuming the inheritance hierarchy contains no loops, as is the case with the example, we can break the problem down into the pieces shown below:
# Use an inner function of arity 0 to take advantage of jq's TCO
def inherits_from($dict):
def from:
if .name == "default" then .
else $dict[.inherits] as $next
| ., ($next | from)
end;
from;
def chain($start):
INDEX(.configurePresets[]; .name) as $dict
| $dict[$start] | inherits_from($dict);
reduce chain("arm") as $x (null;
($x.cacheVariables + .cacheVariables) as $cv
| $x + .
| .cacheVariables = $cv)
| del(.inherits)
This produces the desired output efficiently.
One advantage of the above formulation of a solution is that it can easily be modified to handle circular dependencies.
Using recurse/1
inherits_from/1 could also be defined using the built-in function recurse/1:
def inherits_from($dict):
recurse( select(.name != "default") | $dict[.inherits]) ;
or perhaps more interestingly:
def inherits_from($dict):
recurse( select(.inherits) | $dict[.inherits]) ;
Using *
Using * to combine objects has a high overhead because of its recursive semantics, which is often either not required or not wanted. However,
if it is acceptable here to use * for combining the objects, the above can be simplified to:
def inherits_from($dict):
recurse( select(.inherits) | $dict[.inherits]) ;
INDEX(.configurePresets[]; .name) as $dict
| $dict["arm"]
| reduce inherits_from($dict) as $x ({}; $x * .)
| del(.inherits)

Writing a recursive function is actually simple, once you get the hang of it:
jq --arg name "$1" '
def _get_in(input; n):
(input[] | select(.name == n)) |
(if .inherits then .inherits as $n | _get_in(input; $n) else {} end) * .;
def get(name):
.configurePresets as $input | _get_in($input; name);
get($name)
' "$presetfile"
First I filter only .configurePresets then in a function I get input[] | select(.name == n) only the part I am interested in. Then if .inherits if it has inherits, then .inherits as $n | _get_in(input; $n) take the name in inherits and call itself again. Else return else {} end empty. Then that is * . merged with the result of input[] | select(.name == n) - the itself. So it recursively loads all the {} * (input[]|select()) * (input[]|select()) * (input[]|select()).

Related

How to conditionally select array index to update based on value

I have a json file as below, need to append the new name into .root.application.names, but if the name passed in has prefix (everything before -, in below example it's jr), then find the list with same prefix names already present, and update it, if there is only one names list or if there is no matching list, then update first list.
In the below example, if
$application == 'application1' and $name == <whatever>; just update first list under application1, as there is only one list under application1, nothing to choose from.
$application == 'application2' and if $name has no prefix delimiter "-" or unmatched prefix (say sr-allen); then update the first list under application2.names, because foo has no or unmatched prefix.
$application == 'application2' and say $name == jr-allen; then update the second list under application2, because $name has prefix "jr-" and there is a list with items matching this prefix.
{
"root": {
"application1": [
{
"names": [
"john"
],
"project": "generic"
}
],
"application2": [
{
"names": [
"peter",
"jack"
],
"project": "generic"
},
{
"names": [
"jr-sam",
"jr-mike",
"jr-rita"
],
"project": "junior-project"
}
]
}
}
I found how to update the list, not sure how to add these conditions, any help please?
jq '."root"."application2"[1].names[."root"."application2"[1].names| length] |= . + "jr-allen"' foo.json
Update:
good if I can do this with jq/walk, I am still trying as below, but couldn't get anywhere close.
prefix=$(echo ${name} | cut -d"-" -f1) # this gives the prefix, e.g: "jr"
jq -r --arg app "${application}" name "${name}" prefix "${prefix}"'
def walk(f):
. as $in
| if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | walk(f)) } ) | f
elif type == "array" then map( walk(f) ) | f
else f
end;
walk( if type=="object" and ."$app" and (.names[]|startswith("$prefix")) ) then .names[]="$name" else . end )
' foo.json
It took me a while to have any confidence that I have understood the requirements, but I believe the following at least captures the essence of what you have in mind.
To make the solution easier to understand, we begin with a helper function, the name of which makes its purpose clear enough, at least given the context:
def updateFirstNamesArrayWithMatchingPrefix($prefix; $value):
(first( range(0; length) as $i
| if any(.[$i].names[]; startswith($prefix))
then $i else empty end) // 0) as $i
| .[$i].names += [$value] ;
.root |=
if .[$app] | length == 1
then .[$app][0].names += [ $name ]
elif .[$app] | length > 1
then
( $name | split("-")) as $components
| if $components|length==1 # no prefix
then .[$app][0].names += [ $name ]
else ($components[0] + "-" ) as $prefix
| .[$app] |= updateFirstNamesArrayWithMatchingPrefix($prefix; $name)
end
else .
end
Testing
The above passes the four tests originally proposed by
#Inian:
jq --arg app "application1" --arg name "foo" -f script.jq jsonFile
jq --arg app "application2" --arg name "jr-foo" -f script.jq jsonFile
jq --arg app "application2" --arg name "sr-foo" -f script.jq jsonFile
jq --arg app "application2" --arg name "foo" -f script.jq jsonFile
Herrings?
Based on my understanding of the problem, it seems to me that walk may be a bit of a red herring, but if not, I hope you'll be able to adapt the above to meet your actual requirements.
Your requirement is: If prefix exists in name, update the last array element, else the first one.
When the array has one element, like for application1, the last is the first also.
#!/bin/bash
application="$1"
name="$2"
json_file="file.json"
ind=0
# if name matches "-", set index to the last array element
[[ "$name" == *"-"* ]] && ind=-1
jq --arg app "$application" \
--arg name "$name" \
--argjson ind "$ind" \
'.root[$app][$ind].names += [$name]' "$json_file"
I believe above script is self-explanatory enough, --argjson used for having an unquoted index, += stands for |= . +.
Testing
Commands below produce the expected result.
bash test.sh application1 jr-John
bash test.sh application2 jr-John
bash test.sh application1 Mary
bash test.sh application2 Mary

Map arrays to objects with no common fields

How might one use jq-1.5-1-a5b5cbe to join a filtered set of arrays from STDIN to a set of objects which contains no common fields, assuming that all elements will be in predictable order?
Standard Input (pre-slurpfile; generated by multiple GETs):
{"ref":"objA","arr":["alpha"]}
{"ref":"objB","arr":["bravo"]}
Existing File:
[{"name":"foo"},{"name":"bar"}]
Desired Output:
[{"name":"foo","arr":["alpha"]},{"name":"bar","arr":["bravo"]}]
Current Bash:
$ multiGET | jq --slurpfile stdin /dev/stdin '.[].arr = $stdin[].arr' file
[
{
"name": "foo",
"arr": [
"alpha"
]
},
{
"name": "bar",
"arr": [
"alpha"
]
}
]
[
{
"name": "foo",
"arr": [
"bravo"
]
},
{
"name": "bar",
"arr": [
"bravo"
]
}
]
Sidenote: I wasn't sure when to use pretty/compact JSON in this question; please comment with your opinion on best practice.
Get jq to read file before stdin, so that the first entity in file will be . and you can get everything else using inputs.
$ multiGET | jq -c '. as $objects
| [ foreach (inputs | {arr}) as $x (-1; .+1;
. as $i | $objects[$i] + $x
) ]' file -
[{"name":"foo","arr":["alpha"]},{"name":"bar","arr":["bravo"]}]
"Slurping" (whether using -s or --slurpfile) is sometimes necessary but rarely desirable, because of the memory requirements. So here's a solution that takes advantage of the fact that your multiGET produces a stream:
multiGET | jq -n --argjson objects '[{"name":"foo"},{"name":"bar"}]' '
$objects
| [foreach inputs as $in (-1; .+1;
. as $ix
| $objects[$ix] + ($in | del(.ref)))]
'
Here's a functional approach that might be appropriate if your stream was in fact already packaged as an array:
multiGET | jq -s --argjson objects '[{"name":"foo"},{"name":"bar"}]' '
[$objects, map(del(.ref))]
| transpose
| map(add)
'
If the $objects array is in a file or too big for the command line, I'd suggest using --argfile, even though it is technically deprecated.
If the $objects array is in a file, and if you want to avoid --argfile, you could still avoid slurping, e.g. by using the fact that unless -n is used, jq will automatically read one JSON entity from stdin:
(echo '[{"name":"foo"},{"name":"bar"}]';
multiGET) | jq '
. as $objects
| [foreach inputs as $in (-1; .+1;
. as $ix | $objects[$ix] + $in | del(.ref))]
'

Convert even odd index in array to key value pairs in json using jq

I'm trying to use jq to parse Solr 6.5 metrics into key value pairs:
{
"responseHeader": {
"status": 0,
"QTime": 7962
},
"metrics": [
"solr.core.shard1",
"QUERY./select",
"solr.core.shard2",
"QUERY./update"
...
]
}
I'd like to pick even odd entries in metrics array and put them together into a single object as key value pairs like this:
{
"solr.core.shard1": "QUERY./select",
"solr.core.shard2": "QUERY./update",
...
}
Till now, I am only able to come up with:
.metrics | to_entries | .[] | {(select(.key % 2 == 0).value): select(.key % 2 == 1).value}
But this returns an error or no results.
I'd be grateful if someone could point me in the right direction. I feel like the answer is probably in the map operator, but I haven't been able to figure it out.
jq solution:
jq '[ .metrics as $m | range(0; $m | length; 2)
| {($m[.]): $m[(. + 1)]} ] | add' jsonfile
The output:
{
"solr.core.shard1": "QUERY./select",
"solr.core.shard2": "QUERY./update"
}
https://stedolan.github.io/jq/manual/v1.5/#range(upto),range(from;upto)range(from;upto;by)
Here's a helper function which makes the solution trivial:
# Emit a stream consisting of pairs of items taken from `stream`
def pairwise(stream):
foreach stream as $i ([];
if length == 1 then . + [$i] else [$i] end;
select(length == 2));
From here there are several good options, e.g. we could start with:
.metrics
| [pairwise(.[]) | {(.[0]): .[1]}]
| add
With your input, this produces:
{
"solr.core.shard1": "QUERY./select",
"solr.core.shard2": "QUERY./update"
}
So you might want to write:
.metrics |= ([pairwise(.[]) | {(.[0]): .[1]}] | add)

Using jq or alternative command line tools to compare JSON files

Are there any command line utilities that can be used to find if two JSON files are identical with invariance to within-dictionary-key and within-list-element ordering?
Could this be done with jq or some other equivalent tool?
Examples:
These two JSON files are identical
A:
{
"People": ["John", "Bryan"],
"City": "Boston",
"State": "MA"
}
B:
{
"People": ["Bryan", "John"],
"State": "MA",
"City": "Boston"
}
but these two JSON files are different:
A:
{
"People": ["John", "Bryan", "Carla"],
"City": "Boston",
"State": "MA"
}
C:
{
"People": ["Bryan", "John"],
"State": "MA",
"City": "Boston"
}
That would be:
$ some_diff_command A.json B.json
$ some_diff_command A.json C.json
The files are not structurally identical
If your shell supports process substitution (Bash-style follows, see docs):
diff <(jq --sort-keys . A.json) <(jq --sort-keys . B.json)
Objects key order will be ignored, but array order will still matter. It is possible to work-around that, if desired, by sorting array values in some other way, or making them set-like (e.g. ["foo", "bar"] → {"foo": null, "bar": null}; this will also remove duplicates).
Alternatively, substitute diff for some other comparator, e.g. cmp, colordiff, or vimdiff, depending on your needs. If all you want is a yes or no answer, consider using cmp and passing --compact-output to jq to not format the output for a potential small performance increase.
Use jd with the -set option:
No output means no difference.
$ jd -set A.json B.json
Differences are shown as an # path and + or -.
$ jd -set A.json C.json
# ["People",{}]
+ "Carla"
The output diffs can also be used as patch files with the -p option.
$ jd -set -o patch A.json C.json; jd -set -p patch B.json
{"City":"Boston","People":["John","Carla","Bryan"],"State":"MA"}
https://github.com/josephburnett/jd#command-line-usage
Since jq's comparison already compares objects without taking into account key ordering, all that's left is to sort all lists inside the object before comparing them. Assuming your two files are named a.json and b.json, on the latest jq nightly:
jq --argfile a a.json --argfile b b.json -n '($a | (.. | arrays) |= sort) as $a | ($b | (.. | arrays) |= sort) as $b | $a == $b'
This program should return "true" or "false" depending on whether or not the objects are equal using the definition of equality you ask for.
EDIT: The (.. | arrays) |= sort construct doesn't actually work as expected on some edge cases. This GitHub issue explains why and provides some alternatives, such as:
def post_recurse(f): def r: (f | select(. != null) | r), .; r; def post_recurse: post_recurse(.[]?); (post_recurse | arrays) |= sort
Applied to the jq invocation above:
jq --argfile a a.json --argfile b b.json -n 'def post_recurse(f): def r: (f | select(. != null) | r), .; r; def post_recurse: post_recurse(.[]?); ($a | (post_recurse | arrays) |= sort) as $a | ($b | (post_recurse | arrays) |= sort) as $b | $a == $b'
Pulling in the best from the top two answers to get a jq based json diff:
diff \
<(jq -S 'def post_recurse(f): def r: (f | select(. != null) | r), .; r; def post_recurse: post_recurse(.[]?); (. | (post_recurse | arrays) |= sort)' "$original_json") \
<(jq -S 'def post_recurse(f): def r: (f | select(. != null) | r), .; r; def post_recurse: post_recurse(.[]?); (. | (post_recurse | arrays) |= sort)' "$changed_json")
This takes the elegant array sorting solution from https://stackoverflow.com/a/31933234/538507 (which allows us to treat arrays as sets) and the clean bash redirection into diff from https://stackoverflow.com/a/37175540/538507 This addresses the case where you want a diff of two json files and the order of the array contents is not relevant.
Here is a solution using the generic function walk/1:
# Apply f to composite entities recursively, and to atoms
def walk(f):
. as $in
| if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | walk(f)) } ) | f
elif type == "array" then map( walk(f) ) | f
else f
end;
def normalize: walk(if type == "array" then sort else . end);
# Test whether the input and argument are equivalent
# in the sense that ordering within lists is immaterial:
def equiv(x): normalize == (x | normalize);
Example:
{"a":[1,2,[3,4]]} | equiv( {"a": [[4,3], 2,1]} )
produces:
true
And wrapped up as a bash script:
#!/bin/bash
JQ=/usr/local/bin/jq
BN=$(basename $0)
function help {
cat <<EOF
Syntax: $0 file1 file2
The two files are assumed each to contain one JSON entity. This
script reports whether the two entities are equivalent in the sense
that their normalized values are equal, where normalization of all
component arrays is achieved by recursively sorting them, innermost first.
This script assumes that the jq of interest is $JQ if it exists and
otherwise that it is on the PATH.
EOF
exit
}
if [ ! -x "$JQ" ] ; then JQ=jq ; fi
function die { echo "$BN: $#" >&2 ; exit 1 ; }
if [ $# != 2 -o "$1" = -h -o "$1" = --help ] ; then help ; exit ; fi
test -f "$1" || die "unable to find $1"
test -f "$2" || die "unable to find $2"
$JQ -r -n --argfile A "$1" --argfile B "$2" -f <(cat<<"EOF"
# Apply f to composite entities recursively, and to atoms
def walk(f):
. as $in
| if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | walk(f)) } ) | f
elif type == "array" then map( walk(f) ) | f
else f
end;
def normalize: walk(if type == "array" then sort else . end);
# Test whether the input and argument are equivalent
# in the sense that ordering within lists is immaterial:
def equiv(x): normalize == (x | normalize);
if $A | equiv($B) then empty else "\($A) is not equivalent to \($B)" end
EOF
)
POSTSCRIPT: walk/1 is a built-in in versions of jq > 1.5, and can therefore be omitted if your jq includes it, but there is no harm in including it redundantly in a jq script.
POST-POSTSCRIPT: The builtin version of walk has recently been changed so that it no longer sorts the keys within an object. Specifically, it uses keys_unsorted. For the task at hand, the version using keys should be used.
There's an answer for this here that would be useful.
Essentially you can use the Git diff functionality (even for non-Git tracked files) which also includes colour in the output:
git diff --no-index payload_1.json payload_2.json
One more tool for those to which the previous answers are not a good fit, you can try jdd.
It's HTML based so you can either use it online at www.jsondiff.com or, if you prefer running it locally, just download the project and open the index.html.
Perhaps you could use this sort and diff tool: http://novicelab.org/jsonsortdiff/ which first sorts the objects semantically and then compares it. It is based on https://www.npmjs.com/package/jsonabc
If you also want to see the differences, using #Erik's answer as inspiration and js-beautify:
$ echo '[{"name": "John", "age": 56}, {"name": "Mary", "age": 67}]' > file1.json
$ echo '[{"age": 56, "name": "John"}, {"name": "Mary", "age": 61}]' > file2.json
$ diff -u --color \
<(jq -cS . file1.json | js-beautify -f -) \
<(jq -cS . file2.json | js-beautify -f -)
--- /dev/fd/63 2016-10-18 13:03:59.397451598 +0200
+++ /dev/fd/62 2016-10-18 13:03:59.397451598 +0200
## -2,6 +2,6 ##
"age": 56,
"name": "John Smith"
}, {
- "age": 67,
+ "age": 61,
"name": "Mary Stuart"
}]
In JSONiq, you can simply use the deep-equal function:
deep-equal(
{
"People": ["John", "Bryan", "Carla"],
"City": "Boston",
"State": "MA"
},
{
"People": ["Bryan", "John"],
"State": "MA",
"City": "Boston"
}
)
which returns
false
You can also read from files (locally or an HTTP URL also works) like so:
deep-equal(
json-doc("path to doc A.json"),
json-doc("path to doc B.json")
)
A possible implementation is RumbleDB.
However, you need to be aware that it is not quite correct that the first two documents are the same: JSON defines arrays as ordered lists of values.
["Bryan", "John"]
is not the same as:
["John", "Bryan"]

Transforming the name of key deeper in the JSON structure with jq

I have following json:
{
"vertices": [
{
"__cp": "foo",
"__type": "metric",
"__eid": "foobar",
"name": "Undertow Metrics~Sessions Created",
"_id": 45056,
"_type": "vertex"
},
...
]
"edges": [
...
and I would like to achieve this format:
{
"nodes": [
{
"cp": "foo",
"type": "metric",
"label": "metric: Undertow Metrics~Sessions Created",
"name": "Undertow Metrics~Sessions Created",
"id": 45056
},
...
]
"edges": [
...
So far I was able to create this expression:
jq '{nodes: .vertices} | del(.nodes[]."_type", .nodes[]."__eid")'
I.e. rename 'vertices' to 'nodes' and remove '_type' and '__eid', how can I rename a key nested deeper in the JSON?
You can change the names of properties of objects if you use with_entries(filter). This converts an object to an array of key/value pairs and applies a filter to the pairs and converts back to an object. So you would just want to update the key of those objects to your new names.
Depending on which version of jq you're using, the next part can be tricky. String replacement doesn't get introduced until jq 1.5. If that was available, you could then do this:
{
nodes: .vertices | map(with_entries(
.key |= sub("^_+"; "")
)),
edges
}
Otherwise if you're using jq 1.4, then you'll have to remove them manually. A recursive function can help with that since the number of underscores varies.
def ltrimall(str): str as $str |
if startswith($str)
then ltrimstr($str) | ltrimall(str)
else .
end;
{
nodes: .vertices | map(with_entries(
.key |= ltrimall("_")
)),
edges
}
The following program works with jq 1.4 or jq 1.5.
It uses walk/1 to remove leading underscores from any key, no matter where it occurs in the input JSON.
The version of ltrim provided here uses recurse/1 for efficiency and portability, but any suitable substitute may be used.
def ltrim(c):
reduce recurse( if .[0:1] == c then .[1:] else null end) as $x
(null; $x);
# Apply f to composite entities recursively, and to atoms
def walk(f):
. as $in
| if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | walk(f)) } ) | f
elif type == "array" then map( walk(f) ) | f
else f
end;
.vertices = .nodes
| del(.nodes)
| (.vertices |= walk(
if type == "object"
then with_entries( .key |= ltrim("_") )
else .
end ))
From your example data it looks like you intend lots of little manipulations so I'd break things out into stages like this:
.nodes = .vertices # \ first take care of renaming
| del(.vertices) # / .vertices to .nodes
| .nodes = [
.nodes[] # \ then scan each node
| . as $n # /
| del(._type, .__eid) # \ whatever key-specific tweaks like
| .label = "metric: \(.name)" # / calculating .label you want can go here
| reduce keys[] as $k ( # \
{}; # | final reduce to handle renaming
.[$k | sub("^_+";"")] = $n[$k] # | any keys that start with _
) # /
]