how to flatten a tree and build a path from properties? - json

My goal is it to flatten a filesystem like structure (nested directories) with history information for individual files into a csv file for further processing. Here is what i tried so far.
The simplified input looks like this:
{ "dirs": [
{
"name": "documents",
"files": [
{
"name": "foo.bar",
"history": [
{ "hash": "123", "timestamp": "..."},
{ "hash": "234", "timestamp": "..."}
]
}
],
"subDirs": [
{ "name": "tmp", "files": [...], "subDirs": [...]
}
]
}
]}
The tricky part is that the csv file should contain full directory paths, not only the directory name. The desired output looks like this:
"documents","foo.bar","123","..."
"documents","foo.bar","234","..."
"documents","bar.baz","345","..."
"documents","bar.baz","456","..."
"documents/tmp","deleteme","567","..."
"documents/tmp","deleteme","678","..."
flattening most of the data by using recurse works using this query:
.dirs[] | recurse(.subDirs[]?) | . as $d | $d.files[]? as $f | $f.history[]? as $h | [$d.name, $f.name, $h.hash, $h.timestamp] | #csv
...but i cannot wrap my head around how i can preserve build the directory path. Any suggestions would be much appreciated.

Here's an approach that neither uses recursion explicitly (*) nor relies on a recursive structure:
def names($path):
reduce getpath($path[0:range(0; $path|length)]) as $v ("";
if $v | type == "object" and has("name") then . + "/" + $v["name"] else . end) ;
paths as $p
| getpath($p) as $v
| select($v | objects | has("history"))
| [names($p), getpath($p + ["name"])]
+ ($v["history"][] | [.hash, .timestamp] )
| #csv
This produces "absolute" paths (e.g. "/documents"); omitting the leading "/" can be accomplished easily enough.
(*) paths is defined recursively but in a way that takes advantage of jq's tail-call optimization (TCO), which is only applied to arity-0 recursive functions.

I think you need to define a custom recursive function for this, like below; which assumes that all files have a non-empty history.
def f(pfix):
( [ pfix, .name ] | join("/") ) as $path |
( .files[] | .history[] as $hist | [ $path, .name, $hist[] ] ),
( .subDirs[] | f($path) );
.dirs[] | f(empty) | #csv

Related

"Transpose"/"Rotate"/"Flip" JSON elements

I would like to "transpose" (not sure that's the right word) JSON elements.
For example, I have a JSON file like this:
{
"name": {
"0": "fred",
"1": "barney"
},
"loudness": {
"0": "extreme",
"1": "not so loud"
}
}
... and I would like to generate a JSON array like this:
[
{
"name": "fred",
"loudness": "extreme"
},
{
"name": "barney",
"loudness": "not so loud"
}
]
My original JSON has many more first level elements than just "name" and "loudness", and many more names, features, etc.
For this simple example I could fully specify the transformation like this:
$ echo '{"name":{"0":"fred","1":"barney"},"loudness":{"0":"extreme","1":"not so loud"}}'| \
> jq '[{"name":.name."0", "loudness":.loudness."0"},{"name":.name."1", "loudness":.loudness."1"}]'
[
{
"name": "fred",
"loudness": "extreme"
},
{
"name": "barney",
"loudness": "not so loud"
}
]
... but this isn't feasible for the original JSON.
How can jq create the desired output while being key-agnostic for my much larger JSON file?
Yes, transpose is an appropriate word, as the following makes explicit.
The following generic helper function makes for a simple solution that is completely agnostic about the key names, both of the enclosing object and the inner objects:
# Input: an array of values
def objectify($keys):
. as $in | reduce range(0;length) as $i ({}; .[$keys[$i]] = $in[$i]);
Assuming consistency of the ordering of the inner keys
Assuming the key names in the inner objects are given in a consistent order, a solution can now obtained as follows:
keys_unsorted as $keys
| [.[] | [.[]]] | transpose
| map(objectify($keys))
Without assuming consistency of the ordering of the inner keys
If the ordering of the inner keys cannot be assumed to be consistent, then one approach would be to order them, e.g. using this generic helper function:
def reorder($keys):
. as $in | reduce $keys[] as $k ({}; .[$k] = $in[$k]);
or if you prefer a reduce-free def:
def reorder($keys): [$keys[] as $k | {($k): .[$k]}] | add;
The "main" program above can then be modified as follows:
keys_unsorted as $keys
| (.[$keys[0]]|keys_unsorted) as $inner
| map_values(reorder($inner))
| [.[] | [.[]]] | transpose
| map(objectify($keys))
Caveat
The preceding solution only considers the key names in the first inner object.
Building upon Peak's solution, here is an alternative based on group_by to deal with arbitrary orders of inner keys.
keys_unsorted as $keys
| map(to_entries[])
| group_by(.key)
| map(with_entries(.key = $keys[.key] | .value |= .value))
Using paths is a good idea as pointed out by Hobbs. You could also do something like this :
[ path(.[][]) as $p | { key: $p[0], value: getpath($p), id: $p[1] } ]
| group_by(.id)
| map(from_entries)
This is a bit hairy, but it works:
. as $data |
reduce paths(scalars) as $p (
[];
setpath(
[ $p[1] | tonumber, $p[0] ];
( $data | getpath($p) )
)
)
First, capture the top level as $data because . is about to get a new value in the reduce block.
Then, call paths(scalars) which gives a key path to all of the leaf nodes in the input. e.g. for your sample it would give ["name", "0"] then ["name", "1"], then ["loudness", "0"], then ["loudness", "1"].
Run a reduce on each of those paths, starting the reduction with an empty array.
For each path, construct a new path, in the opposite order, with numbers-in-strings turned into real numbers that can be used as array indices, e.g. ["name", "0"] becomes [0, "name"].
Then use getpath to get the value at the old path in $data and setpath to set a value at the new path in . and return it as the next . for the reduce.
At the end, the result will be
[
{
"name": "fred",
"loudness": "extreme"
},
{
"name": "barney",
"loudness": "not so loud"
}
]
If your real data structure might be two levels deep then you would need to replace [ $p[1] | tonumber, $p[0] ] with a more appropriate expression to transform the path. Or maybe some of your "values" are objects/arrays that you want to leave alone, in which case you probably need to replace paths(scalars) with something like paths | select(length == 2).

JQ Recursive Tree expansion

I am attempting to parse a JSON structure to extract a dependency path, for use in an automation script.
The structure of this JSON is extracted to a format like this:
[
{
"Id": "abc",
"Dependencies": [
]
},
{
"Id": "def",
"Dependencies": [
"abc"
]
},
{
"Id": "ghi",
"Dependencies": [
"def"
]
}
]
Note: Lots of other irrelevant fields removed.
The plan is to be able to pass into my JQ command the Id of one of these and get back out a list.
Eg:
Input: abc
Expected Output: []
Input: def
Expected Output: ["abc"]
Input: ghi
Expected Output: ["abc", "def"]
Currently have a jq script like this (https://jqplay.org/s/NAhuXNYXXO):
jq
'. as $original | .[] |
select(.Id == "INPUTVARIABLE") |
[.Dependencies[]] as $level1Dep | [$original[] | select( [ .Id == $level1Dep[] ] | any )] as $level1Full | $level1Full[] |
[.Dependencies[]] as $level2Dep | [$original[] | select ( [ .Id == $level2Dep[] ] | any )] as $level2Full |
[$level1Dep[], $level2Dep[]]'
Input: abc
Output: empty
Input: def
Output: ["abc"]
Input: ghi
Output: ["def","abc"]
Great! However, as you can see this is not particularly scale-able and will only handle two dependency levels (https://jqplay.org/s/Zs0xIvJ2Zn), and also falls apart horribly when there are multiple dependencies on an item (https://jqplay.org/s/eB9zHQSH2r).
Is there a way of constructing this within JQ or do I need to move out to a different language?
I know that the data cannot have circular dependencies, it is pulled from a database that enforces this.
It's trivial then. Reduce your input JSON down to an object where each Id and corresponding Dependencies array are paired, and walk through it aggregating dependencies using a recursive function.
def deps($depdb; $id):
def _deps($id): $depdb[$id] // empty
| . + map(_deps(.)[]);
_deps($id);
deps(map({(.Id): .Dependencies}) | add; $fid)
Invocation:
jq -c --arg fid 'ghi' -f prog.jq file
Online demo - arbitrary dependency levels
Online demo - multiple dependencies per Id
Here's a short program that handles circular dependencies efficiently and illustrates how a subfunction can be defined after the creation of a local variable (here, $next) for efficiency:
def dependents($x):
(map( {(.Id): .Dependencies}) | add) as $next
# Input: array of dependents computed so far
# Output: array of all dependents
| def tc($x):
($next[$x] - .) as $new
| if $new == [] then .
else (. + $new | unique)
# avoid calling unique again:
| . + ([tc($new[])[]] - .)
end ;
[] | tc($x);
dependents($start)
Usage
With the given input and an invocation such as
jq --arg start START -f program.jq input.json
the output for various values of START is:
START output
abc []
def ["abc"]
ghi ["def", "abc"]
If the output must be sorted, then simply add a call to sort.

apply jq filter on sub level only, keeping the rest of the JSON intact

Given a nested object, is it possible to only apply a filter to a sub level of the json and then have as output the original json which includes the sub level to which one has applied the filter?
So for instance if I have a json like this:
{"k1": "v1",
"k2": "v2",
"k3": "v3",
"k4": [],
"records": [{
"kk1": "vv1",
"kk2": ["vv2"],
"releases": [{"kkk1":"vvv1"},
{"parties":[{"name":"value",
"kkkk1":"value"},
{"name":"value",
"kkkk1":"value"}]}],
"kk4": "vv4",
"kk5": "vv5"},
{
"kk1": "o_vv1",
"kk2": ["o_vv2"],
"releases": [{"kkk1":"o_vvv1"},
{"parties":[{"name":"o_value",
"kkkk1":"o_value"},
{"name":"o_value",
"kkkk1":"o_value"}]}],
"kk4": "o_vv4",
"kk5": "o_vv5"}],
"k6":"v6"}
I would like as output this same json, but with the nested objects in .[].records[].releases[].parties flattened, so the desired output should be:
previous keys
...
"records": [{
"kk1": "vv1",
"kk2": ["vv2"],
"releases": [{"kkk1":"vvv1"},
{"parties.0.name":"value",
"parties.0.kkkk1":"value",
"parties.1.name":"value",
"parties.1.kkkk1": "value"}]
"kk4": "vv4",
"kk5": "vv5"},
{
"kk1": "other_vv1",
"kk2": ["other_vv2"],
"releases": [{"kkk1":"vvv1"},
{"parties.0.name":"value",
"parties.0.kkkk1":"value",
"parties.1.name":"value",
"parties.1.kkkk1": "value"}]
"kk4": "other_vv4",
"kk5": "other_vv5"},
...
following keys
Edit:
the current jq command :
jq '.[].records[].releases[].parties| . as $ho | reduce paths(scalars) as $path ({}; . + {($path | map(tostring) | join(".")): $ho | getpath($path)})'
if you want to replace only a sub part you can use |= instead of | , just after .parties
So i think this will achieve your goal :
.[].records[].releases[].parties |=
(. as $ho
| reduce paths(scalars) as $path ({};
. + {($path | map(tostring) | join(".")):
$ho | getpath($path)}))

How to recurse with jq on nested JSON where each object has a name property?

I have a nested JSON object where each level has the same property key and what distinguishes each level is a property called name. If I want to traverse down to a level which has a particular "path" of name properties, how would I formulate the jq filter?
Here is some sample JSON data that represents a file system's directory structure:
{
"subs": [
{
"name": "aaa",
"subs": [
{
"name": "bbb",
"subs": [
{
"name": "ccc",
"subs": [
{
"name": "ddd",
"payload": "xyz"
}
]
}
]
}
]
}
]
}
What's a jq filter for obtaining the value of the payload in the "path" aaa/bbb/ccc/ddd?
Prior research:
jq - select objects with given key name - helpful but looks for any element in the JSON which contains the specified name whereas I'm looking for an element that's nested under a set of objects who also have specific names.
http://arjanvandergaag.nl/blog/wrestling-json-with-jq.html - helpful in section 4 where it shows how to extract an object having a property name having a particular value. However, the recursion performed is based a specific known set of property names ("values[].links.clone[]"). In my case, my equivalent is just "subs[].subs[].subs[]".
Here is the basis for a generic solution:
def descend(name): .subs[] | select(.name == name);
So your particular query could be formulated as follows:
descend( "aaa") | descend( "bbb") | descend( "ccc") | descend( "ddd") | .payload
Or slightly better, still using the above definition of descend:
def path(array):
if (array|length)==0 then .
else descend(array[0]) | path(array[1:])
end;
path( ["aaa", "bbb", "ccc", "ddd"] ) | .payload
TCO
The above recursive definition of path/1 is simple enough but would be unsuitable for very deeply nested data structures, e.g. if the depth is greater than 1000. Here is an alternative definition that takes advantage of jq's tail-call optimization, and that therefore runs very quickly:
def atpath(array):
[array, .]
| until( .[0] == []; .[0] as $a | .[1] | descend($a[0]) | [$a[1:], . ] )
| .[1];
.aaa.bbb.ccc.ddd
If you want to be able to use the .aaa.bbb.ccc.ddd notation, one approach would be to begin by "flattening" the data:
def flat:
{ (.name): (if .subs then (.subs[] | flat) else .payload end) };
Since the top-level element does not have a "name" tag, the query would then be:
.subs[] | flat | .aaa.bbb.ccc.ddd
Here is a more efficient approach, once again using descend defined above:
def payload(p):
def get($array):
if $array == []
then .payload
else descend($array[0]) | get($array[1:]) end;
get( null | path(p) );
payload( .aaa.bbb.ccc.ddd )
The filter in the following jq command recurses down a "path" of objects that have name properties which correspond to the "path" aaa/bbb/ccc/ddd:
jq '.subs[] | select(.name = "aaa") | .subs[] | select(.name = "bbb") | .subs[] | select(.name = "ccc") | .subs[] | .payload'
Here it is live on qplay.org:
https://jqplay.org/s/tblW7UX0Si

How to use jq to find all paths to a certain key

In a very large nested json structure I'm trying to find all of the paths that end in a key.
ex:
{
"A": {
"A1": {
"foo": {
"_": "_"
}
},
"A2": {
"_": "_"
}
},
"B": {
"B1": {}
},
"foo": {
"_": "_"
}
}
would print something along the lines of:
["A","A1","foo"], ["foo"]
Unfortunately I don't know at what level of nesting the keys will appear, so I haven't been able to figure it out with a simple select. I've gotten close with jq '[paths] | .[] | select(contains(["foo"]))', but the output contains all the permutations of any tree that contains foo.
output: ["A", "A1", "foo"]["A", "A1", "foo", "_"]["foo"][ "foo", "_"]
Bonus points if I could keep the original data structure format but simply filter out all paths that don't contain the key (in this case the sub trees under "foo" wouldn't need to be hidden).
With your input:
$ jq -c 'paths | select(.[-1] == "foo")'
["A","A1","foo"]
["foo"]
Bonus points:
(1) If your jq has tostream:
$ jq 'fromstream(tostream| select(.[0]|index("foo")))'
Or better yet, since your input is large, you can use the streaming parser (jq -n --stream) with this filter:
fromstream( inputs|select( (.[0]|index("foo"))))
(2) Whether or not your jq has tostream:
. as $in
| reduce (paths(scalars) | select(index("foo"))) as $p
(null; setpath($p; $in|getpath($p)))
In all three cases, the output is:
{
"A": {
"A1": {
"foo": {
"_": "_"
}
}
},
"foo": {
"_": "_"
}
}
I had the same fundamental problem.
With (yaml) input like:
developer:
android:
members:
- alice
- bob
oncall:
- bob
hr:
members:
- charlie
- doug
this:
is:
really:
deep:
nesting:
members:
- example deep nesting
I wanted to find all arbitrarily nested groups and get their members.
Using this:
yq . | # convert yaml to json using python-yq
jq '
. as $input | # Save the input for later
. | paths | # Get the list of paths
select(.[-1] | tostring | test("^(members|oncall|priv)$"; "ix")) | # Only find paths which end with members, oncall, and priv
. as $path | # save each path in the $path variable
( $input | getpath($path) ) as $members | # Get the value of each path from the original input
{
"key": ( $path | join("-") ), # The key is the join of all path keys
"value": $members # The value is the list of members
}
' |
jq -s 'from_entries' | # collect kv pairs into a full object using slurp
yq --sort-keys -y . # Convert back to yaml using python-yq
I get output like this:
developer-android-members:
- alice
- bob
developer-android-oncall:
- bob
hr-members:
- charlie
- doug
this-is-really-deep-nesting-members:
- example deep nesting