Use jq to parse key path for each each leaf - json

(I’m not sure the technical terms to use but can update the question if someone can clarify the terminology I’m lacking for what I'm trying to do. It might help someone find this answer in the future.)
Given the input JSON, how would I use jq to produce the expected output?
Input:
{
"items": {
"item1": {
"part1": {
"a": {
"key1": "value",
"key2": "value"
},
"b": {
"key1": "value",
"key2": "value"
}
},
"part2": {
"c": {
"key1": "value",
"key2": "value"
},
"d": {
"key1": "value",
"key2": "value"
}
}
},
"item2": {
"part3": {
"e": {
"key1": "value",
"key2": "value"
},
"f": {
"key1": "value",
"key2": "value"
}
},
"part4": {
"g": {
"key1": "value",
"key2": "value"
},
"h": {
"key1": "value",
"key2": "value"
}
}
}
}
}
Expected output:
{
"item1": [
"part1.a",
"part1.b",
"part2.c",
"part2.d"
]
"item2": [
"part3.e",
"part3.f"
"part4.g",
"part4.h"
]
}

Try this:
.items | map_values([path(.[][]) | join(".")])
Online demo
Each output path will contain as many path components as the number of []s in the .[][] part; in other words, if you change .[][] to .[][][], for example, you'll see part1.a.key1, part1.a.key2, etc.

This would do it:
# Output: a stream
def keyKey:
keys_unsorted[] as $k | $k + "." + (.[$k] | keys_unsorted[]);
.items | map_values([keyKey])

Some aspects are underspecified. For instance, you don't specify how deep the aggregation should go for the array items. Is it always two levels deep, or is it the whole tree but the last level?
Here's one way how you would go two levels deep with the keys sorted alphabetically:
jq '.items | .[] |= [keys[] as $k | $k + "." + (.[$k] | keys[])]'
Demo
Here's another way how to go down until the second-to-last level:
jq '.items | .[] |= ([path(.. | scalars)[:-1] | join(".")] | unique)'
Demo
Output:
{
"item1": [
"part1.a",
"part1.b",
"part2.c",
"part2.d"
],
"item2": [
"part3.e",
"part3.f",
"part4.g",
"part4.h"
]
}

the unique sequence of jq paths of 'keys' to each and every leaf
is returned from json2jqpath.jq
json2jqpath.jq dat.json
.
.items
.items|.item1
.items|.item1|.part1
.items|.item1|.part1|.a
.items|.item1|.part1|.a|.key1
.items|.item1|.part1|.a|.key2
.items|.item1|.part1|.b
.items|.item1|.part1|.b|.key1
.items|.item1|.part1|.b|.key2
.items|.item1|.part2
.items|.item1|.part2|.c
.items|.item1|.part2|.c|.key1
.items|.item1|.part2|.c|.key2
.items|.item1|.part2|.d
.items|.item1|.part2|.d|.key1
.items|.item1|.part2|.d|.key2
.items|.item2
.items|.item2|.part3
.items|.item2|.part3|.e
.items|.item2|.part3|.e|.key1
.items|.item2|.part3|.e|.key2
.items|.item2|.part3|.f
.items|.item2|.part3|.f|.key1
.items|.item2|.part3|.f|.key2
.items|.item2|.part4
.items|.item2|.part4|.g
.items|.item2|.part4|.g|.key1
.items|.item2|.part4|.g|.key2
.items|.item2|.part4|.h
.items|.item2|.part4|.h|.key1
.items|.item2|.part4|.h|.key2
It is not the output you asked for but as another noted, your question may be somewhat under specified. starting from a preprocessed structure such as this has the advantage of reducing every json file to its set of paths to start fiddling with.
json2jqpath

Related

jq - get dict element based on key regex

I'm working with a JSON object having the following structure:
{
"key-foo-1.0": [
{
"key1": "foo",
"key2": "bar",
"id": "01"
},
{
"key1": "foo",
"key2": "bar",
"id": "23"
}
],
"key-bar-1.0": [
{
"key1": "foo",
"key2": "bar",
"id": "45"
},
{
"key1": "foo",
"key2": "bar",
"id": "67"
}
],
"key-baz-1.0": [
{
"key1": "foo",
"key2": "bar",
"id": "89"
}
]
}
I want to get all the id values where the "parent" key name matches the pattern .*foo.* or .*bar.*.
So in my example something like this:
cat json | jq <some filter>
01
23
45
67
Based on https://unix.stackexchange.com/questions/443884/match-keys-with-regex-in-jq I tried:
$ cat json | jq 'with_entries(if (.key|test(".*foo.*$")) then ( {key: .key, value: .value } ) else empty end )'
{
"key-foo-1.0": [
{
"key1": "foo",
"key2": "bar",
"id": "01"
},
{
"key1": "foo",
"key2": "bar",
"id": "23"
}
]
}
But I don't really know how to continue.
I also think there is a better/simpler solution.
You could go with:
jq -r '.[keys_unsorted[] | select(test(".*foo.*|.bar.."))][].id'
01
23
45
67
This gathers all keys using keys_unsorted, then selects those matching the regular expression in test. The wrapping .[…] descends into them, the following [] iterates over the children, and .id outputs the values as raw text using the -r flag.
you can use the following JQ expression:
jq 'to_entries[] | select(.key | test(".*foo.*|.*bar.*")) | .value[] | .id'
JQ playground example

Select objects from file A where value of path appear in file B in jq

I want to filter with jq the objects from json content of this fileA
[
{
"id": "bird",
"content": {
"key1": "a"
}
},
{
"id": "dog",
"content": {
"key1": "b"
}
},
{
"id": "cat",
"content": {
"key1": "c"
}
}
]
Where the id appear in this json content of fileB called theId (the sort order has no importance) :
[
{
"theId": "cat"
},
{
"theId": "bird"
}
]
Expected result (the sort order has no importance) :
[
{
"id": "cat",
"content": {
"key1": "c"
}
},
{
"id": "bird",
"content": {
"key1": "a"
}
}
]
I think I can do this in a bash loop :
looping on ids from fileB
execute jq to extract the given id such as
jq -c "map(select(.id | contains(\"$id\")))"
but I have to separate them with , which seems dirty.
I don't know how to say to jq the filter is composed of values of the given array which is stored in fileB
Is it possible ?
Here is one way:
$ jq 'map(.theId) as $ids | input | map(select(.id | IN($ids[])))' fileB fileA
[
{
"id": "bird",
"content": {
"key1": "a"
}
},
{
"id": "cat",
"content": {
"key1": "c"
}
}
]
A simple solution using --slurpfile:
jq --slurpfile b fileB 'map(select(.id|IN($b[][].theId)))' fileA

Transforming json file using jq so data ends up under a common object

I have some json data i get from an API that i need to transform using jq into another json format for later use with ansible as part of an inventory script.
What i have is something like:
{
"results": [
{
"name": "hostname1",
"key1": "somevalue1",
"key2": "somevalue2",
"key3": "somevalue3"
},
{
"name": "hostname2",
"key1": "somevalue12",
"key2": "somevalue22",
"key3": "somevalue32"
},
{
"name": "hostname3",
"key1": "somevalue13",
"key2": "somevalue23",
"key3": "somevalue33"
}
]
}
and i need to transform this to look like this:
{
"_meta": {
"hostvars": {
"hostname1": {
"name": "hostname1",
"key1": "somevalue1",
"key2": "somevalue2",
"key3": "somevalue3"
},
"hostname2": {
"name": "hostname2",
"key1": "somevalue12",
"key2": "somevalue22",
"key3": "somevalue32"
},
"hostname3": {
"name": "hostname3",
"key1": "somevalue13",
"key2": "somevalue23",
"key3": "somevalue33"
}
}
}
}
Things i have not been able to figure out.
If i use something like:
cat input.json | jq '{_meta: { hostvars: { (.results[].name): (.results[]) } }}'
Then _meta and hostvars is repeated for each object in the input and that is not at all what i want, i need a common "header" and then the data under there.
Ideally i would like to also exclude the "name" part in the output since it is already used and duplicated, but this is just a bonus.
Advice on how to do this? or is the filter in jq always run against one object at a time?
I experimented a bit with --slurp but didn't get anywhere
The crucial part is: Take an array of objects and transform it into one (outer) object where the attribute names of that outer object are given by some attribute value of the inner objects. That's the job description for INDEX(filter).
So:
{ _meta : { hostvars: ( .results | INDEX(.name) ) } }
Using .results[] twice will iterate over the same list twice, giving you the cartesian product of each host with each of the other objects ( 3 x 3 = 9 ). You need to reference it once!
You can do something like below. The key to the logic is forming an array of objects, firstly by making the key name as .name and the value as the whole sub-object inside.
Once you have the array, you can un-wrap into a comma-separated list of objects using add.
{ _meta : { hostvars: ( ( .results | map( { ( .name ) : . } ) | add ) ) } }
Demo - jqplay
Yet another approach using from_entries could be:
jq '.results | map({key: .name, value: .}) | {_meta: {hostvars: from_entries}}'
{
"_meta": {
"hostvars": {
"hostname1": {
"name": "hostname1",
"key1": "somevalue1",
"key2": "somevalue2",
"key3": "somevalue3"
},
"hostname2": {
"name": "hostname2",
"key1": "somevalue12",
"key2": "somevalue22",
"key3": "somevalue32"
},
"hostname3": {
"name": "hostname3",
"key1": "somevalue13",
"key2": "somevalue23",
"key3": "somevalue33"
}
}
}
}
Demo
You'll probably want to take advantage of jq pipes.
cat input.json | jq '{ _meta : { hostvars : (.results | map({key : .name, value : del(. | .name) }) | from_entries) }}'
Map these results as if they were entries to a new set of entries where the key is .name, also removing the .name from .value as you do so:
.results | map({key : .name, value : del(. | .name) })
Then, form an object from the entries:
... | from_entries
NOTE: See Inian's answer for why the array syntax used by the OP does not work.

jq: sort object values

I want to sort this data structure by the object keys (easy with -S and sort the object values (the arrays) by the 'foo' property.
I can sort them with
jq -S '
. as $in
| keys[]
| . as $k
| $in[$k] | sort_by(.foo)
' < test.json
... but that loses the keys.
I've tried variations of adding | { "\($k)": . }, but then I end up with a list of objects instead of one object. I also tried variations of adding to $in (same problem) or using $in = $in * { ... }, but that gives me syntax errors.
The one solution I did find was to just have the separate objects and then pipe it into jq -s add, but ... I really wanted it to work the other way. :-)
Test data below:
{
"": [
{ "foo": "d" },
{ "foo": "g" },
{ "foo": "f" }
],
"c": [
{ "foo": "abc" },
{ "foo": "def" }
],
"e": [
{ "foo": "xyz" },
{ "foo": "def" }
],
"ab": [
{ "foo": "def" },
{ "foo": "abc" }
]
}
Maybe this?
jq -S '.[] |= sort_by(.foo)'
Output
{
"": [
{
"foo": "d"
},
{
"foo": "f"
},
{
"foo": "g"
}
],
"ab": [
{
"foo": "abc"
},
{
"foo": "def"
}
],
"c": [
{
"foo": "abc"
},
{
"foo": "def"
}
],
"e": [
{
"foo": "def"
},
{
"foo": "xyz"
}
]
}
#user197693 had a great answer. A suggestion I got in a private message elsewhere was to use
jq -S 'with_entries(.value |= sort_by(.foo))'
If for some reason using the -S command-line option is not a satisfactory option, you can also perform the by-key sort using the to_entries | sort_by(.key) | from_entries idiom. So a complete solution to the problem would be:
.[] |= sort_by(.foo)
| to_entries | sort_by(.key) | from_entries

jq map object key value to array of objects containing both

I would like to put an object parent key inside the object itself and convert each key value pair to an array
Given:
{
"field1": {
"key1": 11,
"key2": 10
},
"field2": {
"key1": 11,
"key2": 10
}
}
Desired output
[
{"name": "field1", "key1": 11, "key2": 10},
{"name": "field2", "key1": 11, "key2": 10}
]
I know that jq keys would give me ["field1", "field2"] and jq '[.[]]' would give
[
{ "key1": 11, "key2": 10 },
{ "key1": 11, "key2": 10 }
]
I cannot figure out a way to combine them, how should I go about it?
Generate an object in {"name": <key>} form for each key, and merge that with the key's value.
to_entries | map({name: .key} + .value)
or:
[keys_unsorted[] as $k | {name: $k} + .[$k]]
Something like below. Get the list of keys in the JSON using keys[] and add the new field name by indexing key on each object.
jq '[ keys[] as $k | { name: $k } + .[$k] ]'
If you want the ordering of keys maintained, use keys_unsorted[].