"Transposing" objects in jq - json

I'm unsure if "transpose" is the correct term here, but I'm looking to use jq to transpose a 2-dimensional object such as this:
[
{
"name": "A",
"keys": ["k1", "k2", "k3"]
},
{
"name": "B",
"keys": ["k2", "k3", "k4"]
}
]
I'd like to transform it to:
{
"k1": ["A"],
"k2": ["A", "B"],
"k3": ["A", "B"],
"k4": ["A"],
}
I can split out the object with .[] | {key: .keys[], name} to get a list of keys and names, or I could use .[] | {(.keys[]): [.name]} to get a collection of key–value pairs {"k1": ["A"]} and so on, but I'm unsure of the final concatenation step for either approach.
Are either of these approaches heading in the right direction? Is there a better way?

This should work:
map({ name, key: .keys[] })
| group_by(.key)
| map({ key: .[0].key, value: map(.name) })
| from_entries
The basic approach is to convert each object to name/key pairs, regroup them by key, then map them out to entries of an object.
This produces the following output:
{
"k1": [ "A" ],
"k2": [ "A", "B" ],
"k3": [ "A", "B" ],
"k4": [ "B" ]
}

Here is a simple solution that may also be easier to understand. It is based on the idea that a dictionary (a JSON object) can be extended by adding details about additional (key -> value) pairs:
# input: a dictionary to be extended by key -> value
# for each key in keys
def extend_dictionary(keys; value):
reduce keys[] as $key (.; .[$key] += [value]);
reduce .[] as $o ({}; extend_dictionary($o.keys; $o.name) )
$ jq -c -f transpose-object.jq input.json
{"k1":["A"],"k2":["A","B"],"k3":["A","B"],"k4":["B"]}

Here is a better solution for the case that all the values of "name"
are distinct. It is better because it uses a completely generic
filter, invertMapping; that is, invertMapping could be a built-in or
library function. With the help of this function, the solution
becomes a simple three-liner.
Furthermore, if the values of "name" are not all unique, then the solution
below can easily be tweaked by modifying the initial reduction of the input
(i.e. the line immediately above the invocation of invertMapping).
# input: a JSON object of (key, values) pairs, in which "values" is an array of strings;
# output: a JSON object representing the inverse relation
def invertMapping:
reduce to_entries[] as $pair
({}; reduce $pair.value[] as $v (.; .[$v] += [$pair.key] ));
map( { (.name) : .keys} )
| add
| invertMapping

Related

semantics of map on a sequence of objects in jq

Suppose I have a file fruit.json containing the following lines:
[
{
"name": "apple",
"color": "red",
"price": 20
},
{
"name": "banana",
"color": "yellow",
"price": 15
},
{
"name": "pineapple",
"color": "orange",
"price": 53
}
]
If I do jq '. | map(.)' fruit.json then I get the original data. That's expected. The second . refers to an element in the entire array.
However if I do jq '.[] | map(.)' fruit.json then I get this:
[
"apple",
"red",
20
]
[
"banana",
"yellow",
15
]
[
"pineapple",
"orange",
53
]
Can someone please explain what's going on? Specifically,
The [] after . strips away the brackets from the input array. Do
we have a name for the [] operator? The manual seems to treat it as
something very basic without definition.
Do we have a name for the resulting thing by appending [] to .?
Obviously it's not an object. If we do jq '.[]' fruit.json
we can see that it looks very similar to an array.
But apparently it behaves quite differently.
Why is it the case that the map function seems to go two
levels inside instead of one? This is
more obvious if we do jq '.[] | map(. | length)' fruit.json
and see that the . inside the map
function refers to the value part of an (object) element of the input array.
Thank you all in advance!
.[] produces the values of the array or object given to it.
For example,
[ "a", "b", "c" ] | .[]
is equivalent to
[ "a", "b", "c" ] | .[0], .[1], .[2]
and produces three strings: a, b and c.
map( ... )
is equivalent to
[ .[] | ... ]
This means that
map( . ) ≡ [ .[] | . ] ≡ [ .[] ]
For an array, that means
map( . ) ≡ [ .[0], .[1], ... ] ≡ .
For an object, that means
map( . ) ≡ [ .["key1"], .["key2"], ... ]
The [] after . strips away the brackets from the input array.
There are no brackets. jq programs don't deal with JSON text, but the data structure it represents.
When given an array or object, .[] produces the values of the elements of that array or object.
Do we have a name for the [] operator?
The docs call it the Array/Object Value Iterator, but it's really just a specific usage of the indexing operator.
The Array/Object Value Iterator is ascribed to .[] in the docs, but that's not accurate. It doesn't have to be . before it, but an expression must precede it. This distinguishes it from array construction operator.
In technical terms,
[] as a circumfix operator ([ EXPR ]) is the array construction operator, and
[] as a postfix operator (EXPR [ EXPR? ]) is the indexing operator, and it's specifically called the the array/object value iterator when there's nothing in the brackets.
Do we have a name for the resulting thing by appending [] to .? Obviously it's not an object. If we do jq '.[]' fruit.json we can see that it looks very similar to an array. But apparently it behaves quite differently.
We call that a stream.
I'm not sure what to call the components of the stream. I usually use "value".
For example,
"a", "b", "c" // Produces a stream of three values.
"abc" / "" | .[] // Same
When serialized to a file with one value per line (as you would get using -c), it's called "JSON lines" with a suggested naming convention of .jsonl.
Why is it the case that the map function seems to go two levels inside instead of one? This is more obvious if we do jq '.[] | map(. | length)' fruit.json and see that the . inside the map function refers to the value part of an (object) element of the input array.
No, just one.
In that example,
The .[] iterates over the values of the array.
The map iterates over the values of the objects.

jq with multiple select statements and an array

I've got some JSON like the following (I've filtered the output here):
[
{
"Tags": [
{
"Key": "Name",
"Value": "example1"
},
{
"Key": "Irrelevant",
"Value": "irrelevant"
}
],
"c7n:MatchedFilters": [
"tag: example_tag_rule"
],
"another_key": "another_value_I_dont_want"
},
{
"Tags": [
{
"Key": "Name",
"Value": "example2"
}
],
"c7n:MatchedFilters": [
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
}
]
I'd like to create a csv file with the value within the Name key and all of the "c7n:MatchedFilters" in the array. I've made a few attempts but still can't get quite the output I expect. There's some example code and the output below:
#Prints the key that I'm after.
cat new.jq | jq '.[] | [.Tags[], {"c7n:MatchedFilters"}] | .[] | select(.Key=="Name")|.Value'
"example1"
"example2"
#Prints all the filters in an array I'm after.
cat new.jq | jq -r '.[] | [.Tags[], {"c7n:MatchedFilters"}] | .[] | select(."c7n:MatchedFilters") | .[]'
[
"tag: example_tag_rule"
]
[
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
#Prints *all* the tags (including ones I don't want) and all the filters in the array I'm after.
cat new.jq | jq '.[] | [.Tags[], {"c7n:MatchedFilters"}] | select((.[].Key=="Name") and (.[]."c7n:MatchedFilters"))'
[
{
"Key": "Name",
"Value": "example1"
},
{
"Key": "Irrelevant",
"Value": "irrelevant"
},
{
"c7n:MatchedFilters": [
"tag: example_tag_rule"
]
}
]
[
{
"Key": "Name",
"Value": "example2"
},
{
"c7n:MatchedFilters": [
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
}
]
I hope this makes sense, let me know if I've missed anything.
Your attempts are not working because you start out with [.Tags[], {"c7n:MatchedFilters"}] to construct one array containing all the tags and an object containing the filters. You are then struggling to find a way to process this entire array at once because it jumbles together these unrelated things without any distinction. You will find it much easier if you don't combine them in the first place!
You want to find the single tag with a Key of "Name". Here's one way to find that:
first(
.Tags[]|
select(.Key=="Name")
).Value as $name
By using a variable binding we can save it for later and worry about constructing the array separately.
You say (in the comments) that you just want to concatenate the filters with spaces. You can do that easily enough:
(
."c7n:MatchedFilters"|
join(" ")
) as $filters
You can combine all this together like follows. Note that each variable binding leaves the input stream unchanged, so it's easy to compose everything.
jq --raw-output '
.[]|
first(
.Tags[]|
select(.Key=="Name")
).Value as $name|
(
."c7n:MatchedFilters"|
join(" ")
) as $filters|
[$name, $filters]|
#csv
Hopefully that's easy enough to read and separates out each concept. We break up the array into a stream of objects. For each object, we find the name and bind it to $name, we concatenate the filters and bind them to $filters, then we construct an array containing both, then we convert the array to a CSV string.
We don't need to use variables. We could just have a big array constructor wrapped around the expression to find the name and the expression to find the filters. But I hope you can see the variables make things a bit flatter and easier to understand.

Use JQ to select specific, arbitrarily nested objects from JSON

I'm looking for efficient means to search through an large JSON object for "sub-objects" that match a filter (via select(), I imagine). However, the top-level JSON is an object with arbitrary nesting contained within, including more simple values, objects and arrays of objects. For example:
{
"name": "foo",
"class": "system",
"description": "top-level-thing",
"configuration": {
"status": "normal",
"uuid": "id"
},
"children": [
{
"id": "c1",
"class": "c1",
"children": [
{
"id": "c1.1",
"class": "c1.1"
},
{
"id": "c1.1",
"class": "FINDME"
}
]
},
{
"id": "c2",
"class": "FINDME"
}
],
"thing": {
"id": "c3",
"class": "FINDME"
}
}
I have a solution which does part of what I want (and is understandable):
jq -r '.. | arrays | .[] | select(.class=="FINDME"?) | .id'
which returns:
c2
c1.1
... however, it misses c3, plus it changes the order of items output. Additionally I'm expecting this to operate on potentially very large JSON structures, I would like to make sure I find an efficient solution. Bonus points for something that remains readable by jq neophytes (myself included).
FWIW, references I was using to help me on the way, in case they help others:
Select objects based on value of variable in object using jq
How to use jq to find all paths to a certain key
Recursive search values by key
For small to modest-sized JSON input, you're on the right track with ..
but it seems you want to select objects, like so:
.. | objects | select(.class=="FINDME"?) | .id
For JSON documents that are very large, this might require too much memory, so it may be worth knowing about jq's streaming parser. Unfortunately it's much more difficult to use, so I'd suggest trying the above, and if you're interested, look in the usual places for documentation about the --stream option.
Here's a streaming-parser solution. To make sense of it, you'll need to read up on the --stream option, but the key is that the output includes lines of the form: [PATH, VALUE]
program.jq
foreach inputs as $in (null;
if has("id") and has("class") then null
else . as $x
| $in
| if length != 2 then null
elif .[0][-1] == "id" then ($x + {id: .[-1]})
elif .[0][-1] == "class"
and .[-1] == "FINDME" then ($x + {class: .[-1]})
else $x
end
end;
select(has("id") and has("class")) | .id )
Invocation
jq -n --stream -f program.jq input.json
Output with sample input
"c1.1"
"c2"
"c3"

Getting only desired properties from nested array values with jq

The structure I ultimately want would be:
{
"catalog": [
{
"name": "X",
"catalog": [
{ "name": "Y", "uniqueId": "Z" },
{ "name": "Q", "uniqueId": "B" }
]
}
]
}
This is what the existing structure looks like except there are many other properties at each level (https://gist.github.com/ajcrites/e0e0ca4ca3a08ff2dc401ec872e6094c). I just want to filter those out and get a JSON format that looks specifically like this.
I have started out with: jq '.catalog', but this returns only the array. I still want the catalog property name there. I can do this with jq '{catalog: .catalog[]}, but this prints out each catalog object individually which makes the whole output invalid JSON. I still want the properties to be in the array. Is there a way to filter specific property key-values within arrays using jq?
The following transforms the given input to the desired output and may well be what you want:
{catalog}
| .catalog |= map( {name, catalog} )
| .catalog[].catalog |= map( {name, uniqueId} )
| .catalog |= .[0:1]
However, it's not clear to me that this is really what you want, as you don't discuss the duplication in the given JSON input. So maybe you don't really want the last line in the above, or maybe you want duplicates to be handled in some other way, or ....
Anyway, the trick to keeping things simple here is to use |=.
An alternative approach would be to use del to delete the unwanted properties (rather than selecting the ones you want), but in the present case, that would be (at best) tedious.
You could start by using tostream to convert your sample.json
into a stream of [path, value] arrays as you can see by running
jq -c tostream sample.json
This will generate
[["catalog",0,"catalog",0,"name"],"Y"]
[["catalog",0,"catalog",0,"prop11"],""]
[["catalog",0,"catalog",0,"uniqueId"],"Z"]
[["catalog",0,"catalog",0,"uniqueId"]]
[["catalog",0,"catalog",1,"name"],"Y"]
[["catalog",0,"catalog",1,"prop11"],""]
...
reduce and setpath can be used to convert back into the
original form with a filter such as:
reduce (tostream|select(length==2)) as [$p,$v] (
{};
setpath($p;$v)
)
Adding conditionals makes it easy to omit properties at any level.
For example the following removes leaf attributes starting with "prop":
reduce (tostream|select(length==2)) as [$p,$v] (
{};
if $p[-1]|startswith("prop")
then .
else setpath($p;$v)
end
)
With your sample.json this produces
{
"catalog": [
{
"catalog": [
{
"name": "Y",
"uniqueId": "Z"
},
{
"name": "Y",
"uniqueId": "Z"
}
],
"name": "X"
},
{
"catalog": [
{
"name": "Y",
"uniqueId": "Z"
},
{
"name": "Y",
"uniqueId": "Z"
}
],
"name": "X"
}
]
}
If the goal is to remove certain properties, then one could do so using walk/1. For example, to remove properties whose names start with "prop":
walk(if type == "object"
then with_entries(select(.key|startswith("prop") | not))
else . end)
The same approach would also be applicable if the focus is on retaining certain properties, e.g.:
walk(if type == "object"
then with_entries(select(.key == "name" or .key == "uniqueId" or .key == "catalog"))
else . end)
You could build up a file that contains paths into the json (expressed as arrays) that you want to keep. Then filter out values that do not fit in those paths.
paths.json:
["catalog","name"]
["catalog","catalog","name"]
["catalog","catalog","uniqueId"]
Then filter values based on their paths. Using streams is a great way to go for this since it gives you access to these paths directly:
$ jq --slurpfile paths paths.json '
def keep_path($path): any($paths[]; . == [$path[] | select(strings)]);
fromstream(tostream | select(length == 1 or keep_path(.[0])))
' input.json

JSON fields have the same name

In practice, keys have to be unique within a JSON object (e.g. Does JSON syntax allow duplicate keys in an object?). However, suppose I have a file with the following contents:
{
"a" : "1",
"b" : "2",
"a" : "3"
}
Is there a simple way of converting the repeated keys to an array? So that the file becomes:
{
"a" : [ {"key": "1"}, {"key": "3"}],
"b" : "2"
}
Or something similar, but which combines the repeated keys into an array (or finds and alternative way to extract the repeated key values).
Here's a solution in Java: Convert JSON object with duplicate keys to JSON array
Is there any way to do it with awk/bash/python?
If your input is really a flat JSON object with primitives as values, this should work:
jq -s --stream 'group_by(.[0]) | map({"key": .[0][0][0], "value": map(.[1])}) | from_entries'
{
"a": [
"1",
"3"
],
"b": [
"2"
]
}
For more complex outputs, that would require actually understanding how --stream is supposed to be used, which is beyond me.
Building on Santiago's answer using -s --stream, the following filter builds up the object one step at a time, thus preserving the order of the keys and of the values for a specific key:
reduce (.[] | select(length==2)) as $kv ({};
$kv[0][0] as $k
|$kv[1] as $v
| (.[$k]|type) as $t
| if $t == "null" then .[$k] = $v
elif $t == "array" then .[$k] += [$v]
else .[$k] = [ .[$k], $v ]
end)
For the given input, the result is:
{
"a": [
"1",
"3"
],
"b": "2"
}
To illustrate that the ordering of values for each key is preserved, consider the following input:
{
"c" : "C",
"a" : "1",
"b" : "2",
"a" : "3",
"b" : "1"
}
The output produced by the filter above is:
{
"c": "C",
"a": [
"1",
"3"
],
"b": [
"2",
"1"
]
}
Building up on peak's answer, the following filter also works on multi object-input, with nested objects and without the slurp-Option (-s).
This is not an answer to the initial question, but because the jq-FAQ links here it might be useful for some visitors
File jqmergekeys.txt
def consumestream($arr): # Reads stream elements from stdin until we have enough elements to build one object and returns them as array
input as $inp
| if $inp|has(1) then consumestream($arr+[$inp]) # input=keyvalue pair => Add to array and consume more
elif ($inp[0]|has(1)) then consumestream($arr) # input=closing subkey => Skip and consume more
else $arr end; # input=closing root object => return array
def convert2obj($stream): # Converts an object in stream notation into an object, and merges the values of duplicate keys into arrays
reduce ($stream[]) as $kv ({}; # This function is based on http://stackoverflow.com/a/36974355/2606757
$kv[0] as $k
| $kv[1] as $v
| (getpath($k)|type) as $t # type of existing value under the given key
| if $t == "null" then setpath($k;$v) # value not existing => set value
elif $t == "array" then setpath($k; getpath($k) + [$v] ) # value is already an array => add value to array
else setpath($k; [getpath($k), $v ]) # single value => put existing and new value into an array
end);
def mainloop(f): (convert2obj(consumestream([input]))|f),mainloop(f); # Consumes streams forever, converts them into an object and applies the user provided filter
def mergeduplicates(f): try mainloop(f) catch if .=="break" then empty else error end; # Catches the "break" thrown by jq if there's no more input
#---------------- User code below --------------------------
mergeduplicates(.) # merge duplicate keys in input, without any additional filters
#mergeduplicates(select(.layers)|.layers.frame) # merge duplicate keys in input and apply some filter afterwards
Example:
tshark -T ek | jq -nc --stream -f ./jqmergekeys.txt
Here's a simple alternative that generalizes well:
reshape.jq
def augmentpath($path; $value):
getpath($path) as $v
| setpath($path; $v + [$value]);
reduce (inputs | select(length==2)) as $pv
({}; augmentpath($pv[0]; $pv[1]) )
Usage
jq -n -f reshape.jq input.json
Output
With the given input:
{
"a": [
"1",
"3"
],
"b": [
"2"
]
}
Postscript
If it's important to avoid arrays of singletons, either the def of augmentpath could be modified, or a postprocessing step could be added.