Merge arrays in object - json

I have an object that is just a bunch of arbitrary keys with each an array:
{
"foo": [
"hello",
"world"
],
"bar": [
"foobar"
]
}
How can I return the merged arrays in this object. The expected output would be:
[
"hello",
"world",
"foobar"
]

Create a list of the values and concatenate the elements in that list:
[.[]] | add
Create a list of each element in each array:
[.[][]]
I'd prefer the first one since it parses easier in my mind.

Generalizing a bit:
jq '[..|scalars]' input.json

Related

semantics of map on a sequence of objects in jq

Suppose I have a file fruit.json containing the following lines:
[
{
"name": "apple",
"color": "red",
"price": 20
},
{
"name": "banana",
"color": "yellow",
"price": 15
},
{
"name": "pineapple",
"color": "orange",
"price": 53
}
]
If I do jq '. | map(.)' fruit.json then I get the original data. That's expected. The second . refers to an element in the entire array.
However if I do jq '.[] | map(.)' fruit.json then I get this:
[
"apple",
"red",
20
]
[
"banana",
"yellow",
15
]
[
"pineapple",
"orange",
53
]
Can someone please explain what's going on? Specifically,
The [] after . strips away the brackets from the input array. Do
we have a name for the [] operator? The manual seems to treat it as
something very basic without definition.
Do we have a name for the resulting thing by appending [] to .?
Obviously it's not an object. If we do jq '.[]' fruit.json
we can see that it looks very similar to an array.
But apparently it behaves quite differently.
Why is it the case that the map function seems to go two
levels inside instead of one? This is
more obvious if we do jq '.[] | map(. | length)' fruit.json
and see that the . inside the map
function refers to the value part of an (object) element of the input array.
Thank you all in advance!
.[] produces the values of the array or object given to it.
For example,
[ "a", "b", "c" ] | .[]
is equivalent to
[ "a", "b", "c" ] | .[0], .[1], .[2]
and produces three strings: a, b and c.
map( ... )
is equivalent to
[ .[] | ... ]
This means that
map( . ) ≡ [ .[] | . ] ≡ [ .[] ]
For an array, that means
map( . ) ≡ [ .[0], .[1], ... ] ≡ .
For an object, that means
map( . ) ≡ [ .["key1"], .["key2"], ... ]
The [] after . strips away the brackets from the input array.
There are no brackets. jq programs don't deal with JSON text, but the data structure it represents.
When given an array or object, .[] produces the values of the elements of that array or object.
Do we have a name for the [] operator?
The docs call it the Array/Object Value Iterator, but it's really just a specific usage of the indexing operator.
The Array/Object Value Iterator is ascribed to .[] in the docs, but that's not accurate. It doesn't have to be . before it, but an expression must precede it. This distinguishes it from array construction operator.
In technical terms,
[] as a circumfix operator ([ EXPR ]) is the array construction operator, and
[] as a postfix operator (EXPR [ EXPR? ]) is the indexing operator, and it's specifically called the the array/object value iterator when there's nothing in the brackets.
Do we have a name for the resulting thing by appending [] to .? Obviously it's not an object. If we do jq '.[]' fruit.json we can see that it looks very similar to an array. But apparently it behaves quite differently.
We call that a stream.
I'm not sure what to call the components of the stream. I usually use "value".
For example,
"a", "b", "c" // Produces a stream of three values.
"abc" / "" | .[] // Same
When serialized to a file with one value per line (as you would get using -c), it's called "JSON lines" with a suggested naming convention of .jsonl.
Why is it the case that the map function seems to go two levels inside instead of one? This is more obvious if we do jq '.[] | map(. | length)' fruit.json and see that the . inside the map function refers to the value part of an (object) element of the input array.
No, just one.
In that example,
The .[] iterates over the values of the array.
The map iterates over the values of the objects.

jq with multiple select statements and an array

I've got some JSON like the following (I've filtered the output here):
[
{
"Tags": [
{
"Key": "Name",
"Value": "example1"
},
{
"Key": "Irrelevant",
"Value": "irrelevant"
}
],
"c7n:MatchedFilters": [
"tag: example_tag_rule"
],
"another_key": "another_value_I_dont_want"
},
{
"Tags": [
{
"Key": "Name",
"Value": "example2"
}
],
"c7n:MatchedFilters": [
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
}
]
I'd like to create a csv file with the value within the Name key and all of the "c7n:MatchedFilters" in the array. I've made a few attempts but still can't get quite the output I expect. There's some example code and the output below:
#Prints the key that I'm after.
cat new.jq | jq '.[] | [.Tags[], {"c7n:MatchedFilters"}] | .[] | select(.Key=="Name")|.Value'
"example1"
"example2"
#Prints all the filters in an array I'm after.
cat new.jq | jq -r '.[] | [.Tags[], {"c7n:MatchedFilters"}] | .[] | select(."c7n:MatchedFilters") | .[]'
[
"tag: example_tag_rule"
]
[
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
#Prints *all* the tags (including ones I don't want) and all the filters in the array I'm after.
cat new.jq | jq '.[] | [.Tags[], {"c7n:MatchedFilters"}] | select((.[].Key=="Name") and (.[]."c7n:MatchedFilters"))'
[
{
"Key": "Name",
"Value": "example1"
},
{
"Key": "Irrelevant",
"Value": "irrelevant"
},
{
"c7n:MatchedFilters": [
"tag: example_tag_rule"
]
}
]
[
{
"Key": "Name",
"Value": "example2"
},
{
"c7n:MatchedFilters": [
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
}
]
I hope this makes sense, let me know if I've missed anything.
Your attempts are not working because you start out with [.Tags[], {"c7n:MatchedFilters"}] to construct one array containing all the tags and an object containing the filters. You are then struggling to find a way to process this entire array at once because it jumbles together these unrelated things without any distinction. You will find it much easier if you don't combine them in the first place!
You want to find the single tag with a Key of "Name". Here's one way to find that:
first(
.Tags[]|
select(.Key=="Name")
).Value as $name
By using a variable binding we can save it for later and worry about constructing the array separately.
You say (in the comments) that you just want to concatenate the filters with spaces. You can do that easily enough:
(
."c7n:MatchedFilters"|
join(" ")
) as $filters
You can combine all this together like follows. Note that each variable binding leaves the input stream unchanged, so it's easy to compose everything.
jq --raw-output '
.[]|
first(
.Tags[]|
select(.Key=="Name")
).Value as $name|
(
."c7n:MatchedFilters"|
join(" ")
) as $filters|
[$name, $filters]|
#csv
Hopefully that's easy enough to read and separates out each concept. We break up the array into a stream of objects. For each object, we find the name and bind it to $name, we concatenate the filters and bind them to $filters, then we construct an array containing both, then we convert the array to a CSV string.
We don't need to use variables. We could just have a big array constructor wrapped around the expression to find the name and the expression to find the filters. But I hope you can see the variables make things a bit flatter and easier to understand.

Create a new json string from jq output elements

My jq command returns objects in brackets but without comma separators. But I would like to create a new json string from it.
This call finds all elements of arr that have a FooItem in them and then returns texts from the nested array at index 3:
jq '.arr[] | select(index("FooItem")) | .[3].texts'
on this json (The original has more elements ):
{
"arr": [
[
"create",
"w199",
"FooItem",
{
"index": 0,
"texts": [
"aBarfoo",
"avalue"
]
}
],
[
"create",
"w200",
"NoItem",
{
"index": 1,
"val": 5,
"hearts": 5
}
],
[
"create",
"w200",
"FooItem",
{
"index": 1,
"texts": [
"mybarfoo",
"bValue"
]
}
]
]
}
returns this output:
[
"aBarfoo",
"avalue"
]
[
"mybarfoo",
"bValue"
]
But I'd like to create a new json from these objects that looks like this:
{
"arr": [
[
"aBarfoo",
"avalue"
],
[
"mybarfoo",
"bValue"
]
]
}
Can jq do this?
EDIT
One more addition: Considering that texts also has strings of zero length, how would you delete those/not have them in the result?
"texts": ["",
"mybarfoo",
"bValue",
""
]
You can always embed a stream of (zero or more) JSON entities within some other JSON structure by decorating the stream, that is, in the present case, by wrapping the STREAM as follows:
{ arr: [ STREAM ] }
In the present case, however, we can also take the view that we are simply editing the original document, and accordingly use a variation of the map(select(...)) idiom:
.arr |= map( select(index("FooItem")) | .[3].texts)
This latter approach ensures that the context of the "arr" key is preserved.
Addendum
To filter out the empty strings, simply add another map(select(...)):
.arr |= map( select(index("FooItem"))
| .[3].texts | map(select(length>0)))

Flatten JSON with jq retaining key names

I'm trying to flatten a JSON consisting of nested objects. The top layer contains several key/value pairs, where each value is itself an array of a number of objects (the bottom layer).
What I would like to get, using jq, is simply an array of objects containing all the objects of the bottom layer, each of which with an additional key/value pair identifying the top-layer key it originally belonged to.
In other words, I would like to turn a JSON
{
"key1": [obj1, obj2],
"key2": [obj3]
}
into a plain array
[OBJ1, OBJ2, OBJ3]
where each OBJi is simply the original object with an extra key/value pair
"parent-key-name": keyx
where keyx would be the top-layer key obji belonged to, i.e. "key1" for obj1 and obj2, and "key2" for obj3.
I'm struggling with the fact that when referencing the objects in the bottom layer, e.g. via .[], jq does not seem to have inbuilt functionality to access associated top-layer information. However, I'm new to jq, and hope there is an easy solution after all.
Given the following input :
{
"key1": [{"name":"Emma"},{"name":"Bob"}],
"key2": [{"name":"Jean"}]
}
You can divide your items to entries, store the key in a variable and add the value for each item in value object:
jq '[ to_entries[] | .key as $parent | .value[] |
.["parent-key-name"] |= (.+ $parent) ] ' test.json
which gives the following output :
[
{
"name": "Emma",
"parent-key-name": "key1"
},
{
"name": "Bob",
"parent-key-name": "key1"
},
{
"name": "Jean",
"parent-key-name": "key2"
}
]
The solution presented below consists of two steps, each of which might be helpful separately, e.g. if someone wants to "flatten" the JSON in a slightly different way.
First, let's make the changes to obj[i] "in-place":
with_entries( .key as $k | .value[] |= ( . + {"parent-key-name": $k} ) )
Example:
$ jq -n -c -f program.jq
Input:
{
"key1": [{a:1}, {a:2}],
"key2": [{b:3}]
}
Output:
{
"key1": [
{
"a": 1,
"parent-key-name": "key1"
},
{
"a": 2,
"parent-key-name": "key1"
}
],
"key2": [
{
"b": 3,
"parent-key-name": "key2"
}
]
}
To flatten, simply append | [.[]] to the above filter. This produces:
[[{"a":1,"parent-key-name":"key1"},{"a":2,"parent-key-name":"key1"}],[{"b":3,"parent-key-name":"key2"}]]

How to make a cartesian product in jq?

Let say I have input
[
{
"a":1,
"b":2
},
{
"a":3,
"b":4
}
]
and I tried,
echo '[{"a": 1, "b": 2}, {"a": 3, "b": 4}]' | jq '[{x: .[].a, y: .[].b}]'
and I would like to get
[
{
"x":1,
"b":2,
"language":"en"
},
{
"x":1,
"b":2,
"language":"fr"
}...
]
Meaning I need to for all items in array I need output two items, one with added "lanuage": "en" key value pair and one with "lanuage": "ru"
EDIT. In case it's not clear enoug. I need a catesian product of input array of objects is with another array xs, which would give me pairs (i, x). For each pair I want to output object that have all (key, value) pairs of i plus some key (language in my case) with with value of x.
In general, any expression that generates multiple values combined with another expression that generates multiple values will create a cartesian product.
i.e.,
"\(1,2) \(3,4)"
generates strings "1 3", "2 3", "1 4", and "2 4".
You can do the same given an array of values. [] will take the array and generate a result for each of the items. So combining these concepts, you could do something like this:
$ jq --argjson langs '["en","fr"]' '[(.[]|{x:.a,b}) + {language:$langs[]}]' input.json
But this could further be reduced to simply:
$ jq --argjson langs '["en","fr"]' '[.[]|{x:.a,b,language:$langs[]}]' input.json
or
$ jq --argjson langs '["en","fr"]' 'map({x:.a,b,language:$langs[]})' input.json