How to make a cartesian product in jq? - json

Let say I have input
[
{
"a":1,
"b":2
},
{
"a":3,
"b":4
}
]
and I tried,
echo '[{"a": 1, "b": 2}, {"a": 3, "b": 4}]' | jq '[{x: .[].a, y: .[].b}]'
and I would like to get
[
{
"x":1,
"b":2,
"language":"en"
},
{
"x":1,
"b":2,
"language":"fr"
}...
]
Meaning I need to for all items in array I need output two items, one with added "lanuage": "en" key value pair and one with "lanuage": "ru"
EDIT. In case it's not clear enoug. I need a catesian product of input array of objects is with another array xs, which would give me pairs (i, x). For each pair I want to output object that have all (key, value) pairs of i plus some key (language in my case) with with value of x.

In general, any expression that generates multiple values combined with another expression that generates multiple values will create a cartesian product.
i.e.,
"\(1,2) \(3,4)"
generates strings "1 3", "2 3", "1 4", and "2 4".
You can do the same given an array of values. [] will take the array and generate a result for each of the items. So combining these concepts, you could do something like this:
$ jq --argjson langs '["en","fr"]' '[(.[]|{x:.a,b}) + {language:$langs[]}]' input.json
But this could further be reduced to simply:
$ jq --argjson langs '["en","fr"]' '[.[]|{x:.a,b,language:$langs[]}]' input.json
or
$ jq --argjson langs '["en","fr"]' 'map({x:.a,b,language:$langs[]})' input.json

Related

semantics of map on a sequence of objects in jq

Suppose I have a file fruit.json containing the following lines:
[
{
"name": "apple",
"color": "red",
"price": 20
},
{
"name": "banana",
"color": "yellow",
"price": 15
},
{
"name": "pineapple",
"color": "orange",
"price": 53
}
]
If I do jq '. | map(.)' fruit.json then I get the original data. That's expected. The second . refers to an element in the entire array.
However if I do jq '.[] | map(.)' fruit.json then I get this:
[
"apple",
"red",
20
]
[
"banana",
"yellow",
15
]
[
"pineapple",
"orange",
53
]
Can someone please explain what's going on? Specifically,
The [] after . strips away the brackets from the input array. Do
we have a name for the [] operator? The manual seems to treat it as
something very basic without definition.
Do we have a name for the resulting thing by appending [] to .?
Obviously it's not an object. If we do jq '.[]' fruit.json
we can see that it looks very similar to an array.
But apparently it behaves quite differently.
Why is it the case that the map function seems to go two
levels inside instead of one? This is
more obvious if we do jq '.[] | map(. | length)' fruit.json
and see that the . inside the map
function refers to the value part of an (object) element of the input array.
Thank you all in advance!
.[] produces the values of the array or object given to it.
For example,
[ "a", "b", "c" ] | .[]
is equivalent to
[ "a", "b", "c" ] | .[0], .[1], .[2]
and produces three strings: a, b and c.
map( ... )
is equivalent to
[ .[] | ... ]
This means that
map( . ) ≡ [ .[] | . ] ≡ [ .[] ]
For an array, that means
map( . ) ≡ [ .[0], .[1], ... ] ≡ .
For an object, that means
map( . ) ≡ [ .["key1"], .["key2"], ... ]
The [] after . strips away the brackets from the input array.
There are no brackets. jq programs don't deal with JSON text, but the data structure it represents.
When given an array or object, .[] produces the values of the elements of that array or object.
Do we have a name for the [] operator?
The docs call it the Array/Object Value Iterator, but it's really just a specific usage of the indexing operator.
The Array/Object Value Iterator is ascribed to .[] in the docs, but that's not accurate. It doesn't have to be . before it, but an expression must precede it. This distinguishes it from array construction operator.
In technical terms,
[] as a circumfix operator ([ EXPR ]) is the array construction operator, and
[] as a postfix operator (EXPR [ EXPR? ]) is the indexing operator, and it's specifically called the the array/object value iterator when there's nothing in the brackets.
Do we have a name for the resulting thing by appending [] to .? Obviously it's not an object. If we do jq '.[]' fruit.json we can see that it looks very similar to an array. But apparently it behaves quite differently.
We call that a stream.
I'm not sure what to call the components of the stream. I usually use "value".
For example,
"a", "b", "c" // Produces a stream of three values.
"abc" / "" | .[] // Same
When serialized to a file with one value per line (as you would get using -c), it's called "JSON lines" with a suggested naming convention of .jsonl.
Why is it the case that the map function seems to go two levels inside instead of one? This is more obvious if we do jq '.[] | map(. | length)' fruit.json and see that the . inside the map function refers to the value part of an (object) element of the input array.
No, just one.
In that example,
The .[] iterates over the values of the array.
The map iterates over the values of the objects.

Find length of each array field within a JSON object using jq

I have a process that generates a JSON object containing some "header" values as scalars and a number of payload values as arrays:
{
"header 1": 42,
"header 2": "2020-01-27",
"payload 1": [
{
"foo": 1
},
{
"foo": 2
}
],
"another payload": [
10,
9,
8,
7
]
}
I have been able to isolate the names of the array fields with the following command:
$ jq '[to_entries | .[] | select(.value | type == "array")] | from_entries | keys_unsorted' results.json
[
"payload 1",
"another payload"
]
But I don't know how to use this to get the lengths of the arrays. The output I'm looking for would be something like:
{
"payload 1": 2,
"another payload": 4
}
Or anything that lists the keys of fields that are arrays and the length of the arrays.
What is a jq command to list the lengths of all array fields in the top-level object?
You don't need *_entries functions here.
map_values(arrays | length)
Online demo

How to sort/unique output using jq

I have json like below:
% cat example.json
{
"values" : [
{
"title": "B",
"url": "https://B"
},
{
"title": "A",
"url": "https://A"
}
]
}
I want to sort the values based on title. i.e. expected output
{
"title": "A",
"url": "https://A"
}
{
"title": "B",
"url": "https://B"
}
Tried the blow. Does not work:
% jq '.values[] | sort' example.json
jq: error (at example.json:12): object ({"title":"B...) cannot be sorted, as it is not an array
% jq '.values[] | sort_by(.title)' example.json
jq: error (at example.json:12): Cannot index string with string "title"
If you want to preserve the overall structure, you would use the jq filter:
.values |= sort_by(.title)
If you want to extract .values and sort the array, leave out the "=":
.values | sort_by(.title)
To produce the output as shown in the Q:
.values | sort_by(.title)[]
Uniqueness
There are several ways in which "uniqueness" can be defined, and also several ways in which uniqueness can be achieved.
One option would simply be to use unique_by instead of sort_by; another (with different semantics) would be to use (sort_by(.title)|unique) instead of sort_by(.title).

Merge arrays in object

I have an object that is just a bunch of arbitrary keys with each an array:
{
"foo": [
"hello",
"world"
],
"bar": [
"foobar"
]
}
How can I return the merged arrays in this object. The expected output would be:
[
"hello",
"world",
"foobar"
]
Create a list of the values and concatenate the elements in that list:
[.[]] | add
Create a list of each element in each array:
[.[][]]
I'd prefer the first one since it parses easier in my mind.
Generalizing a bit:
jq '[..|scalars]' input.json

Flatten JSON with jq retaining key names

I'm trying to flatten a JSON consisting of nested objects. The top layer contains several key/value pairs, where each value is itself an array of a number of objects (the bottom layer).
What I would like to get, using jq, is simply an array of objects containing all the objects of the bottom layer, each of which with an additional key/value pair identifying the top-layer key it originally belonged to.
In other words, I would like to turn a JSON
{
"key1": [obj1, obj2],
"key2": [obj3]
}
into a plain array
[OBJ1, OBJ2, OBJ3]
where each OBJi is simply the original object with an extra key/value pair
"parent-key-name": keyx
where keyx would be the top-layer key obji belonged to, i.e. "key1" for obj1 and obj2, and "key2" for obj3.
I'm struggling with the fact that when referencing the objects in the bottom layer, e.g. via .[], jq does not seem to have inbuilt functionality to access associated top-layer information. However, I'm new to jq, and hope there is an easy solution after all.
Given the following input :
{
"key1": [{"name":"Emma"},{"name":"Bob"}],
"key2": [{"name":"Jean"}]
}
You can divide your items to entries, store the key in a variable and add the value for each item in value object:
jq '[ to_entries[] | .key as $parent | .value[] |
.["parent-key-name"] |= (.+ $parent) ] ' test.json
which gives the following output :
[
{
"name": "Emma",
"parent-key-name": "key1"
},
{
"name": "Bob",
"parent-key-name": "key1"
},
{
"name": "Jean",
"parent-key-name": "key2"
}
]
The solution presented below consists of two steps, each of which might be helpful separately, e.g. if someone wants to "flatten" the JSON in a slightly different way.
First, let's make the changes to obj[i] "in-place":
with_entries( .key as $k | .value[] |= ( . + {"parent-key-name": $k} ) )
Example:
$ jq -n -c -f program.jq
Input:
{
"key1": [{a:1}, {a:2}],
"key2": [{b:3}]
}
Output:
{
"key1": [
{
"a": 1,
"parent-key-name": "key1"
},
{
"a": 2,
"parent-key-name": "key1"
}
],
"key2": [
{
"b": 3,
"parent-key-name": "key2"
}
]
}
To flatten, simply append | [.[]] to the above filter. This produces:
[[{"a":1,"parent-key-name":"key1"},{"a":2,"parent-key-name":"key1"}],[{"b":3,"parent-key-name":"key2"}]]