Count elements in nested JSON with jq - json

I am trying to count all elements in a nested JSON-document with jq?
Given the following JSON-document
{"a": true, "b": [1, 2], "c": {"a": {"aa":1, "bb": 2}, "b": "blue"}}
I want to calculate the result 6.
In order to do this, I tried the following:
echo '{"a": true, "b": [1, 2], "c": {"a": {"aa":1, "bb": 2}, "b": "blue"}}' \
| jq 'reduce (.. | if (type == "object" or type == "array")
then length else 0 end) as $counts
(1; . + $counts)'
# Actual output: 10
# Desired output: 6
However, this counts the encountered objects and arrays as well and therefore yields 10 opposing to the desired output: 6
So, how can I only count the document's elements/leaf-nodes?
Thanks already in advance for you help!
Edit: What would be an efficient approach to count empty arrays and objects as well?

You can use the scalars filter to find leaf nodes. Scalars are all "simple" JSON values, i.e. null, true, false, numbers and strings. Alternatively you can compare the type of each item and use length to determine if an object or array has children.
I've expanded your input data a little to distinguish a few more corner cases:
Input:
{
"a": true,
"b": [1, 2],
"c": {
"a": {
"aa": 1,
"bb": 2
},
"b": "blue"
},
"d": [],
"e": [[], []],
"f": {}
}
This has 15 JSON entities:
5 of them are arrays or objects with children.
4 of them are empty arrays or objects.
6 of them are scalars.
Depending on what you're trying to do, you might consider only scalars to be "leaf nodes", or you might consider both scalars and empty arrays and objects to be leaf nodes.
Here's a filter that counts scalars:
[..|scalars]|length
Output:
6
And here's a filter that counts all entities which have no children. It just checks for all the scalar types explicitly (there are only six possible types for a JSON value) and if it's not one of those it must be an array or object, where we can check how many children it has with length.
[
..|
select(
(type|IN("boolean","number","string","null")) or
length==0
)
]|
length
Output:
10

Related

semantics of map on a sequence of objects in jq

Suppose I have a file fruit.json containing the following lines:
[
{
"name": "apple",
"color": "red",
"price": 20
},
{
"name": "banana",
"color": "yellow",
"price": 15
},
{
"name": "pineapple",
"color": "orange",
"price": 53
}
]
If I do jq '. | map(.)' fruit.json then I get the original data. That's expected. The second . refers to an element in the entire array.
However if I do jq '.[] | map(.)' fruit.json then I get this:
[
"apple",
"red",
20
]
[
"banana",
"yellow",
15
]
[
"pineapple",
"orange",
53
]
Can someone please explain what's going on? Specifically,
The [] after . strips away the brackets from the input array. Do
we have a name for the [] operator? The manual seems to treat it as
something very basic without definition.
Do we have a name for the resulting thing by appending [] to .?
Obviously it's not an object. If we do jq '.[]' fruit.json
we can see that it looks very similar to an array.
But apparently it behaves quite differently.
Why is it the case that the map function seems to go two
levels inside instead of one? This is
more obvious if we do jq '.[] | map(. | length)' fruit.json
and see that the . inside the map
function refers to the value part of an (object) element of the input array.
Thank you all in advance!
.[] produces the values of the array or object given to it.
For example,
[ "a", "b", "c" ] | .[]
is equivalent to
[ "a", "b", "c" ] | .[0], .[1], .[2]
and produces three strings: a, b and c.
map( ... )
is equivalent to
[ .[] | ... ]
This means that
map( . ) ≡ [ .[] | . ] ≡ [ .[] ]
For an array, that means
map( . ) ≡ [ .[0], .[1], ... ] ≡ .
For an object, that means
map( . ) ≡ [ .["key1"], .["key2"], ... ]
The [] after . strips away the brackets from the input array.
There are no brackets. jq programs don't deal with JSON text, but the data structure it represents.
When given an array or object, .[] produces the values of the elements of that array or object.
Do we have a name for the [] operator?
The docs call it the Array/Object Value Iterator, but it's really just a specific usage of the indexing operator.
The Array/Object Value Iterator is ascribed to .[] in the docs, but that's not accurate. It doesn't have to be . before it, but an expression must precede it. This distinguishes it from array construction operator.
In technical terms,
[] as a circumfix operator ([ EXPR ]) is the array construction operator, and
[] as a postfix operator (EXPR [ EXPR? ]) is the indexing operator, and it's specifically called the the array/object value iterator when there's nothing in the brackets.
Do we have a name for the resulting thing by appending [] to .? Obviously it's not an object. If we do jq '.[]' fruit.json we can see that it looks very similar to an array. But apparently it behaves quite differently.
We call that a stream.
I'm not sure what to call the components of the stream. I usually use "value".
For example,
"a", "b", "c" // Produces a stream of three values.
"abc" / "" | .[] // Same
When serialized to a file with one value per line (as you would get using -c), it's called "JSON lines" with a suggested naming convention of .jsonl.
Why is it the case that the map function seems to go two levels inside instead of one? This is more obvious if we do jq '.[] | map(. | length)' fruit.json and see that the . inside the map function refers to the value part of an (object) element of the input array.
No, just one.
In that example,
The .[] iterates over the values of the array.
The map iterates over the values of the objects.

accumulate an array of key-value pairs into a single object

How can I use jq to transform this:
[
{
"k": "a",
"v": 123
},
{
"k": "b",
"v": 456
}
]
into this:
{
"a": 123,
"b": 456
}
Reconstruct each object, and add them all to get a big, single one.
map({(.k): .v}) | add
If your input is a large dataset, reduce might be a better choice in terms of performance.
reduce .[] as {$k,$v} ({}; . + {($k): $v})
Another option, since your objects are similar to how entries are structured, you could map them as those key/value pairs and convert to an object that way.
map({key: .k, value: .v}) | from_entries

JQ: how can I remove keys based on regex?

I would like to remove all keys that start with "hide". Important to note that the keys may be nested at many levels. I'd like to see the answer using a regex, although I recognise that in my example a simple contains would suffice. (I don't know how to do this with contains, either, BTW.)
Input JSON 1:
{
"a": 1,
"b": 2,
"hideA": 3,
"c": {
"d": 4,
"hide4": 5
}
}
Desired output JSON:
{
"a": 1,
"b": 2,
"c": {
"d": 4
}
}
Input JSON 2:
{
"a": 1,
"b": 2,
"hideA": 3,
"c": {
"d": 4,
"hide4": 5
},
"e": null,
"f": "hiya",
"g": false,
"h": [{
"i": 343.232,
"hide9": "private",
"so_smart": true
}]
}
Thanks!
Since you're just checking the start of the keys, you could use startswith/1 instead in this case, otherwise you could use test/1 or test/2. Then you could pass those paths to be removed to delpaths/1.
You might want to filter the key by strings (or convert to strings) beforehand to account for arrays in your tree.
delpaths([paths | select(.[-1] | strings | startswith("hide"))])
delpaths([paths | select(.[-1] | strings | test("^hide"; "i"))])
A straightforward approach to the problem is to use walk in conjunction with with_entries, e.g.
walk(if type == "object"
then with_entries(select(.key | test("^hide") | not))
else . end)
If your jq does not have walk/1 simply include its def (available e.g. from https://raw.githubusercontent.com/stedolan/jq/master/src/builtin.jq) before invoking it.

Almost automatically sorting keys with `jq`, but keep "id" key, if present, on top

Is there a way to sort the keys of a JSON using jq but keeping keys named "id" as first descendants on all trees? It's nice to have a way to easily compare JSON files to one another and normalizing key order and formatting is a great way to ensure they are easy to match, but sometimes the "id" key is the one we are looking for and it's not always easy to find if it's buried in the middle of the tree.
As an example, this:
{
"z-displacement": 3,
"absorption": 0.4,
"collections": [
{
"b": 12,
"a": 18,
"id" 190:,
},
{
"m": 22,
"id": 169,
"n": 3,
},
],
"id": 256767
}
Would become something like:
{
"id": 256767,
"absorption": 0.4,
"collections": [
{
"id" 190:,
"a": 18,
"b": 12
},
{
"id": 169,
"m": 22,
"n": 3
}
],
"z-displacement": 3
}
Assuming you are using jq 1.4 or later, the following will do what is requested for all the JSON objects in the input, not only those at the top level:
def reorder:
(if has("id") then {id} else null end) + (to_entries | sort | from_entries );
walk(if type == "object" then reorder else . end)
If your jq does not have walk/1, you can snarf its def from the jq FAQ https://github.com/stedolan/jq/wiki/FAQ or from the "master" version of builtin.jq
I have no idea how robust this is, but it gets the desired result in this case.
jq -S '.' | jq '{id} + .'

How to make a cartesian product in jq?

Let say I have input
[
{
"a":1,
"b":2
},
{
"a":3,
"b":4
}
]
and I tried,
echo '[{"a": 1, "b": 2}, {"a": 3, "b": 4}]' | jq '[{x: .[].a, y: .[].b}]'
and I would like to get
[
{
"x":1,
"b":2,
"language":"en"
},
{
"x":1,
"b":2,
"language":"fr"
}...
]
Meaning I need to for all items in array I need output two items, one with added "lanuage": "en" key value pair and one with "lanuage": "ru"
EDIT. In case it's not clear enoug. I need a catesian product of input array of objects is with another array xs, which would give me pairs (i, x). For each pair I want to output object that have all (key, value) pairs of i plus some key (language in my case) with with value of x.
In general, any expression that generates multiple values combined with another expression that generates multiple values will create a cartesian product.
i.e.,
"\(1,2) \(3,4)"
generates strings "1 3", "2 3", "1 4", and "2 4".
You can do the same given an array of values. [] will take the array and generate a result for each of the items. So combining these concepts, you could do something like this:
$ jq --argjson langs '["en","fr"]' '[(.[]|{x:.a,b}) + {language:$langs[]}]' input.json
But this could further be reduced to simply:
$ jq --argjson langs '["en","fr"]' '[.[]|{x:.a,b,language:$langs[]}]' input.json
or
$ jq --argjson langs '["en","fr"]' 'map({x:.a,b,language:$langs[]})' input.json