Find length of each array field within a JSON object using jq - json

I have a process that generates a JSON object containing some "header" values as scalars and a number of payload values as arrays:
{
"header 1": 42,
"header 2": "2020-01-27",
"payload 1": [
{
"foo": 1
},
{
"foo": 2
}
],
"another payload": [
10,
9,
8,
7
]
}
I have been able to isolate the names of the array fields with the following command:
$ jq '[to_entries | .[] | select(.value | type == "array")] | from_entries | keys_unsorted' results.json
[
"payload 1",
"another payload"
]
But I don't know how to use this to get the lengths of the arrays. The output I'm looking for would be something like:
{
"payload 1": 2,
"another payload": 4
}
Or anything that lists the keys of fields that are arrays and the length of the arrays.
What is a jq command to list the lengths of all array fields in the top-level object?

You don't need *_entries functions here.
map_values(arrays | length)
Online demo

Related

Convert json to csv using jq with different key

Given an array of JSON objects, I'd like to output a CSV where one of the rows contains each object key and the others are based on each object value.
The input json is:
{
"PCID000": {
"OSmodle": "LINUX",
"IEversion": "2.15.0",
"hardwareUSB": [
"Card reader",
"keyboard"
],
"OrderStatus": "01"
},
"PCID999": {
"OSmodle": "LINUX",
"OSversion": "4.0",
"hardwareUSB": [],
"OrderStatus": "01"
}
}
The output would look something like this. The header can be hardcoded.
PCID,OSmodle,OSversion,IEversion,hardwareUSB, OrderStatus
"PCID000","LINUX",,"2.15.0","Card reader&keyboard","01"
"PCID999","LINUX","4.0",,"01
You can use the to_entries function to convert an object such as {"a": 1, "b": 2} to an array of key-value objects such as [{"key": "a", "value": 1}, {"key": "b", "value": 2}]. Then map over this to pick the key and the parts of the value of interest.
The jq script would look like this:
to_entries | map([
.key,
.value.OSmodle,
.value.OSversion,
.value.IEversion,
(.value.hardwareUSB | join("&")),
.value.OrderStatus])
| ["PCID", "OSmodle", "OSversion", "IEversion", "hardwareUSB", "OrderStatus"], .[]
| #csv
Output (with -r):
"PCID","OSmodle","OSversion","IEversion","hardwareUSB","OrderStatus"
"PCID000","LINUX",,"2.15.0","Card reader&keyboard","01"
"PCID999","LINUX","4.0",,"","01"
jqplay

Why is adding parentheses to a filter in 'jq' producing valid JSON and without parentheses, multiple outputs of objects?

With jq, I would like to set a property within JSON data and let jq output the original JSON with the updated value. I found, more or less due to trial and error, a solution, and want to understand why and how it works.
I have the following JSON data:
{
"notifications": [
{
"source": "observer01",
"channel": "error",
"time": "2021-01-01 01:01:01"
},
{
"source": "observer01",
"channel": "info",
"time": "2021-02-02 02:02:02"
}
]
}
My goal is to update the time property of an object with a specific source and channel (the original JSON is way longer with lots of objects in the notifications array of the same format).
(In the following example, I want to update the time property of observer01 with channel info, so the second object in the example data above.)
My first try, not producing the desired output, was the following jq command:
jq '.notifications[] | select(.source == "observer01" and .channel == "info").time = "NEWTIME"' data.json
That produces the following output:
{
"source": "observer01",
"channel": "error",
"time": "2021-01-01 01:01:01"
},
{
"source": "observer01",
"channel": "info",
"time": "NEWTIME"
}
Which is just a list of the JSON objects within the notifications array. I understand that this can be useful, for example piping the objects to other command line tools.
Now let's try the following jq command, which is the same as above plus one pair of parentheses:
jq '(.notifications[] | select(.source == "observer01" and .channel == "info").time) = "NEWTIME"' data.json
This produces the desired output, the original valid JSON with the updated time property:
{
"notifications": [
{
"source": "observer01",
"channel": "error",
"time": "2021-01-01 01:01:01"
},
{
"source": "observer01",
"channel": "info",
"time": "NEWTIME"
}
]
}
Why is adding the parentheses to the jq filter in the case above producing a different output?
The parentheses just change the precedence. It's documented in man jq:
Parenthesis work as a grouping operator just as in any typical programming language.
jq ´(. + 2) * 5´
1
=> 15
Let's have a simpler example:
echo '[{"a":1}, {"a":2}]' | jq '.[] | .a |= .+1'
It outputs
{
"a": 2
}
{
"a": 3
}
because it's interpreted as
↓ ↓
echo '[{"a":1}, {"a":2}]' | jq '.[] | (.a |= .+1)'
The first filter .[] outputs the elements as separated objects, they are then modified by the second filter.
Placing the parentheses after the first two elements changes the precedence:
↓ ↓
echo '[{"a":1}, {"a":2}]' | jq '(.[] | .a) |= .+1'
and produces a different otuput:
[
{
"a": 2
},
{
"a": 3
}
]
BTW, this is the same output as from
echo '[{"a":1}, {"a":2}]' | jq '.[].a |= .+1'
It changes the value associated with the "a" key in the array.
Let's compare the two.
.notifications[] | select(...).time = "NEWTIME"
(.notifications[] | select(...).time) = "NEWTIME"
In the first one, the top-level filter is defined by |. The input is an object, and the output is the result of applying select(...).time = "NEWTIME" to each value produced by .notifications[]. In essence, the original object is "lost".
In the second one, the top-level filter is defined by =. x = y returns its input as output, but with a side effect produced by
Determining what the path expression x refers to in the input,
Evaluating the filter y on the input, (Even an expression like "NEWTIME" is just a filter: one that ignores its input and returns the string "NEWTIME")
Assigning the result of y to the thing addressed by x.

How to sort/unique output using jq

I have json like below:
% cat example.json
{
"values" : [
{
"title": "B",
"url": "https://B"
},
{
"title": "A",
"url": "https://A"
}
]
}
I want to sort the values based on title. i.e. expected output
{
"title": "A",
"url": "https://A"
}
{
"title": "B",
"url": "https://B"
}
Tried the blow. Does not work:
% jq '.values[] | sort' example.json
jq: error (at example.json:12): object ({"title":"B...) cannot be sorted, as it is not an array
% jq '.values[] | sort_by(.title)' example.json
jq: error (at example.json:12): Cannot index string with string "title"
If you want to preserve the overall structure, you would use the jq filter:
.values |= sort_by(.title)
If you want to extract .values and sort the array, leave out the "=":
.values | sort_by(.title)
To produce the output as shown in the Q:
.values | sort_by(.title)[]
Uniqueness
There are several ways in which "uniqueness" can be defined, and also several ways in which uniqueness can be achieved.
One option would simply be to use unique_by instead of sort_by; another (with different semantics) would be to use (sort_by(.title)|unique) instead of sort_by(.title).

JQ: key selection from numeric objects

I use jq 1.6 in a Windows 10 PowerShell enviroment and trying to select keys from coincidentally numeric json objects.
Json exampel:
{
"alliances_info":{
"744085325458334213":{
"emblem":3,
"name":"wellwell",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"MELL",
"slogan":"",
"id":744085325458334213
},
"744128593839677958":{
"emblem":0,
"name":"Brave",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"GABA",
"slogan":"",
"id":744128593839677958
},
"746034084459209223":{
"emblem":0,
"name":"Queen",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"QUE",
"slogan":"",
"id":746034084459209223
},
"750446471312466445":{
"emblem":0,
"name":"Phoenix Inc",
"member_count":35,
"level":6,
"military_might":453369,
"public":true,
"tag":"PHOI",
"slogan":"",
"id":750446471312466445
},
"750446518934594062":{
"emblem":11,
"name":"Australia",
"member_count":44,
"level":8,
"military_might":957211,
"public":true,
"tag":"AUST",
"slogan":"Go Australia",
"id":750446518934594062
}
},
"server_version":"v7.190.4-master.000000006"
}
I tried several jq commands:
.alliances_info | .[] | [{alliance_name: .name, alliance_count: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slogan, alliance_id: .id}]
or
.alliances_info | .. | objects | [{alliance_name: .name, alliance_c
ount: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slog
an, alliance_id: .id}]
But Always get a jq error: parse error: Invalid numeric literal at line 1, column 3
I renounce on the object Building in the first command (and built only a Array) it works. But i need that objects. Any tips?
BR
Timo
Your first query works perfectly well with the given JSON sample. Perhaps you're invoking jq incorrectly. If you have the jq program in a file, say select.jq, you'd invoke jq like so:
jq -f select.jq sample.json
If that doesn't help, then try:
jq empty sample.json
If that fails, there might be something wrong with the encoding of the JSON.
I'm not sure I understand what you want.
Your first attempt works for me, but generates one output for JSON value in the input. That is, I created a file named so.json and put in it your JSON from above:
{
"alliances_info": {
"744085325458334213": {
"emblem": 3,
⋮
}
When I run your program , I get:
$ jq '.alliances_info | .[] | [{alliance_name: .name, alliance_count: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slogan, alliance_id: .id}]' so.json
[
{
"alliance_name": "wellwell",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"alliance_tag": "MELL",
"alliance_slogan": "",
"alliance_id": 744085325458334200
}
]
[
{
"alliance_name": "Brave",
⋮
]
If you want an array at all, you probably want one array containing all the alliances like this:
$ jq '.alliances_info | [ .[] | { alliance_name: .name, alliance_id: .id } ]' so.json
[
{
"alliance_name": "wellwell",
"alliance_id": 744085325458334200
},
{
"alliance_name": "Brave",
"alliance_id": 744128593839678000
},
{
"alliance_name": "Queen",
"alliance_id": 746034084459209200
},
{
"alliance_name": "Phoenix Inc",
"alliance_id": 750446471312466400
},
{
"alliance_name": "Australia",
"alliance_id": 750446518934594000
}
]
Starting from the left,
- .alliances_info looks in its input object for the field named "alliances_info" and outputs its value
- the | next says take the output from the left-hand side and pass those as inputs to the right-hand side.
- right after that first |, I have a [ «jq expressions» ] which tells jq to create one JSON array output for each input; the elements of that array are the outputs of that inner «jq expressions»
- that inner expression starts with .[] which means to produce one output for each JSON value (ignoring the keys) in the input object. For us, that will be the objects named "744085325458334213", "744128593839677958", …
- The next | uses those objects as input and for each, generates a JSON object { alliance_name: .name, alliance_id: .id }
That's why I end up with one JSON array containing 5 JSON objects.
As far as I can tell, you are mostly just renaming a bunch of the fields. For that, you could just do something like this:
$ jq --argjson renameMap '{ "name": "alliance_name", "member_count": "alliance_count", "level": "alliance_level", "military_might": "alliance_power", "tag": "alliance_tag", "slog": "alliance_slogan"}' '.alliances_info |= ( . | [ to_entries[] | ( .value |= ( . | [ to_entries[] | ( .key |= ( if $renameMap[.] then $renameMap[.] else . end ) ) ] | from_entries ) ) ] | from_entries )' so.json
{
"alliances_info": {
"744085325458334213": {
"emblem": 3,
"alliance_name": "wellwell",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"public": false,
"alliance_tag": "MELL",
"slogan": "",
"id": 744085325458334200
},
"744128593839677958": {
"emblem": 0,
"alliance_name": "Brave",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"public": false,
"alliance_tag": "GABA",
"slogan": "",
"id": 744128593839678000
},
⋮
},
"server_version": "v7.190.4-master.000000006"
}
well i am a idiot (to be here totally clear). I found the reason (and this is normally a nobrainer...). I read the input from a file and the funny thing is that the file is Unicode but no UTF8. after recoding the command is working fine. Thanks for the help.
BR
Timo

How do I sum the values in an array of maps in jq?

Given a JSON stream of the following form:
{ "a": 10, "b": 11 } { "a": 20, "b": 21 } { "a": 30, "b": 31 }
I would like to sum the values in each of the objects and output a single object, namely:
{ "a": 60, "b": 63 }
I'm guessing this will probably require flattening the above list of objects into a an array of [name, value] pairs and then summing the values using reduce but the documentation of the syntax for using reduce is woeful.
Unless your jq has inputs, you will have to slurp the objects up using the -s flag. Then you'll have to do a fair amount of manipulation:
Each of the objects needs to be mapped out to key/value pairs
Flatten the pairs to a single array
Group up the pairs by key
Map out each group accumulating the values to a single key/value pair
Map the pairs back to an object
map(to_entries)
| add
| group_by(.key)
| map({
key: .[0].key,
value: map(.value) | add
})
| from_entries
With jq 1.5, this could be greatly improved: You can do away with slurping and just read the inputs directly.
$ jq -n '
reduce (inputs | to_entries[]) as {$key,$value} ({}; .[$key] += $value)
' input.json
Since we're simply accumulating all the values in each of the objects, it'll be easier to just run through the key/value pairs of all the inputs, and add them all up.
I faced the same question when listing all artifacts from GitHub (see here for details) and want to sum their size.
curl https://api.github.com/repos/:owner/:repo/actions/artifacts \
-H "Accept: application/vnd.github.v3+json" \
-H "Authorization: token <your_pat_here>" \
| jq '.artifacts | map(.size_in_bytes) | add'
Input:
{
"total_count": 3,
"artifacts": [
{
"id": 0000001,
"node_id": "MDg6QXJ0aWZhY3QyNzUxNjI1",
"name": "artifact-1",
"size_in_bytes": 1,
"url": "https://api.github.com/repos/:owner/:repo/actions/artifacts/2751625",
"archive_download_url": "https://api.github.com/repos/:owner/:repo/actions/artifacts/2751625/zip",
"expired": false,
"created_at": "2020-03-10T18:21:23Z",
"updated_at": "2020-03-10T18:21:24Z"
},
{
"id": 0000002,
"node_id": "MDg6QXJ0aWZhY3QyNzUxNjI0",
"name": "artifact-2",
"size_in_bytes": 2,
"url": "https://api.github.com/repos/:owner/:repo/actions/artifacts/2751624",
"archive_download_url": "https://api.github.com/repos/:owner/:repo/actions/artifacts/2751624/zip",
"expired": false,
"created_at": "2020-03-10T18:21:23Z",
"updated_at": "2020-03-10T18:21:24Z"
},
{
"id": 0000003,
"node_id": "MDg6QXJ0aWZhY3QyNzI3NTk1",
"name": "artifact-3",
"size_in_bytes": 3,
"url": "https://api.github.com/repos/docker/mercury-ui/actions/artifacts/2727595",
"archive_download_url": "https://api.github.com/repos/:owner/:repo/actions/artifacts/2727595/zip",
"expired": false,
"created_at": "2020-03-10T08:46:08Z",
"updated_at": "2020-03-10T08:46:09Z"
}
]
}
Output:
6
Another approach, which illustrates the power of jq quite nicely, is to use a filter named "sum" defined as follows:
def sum(f): reduce .[] as $row (0; . + ($row|f) );
To solve the particular problem at hand, one could then use the -s (--slurp) option as mentioned above, together with the expression:
{"a": sum(.a), "b": sum(.b) } # (2)
The expression labeled (2) only computes the two specified sums, but it is easy to generalize, e.g. as follows:
# Produce an object with the same keys as the first object in the
# input array, but with values equal to the sum of the corresponding
# values in all the objects.
def sumByKey:
. as $in
| reduce (.[0] | keys)[] as $key
( {}; . + {($key): ($in | sum(.[$key]))})
;