accumulate an array of key-value pairs into a single object - json

How can I use jq to transform this:
[
{
"k": "a",
"v": 123
},
{
"k": "b",
"v": 456
}
]
into this:
{
"a": 123,
"b": 456
}

Reconstruct each object, and add them all to get a big, single one.
map({(.k): .v}) | add
If your input is a large dataset, reduce might be a better choice in terms of performance.
reduce .[] as {$k,$v} ({}; . + {($k): $v})

Another option, since your objects are similar to how entries are structured, you could map them as those key/value pairs and convert to an object that way.
map({key: .k, value: .v}) | from_entries

Related

semantics of map on a sequence of objects in jq

Suppose I have a file fruit.json containing the following lines:
[
{
"name": "apple",
"color": "red",
"price": 20
},
{
"name": "banana",
"color": "yellow",
"price": 15
},
{
"name": "pineapple",
"color": "orange",
"price": 53
}
]
If I do jq '. | map(.)' fruit.json then I get the original data. That's expected. The second . refers to an element in the entire array.
However if I do jq '.[] | map(.)' fruit.json then I get this:
[
"apple",
"red",
20
]
[
"banana",
"yellow",
15
]
[
"pineapple",
"orange",
53
]
Can someone please explain what's going on? Specifically,
The [] after . strips away the brackets from the input array. Do
we have a name for the [] operator? The manual seems to treat it as
something very basic without definition.
Do we have a name for the resulting thing by appending [] to .?
Obviously it's not an object. If we do jq '.[]' fruit.json
we can see that it looks very similar to an array.
But apparently it behaves quite differently.
Why is it the case that the map function seems to go two
levels inside instead of one? This is
more obvious if we do jq '.[] | map(. | length)' fruit.json
and see that the . inside the map
function refers to the value part of an (object) element of the input array.
Thank you all in advance!
.[] produces the values of the array or object given to it.
For example,
[ "a", "b", "c" ] | .[]
is equivalent to
[ "a", "b", "c" ] | .[0], .[1], .[2]
and produces three strings: a, b and c.
map( ... )
is equivalent to
[ .[] | ... ]
This means that
map( . ) ≡ [ .[] | . ] ≡ [ .[] ]
For an array, that means
map( . ) ≡ [ .[0], .[1], ... ] ≡ .
For an object, that means
map( . ) ≡ [ .["key1"], .["key2"], ... ]
The [] after . strips away the brackets from the input array.
There are no brackets. jq programs don't deal with JSON text, but the data structure it represents.
When given an array or object, .[] produces the values of the elements of that array or object.
Do we have a name for the [] operator?
The docs call it the Array/Object Value Iterator, but it's really just a specific usage of the indexing operator.
The Array/Object Value Iterator is ascribed to .[] in the docs, but that's not accurate. It doesn't have to be . before it, but an expression must precede it. This distinguishes it from array construction operator.
In technical terms,
[] as a circumfix operator ([ EXPR ]) is the array construction operator, and
[] as a postfix operator (EXPR [ EXPR? ]) is the indexing operator, and it's specifically called the the array/object value iterator when there's nothing in the brackets.
Do we have a name for the resulting thing by appending [] to .? Obviously it's not an object. If we do jq '.[]' fruit.json we can see that it looks very similar to an array. But apparently it behaves quite differently.
We call that a stream.
I'm not sure what to call the components of the stream. I usually use "value".
For example,
"a", "b", "c" // Produces a stream of three values.
"abc" / "" | .[] // Same
When serialized to a file with one value per line (as you would get using -c), it's called "JSON lines" with a suggested naming convention of .jsonl.
Why is it the case that the map function seems to go two levels inside instead of one? This is more obvious if we do jq '.[] | map(. | length)' fruit.json and see that the . inside the map function refers to the value part of an (object) element of the input array.
No, just one.
In that example,
The .[] iterates over the values of the array.
The map iterates over the values of the objects.

Convert json to csv using jq with different key

Given an array of JSON objects, I'd like to output a CSV where one of the rows contains each object key and the others are based on each object value.
The input json is:
{
"PCID000": {
"OSmodle": "LINUX",
"IEversion": "2.15.0",
"hardwareUSB": [
"Card reader",
"keyboard"
],
"OrderStatus": "01"
},
"PCID999": {
"OSmodle": "LINUX",
"OSversion": "4.0",
"hardwareUSB": [],
"OrderStatus": "01"
}
}
The output would look something like this. The header can be hardcoded.
PCID,OSmodle,OSversion,IEversion,hardwareUSB, OrderStatus
"PCID000","LINUX",,"2.15.0","Card reader&keyboard","01"
"PCID999","LINUX","4.0",,"01
You can use the to_entries function to convert an object such as {"a": 1, "b": 2} to an array of key-value objects such as [{"key": "a", "value": 1}, {"key": "b", "value": 2}]. Then map over this to pick the key and the parts of the value of interest.
The jq script would look like this:
to_entries | map([
.key,
.value.OSmodle,
.value.OSversion,
.value.IEversion,
(.value.hardwareUSB | join("&")),
.value.OrderStatus])
| ["PCID", "OSmodle", "OSversion", "IEversion", "hardwareUSB", "OrderStatus"], .[]
| #csv
Output (with -r):
"PCID","OSmodle","OSversion","IEversion","hardwareUSB","OrderStatus"
"PCID000","LINUX",,"2.15.0","Card reader&keyboard","01"
"PCID999","LINUX","4.0",,"","01"
jqplay

JQ: how can I remove keys based on regex?

I would like to remove all keys that start with "hide". Important to note that the keys may be nested at many levels. I'd like to see the answer using a regex, although I recognise that in my example a simple contains would suffice. (I don't know how to do this with contains, either, BTW.)
Input JSON 1:
{
"a": 1,
"b": 2,
"hideA": 3,
"c": {
"d": 4,
"hide4": 5
}
}
Desired output JSON:
{
"a": 1,
"b": 2,
"c": {
"d": 4
}
}
Input JSON 2:
{
"a": 1,
"b": 2,
"hideA": 3,
"c": {
"d": 4,
"hide4": 5
},
"e": null,
"f": "hiya",
"g": false,
"h": [{
"i": 343.232,
"hide9": "private",
"so_smart": true
}]
}
Thanks!
Since you're just checking the start of the keys, you could use startswith/1 instead in this case, otherwise you could use test/1 or test/2. Then you could pass those paths to be removed to delpaths/1.
You might want to filter the key by strings (or convert to strings) beforehand to account for arrays in your tree.
delpaths([paths | select(.[-1] | strings | startswith("hide"))])
delpaths([paths | select(.[-1] | strings | test("^hide"; "i"))])
A straightforward approach to the problem is to use walk in conjunction with with_entries, e.g.
walk(if type == "object"
then with_entries(select(.key | test("^hide") | not))
else . end)
If your jq does not have walk/1 simply include its def (available e.g. from https://raw.githubusercontent.com/stedolan/jq/master/src/builtin.jq) before invoking it.

Almost automatically sorting keys with `jq`, but keep "id" key, if present, on top

Is there a way to sort the keys of a JSON using jq but keeping keys named "id" as first descendants on all trees? It's nice to have a way to easily compare JSON files to one another and normalizing key order and formatting is a great way to ensure they are easy to match, but sometimes the "id" key is the one we are looking for and it's not always easy to find if it's buried in the middle of the tree.
As an example, this:
{
"z-displacement": 3,
"absorption": 0.4,
"collections": [
{
"b": 12,
"a": 18,
"id" 190:,
},
{
"m": 22,
"id": 169,
"n": 3,
},
],
"id": 256767
}
Would become something like:
{
"id": 256767,
"absorption": 0.4,
"collections": [
{
"id" 190:,
"a": 18,
"b": 12
},
{
"id": 169,
"m": 22,
"n": 3
}
],
"z-displacement": 3
}
Assuming you are using jq 1.4 or later, the following will do what is requested for all the JSON objects in the input, not only those at the top level:
def reorder:
(if has("id") then {id} else null end) + (to_entries | sort | from_entries );
walk(if type == "object" then reorder else . end)
If your jq does not have walk/1, you can snarf its def from the jq FAQ https://github.com/stedolan/jq/wiki/FAQ or from the "master" version of builtin.jq
I have no idea how robust this is, but it gets the desired result in this case.
jq -S '.' | jq '{id} + .'

"Transposing" objects in jq

I'm unsure if "transpose" is the correct term here, but I'm looking to use jq to transpose a 2-dimensional object such as this:
[
{
"name": "A",
"keys": ["k1", "k2", "k3"]
},
{
"name": "B",
"keys": ["k2", "k3", "k4"]
}
]
I'd like to transform it to:
{
"k1": ["A"],
"k2": ["A", "B"],
"k3": ["A", "B"],
"k4": ["A"],
}
I can split out the object with .[] | {key: .keys[], name} to get a list of keys and names, or I could use .[] | {(.keys[]): [.name]} to get a collection of key–value pairs {"k1": ["A"]} and so on, but I'm unsure of the final concatenation step for either approach.
Are either of these approaches heading in the right direction? Is there a better way?
This should work:
map({ name, key: .keys[] })
| group_by(.key)
| map({ key: .[0].key, value: map(.name) })
| from_entries
The basic approach is to convert each object to name/key pairs, regroup them by key, then map them out to entries of an object.
This produces the following output:
{
"k1": [ "A" ],
"k2": [ "A", "B" ],
"k3": [ "A", "B" ],
"k4": [ "B" ]
}
Here is a simple solution that may also be easier to understand. It is based on the idea that a dictionary (a JSON object) can be extended by adding details about additional (key -> value) pairs:
# input: a dictionary to be extended by key -> value
# for each key in keys
def extend_dictionary(keys; value):
reduce keys[] as $key (.; .[$key] += [value]);
reduce .[] as $o ({}; extend_dictionary($o.keys; $o.name) )
$ jq -c -f transpose-object.jq input.json
{"k1":["A"],"k2":["A","B"],"k3":["A","B"],"k4":["B"]}
Here is a better solution for the case that all the values of "name"
are distinct. It is better because it uses a completely generic
filter, invertMapping; that is, invertMapping could be a built-in or
library function. With the help of this function, the solution
becomes a simple three-liner.
Furthermore, if the values of "name" are not all unique, then the solution
below can easily be tweaked by modifying the initial reduction of the input
(i.e. the line immediately above the invocation of invertMapping).
# input: a JSON object of (key, values) pairs, in which "values" is an array of strings;
# output: a JSON object representing the inverse relation
def invertMapping:
reduce to_entries[] as $pair
({}; reduce $pair.value[] as $v (.; .[$v] += [$pair.key] ));
map( { (.name) : .keys} )
| add
| invertMapping