Use jq to merge keys with common id - json

consider a file 'b.json':
[
{
"id": 3,
"foo": "cannot be replaced, id isn't in a.json, stay untouched",
"baz": "do not touch3"
},
{
"id": 2,
"foo": "should be replaced with 'foo new2'",
"baz": "do not touch2"
}
]
and 'a.json':
[
{
"id": 2,
"foo": "foo new2",
"baz": "don't care"
}
]
I want to update the key "foo" in b.json using jq with the matching value from a.json. It should also work with more than one entry in a.json.
Thus the desired output is:
[
{
"id": 3,
"foo": "cannot be replaced, id isn't in a.json, stay untouched",
"baz": "do not touch3"
},
{
"id": 2,
"foo": "foo new2",
"baz": "do not touch2"
}
]

Here's one of several possibilities that use INDEX/2. If your jq does not have this as a built-in, see below.
jq --argfile a a.json '
INDEX($a[]; .id) as $dict
| map( (.id|tostring) as $id
| if ($dict|has($id)) then .foo = $dict[$id].foo
else . end)' b.json
There are other ways to pass in the contents of a.json and b.json.
Caveat
The above use of INDEX assumes there are no "collisions", which would happen if, for example, one of the objects has .id equal to 1 and another has .id equal to "1". If there is a possibility of such a collision, then a more complex definition of INDEX could be used.
INDEX/2
Straight from builtin.jq:
def INDEX(stream; idx_expr):
reduce stream as $row ({}; .[$row|idx_expr|tostring] = $row);

Here's a generic answer that makes no assumptions about the values of the .id keys except that they are distinct JSON values.
Generalization of INDEX/2
def type2: [type, if type == "string" then . else tojson end];
def dictionary(stream; f):
reduce stream as $s ({}; setpath($s|f|type2; $s));
def lookup(value):
getpath(value|type2);
def indictionary(value):
(value|type2) as $t
| has($t[0]) and (.[$t[0]] | has($t[1]));
Invocation
jq --argfile a a.json -f program.jq b.json
main
dictionary($a[]; .id) as $dict
| b
| map( .id as $id
| if ($dict|indictionary($id))
then .foo = ($dict|lookup($id).foo)
else . end)

Related

Combining json files using jq objects with array

I've tried using
jq "reduce inputs.skins as $s (.; .skins += $s)" file1.json file2.json > combined.json
but it just creates two boots.name and fun.name from each file
any way I can use jq and combine the objects and arrays without having duplicates?
I apologize for any confusion, jq is kind of complicated to find an easy tutorial for me to understand
file1.json
{
"skins": [
{
"Item Shortname": "boots.name",
"skins": [
2,
25,
41,
]
},
{
"Item Shortname": "fun.name",
"skins": [
12,
8,
]
}
]
}
file2.json
{
"skins": [
{
"Item Shortname": "boots.name",
"skins": [
2,
20,
]
},
{
"Item Shortname": "fun.name",
"skins": [
90,
6,
82,
]
}
]
}
combined.json
{
"skins": [
{
"Item Shortname": "boots.name",
"skins": [
2,
20,
25,
41,
]
},
{
"Item Shortname": "fun.name",
"skins": [
90,
6,
82,
12,
8,
]
}
]
}
The tricky part here is meeting the apparent uniqueness requirements, for which the following generic filter can be used:
# emit a stream of the distinct items in `stream`
def uniques(stream):
foreach stream as $s ({};
($s|type) as $t
| (if $t == "string" then $s else ($s|tostring) end) as $y
| if .[$t][$y] then .emit = false else .emit = true | (.item = $s) | (.[$t][$y] = true) end;
if .emit then .item else empty end );
This ensures the ordering is preserved. It's a bit tricky because it is completely generic -- it allows both 1 and "1" and distinguishes between them, just as unique does.
(If the ordering did not matter, then you could use unique.)
So, assuming an invocation along the lines of
jq -s -f program.jq file1.json file2.json
you would place the above def followed by the following “main” program in program.jq:
.[0] as $file1 | .[1] as $file2
| (INDEX($file1.skins[]; .["Item Shortname"]) | map_values(.skins)) as $dict
| $file2
| .skins |= map( .["Item Shortname"] as $name
| .skins += $dict[$name]
| .skins |= [uniques(.[])] )
A better solution would avoid the -s option (e.g. as shown below), but the above method of feeding the two files to jq is at least straightforward, and will work regardless of which version of jq you are using.
Solution using input
One way to avoid slurping the two files would be to use input in conjunction with the -n command line option instead of -s. The "main" part of the jq program would then be as follows:
(INDEX(input.skins[]; .["Item Shortname"]) | map_values(.skins)) as $dict
| input
| .skins |= map( .["Item Shortname"] as $name
| .skins += $dict[$name]
| .skins |= [uniques(.[])] )

jq conditional processing on multiple files

I have multiple json files:
a.json
{
"status": "passed",
"id": "id1"
}
{
"status": "passed",
"id": "id2"
}
b.json
{
"status": "passed",
"id": "id1"
}
{
"status": "failed",
"id": "id2"
}
I want to know which id was passed in a.json and which is failed now in b.json.
expected.json
{
"status": "failed",
"id": "id2"
}
I tried something like:
jq --slurpfile a a.json --slurpfile b b.json -n '$a[] | reduce select(.status == "passed") as $passed (.; $b | select($a.id == .id and .status == "failed"))'
$passed is supposed to contain the list of passed entry in a.json and reduce will merge all the objects for which the id matches and are failed.
However it does not produce the expected result, and the documentation is kind of limited.
How to produce expected.json from a.json and b.json ?
For me your filter produces the error
jq: error (at <unknown>): Cannot index array with string "id"
I suspect this is because you wrote $b instead of $b[] and $a.id instead of $passed.id. Here is my guess at what you intended to write:
$a[]
| reduce select(.status == "passed") as $passed (.;
$b[] | select( $passed.id == .id and .status == "failed")
)
which produces the output
null
{
"status": "failed",
"id": "id2"
}
You can filter away the null by adding | values e.g.
$a[]
| reduce select(.status == "passed") as $passed (.;
$b[] | select( $passed.id == .id and .status == "failed")
)
| values
However you don't really need reduce here. A simpler way is just:
$a[]
| select(.status == "passed") as $passed
| $b[]
| select( $passed.id == .id and .status == "failed")
If you intend to go further with this I would recommend a different approach: first construct an object combining $a and $b and then project what you want from it. e.g.
reduce (($a[]|{(.id):{a:.status}}),($b[]|{(.id):{b:.status}})) as $v ({};.*$v)
will give you
{
"id1": {
"a": "passed",
"b": "passed"
},
"id2": {
"a": "passed",
"b": "failed"
}
}
To convert that back to the output you requested add
| keys[] as $id
| .[$id]
| select(.a == "passed" and .b == "failed")
| {$id, status:.b}
to obtain
{
"id": "id2",
"status": "failed"
}
The following solutions to the problem are oriented primarily towards efficiency, but it turns out that they are quite straightforward and concise.
For efficiency, we will construct a "dictionary" of ids of those who have passed in a.json to make the required lookup very fast.
Also, if you have a version of jq with inputs, it is easy to avoid "slurping" the contents of b.json.
Solution for jq 1.4 or higher
Here is a generic solution which, however, slurps both files:
Invocation (note the use of the -s option):
jq -s --slurpfile a a.json -f passed-and-failed.jq b.json
Program:
([$a[] | select(.status=="passed") | {(.id): true}] | add) as $passed
| .[] | select(.status == "failed" and $passed[.id])
That is, first construct the dictionary, and then emit the objects in b.json that satisfy the condition.
Solution for jq 1.5 or higher
Invocation (note the use of the -n option):
jq -n --slurpfile a a.json -f passed-and-failed.jq b.json
INDEX/2 is currently available from the master branch, but is provided here in case your jq does not have it, in which case you might want to add its definition to ~/.jq:
def INDEX(stream; idx_expr):
reduce stream as $row ({};
.[$row|idx_expr|
if type != "string" then tojson
else .
end] |= $row);
The solution now becomes a simple two-liner:
INDEX($a[] | select(.status == "passed") | .id; .) as $passed
| inputs | select(.status == "failed" and $passed[.id])

Get the index of the array element in JSON with jq

I have the following type of json:
{
"foo": "hello",
"bar": [
{
"key": "k1",
"val": "v1"
},
{
"key": "k2",
"val": "v2"
},
{
"key": "k3",
"val": "v3"
}
]
}
I want to output the following:
"hello", 1, "k1", "v1"
"hello", 2, "k2", "v2"
"hello", 3, "k3", "v3"
I am using jq to tranform this and the answer should also be with a jq transformation.
I am currently at:
echo '{"foo": "hello","bar": [{"key": "k1","val": "v1"},{"key": "k2","val": "v2"},{"key": "k3","val": "v3"} ]}' | jq -c -r '.bar[] as $b | [.foo, ($b | .key, .val)] | #csv'
Which gives me:
"hello","k1","v1"
"hello","k2","v2"
"hello","k3","v3"
How can I also get the index to show of the array element being parsed?
You could convert the array to entries to access the index and the value. Then you can build out the CSV rows.
$ jq -r '[.foo] + (.bar | to_entries[] | [.key+1,.value.key,.value.val]) | #csv' input.json
"hello",1,"k1","v1"
"hello",2,"k2","v2"
"hello",3,"k3","v3"
Assuming you have access to jq 1.5 and that the key/val keys are presented in that order:
jq -r '.foo as $foo
| foreach .bar[] as $i (0; .+1; [$foo, .] + [$i[]])
| #csv'
would produce:
"hello",1,"k1","v1"
"hello",2,"k2","v2"
"hello",3,"k3","v3"
The -r option is often used with #csv to convert the JSON string that would otherwise be produced by #csv into a comma-separated list of values.
If you really want to join with ", ", then it's a bit messier, but if you're not worried about the functionality that #csv provides, here's one way:
$ jq -r '"\"\(.foo)\"" as $foo
| foreach .bar[] as $i
(0; .+1; "\($foo), \(.), \($i | map("\"\(.)\"")|join(", "))")'
This produces:
"hello", 1, "k1", "v1"
"hello", 2, "k2", "v2"
"hello", 3, "k3", "v3"
If your jq does not have foreach then you could similarly use reduce, but it might be easier to upgrade.

JSON fields have the same name

In practice, keys have to be unique within a JSON object (e.g. Does JSON syntax allow duplicate keys in an object?). However, suppose I have a file with the following contents:
{
"a" : "1",
"b" : "2",
"a" : "3"
}
Is there a simple way of converting the repeated keys to an array? So that the file becomes:
{
"a" : [ {"key": "1"}, {"key": "3"}],
"b" : "2"
}
Or something similar, but which combines the repeated keys into an array (or finds and alternative way to extract the repeated key values).
Here's a solution in Java: Convert JSON object with duplicate keys to JSON array
Is there any way to do it with awk/bash/python?
If your input is really a flat JSON object with primitives as values, this should work:
jq -s --stream 'group_by(.[0]) | map({"key": .[0][0][0], "value": map(.[1])}) | from_entries'
{
"a": [
"1",
"3"
],
"b": [
"2"
]
}
For more complex outputs, that would require actually understanding how --stream is supposed to be used, which is beyond me.
Building on Santiago's answer using -s --stream, the following filter builds up the object one step at a time, thus preserving the order of the keys and of the values for a specific key:
reduce (.[] | select(length==2)) as $kv ({};
$kv[0][0] as $k
|$kv[1] as $v
| (.[$k]|type) as $t
| if $t == "null" then .[$k] = $v
elif $t == "array" then .[$k] += [$v]
else .[$k] = [ .[$k], $v ]
end)
For the given input, the result is:
{
"a": [
"1",
"3"
],
"b": "2"
}
To illustrate that the ordering of values for each key is preserved, consider the following input:
{
"c" : "C",
"a" : "1",
"b" : "2",
"a" : "3",
"b" : "1"
}
The output produced by the filter above is:
{
"c": "C",
"a": [
"1",
"3"
],
"b": [
"2",
"1"
]
}
Building up on peak's answer, the following filter also works on multi object-input, with nested objects and without the slurp-Option (-s).
This is not an answer to the initial question, but because the jq-FAQ links here it might be useful for some visitors
File jqmergekeys.txt
def consumestream($arr): # Reads stream elements from stdin until we have enough elements to build one object and returns them as array
input as $inp
| if $inp|has(1) then consumestream($arr+[$inp]) # input=keyvalue pair => Add to array and consume more
elif ($inp[0]|has(1)) then consumestream($arr) # input=closing subkey => Skip and consume more
else $arr end; # input=closing root object => return array
def convert2obj($stream): # Converts an object in stream notation into an object, and merges the values of duplicate keys into arrays
reduce ($stream[]) as $kv ({}; # This function is based on http://stackoverflow.com/a/36974355/2606757
$kv[0] as $k
| $kv[1] as $v
| (getpath($k)|type) as $t # type of existing value under the given key
| if $t == "null" then setpath($k;$v) # value not existing => set value
elif $t == "array" then setpath($k; getpath($k) + [$v] ) # value is already an array => add value to array
else setpath($k; [getpath($k), $v ]) # single value => put existing and new value into an array
end);
def mainloop(f): (convert2obj(consumestream([input]))|f),mainloop(f); # Consumes streams forever, converts them into an object and applies the user provided filter
def mergeduplicates(f): try mainloop(f) catch if .=="break" then empty else error end; # Catches the "break" thrown by jq if there's no more input
#---------------- User code below --------------------------
mergeduplicates(.) # merge duplicate keys in input, without any additional filters
#mergeduplicates(select(.layers)|.layers.frame) # merge duplicate keys in input and apply some filter afterwards
Example:
tshark -T ek | jq -nc --stream -f ./jqmergekeys.txt
Here's a simple alternative that generalizes well:
reshape.jq
def augmentpath($path; $value):
getpath($path) as $v
| setpath($path; $v + [$value]);
reduce (inputs | select(length==2)) as $pv
({}; augmentpath($pv[0]; $pv[1]) )
Usage
jq -n -f reshape.jq input.json
Output
With the given input:
{
"a": [
"1",
"3"
],
"b": [
"2"
]
}
Postscript
If it's important to avoid arrays of singletons, either the def of augmentpath could be modified, or a postprocessing step could be added.

"Transposing" objects in jq

I'm unsure if "transpose" is the correct term here, but I'm looking to use jq to transpose a 2-dimensional object such as this:
[
{
"name": "A",
"keys": ["k1", "k2", "k3"]
},
{
"name": "B",
"keys": ["k2", "k3", "k4"]
}
]
I'd like to transform it to:
{
"k1": ["A"],
"k2": ["A", "B"],
"k3": ["A", "B"],
"k4": ["A"],
}
I can split out the object with .[] | {key: .keys[], name} to get a list of keys and names, or I could use .[] | {(.keys[]): [.name]} to get a collection of key–value pairs {"k1": ["A"]} and so on, but I'm unsure of the final concatenation step for either approach.
Are either of these approaches heading in the right direction? Is there a better way?
This should work:
map({ name, key: .keys[] })
| group_by(.key)
| map({ key: .[0].key, value: map(.name) })
| from_entries
The basic approach is to convert each object to name/key pairs, regroup them by key, then map them out to entries of an object.
This produces the following output:
{
"k1": [ "A" ],
"k2": [ "A", "B" ],
"k3": [ "A", "B" ],
"k4": [ "B" ]
}
Here is a simple solution that may also be easier to understand. It is based on the idea that a dictionary (a JSON object) can be extended by adding details about additional (key -> value) pairs:
# input: a dictionary to be extended by key -> value
# for each key in keys
def extend_dictionary(keys; value):
reduce keys[] as $key (.; .[$key] += [value]);
reduce .[] as $o ({}; extend_dictionary($o.keys; $o.name) )
$ jq -c -f transpose-object.jq input.json
{"k1":["A"],"k2":["A","B"],"k3":["A","B"],"k4":["B"]}
Here is a better solution for the case that all the values of "name"
are distinct. It is better because it uses a completely generic
filter, invertMapping; that is, invertMapping could be a built-in or
library function. With the help of this function, the solution
becomes a simple three-liner.
Furthermore, if the values of "name" are not all unique, then the solution
below can easily be tweaked by modifying the initial reduction of the input
(i.e. the line immediately above the invocation of invertMapping).
# input: a JSON object of (key, values) pairs, in which "values" is an array of strings;
# output: a JSON object representing the inverse relation
def invertMapping:
reduce to_entries[] as $pair
({}; reduce $pair.value[] as $v (.; .[$v] += [$pair.key] ));
map( { (.name) : .keys} )
| add
| invertMapping