Find common keys in JSON objects using jq - json

I'm trying to find all common keys in a Json file, given that we don't know names of keys in the file.
the Json file looks like:
{
"DynamicKey1" : {
"foo" : 1,
"bar" : 2
},
"DynamicKey2" : {
"bar" : 3
},
"DynamicKey3" : {
"foo" : 5,
"zyx" : 5
}
}
Expect result:
{
"foo"
}
I was trying to apply reduce/foreach logic here but I am not sure how to write it in jq. I appreciate any help!!
jq '. as $ss | reduce range(1; $ss|length) as $i ([]; . + reduce ($ss[i] | keys) as $key ([]; if $ss[$i - 1] | has($key) then . +$key else . end))' file.json

There are some inconsistencies in the Q as posted: there are no keys common to all the objects, and if one looks at the pair-wise intersection of keys, the result would include both "foo" and "bar".
In the following, I'll present solutions for both these problems.
Keys in more than one object
[.[] | keys_unsorted[]] | group_by(.)[] | select(length>1)[0]
Keys in all the objects
Here's a solution using a similar approach:
length as $length
| [.[] | keys_unsorted[]] | group_by(.)[]
| select(length==$length)
| .[0]
This involves group_by/2, which is implemented using a sort.
Here is an alternative approach that relies on the built-in function keys to do the sorting (the point being that ((nk ln(nk)) - n(k ln(k))) = nk ln(n), i.e. having n small sorts of k items is better than one large sort of n*k items):
# The intersection of an arbitrary number of sorted arrays
def intersection_of_sorted_arrays:
# intersecting/1 returns a stream
def intersecting($A;$B):
def pop:
.[0] as $i
| .[1] as $j
| if $i == ($A|length) or $j == ($B|length) then empty
elif $A[$i] == $B[$j] then $A[$i], ([$i+1, $j+1] | pop)
elif $A[$i] < $B[$j] then [$i+1, $j] | pop
else [$i, $j+1] | pop
end;
[0,0] | pop;
reduce .[1:][] as $x (.[0]; [intersecting(.; $x)]);
To compute the keys common to all the objects:
[.[] | keys] | intersection_of_sorted_arrays

Here is a sort-free and time-efficient answer that relies on the efficiency of jq's implementation of lookups in a JSON dictionary. Since keys are strings, we can simply use the concept of a "bag of words" (bow):
def bow(stream):
reduce stream as $word ({}; .[$word|tostring] += 1);
We can now solve the "Keys common to all objects" problem as follows:
length as $length
| bow(.[] | keys_unsorted[])
| to_entries[]
| select(.value==$length).key
And similarly for the "Keys in more than one object" problem.
Of course, to achieve the time-efficiency, there is the usual space-time tradeoff.

Related

In JQ is there a better way to process an array using a sliding window than using indexes?

In my specific case, I'm looking to convert input like ["a", 1, "b", 2, "c", 3] into an object like {"a": 1, "b": 2, "c": 3}, but the general technique is processing an array using a sliding window (in this case, of size 2).
I can make this work using indexes, but it's rather ugly, and it suffers from having to load the entire array into memory, so it's not great for streaming:
# Just creates input to play with, in this case, all the letters from 'a' to 'z'
function input () {
printf '"%s" ' {a..z} | jq --slurp --compact-output '.'
}
input |
jq '. as $i | $i
| keys
| map(select (. % 2 == 0))
| map({key:($i[.]|tostring), value:$i[. + 1]})
| from_entries'
In a perfect world, this could look something like this:
input |
jq 'sliding(2;2)
| map({key: (.[0]|tostring), value: .[1])
| from_entries'
I don't see anything like that in the docs, but I'd like to know if there's any techniques that could get me to a cleaner solution.
Tangent on sliding
I used sliding(2;2) a placeholder for "something that does this in one go", but for the curious, the semantics come from Scala's sliding(size: Int, step: Int) collection method.
Because jq returns null if you're out of range, the size would be mostly to make life easier when you're looking at an intermediate result. Borrowing the while implementation from #pmf's answer, the second has a much easier to understand intermediate output when the size argument is applied:
$ input | jq --compact-output 'while(. != []; .[2:])'
["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["o","p","q","r","s","t","u","v","w","x","y","z"]
["q","r","s","t","u","v","w","x","y","z"]
["s","t","u","v","w","x","y","z"]
["u","v","w","x","y","z"]
["w","x","y","z"]
["y","z"]
$ input | jq --compact-output 'while(. != []; .[2:])[:3]'
["a","b","c"]
["c","d","e"]
["e","f","g"]
["g","h","i"]
["i","j","k"]
["k","l","m"]
["m","n","o"]
["o","p","q"]
["q","r","s"]
["s","t","u"]
["u","v","w"]
["w","x","y"]
["y","z"]
I am confused with the meaning of 2 and 2 in sliding(2;2), but here's a definition for sliding that can master what (I think) you are looking for (with maybe different parameter values). It generates an array of arrays using a step size and a length parameter:
def sliding($a;$b): [while(. != []; .[$a:])[:$b]];
Examples:
sliding(2;2) | map({key: (.[0]|tostring), value: .[1]}) | from_entries
{"a":"b","c":"d","e":"f","g":"h","i":"j","k":"l","m":"n","o":"p","q":"r","s":"t","u":"v","w":"x","y":"z"}
Skipping:
sliding(3;2) | map({key: (.[0]|tostring), value: .[1]}) | from_entries
{"a":"b","d":"e","g":"h","j":"k","m":"n","p":"q","s":"t","v":"w","y":"z"}
Overlapping:
sliding(1;2) | map({key: (.[0]|tostring), value: .[1]}) | from_entries
{"a":"b","b":"c","c":"d","d":"e","e":"f","f":"g","g":"h","h":"i","i":"j","j":"k","k":"l","l":"m","m":"n","n":"o","o":"p","p":"q","q":"r","r":"s","s":"t","t":"u","u":"v","v":"w","w":"x","x":"y","y":"z","z":null}
Note: the second parameter is not really used, as you always take two items from the current window, so you could actually omit it entirely, or hard-code it to 2.
You could use reduce to go through the array, and take two items at a time:
jq 'reduce while(. != []; .[2:]) as [$key, $val] ({}; .[$key] = $val)'
Yes, that's what _nwise is for.
reduce _nwise(2) as [$k, $v] ({}; .[$k] = $v)
Online demo

Can I output boolean based on values in a list?

Edit: I used the solution provided by #peak to do the following:
$ jq -r --argjson whitelist '["role1", "role2"]' '
select(has("roles") and any(.roles[]; . == "role1" or . == "role2"))
| (reduce ."roles"[] as $r ({}; .[$r]=true)) as $roles
| [.email, .username, .given_name, .family_name, ($roles[$whitelist[]]
| . != null)]
| #csv
' users.json
Added the select() to filter out users who haven't onboarded yet and don't have any roles, and to ensure the users included in the output have at least one of the target roles.
Scenario: user profiles as JSON docs, where each profile has a list object with their assigned roles. Example:
{
"username": "janedoe",
"roles": [
"role1",
"role4",
"role5"
]
}
The actual data file is an ndjson file, one user object as above per line.
I am only interested in specific roles, say role1, role3, and role4. I want to produce a CSV formatted as:
username,role1?,role3?,role4?
e.g.,
janedoe,true,false,true
The part I haven't figured out is how to output booleans or Y / N in response to the values in the list object. Is this something I can do in jq itself?
With your input, the invocation:
jq -r --argjson whitelist '["role1", "role3", "role4"]' '
(["username"] + $whitelist),
[.username, ($whitelist[] as $w | .roles | index([$w]) != null)]
| #csv
'
produces:
"username","role1","role3","role4"
"janedoe",true,false,true
Notes:
The second last line of the jq filter above could be shortened to:
[.username, (.roles | index($whitelist[]) != null)]
Presumably if there were more than one user, you'd only want
the header row once, in which case the above solution
would need to be tweaked.
Using IN/1
Because index/1 is not as efficient as it might be,
you might like to consider this alternative:
(["username"] + $whitelist),
(.roles as $roles | [.username, ($whitelist[] | IN($roles[]) )])
| #csv
Using a JSON dictionary
If the number of roles was very large, then it would probably be more
efficient to construct a JSON dictionary to avoid repeated linear lookups:
(reduce .roles[] as $r ({}; .[$r]=true)) as $roles
| (["username"] + $whitelist),
[.username, ($roles[$whitelist[]] != null)]
| #csv
With ndjson as input
For efficiency, and to ensure there's just one header, you could use inputs with the -n command-line option. Adding the extra fields mentioned in the revised Q, you might end up with:
jq -nr --argjson whitelist '["role1", "role2"]' '
["email", "username", "given_name", "family_name"] as $greenlist
| ($greenlist + $whitelist),
(inputs
| select(has("roles") and any(.roles[] == $whitelist[]; true))
| (reduce ."roles"[] as $r ({}; .[$r]=true)) as $roles
| [ .[$greenlist[]], ($roles[$whitelist[]] != null) ])
| #csv
' users.json

Zip lists in jq's objects construction by {} instead of multiplying them like default

A JSON object like this:
{"user":"stedolan","titles":["JQ Primer", "More JQ"],"years":[2013, 2016]}
And, I want to convert it with lists(assume all lists have equal length N) zipped and output like this:
{"user":"stedolan","title":"JQ Primer","year":2013}
{"user":"stedolan","title":"More JQ","year":2016}
I followed Object - {} example and tried:
tmp='{"user":"stedolan","titles":["JQ Primer", "More JQ"],"years":[2013, 2016]}'
echo $tmp | jq '{user, title: .titles[], year: .years[]}'
then it output:
{"user":"stedolan","title":"JQ Primer","year":2013}
{"user":"stedolan","title":"JQ Primer","year":2016}
{"user":"stedolan","title":"More JQ","year":2013}
{"user":"stedolan","title":"More JQ","year":2016}
It produces N*N ... lines result, instead of N lines result.
Any suggestion is appreciated!
transpose/0 can be used to effectively zip the values together. And the nice thing about the way assignments work is that it can be assigned simultaneously over multiple variables.
([.titles,.years]|transpose[]) as [$title,$year] | {user,$title,$year}
If you want the results in an array rather than a stream, just wrap it all in [].
https://jqplay.org/s/ZIFU5gBnZ7
For a jq 1.4 compatible version, you'll have to rewrite it to not use destructuring but you could use the same transpose/0 implementation from the builtins.
transpose/0:
def transpose:
if . == [] then []
else . as $in
| (map(length) | max) as $max
| length as $length
| reduce range(0; $max) as $j
([]; . + [reduce range(0;$length) as $i ([]; . + [ $in[$i][$j] ] )] )
end;
Here's an alternative implementation that I cooked up that should also be compatible. :)
def transpose2:
length as $cols
| (map(length) | max) as $rows
| [range(0;$rows) as $r | [.[range(0;$cols)][$r]]];
([.titles,.years]|transpose[]) as $p | {user,title:$p[0],year:$p[1]}
If you want the output to have the keys in the order indicated in the Q, then the solution is a bit trickier than would otherwise be the case.
Here's one way to retain the order:
with_entries( .key |= (if . == "titles" then "title" elif . == "years" then "year" else . end) )
| range(0; .title|length) as $i
| .title |= .[$i]
| .year |= .[$i]
The (potential) advantage of this approach is that one does not have to mention any of the other keys.

jq 1.5 print items from array that is inside another array

Incoming json file contains json array per row eg:
["a100","a101","a102","a103","a104","a105","a106","a107","a108"]
["a100","a102","a103","a106","a107","a108"]
["a100","a99"]
["a107","a108"]
a "filter array" would be ["a99","a101","a108"] so I can slurpfile it
Trying to figure out how to print only values that are inside "filter array", eg the output:
["a101","a108"]
["a108"]
["a99"]
["a108"]
You can port IN function from jq 1.6 to 1.5 and use:
def IN(s): any(s == .; .);
map(select(IN($filter_array[])))
Or even shorter:
map(select(any($filter_array[]==.;.)))
I might be missing some simpler solution, but the following works :
map(select(. as $in | ["a99","a101","a108"] | contains([$in])))
Replace the ["a99","a101","a108"] hardcoded array by your slurped variable.
You can try it here !
In the example, the arrays in the input stream are sorted (in jq's sort order), so it is worth noting that in such cases, a more efficient solution is possible using the bsearch built-in, or perhaps even better, the definition of intersection/2 given at https://rosettacode.org/wiki/Set#Finite_Sets_of_JSON_Entities
For ease of reference, here it is:
def intersection($A;$B):
def pop:
.[0] as $i
| .[1] as $j
| if $i == ($A|length) or $j == ($B|length) then empty
elif $A[$i] == $B[$j] then $A[$i], ([$i+1, $j+1] | pop)
elif $A[$i] < $B[$j] then [$i+1, $j] | pop
else [$i, $j+1] | pop
end;
[[0,0] | pop];
Assuming a jq invocation such as:
jq -c --argjson filter '["a99","a101","a108"]' -f intersections.jq input.json
an appropriate filter would be:
($filter | sort) as $sorted
| intersection(.; $sorted)
(Of course if $filter is already presented in jq's sort order, then the initial sort can be skipped, or replaced by a check.)
Output
["a101","a108"]
["a108"]
["a99"]
["a108"]
Unsorted arrays
In practice, jq's builtin sort filter is usually so fast that it might be worthwhile simply sorting the arrays in order to use intersection as defined above.

Convert even odd index in array to key value pairs in json using jq

I'm trying to use jq to parse Solr 6.5 metrics into key value pairs:
{
"responseHeader": {
"status": 0,
"QTime": 7962
},
"metrics": [
"solr.core.shard1",
"QUERY./select",
"solr.core.shard2",
"QUERY./update"
...
]
}
I'd like to pick even odd entries in metrics array and put them together into a single object as key value pairs like this:
{
"solr.core.shard1": "QUERY./select",
"solr.core.shard2": "QUERY./update",
...
}
Till now, I am only able to come up with:
.metrics | to_entries | .[] | {(select(.key % 2 == 0).value): select(.key % 2 == 1).value}
But this returns an error or no results.
I'd be grateful if someone could point me in the right direction. I feel like the answer is probably in the map operator, but I haven't been able to figure it out.
jq solution:
jq '[ .metrics as $m | range(0; $m | length; 2)
| {($m[.]): $m[(. + 1)]} ] | add' jsonfile
The output:
{
"solr.core.shard1": "QUERY./select",
"solr.core.shard2": "QUERY./update"
}
https://stedolan.github.io/jq/manual/v1.5/#range(upto),range(from;upto)range(from;upto;by)
Here's a helper function which makes the solution trivial:
# Emit a stream consisting of pairs of items taken from `stream`
def pairwise(stream):
foreach stream as $i ([];
if length == 1 then . + [$i] else [$i] end;
select(length == 2));
From here there are several good options, e.g. we could start with:
.metrics
| [pairwise(.[]) | {(.[0]): .[1]}]
| add
With your input, this produces:
{
"solr.core.shard1": "QUERY./select",
"solr.core.shard2": "QUERY./update"
}
So you might want to write:
.metrics |= ([pairwise(.[]) | {(.[0]): .[1]}] | add)