In my specific case, I'm looking to convert input like ["a", 1, "b", 2, "c", 3] into an object like {"a": 1, "b": 2, "c": 3}, but the general technique is processing an array using a sliding window (in this case, of size 2).
I can make this work using indexes, but it's rather ugly, and it suffers from having to load the entire array into memory, so it's not great for streaming:
# Just creates input to play with, in this case, all the letters from 'a' to 'z'
function input () {
printf '"%s" ' {a..z} | jq --slurp --compact-output '.'
}
input |
jq '. as $i | $i
| keys
| map(select (. % 2 == 0))
| map({key:($i[.]|tostring), value:$i[. + 1]})
| from_entries'
In a perfect world, this could look something like this:
input |
jq 'sliding(2;2)
| map({key: (.[0]|tostring), value: .[1])
| from_entries'
I don't see anything like that in the docs, but I'd like to know if there's any techniques that could get me to a cleaner solution.
Tangent on sliding
I used sliding(2;2) a placeholder for "something that does this in one go", but for the curious, the semantics come from Scala's sliding(size: Int, step: Int) collection method.
Because jq returns null if you're out of range, the size would be mostly to make life easier when you're looking at an intermediate result. Borrowing the while implementation from #pmf's answer, the second has a much easier to understand intermediate output when the size argument is applied:
$ input | jq --compact-output 'while(. != []; .[2:])'
["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
["o","p","q","r","s","t","u","v","w","x","y","z"]
["q","r","s","t","u","v","w","x","y","z"]
["s","t","u","v","w","x","y","z"]
["u","v","w","x","y","z"]
["w","x","y","z"]
["y","z"]
$ input | jq --compact-output 'while(. != []; .[2:])[:3]'
["a","b","c"]
["c","d","e"]
["e","f","g"]
["g","h","i"]
["i","j","k"]
["k","l","m"]
["m","n","o"]
["o","p","q"]
["q","r","s"]
["s","t","u"]
["u","v","w"]
["w","x","y"]
["y","z"]
I am confused with the meaning of 2 and 2 in sliding(2;2), but here's a definition for sliding that can master what (I think) you are looking for (with maybe different parameter values). It generates an array of arrays using a step size and a length parameter:
def sliding($a;$b): [while(. != []; .[$a:])[:$b]];
Examples:
sliding(2;2) | map({key: (.[0]|tostring), value: .[1]}) | from_entries
{"a":"b","c":"d","e":"f","g":"h","i":"j","k":"l","m":"n","o":"p","q":"r","s":"t","u":"v","w":"x","y":"z"}
Skipping:
sliding(3;2) | map({key: (.[0]|tostring), value: .[1]}) | from_entries
{"a":"b","d":"e","g":"h","j":"k","m":"n","p":"q","s":"t","v":"w","y":"z"}
Overlapping:
sliding(1;2) | map({key: (.[0]|tostring), value: .[1]}) | from_entries
{"a":"b","b":"c","c":"d","d":"e","e":"f","f":"g","g":"h","h":"i","i":"j","j":"k","k":"l","l":"m","m":"n","n":"o","o":"p","p":"q","q":"r","r":"s","s":"t","t":"u","u":"v","v":"w","w":"x","x":"y","y":"z","z":null}
Note: the second parameter is not really used, as you always take two items from the current window, so you could actually omit it entirely, or hard-code it to 2.
You could use reduce to go through the array, and take two items at a time:
jq 'reduce while(. != []; .[2:]) as [$key, $val] ({}; .[$key] = $val)'
Yes, that's what _nwise is for.
reduce _nwise(2) as [$k, $v] ({}; .[$k] = $v)
Online demo
Edit: I used the solution provided by #peak to do the following:
$ jq -r --argjson whitelist '["role1", "role2"]' '
select(has("roles") and any(.roles[]; . == "role1" or . == "role2"))
| (reduce ."roles"[] as $r ({}; .[$r]=true)) as $roles
| [.email, .username, .given_name, .family_name, ($roles[$whitelist[]]
| . != null)]
| #csv
' users.json
Added the select() to filter out users who haven't onboarded yet and don't have any roles, and to ensure the users included in the output have at least one of the target roles.
Scenario: user profiles as JSON docs, where each profile has a list object with their assigned roles. Example:
{
"username": "janedoe",
"roles": [
"role1",
"role4",
"role5"
]
}
The actual data file is an ndjson file, one user object as above per line.
I am only interested in specific roles, say role1, role3, and role4. I want to produce a CSV formatted as:
username,role1?,role3?,role4?
e.g.,
janedoe,true,false,true
The part I haven't figured out is how to output booleans or Y / N in response to the values in the list object. Is this something I can do in jq itself?
With your input, the invocation:
jq -r --argjson whitelist '["role1", "role3", "role4"]' '
(["username"] + $whitelist),
[.username, ($whitelist[] as $w | .roles | index([$w]) != null)]
| #csv
'
produces:
"username","role1","role3","role4"
"janedoe",true,false,true
Notes:
The second last line of the jq filter above could be shortened to:
[.username, (.roles | index($whitelist[]) != null)]
Presumably if there were more than one user, you'd only want
the header row once, in which case the above solution
would need to be tweaked.
Using IN/1
Because index/1 is not as efficient as it might be,
you might like to consider this alternative:
(["username"] + $whitelist),
(.roles as $roles | [.username, ($whitelist[] | IN($roles[]) )])
| #csv
Using a JSON dictionary
If the number of roles was very large, then it would probably be more
efficient to construct a JSON dictionary to avoid repeated linear lookups:
(reduce .roles[] as $r ({}; .[$r]=true)) as $roles
| (["username"] + $whitelist),
[.username, ($roles[$whitelist[]] != null)]
| #csv
With ndjson as input
For efficiency, and to ensure there's just one header, you could use inputs with the -n command-line option. Adding the extra fields mentioned in the revised Q, you might end up with:
jq -nr --argjson whitelist '["role1", "role2"]' '
["email", "username", "given_name", "family_name"] as $greenlist
| ($greenlist + $whitelist),
(inputs
| select(has("roles") and any(.roles[] == $whitelist[]; true))
| (reduce ."roles"[] as $r ({}; .[$r]=true)) as $roles
| [ .[$greenlist[]], ($roles[$whitelist[]] != null) ])
| #csv
' users.json
A JSON object like this:
{"user":"stedolan","titles":["JQ Primer", "More JQ"],"years":[2013, 2016]}
And, I want to convert it with lists(assume all lists have equal length N) zipped and output like this:
{"user":"stedolan","title":"JQ Primer","year":2013}
{"user":"stedolan","title":"More JQ","year":2016}
I followed Object - {} example and tried:
tmp='{"user":"stedolan","titles":["JQ Primer", "More JQ"],"years":[2013, 2016]}'
echo $tmp | jq '{user, title: .titles[], year: .years[]}'
then it output:
{"user":"stedolan","title":"JQ Primer","year":2013}
{"user":"stedolan","title":"JQ Primer","year":2016}
{"user":"stedolan","title":"More JQ","year":2013}
{"user":"stedolan","title":"More JQ","year":2016}
It produces N*N ... lines result, instead of N lines result.
Any suggestion is appreciated!
transpose/0 can be used to effectively zip the values together. And the nice thing about the way assignments work is that it can be assigned simultaneously over multiple variables.
([.titles,.years]|transpose[]) as [$title,$year] | {user,$title,$year}
If you want the results in an array rather than a stream, just wrap it all in [].
https://jqplay.org/s/ZIFU5gBnZ7
For a jq 1.4 compatible version, you'll have to rewrite it to not use destructuring but you could use the same transpose/0 implementation from the builtins.
transpose/0:
def transpose:
if . == [] then []
else . as $in
| (map(length) | max) as $max
| length as $length
| reduce range(0; $max) as $j
([]; . + [reduce range(0;$length) as $i ([]; . + [ $in[$i][$j] ] )] )
end;
Here's an alternative implementation that I cooked up that should also be compatible. :)
def transpose2:
length as $cols
| (map(length) | max) as $rows
| [range(0;$rows) as $r | [.[range(0;$cols)][$r]]];
([.titles,.years]|transpose[]) as $p | {user,title:$p[0],year:$p[1]}
If you want the output to have the keys in the order indicated in the Q, then the solution is a bit trickier than would otherwise be the case.
Here's one way to retain the order:
with_entries( .key |= (if . == "titles" then "title" elif . == "years" then "year" else . end) )
| range(0; .title|length) as $i
| .title |= .[$i]
| .year |= .[$i]
The (potential) advantage of this approach is that one does not have to mention any of the other keys.
Incoming json file contains json array per row eg:
["a100","a101","a102","a103","a104","a105","a106","a107","a108"]
["a100","a102","a103","a106","a107","a108"]
["a100","a99"]
["a107","a108"]
a "filter array" would be ["a99","a101","a108"] so I can slurpfile it
Trying to figure out how to print only values that are inside "filter array", eg the output:
["a101","a108"]
["a108"]
["a99"]
["a108"]
You can port IN function from jq 1.6 to 1.5 and use:
def IN(s): any(s == .; .);
map(select(IN($filter_array[])))
Or even shorter:
map(select(any($filter_array[]==.;.)))
I might be missing some simpler solution, but the following works :
map(select(. as $in | ["a99","a101","a108"] | contains([$in])))
Replace the ["a99","a101","a108"] hardcoded array by your slurped variable.
You can try it here !
In the example, the arrays in the input stream are sorted (in jq's sort order), so it is worth noting that in such cases, a more efficient solution is possible using the bsearch built-in, or perhaps even better, the definition of intersection/2 given at https://rosettacode.org/wiki/Set#Finite_Sets_of_JSON_Entities
For ease of reference, here it is:
def intersection($A;$B):
def pop:
.[0] as $i
| .[1] as $j
| if $i == ($A|length) or $j == ($B|length) then empty
elif $A[$i] == $B[$j] then $A[$i], ([$i+1, $j+1] | pop)
elif $A[$i] < $B[$j] then [$i+1, $j] | pop
else [$i, $j+1] | pop
end;
[[0,0] | pop];
Assuming a jq invocation such as:
jq -c --argjson filter '["a99","a101","a108"]' -f intersections.jq input.json
an appropriate filter would be:
($filter | sort) as $sorted
| intersection(.; $sorted)
(Of course if $filter is already presented in jq's sort order, then the initial sort can be skipped, or replaced by a check.)
Output
["a101","a108"]
["a108"]
["a99"]
["a108"]
Unsorted arrays
In practice, jq's builtin sort filter is usually so fast that it might be worthwhile simply sorting the arrays in order to use intersection as defined above.