I'm looking at the JQ "builtin.jq" file, and find
def _assign(paths; $value): reduce path(paths) as $p (.; setpath($p; $value));
and am trying to figure out the semantics of "$value" as a formal parameter. It could mean that the parameter is expected to provide only one value, not a list of them. Or it could be the same as
def _assign(paths; vv): vv as $value | reduce path(paths) as $p (.; setpath($p; $value));
or maybe it's something else?
I can't find anything in the documentation about this kind of function formal-parameter.
You're right about def _assign(paths; vv): vv as $value ...
In essence, a formal parameter, $x, is equivalent to having x in the formal parameter list, followed by x as $x shortly thereafter.
This is briefly mentioned in the jq manual:
Or use the short-hand:
def addvalue($f): ...;
What is not mentioned is that, using this example, f can also be used in the body of addvalue, though doing so might easily be the source of confusion. For example, what result would you expect the following to produce?
echo 1 2 3 | jq -n 'def f($x): x+x; f(input)'
Related
A JSON object like this:
{"user":"stedolan","titles":["JQ Primer", "More JQ"],"years":[2013, 2016]}
And, I want to convert it with lists(assume all lists have equal length N) zipped and output like this:
{"user":"stedolan","title":"JQ Primer","year":2013}
{"user":"stedolan","title":"More JQ","year":2016}
I followed Object - {} example and tried:
tmp='{"user":"stedolan","titles":["JQ Primer", "More JQ"],"years":[2013, 2016]}'
echo $tmp | jq '{user, title: .titles[], year: .years[]}'
then it output:
{"user":"stedolan","title":"JQ Primer","year":2013}
{"user":"stedolan","title":"JQ Primer","year":2016}
{"user":"stedolan","title":"More JQ","year":2013}
{"user":"stedolan","title":"More JQ","year":2016}
It produces N*N ... lines result, instead of N lines result.
Any suggestion is appreciated!
transpose/0 can be used to effectively zip the values together. And the nice thing about the way assignments work is that it can be assigned simultaneously over multiple variables.
([.titles,.years]|transpose[]) as [$title,$year] | {user,$title,$year}
If you want the results in an array rather than a stream, just wrap it all in [].
https://jqplay.org/s/ZIFU5gBnZ7
For a jq 1.4 compatible version, you'll have to rewrite it to not use destructuring but you could use the same transpose/0 implementation from the builtins.
transpose/0:
def transpose:
if . == [] then []
else . as $in
| (map(length) | max) as $max
| length as $length
| reduce range(0; $max) as $j
([]; . + [reduce range(0;$length) as $i ([]; . + [ $in[$i][$j] ] )] )
end;
Here's an alternative implementation that I cooked up that should also be compatible. :)
def transpose2:
length as $cols
| (map(length) | max) as $rows
| [range(0;$rows) as $r | [.[range(0;$cols)][$r]]];
([.titles,.years]|transpose[]) as $p | {user,title:$p[0],year:$p[1]}
If you want the output to have the keys in the order indicated in the Q, then the solution is a bit trickier than would otherwise be the case.
Here's one way to retain the order:
with_entries( .key |= (if . == "titles" then "title" elif . == "years" then "year" else . end) )
| range(0; .title|length) as $i
| .title |= .[$i]
| .year |= .[$i]
The (potential) advantage of this approach is that one does not have to mention any of the other keys.
I need to convert JSON to CSV where JSON has arrays of variable length, for example:
JSON objects:
{"labels": ["label1"]}
{"labels": ["label2", "label3"]}
{"labels": ["label1", "label4", "label5"]}
Resulting CSV:
labels,labels,labels
"label1",,
"label2","label3",
"label1","label4","label5"
There are many other properties in the source JSON, this is just an exсerpt for the sake of simplicity.
Also, I need to say that the process has to work with JSON as a stream because source JSON could be very large (>1GB).
I wanted to use jq with two passes, the first pass would collect the maximum length of the 'labels' array, the second pass would create CSV as the number of the resulting columns is known by this time. But jq doesn't have a concept of global variables, so I don't know where I can store the running total.
I'd like to be able to do that on Windows via CLI.
Thank you in advance.
The question shows a stream of JSON objects, so the following solutions assume that the input file is already a sequence as shown. These solutions can also easily be adapted to cover the case where the input file contains a huge array of objects, e.g. as discussed in the epilog.
A two-invocation solution
Here's a two-pass solution using two invocations of jq. The presentation assumes a bash-like environment, in case you have wsl:
n=$(jq -n 'reduce (inputs|.labels|length) as $i (-1;
if $i > . then $i else . end)' stream.json)
jq -nr --argjson n $n '
def fill($n): . + [range(length;$n)|null];
[range(0;$n)|"labels"],
(inputs | .labels | fill($n))
| #csv' stream.json
Assuming the input is as described, this is guaranteed to produce valid CSV. Hopefully you can adapt the above to your shell as necessary -- maybe this link will help:
Assign output of a program to a variable using a MS batch file
Using input_filename and a single invocation of jq
Unfortunately, jq does not have a "rewind" facility, but
there is an alternative: read the file twice within a single invocation of jq. This is more cumbersome than the two-invocation solution above but avoids any difficulties associated with the latter.
cat sample.json | jq -nr '
def fill($n): . + [range(length;$n)|null];
def max($x): if . < $x then $x else . end;
foreach (inputs|.labels) as $in ( {n:0};
if input_filename == "<stdin>"
then .n |= max($in|length)
else .printed+=1
end;
if .printed == null then empty
else .n as $n
| (if .printed == 1 then [range(0;$n)|"labels"] else empty end),
($in | fill($n))
end)
| #csv' - sample.json
Another single-invocation solution
The following solution uses a special value (here null) to delineate the two streams:
(cat stream.json; echo null; cat stream.json) | jq -nr '
def fill($n): . + [range(length; $n) | null];
def max($x): if . < $x then $x else . end;
(label $loop | foreach inputs as $in (0;
if $in == null then . else max($in|.labels|length) end;
if $in == null then ., break $loop else empty end)) as $n
| [range(0;$n)|"labels"],
(inputs | .labels | fill($n))
| #csv '
Epilog
A file with a top-level JSON array that is too large to fit into memory can be converted into a stream of the array's items by invoking jq with the --stream option, e.g. as follows:
jq -cn --stream 'fromstream(1|truncate_stream(inputs))'
For such a large file, you will probably want to do this in two separate invocations, one to get the count, then another to actually output the csv. If you wanted to read the whole file into memory, you could do this in one, but we definitely don't want to do that, we'll want to stream it in where possible.
Things get a little ugly when it comes to storing the result of commands to a variable, writing to a file might be simpler. But I'd rather not use temp files if we don't have to.
REM assuming in a batch file
for /f "usebackq delims=" %%i in (`jq -n --stream "reduce (inputs | .[0][1] + 1) as $l (0; if $l > . then $l else . end)" input.json`) do set cols=%%i
jq -rn --stream --argjson cols "%cols%" "[range($cols)|\"labels\"],(fromstream(1|truncate_stream(inputs))|[.[],(range($cols-length)|null)])|#csv" input.json
> jq -n --stream "reduce (inputs | .[0][1] + 1) as $l (0; if $l > . then $l else . end)" input.json
For the first invocation to get the count of columns, we're just taking advantage of the fact that the paths to the array values could be used to indicate the lengths of the arrays. We'll just want to take the max across all items.
> jq -rn --stream --argjson cols "%cols%" ^
"[range($cols)|\"labels\"],(fromstream(1|truncate_stream(inputs))|[.[],(range($cols-length)|null)])|#csv" input.json
Then to output the rest, we're just taking the labels array (assuming it's the only property on the objects) and padding them out with null up to the $cols count. Then output as csv.
If the labels are in a different, deeply nested path than what's in your example here, you'll need to select based on the appropriate paths.
set labelspath=foo.bar.labels
jq -rn --stream --argjson cols "%cols%" --arg labelspath "%labelspath%" ^
"($labelspath|split(\".\")|[.,length]) as [$path,$depth] | [range($cols)|\"labels\"],(fromstream($depth|truncate_stream(inputs|select(.[0][:$depth] == $path)))|[.[],(range($cols-length)|null)])|#csv" input.json
Incoming json file contains json array per row eg:
["a100","a101","a102","a103","a104","a105","a106","a107","a108"]
["a100","a102","a103","a106","a107","a108"]
["a100","a99"]
["a107","a108"]
a "filter array" would be ["a99","a101","a108"] so I can slurpfile it
Trying to figure out how to print only values that are inside "filter array", eg the output:
["a101","a108"]
["a108"]
["a99"]
["a108"]
You can port IN function from jq 1.6 to 1.5 and use:
def IN(s): any(s == .; .);
map(select(IN($filter_array[])))
Or even shorter:
map(select(any($filter_array[]==.;.)))
I might be missing some simpler solution, but the following works :
map(select(. as $in | ["a99","a101","a108"] | contains([$in])))
Replace the ["a99","a101","a108"] hardcoded array by your slurped variable.
You can try it here !
In the example, the arrays in the input stream are sorted (in jq's sort order), so it is worth noting that in such cases, a more efficient solution is possible using the bsearch built-in, or perhaps even better, the definition of intersection/2 given at https://rosettacode.org/wiki/Set#Finite_Sets_of_JSON_Entities
For ease of reference, here it is:
def intersection($A;$B):
def pop:
.[0] as $i
| .[1] as $j
| if $i == ($A|length) or $j == ($B|length) then empty
elif $A[$i] == $B[$j] then $A[$i], ([$i+1, $j+1] | pop)
elif $A[$i] < $B[$j] then [$i+1, $j] | pop
else [$i, $j+1] | pop
end;
[[0,0] | pop];
Assuming a jq invocation such as:
jq -c --argjson filter '["a99","a101","a108"]' -f intersections.jq input.json
an appropriate filter would be:
($filter | sort) as $sorted
| intersection(.; $sorted)
(Of course if $filter is already presented in jq's sort order, then the initial sort can be skipped, or replaced by a check.)
Output
["a101","a108"]
["a108"]
["a99"]
["a108"]
Unsorted arrays
In practice, jq's builtin sort filter is usually so fast that it might be worthwhile simply sorting the arrays in order to use intersection as defined above.
Is there a way to refactor jq into functions?
Prior to refactor:
jq ' .them ."keyName" ' ./some.json
After refactor:
def getThese(x): .them .$x;
in ~/.jq
and then call it with...
jq ' getThese("keyName") as $i | $i ' ./some.json
The above refactor does not appear to work (is there a way?)
The abbreviation '.x.y' will not work if y is a variable. Use the syntax '.x | .[ y ]' instead.
'E as $i| $i' can be written as 'E' in this case.
Your definition should be either:
def getThese(x): .them | .[x];
or with different semantics (and requiring a sufficiently recent version of jq):
def getThese($x): .them | .[$x];
One alternative would be to define getThem as:
def getThem(f): .them | f;
This would allow you to write: getThem(.keyName) for keys with unexceptional names.
I have a case where I need to parse quoted JSON in JSON.
I know which optional attributes will contain quoted JSON and which not.
Therefore, I want to check if the attribute keys are in a list of possible keys. I already have the following:
# attributes "a" and "b" contain quoted JSON
echo '{"a":"{\"x\":1}","y":2}' |
jq -c '
def is_json($o): ["a","b"] | (map(select(. == $o)) | length) > 0;
with_entries(if is_json(.key) then .value = (.value|fromjson) else . end)
'
This already produces the desired output: {"a":{"x":1},"y":2}. However, the checking of the attribute name looks clumsy, given that jq provides a lot built-in functions such as has, in, contains, inside, etc.
Question: Is there a better way of checking if an attribute key is in a given list?
Edit: Here is the current solution, based on peak's answer.
#!/bin/bash
to_array_string() { echo "$*" | awk -v OFS='","' 'NF > 0 {$1=$1; print "\""$0"\""}'; }
to_json_array_string() { echo "["`to_array_string "$#"`"]"; }
parse_json_jq() { jq -c "
reduce keys[] as \$key
(.; if any((`to_array_string "$#"`); . == \$key) and .[\$key] != null then .[\$key] |= fromjson else . end)
";}
There are three ways in which your program can be improved:
(efficiency) avoiding the creation of an unnecessary array (in is_json);
(efficiency) using "short-circuit" semantics to avoid
iterating unnecessarily;
(efficiency) avoid the construction/deconstruction involved with with_entries;
For the most part, I think you will agree that the alternatives offered here are simpler, more concise, or more readable.
If you have version 1.5 of jq or later, the main improvements
can be had using any/2:
def is_json($o): any( ("a","b"); . == $o );
with_entries(if is_json(.key) then .value |= fromjson else . end)
Notice also the use of '|=' where you had used '='.
If your jq does not have any/2, then you could use the following
definition, though it lacks short-circuit semantics:
def any(s): reduce s as $i (false; . == true or $i);
Finally, to avoid using with_entries, you could use reduce and eliminate is_json entirely:
reduce keys[] as $key
(.; if any(("a","b"); . == $key) then .[$key] |= fromjson else . end)