How does a `select (. == ["a","b"][])` predicate work in JQ? - json

I'm looking for ways to select JSON entries based on an array that I provide as a literal:
$ echo '["a","b","c","d"]' | jq '.[] | select (. == ["a","b"][] )'
"a"
"b"
In the code above, all entries are selected that are in the ["a","b"] array. However, I don't understand how the . == ["a","b"][] predicate works in detail and would be grateful for an explanation. The tricky part is the right-hand side of ==.
Related:
jq - How to select objects based on a 'whitelist' of property values

The key to understanding here is that jq is stream-oriented. ["a","b"][] produces a stream, ergo . == ["a","b"][] produces a stream. select selects the items that produce truthy values in that stream.
To gain an understanding of how jq works, it often helps to pull things apart. In the present case, you could begin by trying:
echo '["a","b","c","d"]' | jq '.[] | (. == ["a","b"][])'
debug is also helpful, e.g.
echo '["a","b","c","d"]' | jq '.[] | select(debug == ["a","b"][])'

Related

jq add list to object list until condition

Background
I have an object with each value being a nested list of only strings. For each string value within the nested list, look up the string value within the object and add all of its values into the current value.
Here's what I have so far:
#!/bin/bash
in=$(jq -n '{
"bar": [["re", "de"]],
"do": [["bar","baz"]],
"baz": [["re"]],
"re": [["zoo"]]
}')
echo "expected:"
jq -n '{
"bar": [["re", "de"], ["zoo"]],
"do": [["bar","baz"], ["re", "de"], ["re"], ["zoo"]],
"baz": [["re"], ["zoo"]],
"re": [["zoo"]]
}'
echo "actual:"
echo ${in} | jq '. as $origin
| map_values( . +
until(
length == 0;
(. | flatten | map($origin[.]) | map(select( . != [[]] )) | add )
)
)'
Problem:
The output is the exact same as the input $in. If the until() function is removed from the statement, then the output correctly outputs one iteration. Although I want to recursively lookup the output strings within the object and add the lookup value until the lookup value is empty or non-existing.
For example, the key do has a value of [["bar","baz"]]. If we iterate through the values of do we come across baz. The value of baz within the object is [["re"]]. Add baz's value ["re"] to do so that do equals: [["bar","baz"], ["re"]]. Since re IS a key within the object, add the value of ["re"] which is ["zoo"]. Since ["zoo"] is NOT a key within the object finish baz and continue to the next key within the object.
The following solves the problem as originally stated, but the "expected" output as shown does not quite match the stated problem.
echo ${in} | jq -c '
. as $dict
| map_values(reduce (..|strings) as $v (.;
. + $dict[$v] ))
'
produces (after some manual reformatting for clarity):
{"bar":[["re","de"],["zoo"]],
"do":[["bar","baz"],["re","de"],["re"]],
"baz":[["re"],["zoo"]],"re":[["zoo"]]}
If some kind of recursive lookup is needed, then please reformulate the problem statement, being sure to avoid infinite loops.

create json from bash variable and associative array [duplicate]

This question already has answers here:
Constructing a JSON object from a bash associative array
(5 answers)
Closed 5 months ago.
Lets say I have the following declared in bash:
mcD="had_a_farm"
eei="eeieeio"
declare -A animals=( ["duck"]="quack_quack" ["cow"]="moo_moo" ["pig"]="oink_oink" )
and I want the following json:
{
"oldMcD": "had a farm",
"eei": "eeieeio",
"onThisFarm":[
{
"duck": "quack_quack",
"cow": "moo_moo",
"pig": "oink_oink"
}
]
}
Now I know I could do this with an echo, printf, or assign text to a variable, but lets assume animals is actually very large and it would be onerous to do so. I could also loop through my variables and associative array and create a variable as I'm doing so. I could write either of these solutions, but both seem like the "wrong way". Not to mention its obnoxious to deal with the last item in animals, after which I do not want a ",".
I'm thinking the right solution uses jq, but I'm having a hard time finding much documentation and examples on how to use this tool to write jsons (especially those that are nested) rather than parse them.
Here is what I came up with:
jq -n --arg mcD "$mcD" --arg eei "$eei" --arg duck "${animals['duck']}" --arg cow "${animals['cow']}" --arg pig "${animals['pig']}" '{onThisFarm:[ { pig: $pig, cow: $cow, duck: $duck } ], eei: $eei, oldMcD: $mcD }'
Produces the desired result. In reality, I don't really care about the order of the keys in the json, but it's still annoying that the input for jq has to go backwards to get it in the desired order. Regardless, this solution is clunky and was not any easier to write than simply declaring a string variable that looks like a json (and would be impossible with larger associative arrays). How can I build a json like this in an efficient, logical manner?
Thanks!
Assuming that none of the keys or values in the "animals" array contains newline characters:
for i in "${!animals[#]}"
do
printf "%s\n%s\n" "${i}" "${animals[$i]}"
done | jq -nR --arg oldMcD "$mcD" --arg eei "$eei" '
def to_o:
. as $in
| reduce range(0;length;2) as $i ({};
.[$in[$i]]= $in[$i+1]);
{$oldMcD,
$eei,
onthisfarm: [inputs] | to_o}
'
Notice the trick whereby {$x} in effect expands to {(x): $x}
Using "\u0000" as the separator
If any of the keys or values contains a newline character, you could tweak the above so that "\u0000" is used as the separator:
for i in "${!animals[#]}"
do
printf "%s\0%s\0" "${i}" "${animals[$i]}"
done | jq -sR --arg oldMcD "$mcD" --arg eei "$eei" '
def to_o:
. as $in
| reduce range(0;length;2) as $i ({};
.[$in[$i]]= $in[$i+1]);
{$oldMcD,
$eei,
onthisfarm: split("\u0000") | to_o }
'
Note: The above assumes jq version 1.5 or later.
You can reduce associative array with for loop and pipe it to jq:
for i in "${!animals[#]}"; do
echo "$i"
echo "${animals[$i]}"
done |
jq -n -R --arg mcD "$mcD" --arg eei "$eei" 'reduce inputs as $i ({onThisFarm: [], mcD: $mcD, eei: $eei}; .onThisFarm[0] += {($i): (input | tonumber ? // .)})'

Parse JSON output for particular key fields

I have the following JSON content in a file.json.
I need only a particular key field from all this overwhelming information.
Let us assume that I need web_url,
The problem here is there are multiple key field with "web_url".
How do only get the web_url field I am after?
[{"id":196,"iid":1,"project_id":233,"title":"DEV to Master","description":"","state":"merged","created_at":"2019-12-04T14:14:35.424-06:00","updated_at":"2019-12-04T14:14:47.310-06:00","merged_by":{"id":122,"name":"Sengoku","username":"sengk","state":"active","avatar_url":"https://secure.gravatar.com/avatar/7cvffgfgfgfgf9eb1348d0ba7795a076?s=80\u0026d=identicon","web_url":"https://gitlaboo.tests.com/sengk"},"merged_at":"2019-12-04T14:14:47.468-06:00","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"DEV","upvotes":0,"downvotes":0,"author":{"id":122,"name":"Sengoku","username":"sengk","state":"active","avatar_url":"https://secure.gravatar.com/avatar/7fgdfdgdfgdvfg9eb1348d0ba7795a076?s=80\u0026d=identicon","web_url":"https://gitlaboo.tests.com/sengk"},"assignee":{"id":122,"name":"Sengoku","username":"sengk","state":"active","avatar_url":"https://secure.gravatar.com/avatar/7afsdfdvdfvfde24f89eb1348d0ba7795a076?s=80\u0026d=identicon","web_url":"https://gitlaboo.tests.com/sengk"},"source_project_id":233,"target_project_id":233,"labels":[],"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":false,"merge_status":"can_be_merged","sha":"6318e51ea8czfdfsdvdfvdfbc02988ba62c71e5774107e","merge_commit_sha":"6dc5vdfvdfgdfg5bf14e97dea949b8584c0c68d6","user_notes_count":0,"discussion_locked":null,"should_remove_source_branch":null,"force_remove_source_branch":false,"web_url":"https://gitlaboo.tests.com/demo/frog/merge_requests/1","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false}]
You have 4 web_url in your JSON.
Can check the below results,
.[] | .web_url
.[] | .merged_by.web_url
.[] | .author.web_url
.[] | .assignee.web_url
If the question is essentially how to find the needle in the haystack, the answer is: use paths; more specifically, in your case:
jq -c 'paths(. == "https://gitlaboo.tests.com/demo/frog/merge_requests/1")
| select(.[-1] == "web_url")
' file.json
The output gives the path as a JSON array:
[0,"web_url"]
This can be used directly in jq (using getpath/1), or as the basis for a direct query:
.[0].web_url

JSON to CSV: variable number of columns per row

I need to convert JSON to CSV where JSON has arrays of variable length, for example:
JSON objects:
{"labels": ["label1"]}
{"labels": ["label2", "label3"]}
{"labels": ["label1", "label4", "label5"]}
Resulting CSV:
labels,labels,labels
"label1",,
"label2","label3",
"label1","label4","label5"
There are many other properties in the source JSON, this is just an exсerpt for the sake of simplicity.
Also, I need to say that the process has to work with JSON as a stream because source JSON could be very large (>1GB).
I wanted to use jq with two passes, the first pass would collect the maximum length of the 'labels' array, the second pass would create CSV as the number of the resulting columns is known by this time. But jq doesn't have a concept of global variables, so I don't know where I can store the running total.
I'd like to be able to do that on Windows via CLI.
Thank you in advance.
The question shows a stream of JSON objects, so the following solutions assume that the input file is already a sequence as shown. These solutions can also easily be adapted to cover the case where the input file contains a huge array of objects, e.g. as discussed in the epilog.
A two-invocation solution
Here's a two-pass solution using two invocations of jq. The presentation assumes a bash-like environment, in case you have wsl:
n=$(jq -n 'reduce (inputs|.labels|length) as $i (-1;
if $i > . then $i else . end)' stream.json)
jq -nr --argjson n $n '
def fill($n): . + [range(length;$n)|null];
[range(0;$n)|"labels"],
(inputs | .labels | fill($n))
| #csv' stream.json
Assuming the input is as described, this is guaranteed to produce valid CSV. Hopefully you can adapt the above to your shell as necessary -- maybe this link will help:
Assign output of a program to a variable using a MS batch file
Using input_filename and a single invocation of jq
Unfortunately, jq does not have a "rewind" facility, but
there is an alternative: read the file twice within a single invocation of jq. This is more cumbersome than the two-invocation solution above but avoids any difficulties associated with the latter.
cat sample.json | jq -nr '
def fill($n): . + [range(length;$n)|null];
def max($x): if . < $x then $x else . end;
foreach (inputs|.labels) as $in ( {n:0};
if input_filename == "<stdin>"
then .n |= max($in|length)
else .printed+=1
end;
if .printed == null then empty
else .n as $n
| (if .printed == 1 then [range(0;$n)|"labels"] else empty end),
($in | fill($n))
end)
| #csv' - sample.json
Another single-invocation solution
The following solution uses a special value (here null) to delineate the two streams:
(cat stream.json; echo null; cat stream.json) | jq -nr '
def fill($n): . + [range(length; $n) | null];
def max($x): if . < $x then $x else . end;
(label $loop | foreach inputs as $in (0;
if $in == null then . else max($in|.labels|length) end;
if $in == null then ., break $loop else empty end)) as $n
| [range(0;$n)|"labels"],
(inputs | .labels | fill($n))
| #csv '
Epilog
A file with a top-level JSON array that is too large to fit into memory can be converted into a stream of the array's items by invoking jq with the --stream option, e.g. as follows:
jq -cn --stream 'fromstream(1|truncate_stream(inputs))'
For such a large file, you will probably want to do this in two separate invocations, one to get the count, then another to actually output the csv. If you wanted to read the whole file into memory, you could do this in one, but we definitely don't want to do that, we'll want to stream it in where possible.
Things get a little ugly when it comes to storing the result of commands to a variable, writing to a file might be simpler. But I'd rather not use temp files if we don't have to.
REM assuming in a batch file
for /f "usebackq delims=" %%i in (`jq -n --stream "reduce (inputs | .[0][1] + 1) as $l (0; if $l > . then $l else . end)" input.json`) do set cols=%%i
jq -rn --stream --argjson cols "%cols%" "[range($cols)|\"labels\"],(fromstream(1|truncate_stream(inputs))|[.[],(range($cols-length)|null)])|#csv" input.json
> jq -n --stream "reduce (inputs | .[0][1] + 1) as $l (0; if $l > . then $l else . end)" input.json
For the first invocation to get the count of columns, we're just taking advantage of the fact that the paths to the array values could be used to indicate the lengths of the arrays. We'll just want to take the max across all items.
> jq -rn --stream --argjson cols "%cols%" ^
"[range($cols)|\"labels\"],(fromstream(1|truncate_stream(inputs))|[.[],(range($cols-length)|null)])|#csv" input.json
Then to output the rest, we're just taking the labels array (assuming it's the only property on the objects) and padding them out with null up to the $cols count. Then output as csv.
If the labels are in a different, deeply nested path than what's in your example here, you'll need to select based on the appropriate paths.
set labelspath=foo.bar.labels
jq -rn --stream --argjson cols "%cols%" --arg labelspath "%labelspath%" ^
"($labelspath|split(\".\")|[.,length]) as [$path,$depth] | [range($cols)|\"labels\"],(fromstream($depth|truncate_stream(inputs|select(.[0][:$depth] == $path)))|[.[],(range($cols-length)|null)])|#csv" input.json

jq 1.5 print items from array that is inside another array

Incoming json file contains json array per row eg:
["a100","a101","a102","a103","a104","a105","a106","a107","a108"]
["a100","a102","a103","a106","a107","a108"]
["a100","a99"]
["a107","a108"]
a "filter array" would be ["a99","a101","a108"] so I can slurpfile it
Trying to figure out how to print only values that are inside "filter array", eg the output:
["a101","a108"]
["a108"]
["a99"]
["a108"]
You can port IN function from jq 1.6 to 1.5 and use:
def IN(s): any(s == .; .);
map(select(IN($filter_array[])))
Or even shorter:
map(select(any($filter_array[]==.;.)))
I might be missing some simpler solution, but the following works :
map(select(. as $in | ["a99","a101","a108"] | contains([$in])))
Replace the ["a99","a101","a108"] hardcoded array by your slurped variable.
You can try it here !
In the example, the arrays in the input stream are sorted (in jq's sort order), so it is worth noting that in such cases, a more efficient solution is possible using the bsearch built-in, or perhaps even better, the definition of intersection/2 given at https://rosettacode.org/wiki/Set#Finite_Sets_of_JSON_Entities
For ease of reference, here it is:
def intersection($A;$B):
def pop:
.[0] as $i
| .[1] as $j
| if $i == ($A|length) or $j == ($B|length) then empty
elif $A[$i] == $B[$j] then $A[$i], ([$i+1, $j+1] | pop)
elif $A[$i] < $B[$j] then [$i+1, $j] | pop
else [$i, $j+1] | pop
end;
[[0,0] | pop];
Assuming a jq invocation such as:
jq -c --argjson filter '["a99","a101","a108"]' -f intersections.jq input.json
an appropriate filter would be:
($filter | sort) as $sorted
| intersection(.; $sorted)
(Of course if $filter is already presented in jq's sort order, then the initial sort can be skipped, or replaced by a check.)
Output
["a101","a108"]
["a108"]
["a99"]
["a108"]
Unsorted arrays
In practice, jq's builtin sort filter is usually so fast that it might be worthwhile simply sorting the arrays in order to use intersection as defined above.