Converting object to array or keeping an array with jq - json

I have an input file which type may vary from array to object, one of the following inputs expected:
[{"version": "1.0"}]
{version: "1.0"}
How can I construct jq expression for the output to be always converted into array. I came up with the following:
jq 'if (select(has("version")?)) then [.] else . end'
once version key is matched, object is added inside array, but if not matched, that would mean it's already an array, nothing is printed, and I would it expect it to be printed as it is. Please suggest the right way to achieve it.

You can check the input's type directly:
jq 'if type == "object" then [.] else . end'
Demo
Or use the deconstruction alternative operator ?// (available since jq 1.6):
jq '. as [$v] ?// $v | [$v]'
Demo

Auxiliary to pmf's answer, why doesn't this work?
input.jsonl:
[{"version": "1.0"}]
{"version": "2.0"}
Command:
jq -c 'if (select(has("version")?)) then [.] else . end'
Actual output:
[{"version": "2.0"}]
Desired output:
[{"version": "1.0"}]
[{"version": "2.0"}]
First of all, select filters out values. For each value input to select, it evaluates the filter you give it on its input, and if that filter evaluates to a true value, the whole input value it emitted unchanged. Otherwise there is no output. Using select here is unhelpful, because you want a true or false value for the if condition. If the condition part of your if-then-else expression emits no output then the whole if-then-else expression emits no output.
A more thorough way to understand select(expr) is that every time expr emits a true value then the select(expr) emits its input value. A more thorough way to understand if cond then a else b end is that every time cond emits a true value the whole if-then-else emits a and every time cond emits a false value the whole if-then-else emits b.
Okay, so forget the select... why doesn't this work?
Command:
jq -c 'if (has("version")?) then [.] else . end'
Actual output:
[{"version": "2.0"}]
In this case, the ? is the error suppression operator and is equivalent to try has("version") which is itself equivalent to try has("version") catch empty. This means that when an error occurs the expression returns empty which means no output. An error does indeed occur when the input is a list instead of an object. When the condition part of the if-then-else expression emits no output, you guessed it, the whole expression emits no output.
You could make this work by doing this instead:
Command:
jq -c 'if (try has("version") catch false) then [.] else . end'
Actual output:
[{"version":"1.0"}]
[{"version":"2.0"}]
Of course that's a bit roundabout. You should follow pmf's answer. But this perhaps helps you understand why your attempt didn't go as you expected.
As a rule of thumb, try to make sure your select and if conditions always emit exactly one output for each input - that way you will not be surprised. Expressions that can emit zero outputs (e.g. expr? or select(...)) or multiple outputs (e.g. .[]) will make for a confusing time when used as conditions in select or if.

The arrays function selects an input if it is an array. Combined with // you can handle the case where it isn't:
$ jq -c 'arrays // [.]' versions.json
[{"version":"1.0"}]
[{"version":"2.0"}]
Where versions.json is:
[{"version":"1.0"}]
{"version":"2.0"}

Related

Print from jq using a wild card (or coalesce to first non null)

I have the following command:
kubectl get pod -A -o=json | jq -r '.items[]|select(any( .status.containerStatuses[]; .state.waiting or .state.terminated))|[.metadata.namespace, .metadata.name]|#csv'
This command works great. It outputs both the namespace and name of my failing pods.
But now I want to add one more column to the results. The column I want is located in one (and only one) of two places:
.status.containerStatuses[].state.waiting.reason
.status.containerStatuses[].state.terminated.reason
I first tried adding .status.containerStatuses[].state.*.reason to the results fields array. But that gave me an unexpected '*' compile error.
I then got to thinking about how I would do this with SQL or another programming language. They frequently have a function that will return the first non-null value of its parameters. (This is usually called coalesce). However I could not find any such command for jq.
How can I return the reason as a result of my query?
jq has a counterpart to "coalesce" in the form of //.
For example, null // 0 evaluates to 0, and chances are that it will suffice in your case, perhaps:
.status.containerStatuses[].state | (.waiting // .terminated) | .reason
or
.status.containerStatuses[].state | (.waiting.reason // .terminated.reason )
or similar.
However, // should only be used with some understanding of what it does, as explained in detail on the jq FAQ at https://github.com/stedolan/jq/wiki/FAQ#or-versus-
If // is inapplicable for some reason, then the obvious alternative would be an if ... then ... else ... end statement, which is quite like C's _ ? _ : _ in that it can be used to produce a value, e.g. something along the lines of:
.status.containerStatuses[].state
| if has("waiting") then .waiting.reason
else .terminated.reason
end
However, if containerStatuses is an array, then some care may be required.
In case you want to go with coalesce:
# Stream-oriented version
def coalesce(s):
first(s | select(. != null)) // null;
or if you prefer to work with arrays:
# Input: an array
# Output: the first non-null element if any, else null
def coalesce: coalesce(.[]);
Using the stream-oriented version, you could write something along the lines you had in mind with the wildcard, e.g.
coalesce(.status.containerStatuses[].state[].reason?)

Bash case statement doesnt work with json string value using jq

I working on a function which extract the choosen track from a media container (mkv,mp4...etc). One of its major feature will be the "auto output file extension assigner".
the process will be the following...
step 1) when i give the script the number of the track, which i want to extract, it automatically inspect the source file with mediainfo and output the results in JSON format.
step 2) With JQ, i query the value of the "track" key from the selected track, and save it to the "mediaFormat" variable.
step 3) put this variable in a switch statement and compare with a predefined list of switches. If there is a match, then it will initialize the "mediaExtension" variable
with the appropriate value, which will be used as a extension of the ouput file.
For now i just want echo the "mediaExtension" variable, to see if it works. And it DIDN'T WORK.
The problem is step 1-2 works as expected, but somehow the switch statement (step 3) doesn't work. Only the (*) switch will be executed, which means it doesn't recognize the "AVC" switch.
#!/bin/bash
# INCLUDES
# mediainfo binary
PATH=/cygdrive/c/build_suite/local64/bin-video:$PATH;
# jq binary
PATH=/cygdrive/c/build_suite/local64/bin-global:$PATH;
# BASH SETTINGS
set -x;
# FUNCTION PARAMETER
function inspectExtension () {
mediaFormat=$(mediainfo "$1" --Output=JSON | jq ".media.track[$2].Format");
case $mediaFormat in
"AVC") mediaExtension="264";;
*) echo "ERROR";;
esac
set "$mediaExtension";
echo "$mediaExtension";
}
inspectExtension "test.mp4" "1";
read -p "Press enter to continue...";
And as you can see, in this script i activated tracing (set -x), and this is what i see in the console window (i use cygwin on windows 10).
+ inspectExtension test.mp4 1
++ mediainfo test.mp4 --Output=JSON
++ jq '.media.track[1].Format'
' mediaFormat='"AVC"
+ case $mediaFormat in
+ echo ERROR
ERROR
+ set ''
+ echo ''
+ read -p 'Press enter to continue...'
Press enter to continue...
Any ideas? Or is something what i miss here?
Thx for the help!
Maybe the only thing you miss is using the --raw-output option of jq like so:
mediaFormat=$(mediainfo "$1" --Output=JSON | jq --raw-output ".media.track[$2].Format");
Whenever you use jq to access some string values, it will be best to use the --raw-output option because it get's rid of the enclosing quotes.
Assuming you want mediaFormat to be a JSON value (i.e., assuming the invocation of jq is the way you have it), "AVC" in the case statement should be quoted:
'"AVC"' ) ...
In addition, it would probably be safer to quote the argument of case.

jq - order by value

I have the following structure:
{"ID":"XX","guid":1}
{"ID":"YY","guid":2}
...
I have tried running:
jq 'sort_by(.guid)' conn.json
I however get an error:
Cannot index string with string "guid"
Please can you advise how I'd sort the file by guid and/or find the record where guid is the largest?
UPDATE
What I am actually looking for is the record where the GUID is the largest in the dataset. Thought sorting it would help me but it's proving to be very slow
Thanks
sort_by assumes its input is iterable, and expands it by applying .[] before sorting its members. You're providing a stream of objects to it, and each object expands to a stream of non-indexable values ("XX", 1 etc.) in this case, thus .guid fails.
Slurp them to make it work, e.g:
jq -s 'sort_by(.guid)[]' conn.json
To extract the object with the largest GUID, you wouldn't sort the slurped input manually; for such tasks, jq has max_by, e.g:
jq -s 'max_by(.guid)' conn.json
and reduce, which is a more convenient construct for large inputs and eliminates the need for slurping.
jq 'reduce inputs as $in (input; if $in.guid > .guid then $in else . end)' conn.json

Find whether length of a string in json array exceeds certain limit

I have a file which contains many json arrays. I need to find if length of any value in any of the array exceeds a limit, say 1000. If it exceeds I have to trim the length of that particular value. Post that file will be fed to downstream application. What is the best possible solution to be implemented in shell scripting. Tried jq and sed but that doesn't seem to work. Maybe I haven't explored them completely. Any suggestion on this use case will be highly appreciated!
Unfortunately the originally posted question is rather vague on a number of points, so I'll first focus on determining whether an arbitrary JSON document has a string value (excluding key names) that exceeds a certain given size.
To find the maximum of a stream of numbers, we can write:
def max(stream): reduce stream as $s (null;
if $s > . then $s else . end);
Let us suppose the above def, together with the following line, is in a file named max.jq:
max( .. | strings | length) > $mx
Then we could find the answer by running a command such as:
jq --argjson mx 4 -f max.jq INPUT.json
A shorter but possibly less space-efficient answer
jq --argjson mx 4 '[..|strings|length]|max > $mx' INPUT.json
Variants
There are many possible variants, e.g. you might want to arrange things so that jq returns a suitable return code rather than emitting a boolean value.
Truncating long strings
To truncate strings longer than a given length, say $mx, you could use walk/1, like so:
walk(if type == "string" and length > $mx
then .[:$mx] else . end)

Conditional variables in JQ json depending on argument value?

I am trying to build a json with jq with --arg arguments however I'd like for the json not to be able to have a condition if the variable is empty.
An example, if I run the following command
jq -n --arg myvar "${SOMEVAR}" '{ $myvar}'
I'd like the json in that case to be {} if myvar happens to be empty (Because the variable ${SOMEVAR} does not exist) and not { "myvar": "" } which is what I get by just running the command above.
Is there any way to achieve this through some sort of condition?
UPDATE:
Some more details about the use case
I want to build a json based on several environment variables but only include the variables that have a value.
Something like
{"varA": "value", "varB": "value"}
But only include varA if its value is defined and so on. The issue now is that if value is not defined, the property varA will still exist with an empty value and because of the multiple argument/variable nature, using an if/else to build the entire json as suggested will lead to a huge amount of conditions to cover for every possible combination of variables not existing
Suppose you have a template of variable names, in the form of an object as you have suggested you want:
{a, b, c}
Suppose also (for the sake of illustration) that you want to pull in the corresponding values from *ix environment variables. Then you just need to adjust the template, which can be done using this filter:
def adjust: with_entries( env[.key] as $v | select($v != null) | .value = $v );
Example:
Assuming the above filter, together with the following line, is in a file named adjust.jq:
{a,b,c} | adjust
then:
$ export a=123
$ jq -n -f -c adjust.jq
{"a":"123"}
You can use an if/else construct:
jq -n --arg myvar "${SOMEVAR}" 'if ($myvar|length > 0) then {$myvar} else {} end'
It's still not clear where the variable-value pairs are coming from, so maybe it would be simplest to construct the object containing the mapping before invoking jq, and then passing it in using the --argjson or --argfile option?