jq - order by value - json

I have the following structure:
{"ID":"XX","guid":1}
{"ID":"YY","guid":2}
...
I have tried running:
jq 'sort_by(.guid)' conn.json
I however get an error:
Cannot index string with string "guid"
Please can you advise how I'd sort the file by guid and/or find the record where guid is the largest?
UPDATE
What I am actually looking for is the record where the GUID is the largest in the dataset. Thought sorting it would help me but it's proving to be very slow
Thanks

sort_by assumes its input is iterable, and expands it by applying .[] before sorting its members. You're providing a stream of objects to it, and each object expands to a stream of non-indexable values ("XX", 1 etc.) in this case, thus .guid fails.
Slurp them to make it work, e.g:
jq -s 'sort_by(.guid)[]' conn.json
To extract the object with the largest GUID, you wouldn't sort the slurped input manually; for such tasks, jq has max_by, e.g:
jq -s 'max_by(.guid)' conn.json
and reduce, which is a more convenient construct for large inputs and eliminates the need for slurping.
jq 'reduce inputs as $in (input; if $in.guid > .guid then $in else . end)' conn.json

Related

Using jq to get the nth element of a stream of JSON entities

I have the following json file and I would like access the nth element in it.
$ cat a.json
{"one":1}
{"two":2}
{"three":3}
Here, the json file is in disassembled form. When I try
$ jq '.[0]' a.json
I get the errorjq: error (at a.json:1): Cannot index object with number. The expected output is {"one":1}.
jq has a built-in filter, nth($n; s) for solving this type of problem very efficiently. For example, to emit the second item, you would simply invoke:
jq -n 'nth(1; inputs)' a.json
Note that counting of the items starts from 0.
Efficiency is achieved by avoiding any "slurping" (whether by using the -s option, or by using [inputs]), and by stopping once the requested item has been read.
Your input is not an array of objects, therefore you can't access with the .[0] notation. You could slurp the whole input with -s, but you can also use inputs with -n (null input mode) and get the nth element by specifying the index, i.e to get element at index 2
# doesn't support negative indices
jq -n '[inputs] | nth(2)' a.json
# supports negative indices, i.e. [-1] will get the last element
jq -n '[inputs][2]' a.json

Providing a very large argument to a jq command to filter on keys

I am trying to parse a very large file which consists of JSON objects like this:
{"id": "100000002", "title": "some_title", "year": 1988}
Now I also have a very big list of ID's that I want to extract from the file, if they are there.
Now I know that I can do this:
jq '[ .[map(.id)|indices("1", "2")[]] ]' 0.txt > p0.json
Which produces the result I want, namely fills p0.json with only the objects that have "id" 1 and "2". Now comes the problem: my list of id's is very long too (100k or so). So I have a Python programm that outputs the relevant id's. My line of thought was, to first assign that to a variable:
REL_IDS=`echo python3 rel_ids.py`
And then do:
jq --arg ids "$REL_IDS" '[ .[map(.id)|indices($ids)[]] ]' 0.txt > p0.json
I tried both with brackets [$ids] and without brackets, but no luck so far.
My question is, given a big amount of arguments for the filter, how would I proceed with putting them into my jq command?
Thanks a lot in advance!
Since the list of ids is long, the trick is NOT to use --arg. However, the details will depend on the details regarding the "long list of ids".
In general, though, you'd want to present the list of ids to jq as a file so that you could use --rawfile or --slurpfile or some such.
If for some reason you don't want to bother with an actual file, then provided your shell allows it, you could use these file-oriented options with process substitution: <( ... )
Example
Assuming ids.json contains a lising of the ids as JSON strings:
"1"
"2"
"3"
then one could write:
< objects.json jq -c -n --slurpfile ids ids.json '
inputs | . as $in | select( $ids | index($in.id))'
Notice the use of the -n command-line option.

jq to remove one of the duplicated objects

I have a json file like this:
{"caller_id":"123321","cust_name":"abc"}
{"caller_id":"123443","cust_name":"def"}
{"caller_id":"123321","cust_name":"abc"}
{"caller_id":"234432","cust_name":"ghi"}
{"caller_id":"123321","cust_name":"abc"}
....
I tried:
jq -s 'unique_by(.field1)'
but this will remove all the duplicated items, I,m looking to keep just one of the duplicated items, to get the file like this:
{"caller_id":"123321","cust_name":"abc"}
{"caller_id":"123443","cust_name":"def"}
{"caller_id":"234432","cust_name":"ghi"}
....
With field1, I doubt you are getting anything in the output, since there is no key/field with the given name. If you simply change your command to jq -s 'unique_by(.caller_id)' it will give you desired result containing unique & sorted objects based on caller_id key. It will ensure in result you have atleast & atmost one object for each caller_id.
NOTE: Same as what #Jeff Mercado has explained in the comments.
If the file consists of a sequence (stream) of JSON objects, then a very simple way to produce a stream of the distinct objects would be to use the invocation:
jq -s `unique[]`
A similar alternative would be:
jq -n `[inputs] | unique[]`
For large files, however, the above will probably be too inefficient, both with respect to RAM and run-time. Note that both unique and unique_by entail a sort.
A far better alternative would be to take advantage of the fact that the input is a stream, and to avoid the built-in unique and unique_by filters. This can be done with the assistance of the following filters, which are not yet built-in but likely to become so:
# emit a dictionary
def set(s): reduce s as $x ({}; .[$x | (type[0:1] + tostring)] = $x);
# distinct entities in the stream s
def distinct(s): set(s)[];
We now have only to add:
distinct(inputs)
to achieve the objective, provided jq is invoked with the -n command-line option.
This approach will also preserve the original ordering.
If the input is an array ...
If the input is an array, then using distinct as defined above still has the advantage of not requiring a sort. For arrays that are too large to fit comfortably in memory, it would be advisable to use jq's streaming parser to create a stream.
One possibility would be to proceed in two steps (jq --stream .... | jq -n ...), but it might be better to do everything in one step (jq -cn --stream ...), using the following "main" program:
distinct(fromstream(inputs
| (.[0] |= .[1:] )
| select(. != [[]])))

I cannot get jq to give me the value I'm looking for.

I'm trying to use jq to get a value from the JSON that cURL returns.
This is the JSON cURL passes to jq (and, FTR, I want jq to return "VALUE-I-WANT" without the quotation marks):
[
{
"success":{
"username":"VALUE-I-WANT"
}
}
]
I initially tried this:
jq ' . | .success | .username'
and got
jq: error (at <stdin>:0): Cannot index array with string "success"
I then tried a bunch of variations, with no luck.
With a bunch of searching the web, I found this SE entry, and thought it might have been my saviour (spoiler, it wasn't). But it led me to try these:
jq -r '.[].success.username'
jq -r '.[].success'
They didn't return an error, they returned "null". Which may or may not be an improvement.
Can anybody tell me what I'm doing wrong here? And why it's wrong?
You need to pipe the output of .[] into the next filter.
jq -r '.[] | .success.username' tmp.json
tl;dr
# Extract .success.username from ALL array elements.
# .[] enumerates all array elements
# -r produces raw (unquoted) output
jq -r '.[].success.username' file.json
# Extract .success.username only from the 1st array element.
jq -r '.[0].success.username' file.json
Your input is an array, so in order to access its elements you need .[], the array/object-value iterator (as the name suggests, it can also enumerate the properties of an object):
Just . | sends the input (.) array as a whole through the pipeline, and an array only has numerical indices, so the attempt to index (access) it with .success.username fails.
Thus, simply replacing . | with .[] | in your original attempt, combined with -r to get raw (unquoted output), should solve your problem, as shown in chepner's helpful answer.
However, peak points out that since at least jq 1.3 (current as of this writing is jq 1.5) you don't strictly need a pipeline, as demonstrated in the commands at the top.
So the 2nd command in your question should work with your sample input, unless you're using an older version.

Conditional variables in JQ json depending on argument value?

I am trying to build a json with jq with --arg arguments however I'd like for the json not to be able to have a condition if the variable is empty.
An example, if I run the following command
jq -n --arg myvar "${SOMEVAR}" '{ $myvar}'
I'd like the json in that case to be {} if myvar happens to be empty (Because the variable ${SOMEVAR} does not exist) and not { "myvar": "" } which is what I get by just running the command above.
Is there any way to achieve this through some sort of condition?
UPDATE:
Some more details about the use case
I want to build a json based on several environment variables but only include the variables that have a value.
Something like
{"varA": "value", "varB": "value"}
But only include varA if its value is defined and so on. The issue now is that if value is not defined, the property varA will still exist with an empty value and because of the multiple argument/variable nature, using an if/else to build the entire json as suggested will lead to a huge amount of conditions to cover for every possible combination of variables not existing
Suppose you have a template of variable names, in the form of an object as you have suggested you want:
{a, b, c}
Suppose also (for the sake of illustration) that you want to pull in the corresponding values from *ix environment variables. Then you just need to adjust the template, which can be done using this filter:
def adjust: with_entries( env[.key] as $v | select($v != null) | .value = $v );
Example:
Assuming the above filter, together with the following line, is in a file named adjust.jq:
{a,b,c} | adjust
then:
$ export a=123
$ jq -n -f -c adjust.jq
{"a":"123"}
You can use an if/else construct:
jq -n --arg myvar "${SOMEVAR}" 'if ($myvar|length > 0) then {$myvar} else {} end'
It's still not clear where the variable-value pairs are coming from, so maybe it would be simplest to construct the object containing the mapping before invoking jq, and then passing it in using the --argjson or --argfile option?