filter keys in JSON using jq - json

I am having a complex nested json
{
...
"key1": {
"key2" : [
{ ...
"base_score" :4.5
}
]
"key3": {
"key4": [
{ ...
"base_score" : 0.5
...
}
]
}
...
}
}
There maybe multiple "base_score" in the json("base_score" path is unknown) and the corresponding value will be a number, I have to check if at least one such value is greater than some known value 7.0, and if there is, I have to do "exit 1". I have to write this query in shell script.

Assuming the input is valid JSON in a file named input.json, then based on my understanding of the requirements, you could go with:
jq --argjson limit 7.0 '
any(.. | select(type=="object" and (.base_score|type=="number")) | .base_score; . > $limit)
| halt_error(if . then 1 else 0 end)
' input.json
You can modify the argument to halt_error to set the exit code as you wish.
Note that halt_error redirects its input to stderr, so you might want to append 2> /dev/null (or the equivalent expression appropriate for your shell) to the above invocation.

You can easily get a stream of base_score values at any level and use that with any:
any(..|.base_score?; . > 7)
The stream will contain null values for objects without the property, but null is not greater than any number, so that shouldn't be a stopper.
You could then compare the output or specify -e/--exit-status to be used with a condition directly:
jq -e 'any(..|.base_score?; . > 7)' complexnestedfile.json >/dev/null && exit 1

Related

Conditionally add json array using jq and bash

Am trying to add a json array with just one array element to a json. And it has to be added only if it does not already exist.
Example json is below.
{
"lorem": "2.0",
"ipsum": {
"key1": "value1",
"key2": "value2"
},
"schemes": ["https"],
"dorum" : "value3"
}
Above is the json, where "schemes": ["https"], exists. Am trying to add schemes only if it does not exist using the below code.
scheme=$( cat rendertest.json | jq -r '. "schemes" ')
echo $scheme
schem='["https"]'
echo "Scheme is"
echo $schem
if [ -z $scheme ]
then
echo "all good"
else
jq --argjson argval "$schem" '. += { "schemes" : $schem }' rendertest.json > test.json
fi
I get the below error in a file when the json array element 'schemes' does not exist. It returns a null and errors out. Any idea where am going wrong?
null
Scheme is
["https"]
jq: error: schem/0 is not defined at <top-level>, line 1:
. += { "schemes" : $schem }
jq: 1 compile error
Edit: the question is not about how to pass on bash variables to jq.
Just use an explicit if condition that checks for the attribute schemes in the root level of the JSON structure
schem='["https"]'
After setting the above variable in your shell, run the following filter
jq --argjson argval "$schem" 'if has("schemes")|not then .+= { schemes: $argval } else . end' json
The argument immediately after the --argjson field is the one that needs to be used in the context of jq, but you were trying to use $schem in the context which is incorrect.
You can even go one level further and check even if schemes is present and if it does not contain the value you expect, then make the overwrite. Modify the filter within '..' to
( has("schemes")|not ) or .schemes != $argval )
which can be run as
jq --argjson argval "$schem" 'if ( (has("schemes")|not) or .schemes != $argval) then (.schemes: $argval) else . end'

How to conditionally do a recursive merge?

I'd like to conditionally do a recursive merge. That is, if a key exists in the second object, I'd like to use it to override values in the first. For example, this does what I want:
$ echo '{"a":"value"}{"bar": {"a":"override"}}' | jq -sS '.[0] * if (.[1].foo|length) > 0 then .[1].foo else {} end'
{
"a": "value"
}
$ echo '{"a":"value"}{"foo": {"a":"override"}}' | jq -sS '.[0] * if (.[1].foo|length) > 0 then .[1].foo else {} end'
{
"a": "override"
}
In the first example, the second object does not contain a "foo" key, so the override does not happen. In the 2nd example, the second object does contain "foo", so the value is changed. (In my actual use, I always have 3 objects on the input and sometimes have a 4th which may override some of the previous values.)
Although the above works, it seems absurdly ugly. Is there a cleaner way to do this? I imagine something like jq -sS '.[0] * (.[1].foo ? .[1].foo : {}) or similar.
With -n flag specified on the command line this should do the trick:
reduce inputs as $in (input; . * ($in.foo // {}))
jqplay demo

"Ternary logic" for returned value: foo, bar or error

I've got two different JSON structures to retrieve a specific object value from, basically something like this
{
"one": {
"foo": {
"bar": "baz"
}
}
}
and another like that
{
"two": {
"foo": {
"bar": "qux"
}
}
}
I'd like to return the bar value in both cases plus an additional return variant error in case neither case 1 - baz - nor case 2 - qux - matches anything (i.e. matches null).
Is there a simple way to do that with just jq 1.6?
Update:
Here are snippets of actual JSON files:
/* manifest.json, variant A */
{
"browser_specific_settings": {
"gecko": {
"id": "{95ad7b39-5d3e-1029-7285-9455bcf665c0}",
"strict_min_version": "68.0"
}
}
}
/* manifest.json, variant B */
{
"applications": {
"gecko": {
"id": "j30D-3YFPUvj9u9izFoPSjlNYZfF22xS#foobar",
"strict_min_version": "53.0"
}
}
}
I need the id values (*gecko.id so to speak) or error if there is none:
{95ad7b39-5d3e-1029-7285-9455bcf665c0}
j30D-3YFPUvj9u9izFoPSjlNYZfF22xS#foobar
error
You can use a filter as below that could work with both your sample JSON content provided
jq '.. | if type == "object" and has("id") then .id else empty end'
See them live jqplay - VariantA and jqplay - VariantB
Note: This only gets the value of .id when it is present, see others answers (oguz ismail's) for displaying error when the object does not contain the required field.
(.. | objects | select(has("id"))).id // "error"
This will work with multiple files and files containing multiple separate entities.
jqplay demo
You can use a combination of the ? "error suppression" and // "alternative` operators :
jq -n --slurpfile variantA yourFirstFile --slurpfile variantB yourSecondFile \
'(
($variantA[0].browser_specific_settings.gecko.id)?,
($variantB[0].applications.gecko.id)?
) // "error"'
This will output the id from the first file and/or the id from the second file if any of the two exist, avoiding to raise errors when they don't, and output error instead if none of them can be found.
The command can be shortened as follows if it makes sense in your context :
jq -n --slurpfile variantA yourFirstFile --slurpfile variantB yourSecondFile \
'(($variantA[0].browser_specific_settings, $variantB[0].applications) | .gecko.id)? // "error"'
I think you are looking for hasOwnProperty()
for example:
var value;
if(applications.gecko.hasOwnProperty('id'))
value = applications.gecko.id;
else
value = 'error';
I need the id values (*gecko.id so to speak) or error if there is none:
In accordance with your notation "*gecko.id", here are two solutions, the first interpreting the "*" as a single unknown key (or index), the second interpreting it (more or less) as any number of keys or indices:
.[] | .gecko.id // "error"
.. | objects | select(has("gecko")) | (.gecko.id // "error")
If you don't really care about whether there's a "gecko" key, you might wish to consider:
first(.. | objects | select(has("id")).id ) // "error"

How to skip first n objects in jq input

I have a VERY large stream of objects, which I am trying to import into MongoDB. I keep getting a broken pipe after about 10k objects, so I would like to be able to update my import script to skip the already imported objects and begin with the first one that was missed.
It seems to me that the tool for this would be jq. What I need is a way to skip (yield empty) all items before the nth, and then output the rest as-is.
I've tried using foreach to maintain an object counter, but I keep ending up with 1 as the value of the counter, for all objects in my small test sample (using a bash here document):
$ jq 'foreach . as $item (0; (.+1); [ . , if . < 2 then empty else $item end ])' <<"end"
> { "item": "first" }
> { "item": "second" }
> { "item": "third" }
> { "item": "fourth" }
> end
The output from this is:
[
1
]
[
1
]
[
1
]
[
1
]
Any suggestions would be most welcome.
def skip(n; stream):
foreach stream as $s (0; .+1; select(. > n) | $s);
Example:
skip(1000; inputs)
(When using inputs and/or input, don't forget you'll probably want to use the -n command-line option.)
Sledgehammer Approach
try (range(0; 1000) | input | empty), inputs
In this case, the try is necessary to avoid an error should there be fewer than the requested number of items.

Flatten nested JSON using jq

I'd like to flatten a nested json object, e.g. {"a":{"b":1}} to {"a.b":1} in order to digest it in solr.
I have 11 TB of json files which are both nested and contains dots in field names, meaning not elasticsearch (dots) nor solr (nested without the _childDocument_ notation) can digest it as is.
The other solutions would be to replace dots in the field names with underscores and push it to elasticsearch, but I have far better experience with solr therefore I prefer the flatten solution (unless solr can digest those nested jsons as is??).
I will prefer elasticsearch only if the digestion process will take far less time than solr, because my priority is digesting as fast as I can (thus I chose jq instead of scripting it in python).
Kindly help.
EDIT:
I think the pair of examples 3&4 solves this for me:
https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/
I'll try soon.
You can also use the following jq command to flatten nested JSON objects in this manner:
[leaf_paths as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries
The way it works is: leaf_paths returns a stream of arrays which represent the paths on the given JSON document at which "leaf elements" appear, that is, elements which do not have child elements, such as numbers, strings and booleans. We pipe that stream into objects with key and value properties, where key contains the elements of the path array as a string joined by dots and value contains the element at that path. Finally, we put the entire thing in an array and run from_entries on it, which transforms an array of {key, value} objects into an object containing those key-value pairs.
This is just a variant of Santiago's jq:
. as $in
| reduce leaf_paths as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })
It avoids the overhead of the key/value construction and destruction.
(If you have access to a version of jq later than jq 1.5, you can omit the "map(tostring)".)
Two important points about both these jq solutions:
Arrays are also flattened.
E.g. given {"a": {"b": [0,1,2]}} as input, the output would be:
{
"a.b.0": 0,
"a.b.1": 1,
"a.b.2": 2
}
If any of the keys in the original JSON contain periods, then key collisions are possible; such collisions will generally result in the loss of a value. This would happen, for example, with the following input:
{"a.b":0, "a": {"b": 1}}
Here is a solution that uses tostream, select, join, reduce and setpath
reduce ( tostream | select(length==2) | .[0] |= [join(".")] ) as [$p,$v] (
{}
; setpath($p; $v)
)
I've recently written a script called jqg that flattens arbitrarily complex JSON and searches the results using a regex; to simply flatten the JSON, your regex would be '.', which matches everything. Unlike the answers above, the script will handle embedded arrays, false and null values, and can optionally treat empty arrays and objects ([] & {}) as leaf nodes.
$ jq . test/odd-values.json
{
"one": {
"start-string": "foo",
"null-value": null,
"integer-number": 101
},
"two": [
{
"two-a": {
"non-integer-number": 101.75,
"number-zero": 0
},
"true-boolean": true,
"two-b": {
"false-boolean": false
}
}
],
"three": {
"empty-string": "",
"empty-object": {},
"empty-array": []
},
"end-string": "bar"
}
$ jqg . test/odd-values.json
{
"one.start-string": "foo",
"one.null-value": null,
"one.integer-number": 101,
"two.0.two-a.non-integer-number": 101.75,
"two.0.two-a.number-zero": 0,
"two.0.true-boolean": true,
"two.0.two-b.false-boolean": false,
"three.empty-string": "",
"three.empty-object": {},
"three.empty-array": [],
"end-string": "bar"
}
jqg was tested using jq 1.6
Note: I am the author of the jqg script.
As it turns out, curl -XPOST 'http://localhost:8983/solr/flat/update/json/docs' -d #json_file does just this:
{
"a.b":[1],
"id":"24e3e780-3a9e-4fa7-9159-fc5294e803cd",
"_version_":1535841499921514496
}
EDIT 1: solr 6.0.1 with bin/solr -e cloud. collection name is flat, all the rest are default (with data-driven-schema which is also default).
EDIT 2: The final script I used: find . -name '*.json' -exec curl -XPOST 'http://localhost:8983/solr/collection1/update/json/docs' -d #{} \;.
EDIT 3: Is is also possible to parallel with xargs and to add the id field with jq: find . -name '*.json' -print0 | xargs -0 -n 1 -P 8 -I {} sh -c "cat {} | jq '. + {id: .a.b}' | curl -XPOST 'http://localhost:8983/solr/collection/update/json/docs' -d #-" where -P is the parallelism factor. I used jq to set an id so multiple uploads of the same document won't create duplicates in the collection (when I searched for the optimal value of -P it created duplicates in the collection)
As #hraban mentioned, leaf_paths does not work as expected (furthermore, it is deprecated). leaf_paths is equivalent to paths(scalars), it returns the paths of any values for which scalars returns a truthy value. scalars returns its input value if it is a scalar, or null otherwise. The problem with that is that null and false are not truthy values, so they will be removed from the output. The following code does work, by checking the type of the values directly:
. as $in
| reduce paths(type != "object" and type != "array") as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })