I am having trouble accessing bash variable inside 'jq'.
The snippet below shows my bash loop to check for missing keys in a Json file.
#!/bin/sh
for key in "key1" "key2.key3"; do
echo "$key"
if ! cat ${JSON_FILE} | jq --arg KEY "$key" -e '.[$KEY]'; then
missingKeys+=${key}
fi
done
JSON_FILE:
{
"key1": "val1",
"key2": {
"key3": "val3"
}
}
The script works correctly for top level keys such as "key1". But it does not work correctly (returns null) for "key2.key3".
'jq' on the command line does return the correct value
cat input.json | jq '.key2.key3'
"val3"
I followed answers from other posts to come to this solution. However can't seem to figure out why it does not work for nested json keys.
Using --arg prevents your data from being incorrectly parsed as syntax. Usually, a shell variable you're passing into jq contains literal data, so this is the correct thing.
In this case, your variable contains syntax, not literal data: The . isn't part of the string you want to do a lookup by, but is instead an instruction to jq to do two separate lookups one after the other.
So, in this case, you should do the more obvious thing, instead of using --arg:
jq -e ".$KEY"
Related
I have a json file test.json with the content:
[
{
"name": "Akshay",
"id": "234"
},
{
"name": "Amit",
"id": "28"
}
]
I have a shell script with content:
#!/bin/bash
function display
{
echo "name is $1 and id is $2"
}
cat test.json | jq '.[].name,.[].id' | while read line; do display $line; done
I want name and id of a single item to be passed together as arguments to the function display but the output is something like this :
name is "Akshay" and id is
name is "Amit" and id is
name is "234" and id is
name is "28" and id is
What should be the correct way to implement the code?
PS: I specifically want to use jq so please base the answer in terms of jq
Two major issues, and some additional items that may not matter for your current example use case but can be important when you're dealing with real-world data from untrusted sources:
Your current code iterates over all names before writing any ids.
Your current code uses newline separators, but doesn't make any effort to read multiple lines into each while loop iteration.
Your code uses newline separators, but newlines can be present inside strings; consequently, this is constraining the input domain.
When you pipe into a while loop, that loop is run in a subshell; when the pipeline exits, the subshell does too, so any variables set by the loop are lost.
Starting up a copy of /bin/cat and making jq read a pipe from its output is silly and inefficient compared to letting jq read from test.json directly.
We can fix all of those:
To write names and ids in pairs, you'd want something more like jq '.[] | (.name, .id)'
To read both a name and an id for each element of the loop, you'd want while IFS= read -r name && IFS= read -r id; do ... to iterate over those pairs.
To switch from newlines to NULs (the NUL being the only character that can't exist in a C string, or thus a bash string), you'd want to use the -j argument to jq, and then add explicit "\u0000" elements to the content being written. To read this NUL-delimited content on the bash side, you'd need to add the -d '' argument to each read.
To move the while read loop out of the subshell, we can use process substitution, as described in BashFAQ #24.
To let jq read directly from test.json, use either <test.json to have the shell connect the file directly to jq's stdin, or pass the filename on jq's command line.
Doing everything described above in a manner robust against input data containing JSON-encoded NULs would look like the following:
#!/bin/bash
display() {
echo "name is $1 and id is $2"
}
cat >test.json <<'EOF'
[
{ "name": "Akshay", "id": "234" },
{ "name": "Amit", "id": "28" }
]
EOF
while IFS= read -r -d '' name && IFS= read -r -d '' id; do
display "$name" "$id"
done < <(jq -j '
def stripnuls: sub("\u0000"; "<NUL>");
.[] | ((.name | stripnuls), "\u0000", (.id | stripnuls), "\u0000")
' <test.json)
You can see the above running at https://replit.com/#CharlesDuffy2/BelovedForestgreenUnits#main.sh
You can use string interpolation.
jq '.[] | "The name is \(.name) and id \(.id)"'
Result:
"The name is Akshay and id 234"
"The name is Amit and id 28"
"The name is hi and id 28"
If you want to get rid of the double-quotes from each object, then:
jq --raw-output '.[] | "The name is \(.name) and is \(.id)"'
https://jqplay.org/s/-lkpHROTBk0
Assume the following JSON file
{
"foo": "hello",
"bar": "world"
}
I want to get the foo field from the JSON object in a standalone object, and I do this:
<file jq '{foo}'
{
"foo": "hello"
}
Now the field I actually want is coming from the shell and is given to jq as an argument like this:
<file jq --arg myarg "foo" '{$myarg}'
{
"myarg": "foo"
}
Unfortunately this doesn't give the expected result {"foo":"hello"}.
Any idea why the name of the variable gets into the object?
A workaround to this is to explicitly defined the object:
<file jq '{($myarg):.[$myarg]}'
Fine, but is there a way to use the shortcut syntax as explained in the man page, but with a variable ?
You can use this to select particular fields of an object: if the input is an object with “user”, “title”, “id”, and “content” fields and you just want “user” and “title”, you can write
{user: .user, title: .title}
Because that is so common, there’s a shortcut syntax for it: {user, title}.
If that matters, I'm using jq version 1.5
In short, no. The shortcut syntax can only be used under very special conditions. For example, it cannot be used with key names that are jq keywords.
Alternatives
The method described in the Q is the preferred one, but for the record, here are two alternatives:
jq --arg myarg "foo" '
.[$myarg] as $v | {} | .[$myarg] = $v'
And of course there's the alternative that comes with numerous caveats:
myarg=foo ; jq "{ $myarg }"
I have a json object named version6json as follows
{
"20007.098": {
"os_version": "6.9",
"kernel": "2.6.32-696",
"sfdc-release": "2017.08"
},
"200907.09678”: {
"os_version": "6.9",
"kernel": "2.6.32-696",
"sfdc-release": "201.7909"
},
"206727.1078”: {
"os_version": "6.9",
"kernel": "2.6.32-696.10.2.el6.x86_64",
"sfdc-release": "20097.109”
}
}
I want to add one more key value pair. The key is also a variable and the value too. bundle_release="2019.78" and value= {"release":"2018.1006","kernel":"2.6.32-754.3.5.el6.x86_64","os":"6.10","current":true}
Now I want the bundle_release as key and value as its value, So the new entry would be "2018.1006": {"release":"2018.1006","kernel":"2.6.32-754.3.5.el6.x86_64","os":"6.10","current":true}
To achieve this, I am doing the folllowing
echo "$version6json" | jq --arg "$bundle_release" "$value" '. + {$bundle_release: "${value}"}'
Any help will be appriciated.
P.S- The question is edited as suggested by peak
First, when specifying a key name using a variable in the way you are doing, the variable must be parenthesized, so you would have:
{($bundle_release): ...}
Next, jq variables are not the same as shell variables and should be specified without quoting them, and without using bash-isms.
Third, when setting the value of the shell variable named value, you would have to quote the expression appropriately.
Fourth, to simplify things, use --argjson for $value.
Fifth, your sample JSON is not quite right. Once it's fixed, the following will work in a bash or bash-like environment (assuming you're using a version of jq that supports --argjson):
bundle_release="1034,567"
value='{"release":"2018.1006","kernel":"2.6.32-754.3.5.el6.x86_64","os":"6.10","current":true}'
jq --arg b "$bundle_release" --argjson v "$value" '
. + {($b): $v}' <<< "$version6json"
You're not giving the --arg option enough parameters: from the manual:
--arg name value:
This option passes a value to the jq program as a predefined variable. If you run jq with --arg foo bar, then
$foo is available in the program and has the value "bar". Note that value will be treated as a string, so
--arg foo 123 will bind $foo to "123".
I want to modify a JSON file by using the Linux command line.
I tried these steps:
[root#localhost]# INPUT="dsa"
[root#localhost]# echo $INPUT
dsa
[root#localhost]# CONF_FILE=test.json
[root#localhost]# echo $CONF_FILE
test.json
[root#localhost]# cat $CONF_FILE
{
"global" : {
"name" : "asd",
"id" : 1
}
}
[root#localhost]# jq -r '.global.name |= '""$INPUT"" $CONF_FILE > tmp.$$.json && mv tmp.$$.json $CONF_FILE
jq: error: dsa/0 is not defined at <top-level>, line 1:
.global.name |= dsa
jq: 1 compile error
Desired output:
[root#localhost]# cat $CONF_FILE
{ "global" : {
"name" : "dsa",
"id" : 1 } }
Your only problem was that the script passed to jq was quoted incorrectly.
In your particular case, using a single double-quoted string with embedded \-escaped " instances is probably simplest:
jq -r ".global.name = \"$INPUT\"" "$CONF_FILE" > tmp.$$.json && mv tmp.$$.json "$CONF_FILE"
Generally, however, chepner's helpful answer shows a more robust alternative to embedding the shell variable reference directly in the script: Using the --arg option to pass a value as a jq variable allows single-quoting the script, which is preferable, because it avoids confusion over what elements are expanded by the shell up front and obviates the need for escaping $ instances that should be passed through to jq.
Also:
Just = is sufficient to assign the value; while |=, the so-called update operator, works too, it behaves the same as = in this instance, because the RHS is a literal, not an expression referencing the LHS - see the manual.
You should routinely double-quote your shell-variable references and you should avoid use of all-uppercase variable names in order to avoid conflicts with environment variables and special shell variables.
As for why your quoting didn't work:
'.global.name |= '""$INPUT"" is composed of the following tokens:
String literal .global.name |= (due to single-quoting)
String literal "" - i.e., the empty string - the quotes will be removed by the shell before jq sees the script
An unquoted reference to variable $INPUT (which makes its value subject to word-splitting and globbing).
Another instance of literal "".
With your sample value, jq ended up seeing the following string as its script:
.global.name |= dsa
As you can see, the double quotes are missing, causing jq to interpret dsa as a function name rather than a string literal, and since no argument was passed to (non-existent) function dsa, jq's error message referenced it as dsa/0 - a function with no (0) arguments.
It's much simpler and safer to pass the value using the --arg option:
jq -r --arg newname "$INPUT" '.global.name |= $newname' "$CONF_FILE"
This ensures that the exact value of $INPUT is used and quoted as a JSON value.
Using jq with a straight forward filter, should do it for you.
.global.name = "dsa"
i.e.
jq '.global.name = "dsa"' json-file
{
"global": {
"name": "dsa",
"id": 1
}
}
You can play around with your json-filters, here.
I'd like to flatten a nested json object, e.g. {"a":{"b":1}} to {"a.b":1} in order to digest it in solr.
I have 11 TB of json files which are both nested and contains dots in field names, meaning not elasticsearch (dots) nor solr (nested without the _childDocument_ notation) can digest it as is.
The other solutions would be to replace dots in the field names with underscores and push it to elasticsearch, but I have far better experience with solr therefore I prefer the flatten solution (unless solr can digest those nested jsons as is??).
I will prefer elasticsearch only if the digestion process will take far less time than solr, because my priority is digesting as fast as I can (thus I chose jq instead of scripting it in python).
Kindly help.
EDIT:
I think the pair of examples 3&4 solves this for me:
https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/
I'll try soon.
You can also use the following jq command to flatten nested JSON objects in this manner:
[leaf_paths as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries
The way it works is: leaf_paths returns a stream of arrays which represent the paths on the given JSON document at which "leaf elements" appear, that is, elements which do not have child elements, such as numbers, strings and booleans. We pipe that stream into objects with key and value properties, where key contains the elements of the path array as a string joined by dots and value contains the element at that path. Finally, we put the entire thing in an array and run from_entries on it, which transforms an array of {key, value} objects into an object containing those key-value pairs.
This is just a variant of Santiago's jq:
. as $in
| reduce leaf_paths as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })
It avoids the overhead of the key/value construction and destruction.
(If you have access to a version of jq later than jq 1.5, you can omit the "map(tostring)".)
Two important points about both these jq solutions:
Arrays are also flattened.
E.g. given {"a": {"b": [0,1,2]}} as input, the output would be:
{
"a.b.0": 0,
"a.b.1": 1,
"a.b.2": 2
}
If any of the keys in the original JSON contain periods, then key collisions are possible; such collisions will generally result in the loss of a value. This would happen, for example, with the following input:
{"a.b":0, "a": {"b": 1}}
Here is a solution that uses tostream, select, join, reduce and setpath
reduce ( tostream | select(length==2) | .[0] |= [join(".")] ) as [$p,$v] (
{}
; setpath($p; $v)
)
I've recently written a script called jqg that flattens arbitrarily complex JSON and searches the results using a regex; to simply flatten the JSON, your regex would be '.', which matches everything. Unlike the answers above, the script will handle embedded arrays, false and null values, and can optionally treat empty arrays and objects ([] & {}) as leaf nodes.
$ jq . test/odd-values.json
{
"one": {
"start-string": "foo",
"null-value": null,
"integer-number": 101
},
"two": [
{
"two-a": {
"non-integer-number": 101.75,
"number-zero": 0
},
"true-boolean": true,
"two-b": {
"false-boolean": false
}
}
],
"three": {
"empty-string": "",
"empty-object": {},
"empty-array": []
},
"end-string": "bar"
}
$ jqg . test/odd-values.json
{
"one.start-string": "foo",
"one.null-value": null,
"one.integer-number": 101,
"two.0.two-a.non-integer-number": 101.75,
"two.0.two-a.number-zero": 0,
"two.0.true-boolean": true,
"two.0.two-b.false-boolean": false,
"three.empty-string": "",
"three.empty-object": {},
"three.empty-array": [],
"end-string": "bar"
}
jqg was tested using jq 1.6
Note: I am the author of the jqg script.
As it turns out, curl -XPOST 'http://localhost:8983/solr/flat/update/json/docs' -d #json_file does just this:
{
"a.b":[1],
"id":"24e3e780-3a9e-4fa7-9159-fc5294e803cd",
"_version_":1535841499921514496
}
EDIT 1: solr 6.0.1 with bin/solr -e cloud. collection name is flat, all the rest are default (with data-driven-schema which is also default).
EDIT 2: The final script I used: find . -name '*.json' -exec curl -XPOST 'http://localhost:8983/solr/collection1/update/json/docs' -d #{} \;.
EDIT 3: Is is also possible to parallel with xargs and to add the id field with jq: find . -name '*.json' -print0 | xargs -0 -n 1 -P 8 -I {} sh -c "cat {} | jq '. + {id: .a.b}' | curl -XPOST 'http://localhost:8983/solr/collection/update/json/docs' -d #-" where -P is the parallelism factor. I used jq to set an id so multiple uploads of the same document won't create duplicates in the collection (when I searched for the optimal value of -P it created duplicates in the collection)
As #hraban mentioned, leaf_paths does not work as expected (furthermore, it is deprecated). leaf_paths is equivalent to paths(scalars), it returns the paths of any values for which scalars returns a truthy value. scalars returns its input value if it is a scalar, or null otherwise. The problem with that is that null and false are not truthy values, so they will be removed from the output. The following code does work, by checking the type of the values directly:
. as $in
| reduce paths(type != "object" and type != "array") as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })