Numeric argument passed with jq --arg not matching data with == - json

Here is a sample JSON response from my curl:
{
"success": true,
"message": "jobStatus",
"jobStatus": [
{
"ID": 9,
"status": "Successful"
},
{
"ID": 2,
"status": "Successful"
},
{
"ID": 99,
"status": "Failed"
}
]
}
I want to check the status of ID=2. Here is the command I tried:
cat test.txt|jq --arg v "2" '.jobStatus[]|select(.ID == $v)|.status'
response: there is none
I tried value 2 without quotes and still no result.
By contrast, if I try the command with a literal 2, it works:
cat test.txt | jq '.jobStatus[]|select(.ID == 2)|.status'
response:
"Successful"
I'm stuck. Can anyone help me identify the problem?

jq is data-type-aware:
.ID, as defined in the JSON input, is a number,
but any command-line argument passed with --arg (such as v here) is invariably a string (whether you quote the value or not),
so, in order to compare them, you must use an explicit type conversion, such as with tonumber/1:
jq --arg v '2' '.jobStatus[] | select(.ID == ($v | tonumber)) | .status' test.txt
Given that you're only passing a scalar argument here, the following solution, using --argjson (jq v1.5+) is a bit of an overkill, but it is an alternative to explicit type conversion in that passing a JSON argument in effect passes typed data:
jq --argjson v '{ "ID": 2 }' '.jobStatus[] | select(.ID == $v.ID) | .status' test.txt
peak's answer demonstrates that even --argjson v 2 works (in which case comparing to $v works directly), which is certainly the most concise solution, but may require an explanation:
Even though 2 may not look like JSON, it is: it is a valid JSON text containing a single value of type number (see json.org).
Specifically, it is the fact that 2 is an unquoted token that starts with a digit that makes it a number in the context of JSON (the JSON string-value equivalent is "2", which from the shell would have to be passed as '"2"' - note the embedded double quotes).
Therefore jq interprets --argjson -v 2 as a number, and comparison .ID == $v works as intended (note that the same applies to --argjson -v '2' / --argjson -v "2", where the shell removes the quotes before jq sees the value).
By contrast, anything you pass with --arg is always a string value that is used as-is.
In other words: --argjson, whose purpose is to accept arbitrary JSON texts as strings (such as '{ "ID": 2 }' in the example above), can also be used to pass number-string scalars to force their interpretation as numbers.
The same technique also works with Boolean strings true and false.
Tip of the hat to peak for his help.

Assuming you want to check for the JSON value 2, you have a choice to make - either convert the argument of --arg to a number, or use --argjson with a numeric argument. These alternatives are illustrated by the following:
jq --arg v 2 '.jobStatus[] | select(.ID == ($v|tonumber) | .status'
jq --argjson v 2 '.jobStatus[] | select(.ID == $v) | .status'
Note that --argjson requires a relatively recent version of jq.
Of course, if you want to "normalize" .ID so that it's always treated as a string, you could write:
jq --arg v 2 '.jobStatus[] | select((.ID|tostring) == $v) | .status'

Related

jq get value or default if nested key not present

I would like to make a small jq-like function that does get-or-default-if-not-present similar to Python's dict.get(key, default). This is the desired behavior:
% echo '{"nested": {"key": "value", "tricky": null}}' > file.json
% my-jq nested.key \"default\" file.json
"value"
% my-jq nested.tricky \"default\" file.json
null
% my-jq nested.dne \"default\" file.json
"default"
I have tried playing with this answer to a similar question but it doesn't work for nested keys. Does anyone have a suggestion?
function my-jq () {
jq --arg key "$1" --arg default "$2" \
'if has($key) then .[$key] else $default | fromjson end' "$3"
}
Based on your projection to make a "jq-like" function, it's fair to assume that your parameter should then be considered not just a path expression but a general jq filter, i.e. code. With the current versions of jq, there is no option or other shorthand method that works for code similar to how --arg and --argjson do for data.
You can, however, import code fragments by means of jq's library/module system, but with the task at hand you'd need to store the code from your function's parameter into a (temporary) file, reference it in the static part of the actual jq filter, and delete the file afterwards. This not only is cumbersome, it also needs some static overhead in the module file, which inevitably opens up Pandora's box labeled "Code injection", so you could just as well submit to the unleashed evil, and compose the actual jq filter on the fly using the literal (and potentially malicious) content of the parameter. (Note that this assumes a valid jq expression, thus using .nested.key etc. with a dot up front):
function my-jq() { jq "($1) // ($2)" "$3"; }
% my-jq .nested.key \"default\" file.json
"value"
my-jq .nested.tricky \"default\" file.json # fails
"default"
% my-jq .nested.dne \"default\" file.json
"default"
This minimal approach uses the alternative operator //, which fails to tell an actual but falsy value ("null" or "false") apart from an empty stream (missing value). To counteract that, you could perform a check on the existence of the path input among the all the paths of the base document. This drastically reduces the kind of filters trivially accepted by your function (which with your use case in mind may even be considered a good thing, yet malicious injection is still possible), and the comparison with all paths may come with a performance penalty for base documents with complex structuring, but it meets your three test cases:
function my-jq() { jq "if any(path($1) == paths; .) then ($1) else ($2) end" "$3"; }
% my-jq .nested.key \"default\" file.json
"value"
my-jq .nested.tricky \"default\" file.json
null
% my-jq .nested.dne \"default\" file.json
"default"
Eventually, the potential performance penalty could be mitigated by combining both approaches, i.e. starting off with the faster first one for the general case, but reverting to the possibly slower second one if the first one produced an ambiguous falsy value:
function my-jq() { jq "($1) // if any(path($1) == paths; .) then ($1) else ($2) end" "$3"; }
% my-jq .nested.key \"default\" file.json
"value"
my-jq .nested.tricky \"default\" file.json
null
% my-jq .nested.dne \"default\" file.json
"default"
You can't use a dotted path as a sequence of keys. You can turn such a path into a valid path for use with the getpath function, however. (There may be a cleaner, more robust way to do this.) The // operator provides for an alternate value should the left-hand side produce false or null.
$ jq 'getpath($key | ltrimstr(".") | split(".")) // $default' file.json --arg key .nested.key --arg default foobar
"value"
$ jq 'getpath($key | ltrimstr(".") | split(".")) // $default' file.json --arg key .nested.dne --arg default foobar
"foobar"
$key | ltrimstr(".") | split(".") first gets rid of the leading ., then splits the remaining string on the remaining . to produce a list of separate keys. getpath produces a filter using that list of keys; getpath(["nested", "key"]) is equivalent (AFAIK) to .nested.key.
Your requirements go against JSON's grain a bit, but here is one possible solution, or the basis of a family of solutions.
This particular solution assumes that the path is given in array form:
function my-jq () {
jq -c --argjson key "$1" --arg default "$2" '
first(tostream | select(length == 2 and .[0] == $key)) // null
| if . then .[1] else $default end
'
}
Examples:
echo '{"nested": {"key": "value", "tricky": null}}' | my-jq2 '["nested","key"]' haha
"value"
echo '{"nested": {"key": "value", "tricky": null}}' | my-jq2 '["nested","nokey"]' haha
"haha"
echo '{"nested": {"key": "value", "tricky": null}}' | my-jq2 '["nested","tricky"]' haha
null

Extracting values using jq streaming

I am trying to extract the values from a top-level JSON object using streaming with jq. For the sake of illustration, this is what the data look like (the actual data are rather large, hence needing to use streaming):
{
"empty": null,
"name": "John Smith",
"sex": "male",
"age": 51,
"hobbies": [
"running",
"kayaking",
"camping",
"foraging"
]
}
Without streaming it's easy to get what I need:
$ jq ".name" sample.json
"John Smith"
$ jq ".age" sample.json
51
$ jq ".hobbies" sample.json
[
"running",
"kayaking",
"camping",
"foraging"
]
When I use streaming I can get the value for the "hobbies" key:
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "hobbies")))' <sample.json
["running","kayaking","camping","foraging"]
But using the analogous command for the "name" or "age" keys gives an empty result:
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "name")))' <sample.json
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "age")))' <sample.json
I suspect that this is because the value is a scalar. But I'm not sure that this is the reason and, even if I was, I'm not sure how to use that information.
I discovered the debug operation which seems to yield some light on the situation.
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "hobbies") | debug))' <sample.json
["DEBUG:",[["hobbies",0],"running"]]
["DEBUG:",[["hobbies",1],"kayaking"]]
["DEBUG:",[["hobbies",2],"camping"]]
["DEBUG:",[["hobbies",3],"foraging"]]
["DEBUG:",[["hobbies",3]]]
["running","kayaking","camping","foraging"]
["DEBUG:",[["hobbies"]]]
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "name") | debug))' <sample.json
["DEBUG:",[["name"],"John Smith"]]
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "age") | debug))' <sample.json
["DEBUG:",[["age"],51]]
So it looks like these values are being selected, but they are just not making it through to the output.
Any suggestions would be appreciated! Thank you.
You need to understand how 1 | truncate_stream() works before subsequently applying other filter expressions. The truncate_stream() prefixed with a non-zero integer is used to remove paths specified by the integer in the streamed result.
e.g. if your original result produced the following[path, value] pairs
jq -cn --stream 'inputs' json
[["empty"],null]
[["name"],"John Smith"]
[["sex"],"male"]
[["age"],51]
[["hobbies",0],"running"]
[["hobbies",1],"kayaking"]
[["hobbies",2],"camping"]
[["hobbies",3],"foraging"]
[["hobbies",3]]
[["hobbies"]]
Truncation with 1 would remove the first element of each path provided. Those with the paths removed are completely discarded from the output
jq -cn --stream '1|truncate_stream(inputs)' json
[[0],"running"]
[[1],"kayaking"]
[[2],"camping"]
[[3],"foraging"]
[[3]]
Your original attempt worked because, the select expression was able to get the desired paths to hobbies, with the parent root key hobbies removed, retaining only a list of elements.
But the same doesn't work for age, as you cannot completely trim down the path away. Remove the ["age"] entry would leave a result as [[],51] leaving only the value field.
jq -cn --stream 'inputs|select(.[0][0] == "age")' json
[["age"],51]
If a level is provided to the above expression, i.e. 1|.. the age path would be completely removed, making the fromstream not construct your object back.
So for simple scalars, simply extract away the value from the indices as below without needing to use truncate at all
jq -cn --stream 'inputs|select(.[0][0] == "age")[1]'
51

Create JSON using jq from pipe-separated keys and values in bash

I am trying to create a json object from a string in bash. The string is as follows.
CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0
The output is from docker stats command and my end goal is to publish custom metrics to aws cloudwatch. I would like to format this string as json.
{
"CONTAINER":"nginx_container",
"CPU%":"0.02%",
....
}
I have used jq command before and it seems like it should work well in this case but I have not been able to come up with a good solution yet. Other than hardcoding variable names and indexing using sed or awk. Then creating a json from scratch. Any suggestions would be appreciated. Thanks.
Prerequisite
For all of the below, it's assumed that your content is in a shell variable named s:
s='CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0'
What (modern jq)
# thanks to #JeffMercado and #chepner for refinements, see comments
jq -Rn '
( input | split("|") ) as $keys |
( inputs | split("|") ) as $vals |
[[$keys, $vals] | transpose[] | {key:.[0],value:.[1]}] | from_entries
' <<<"$s"
How (modern jq)
This requires very new (probably 1.5?) jq to work, and is a dense chunk of code. To break it down:
Using -n prevents jq from reading stdin on its own, leaving the entirety of the input stream available to be read by input and inputs -- the former to read a single line, and the latter to read all remaining lines. (-R, for raw input, causes textual lines rather than JSON objects to be read).
With [$keys, $vals] | transpose[], we're generating [key, value] pairs (in Python terms, zipping the two lists).
With {key:.[0],value:.[1]}, we're making each [key, value] pair into an object of the form {"key": key, "value": value}
With from_entries, we're combining those pairs into objects containing those keys and values.
What (shell-assisted)
This will work with a significantly older jq than the above, and is an easily adopted approach for scenarios where a native-jq solution can be harder to wrangle:
{
IFS='|' read -r -a keys # read first line into an array of strings
## read each subsequent line into an array named "values"
while IFS='|' read -r -a values; do
# setup: positional arguments to pass in literal variables, query with code
jq_args=( )
jq_query='.'
# copy values into the arguments, reference them from the generated code
for idx in "${!values[#]}"; do
[[ ${keys[$idx]} ]] || continue # skip values with no corresponding key
jq_args+=( --arg "key$idx" "${keys[$idx]}" )
jq_args+=( --arg "value$idx" "${values[$idx]}" )
jq_query+=" | .[\$key${idx}]=\$value${idx}"
done
# run the generated command
jq "${jq_args[#]}" "$jq_query" <<<'{}'
done
} <<<"$s"
How (shell-assisted)
The invoked jq command from the above is similar to:
jq --arg key0 'CONTAINER' \
--arg value0 'nginx_container' \
--arg key1 'CPU%' \
--arg value1 '0.0.2%' \
--arg key2 'MEMUSAGE/LIMIT' \
--arg value2 '25.09MiB/15.26GiB' \
'. | .[$key0]=$value0 | .[$key1]=$value1 | .[$key2]=$value2' \
<<<'{}'
...passing each key and value out-of-band (such that it's treated as a literal string rather than parsed as JSON), then referring to them individually.
Result
Either of the above will emit:
{
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0"
}
Why
In short: Because it's guaranteed to generate valid JSON as output.
Consider the following as an example that would break more naive approaches:
s='key ending in a backslash\
value "with quotes"'
Sure, these are unexpected scenarios, but jq knows how to deal with them:
{
"key ending in a backslash\\": "value \"with quotes\""
}
...whereas an implementation that didn't understand JSON strings could easily end up emitting:
{
"key ending in a backslash\": "value "with quotes""
}
I know this is an old post, but the tool you seek is called jo: https://github.com/jpmens/jo
A quick and easy example:
$ jo my_variable="simple"
{"my_variable":"simple"}
A little more complex
$ jo -p name=jo n=17 parser=false
{
"name": "jo",
"n": 17,
"parser": false
}
Add an array
$ jo -p name=jo n=17 parser=false my_array=$(jo -a {1..5})
{
"name": "jo",
"n": 17,
"parser": false,
"my_array": [
1,
2,
3,
4,
5
]
}
I've made some pretty complex stuff with jo and the nice thing is that you don't have to worry about rolling your own solution worrying about the possiblity of making invalid json.
You can ask docker to give you JSON data in the first place
docker stats --format "{{json .}}"
For more on this, see: https://docs.docker.com/config/formatting/
JSONSTR=""
declare -a JSONNAMES=()
declare -A JSONARRAY=()
LOOPNUM=0
cat ~/newfile | while IFS=: read CONTAINER CPU MEMUSE MEMPC NETIO BLKIO PIDS; do
if [[ "$LOOPNUM" = 0 ]]; then
JSONNAMES=("$CONTAINER" "$CPU" "$MEMUSE" "$MEMPC" "$NETIO" "$BLKIO" "$PIDS")
LOOPNUM=$(( LOOPNUM+1 ))
else
echo "{ \"${JSONNAMES[0]}\": \"${CONTAINER}\", \"${JSONNAMES[1]}\": \"${CPU}\", \"${JSONNAMES[2]}\": \"${MEMUSE}\", \"${JSONNAMES[3]}\": \"${MEMPC}\", \"${JSONNAMES[4]}\": \"${NETIO}\", \"${JSONNAMES[5]}\": \"${BLKIO}\", \"${JSONNAMES[6]}\": \"${PIDS}\" }"
fi
done
Returns:
{ "CONTAINER": "nginx_container", "CPU%": "0.02%", "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB", "MEM%": "0.16%", "NETI/O": "0B/0B", "BLOCKI/O": "22.09MB/4.096kB", "PIDS": "0" }
Here is a solution which uses the -R and -s options along with transpose:
split("\n") # [ "CONTAINER...", "nginx_container|0.02%...", ...]
| (.[0] | split("|")) as $keys # [ "CONTAINER", "CPU%", "MEMUSAGE/LIMIT", ... ]
| (.[1:][] | split("|")) # [ "nginx_container", "0.02%", ... ] [ ... ] ...
| select(length > 0) # (remove empty [] caused by trailing newline)
| [$keys, .] # [ ["CONTAINER", ...], ["nginx_container", ...] ] ...
| [ transpose[] | {(.[0]):.[1]} ] # [ {"CONTAINER": "nginx_container"}, ... ] ...
| add # {"CONTAINER": "nginx_container", "CPU%": "0.02%" ...
json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}'
json_string=$(printf "$json_template" "nginx_container" "0.02%" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0")
echo "$json_string"
Not using jq but possible to use args and environment in values.
CONTAINER=nginx_container
json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}'
json_string=$(printf "$json_template" "$CONTAINER" "$1" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0")
echo "$json_string"
If you're starting with tabular data, I think it makes more sense to use something that works with tabular data natively, like sqawk to make it into json, and then use jq work with it further.
echo 'CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0' \
| sqawk -FS '[|]' -RS '\n' -output json 'select * from a' header=1 \
| jq '.[] | with_entries(select(.key|test("^a.*")|not))'
{
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0"
}
Without jq, sqawk gives a bit too much:
[
{
"anr": "1",
"anf": "7",
"a0": "nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0",
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0",
"a8": "",
"a9": "",
"a10": ""
}
]

Flatten nested JSON using jq

I'd like to flatten a nested json object, e.g. {"a":{"b":1}} to {"a.b":1} in order to digest it in solr.
I have 11 TB of json files which are both nested and contains dots in field names, meaning not elasticsearch (dots) nor solr (nested without the _childDocument_ notation) can digest it as is.
The other solutions would be to replace dots in the field names with underscores and push it to elasticsearch, but I have far better experience with solr therefore I prefer the flatten solution (unless solr can digest those nested jsons as is??).
I will prefer elasticsearch only if the digestion process will take far less time than solr, because my priority is digesting as fast as I can (thus I chose jq instead of scripting it in python).
Kindly help.
EDIT:
I think the pair of examples 3&4 solves this for me:
https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/
I'll try soon.
You can also use the following jq command to flatten nested JSON objects in this manner:
[leaf_paths as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries
The way it works is: leaf_paths returns a stream of arrays which represent the paths on the given JSON document at which "leaf elements" appear, that is, elements which do not have child elements, such as numbers, strings and booleans. We pipe that stream into objects with key and value properties, where key contains the elements of the path array as a string joined by dots and value contains the element at that path. Finally, we put the entire thing in an array and run from_entries on it, which transforms an array of {key, value} objects into an object containing those key-value pairs.
This is just a variant of Santiago's jq:
. as $in
| reduce leaf_paths as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })
It avoids the overhead of the key/value construction and destruction.
(If you have access to a version of jq later than jq 1.5, you can omit the "map(tostring)".)
Two important points about both these jq solutions:
Arrays are also flattened.
E.g. given {"a": {"b": [0,1,2]}} as input, the output would be:
{
"a.b.0": 0,
"a.b.1": 1,
"a.b.2": 2
}
If any of the keys in the original JSON contain periods, then key collisions are possible; such collisions will generally result in the loss of a value. This would happen, for example, with the following input:
{"a.b":0, "a": {"b": 1}}
Here is a solution that uses tostream, select, join, reduce and setpath
reduce ( tostream | select(length==2) | .[0] |= [join(".")] ) as [$p,$v] (
{}
; setpath($p; $v)
)
I've recently written a script called jqg that flattens arbitrarily complex JSON and searches the results using a regex; to simply flatten the JSON, your regex would be '.', which matches everything. Unlike the answers above, the script will handle embedded arrays, false and null values, and can optionally treat empty arrays and objects ([] & {}) as leaf nodes.
$ jq . test/odd-values.json
{
"one": {
"start-string": "foo",
"null-value": null,
"integer-number": 101
},
"two": [
{
"two-a": {
"non-integer-number": 101.75,
"number-zero": 0
},
"true-boolean": true,
"two-b": {
"false-boolean": false
}
}
],
"three": {
"empty-string": "",
"empty-object": {},
"empty-array": []
},
"end-string": "bar"
}
$ jqg . test/odd-values.json
{
"one.start-string": "foo",
"one.null-value": null,
"one.integer-number": 101,
"two.0.two-a.non-integer-number": 101.75,
"two.0.two-a.number-zero": 0,
"two.0.true-boolean": true,
"two.0.two-b.false-boolean": false,
"three.empty-string": "",
"three.empty-object": {},
"three.empty-array": [],
"end-string": "bar"
}
jqg was tested using jq 1.6
Note: I am the author of the jqg script.
As it turns out, curl -XPOST 'http://localhost:8983/solr/flat/update/json/docs' -d #json_file does just this:
{
"a.b":[1],
"id":"24e3e780-3a9e-4fa7-9159-fc5294e803cd",
"_version_":1535841499921514496
}
EDIT 1: solr 6.0.1 with bin/solr -e cloud. collection name is flat, all the rest are default (with data-driven-schema which is also default).
EDIT 2: The final script I used: find . -name '*.json' -exec curl -XPOST 'http://localhost:8983/solr/collection1/update/json/docs' -d #{} \;.
EDIT 3: Is is also possible to parallel with xargs and to add the id field with jq: find . -name '*.json' -print0 | xargs -0 -n 1 -P 8 -I {} sh -c "cat {} | jq '. + {id: .a.b}' | curl -XPOST 'http://localhost:8983/solr/collection/update/json/docs' -d #-" where -P is the parallelism factor. I used jq to set an id so multiple uploads of the same document won't create duplicates in the collection (when I searched for the optimal value of -P it created duplicates in the collection)
As #hraban mentioned, leaf_paths does not work as expected (furthermore, it is deprecated). leaf_paths is equivalent to paths(scalars), it returns the paths of any values for which scalars returns a truthy value. scalars returns its input value if it is a scalar, or null otherwise. The problem with that is that null and false are not truthy values, so they will be removed from the output. The following code does work, by checking the type of the values directly:
. as $in
| reduce paths(type != "object" and type != "array") as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })

jq: selecting a subset of keys from an object

Given an input json string of keys from an array, return an object with only the entries that had keys in the original object and in the input array.
I have a solution but I think that it isn't elegant ({($k):$input[$k]} feels especially clunky...) and that this is a chance for me to learn.
jq -n '{"1":"a","2":"b","3":"c"}' \
| jq --arg keys '["1","3","4"]' \
'. as $input
| ( $keys | fromjson )
| map( . as $k
| $input
| select(has($k))
| {($k):$input[$k]}
)
| add'
Any ideas how to clean this up?
I feel like Extracting selected properties from a nested JSON object with jq is a good starting place but i cannot get it to work.
solution with inside check:
jq 'with_entries(select([.key] | inside(["key1", "key2"])))'
the inside operator works for most of time; however, I just found the inside operator has side effect, sometimes it selected keys not desired, suppose input is { "key1": val1, "key2": val2, "key12": val12 } and select by inside(["key12"]) it will select both "key1" and "key12"
use the in operator if need an exact match: like this will select .key2 and .key12 only
jq 'with_entries(select(.key | in({"key2":1, "key12":1})))'
because the in operator checks key from an object only (or index exists? from an array), here it has to be written in an object syntax, with desired keys as keys, but values do not matter; the use of in operator is not a perfect one for this purpose, I would like to see the Javascript ES6 includes API's reverse version to be implemented as jq builtin
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/includes
jq 'with_entries(select(.key | included(["key2", "key12"])))'
to check an item .key is included? from an array
You can use this filter:
with_entries(
select(
.key as $k | any($keys | fromjson[]; . == $k)
)
)
Here is some additional clarification
For the input object {"key1":1, "key2":2, "key3":3} I would like to drop all keys that are not in the set of desired keys ["key1","key3","key4"]
jq -n --argjson desired_keys '["key1","key3","key4"]' \
--argjson input '{"key1":1, "key2":2, "key3":3}' \
' $input
| with_entries(
select(
.key == ($desired_keys[])
)
)'
with_entries converts {"key1":1, "key2":2, "key3":3} into the following array of key value pairs and maps the select statement on the array and then turns the resulting array back into an object.
Here is the inner object in the with_entries statement.
[
{
"key": "key1",
"value": 1
},
{
"key": "key2",
"value": 2
},
{
"key": "key3",
"value": 3
}
]
we can then select the keys from this array that meet our criteria.
This is where the magic happens... here is a look at whats going on in the middle of this command. The following command takes the expanded array of values and turns them into a list of objects that we can select from.
jq -cn '{"key":"key1","value":1}, {"key":"key2","value":2}, {"key":"key3","value":3}
| select(.key == ("key1", "key3", "key4"))'
This will yield the following result
{"key":"key1","value":1}
{"key":"key3","value":3}
The with entries command can be a little tricky but its easy to remember that it takes a filter and is defined as follows
def with_entries(f): to_entries|map(f)|from_entries;
This is the same as
def with_entries(f): [to_entries[] | f] | from_entries;
The other part of the question that confuses people is the multiple matches on the right hand side of the ==
Consider the following command. We see the output is an outer production of all the left hand lists and the right hand lists.
jq -cn '1,2,3| . == (1,1,3)'
true
true
false
false
false
false
false
false
true
If that predicate is in a select statement, we keep the input when the predicate is true. Note you can duplicate the inputs here too.
jq -cn '1,2,3| select(. == (1,1,3))'
1
1
3
Jeff's answer has a couple of unnecessary inefficiencies, both of which are addressed by the following, on the assumption that --argjson keys is used instead of --arg keys:
with_entries( select( .key as $k | $keys | index($k) ) )
Even better, if your jq has IN:
with_entries(select(.key | IN($keys[])))
If you are sure that all keys in the input array are present in the original object, you can use the object construction shortcut.
$ echo '{"1":"a","2":"b","3":"c"}' | jq '{"1", "3"}'
{
"1": "a",
"3": "c"
}
Numbers should be quoted to force jq to interpret them as keys instead of literals. In the case of keys not resembling a number, quotes are not needed:
$ echo '{"key1":"a","key2":"b","key3":"c"}' | jq '{key1, key3}'
{
"key1": "a",
"key3": "c"
}
Adding a non-existent key will yield a null value, unlikely what OP wanted:
$ echo '{"1":"a","2":"b","3":"c"}' | jq '{"1", "3", "4"}'
{
"1": "a",
"3": "c",
"4": null
}
but those can be filtered out:
$ echo '{"1":"a","2":"b","3":"c"}' | jq '{"1", "3", "4"} | with_entries(select(.value != null))'
{
"1": "a",
"3": "c"
}
Although this answer doesn't receive a valid input json array as OP asked, I find it useful for just filtering some keys you know are present.
An example usecase: get aud and iss from a JWT. The following is very succint:
echo "jwt-as-json" | jq '{aud, iss}'