jq: selecting a subset of keys from an object - json

Given an input json string of keys from an array, return an object with only the entries that had keys in the original object and in the input array.
I have a solution but I think that it isn't elegant ({($k):$input[$k]} feels especially clunky...) and that this is a chance for me to learn.
jq -n '{"1":"a","2":"b","3":"c"}' \
| jq --arg keys '["1","3","4"]' \
'. as $input
| ( $keys | fromjson )
| map( . as $k
| $input
| select(has($k))
| {($k):$input[$k]}
)
| add'
Any ideas how to clean this up?
I feel like Extracting selected properties from a nested JSON object with jq is a good starting place but i cannot get it to work.

solution with inside check:
jq 'with_entries(select([.key] | inside(["key1", "key2"])))'

the inside operator works for most of time; however, I just found the inside operator has side effect, sometimes it selected keys not desired, suppose input is { "key1": val1, "key2": val2, "key12": val12 } and select by inside(["key12"]) it will select both "key1" and "key12"
use the in operator if need an exact match: like this will select .key2 and .key12 only
jq 'with_entries(select(.key | in({"key2":1, "key12":1})))'
because the in operator checks key from an object only (or index exists? from an array), here it has to be written in an object syntax, with desired keys as keys, but values do not matter; the use of in operator is not a perfect one for this purpose, I would like to see the Javascript ES6 includes API's reverse version to be implemented as jq builtin
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/includes
jq 'with_entries(select(.key | included(["key2", "key12"])))'
to check an item .key is included? from an array

You can use this filter:
with_entries(
select(
.key as $k | any($keys | fromjson[]; . == $k)
)
)

Here is some additional clarification
For the input object {"key1":1, "key2":2, "key3":3} I would like to drop all keys that are not in the set of desired keys ["key1","key3","key4"]
jq -n --argjson desired_keys '["key1","key3","key4"]' \
--argjson input '{"key1":1, "key2":2, "key3":3}' \
' $input
| with_entries(
select(
.key == ($desired_keys[])
)
)'
with_entries converts {"key1":1, "key2":2, "key3":3} into the following array of key value pairs and maps the select statement on the array and then turns the resulting array back into an object.
Here is the inner object in the with_entries statement.
[
{
"key": "key1",
"value": 1
},
{
"key": "key2",
"value": 2
},
{
"key": "key3",
"value": 3
}
]
we can then select the keys from this array that meet our criteria.
This is where the magic happens... here is a look at whats going on in the middle of this command. The following command takes the expanded array of values and turns them into a list of objects that we can select from.
jq -cn '{"key":"key1","value":1}, {"key":"key2","value":2}, {"key":"key3","value":3}
| select(.key == ("key1", "key3", "key4"))'
This will yield the following result
{"key":"key1","value":1}
{"key":"key3","value":3}
The with entries command can be a little tricky but its easy to remember that it takes a filter and is defined as follows
def with_entries(f): to_entries|map(f)|from_entries;
This is the same as
def with_entries(f): [to_entries[] | f] | from_entries;
The other part of the question that confuses people is the multiple matches on the right hand side of the ==
Consider the following command. We see the output is an outer production of all the left hand lists and the right hand lists.
jq -cn '1,2,3| . == (1,1,3)'
true
true
false
false
false
false
false
false
true
If that predicate is in a select statement, we keep the input when the predicate is true. Note you can duplicate the inputs here too.
jq -cn '1,2,3| select(. == (1,1,3))'
1
1
3

Jeff's answer has a couple of unnecessary inefficiencies, both of which are addressed by the following, on the assumption that --argjson keys is used instead of --arg keys:
with_entries( select( .key as $k | $keys | index($k) ) )
Even better, if your jq has IN:
with_entries(select(.key | IN($keys[])))

If you are sure that all keys in the input array are present in the original object, you can use the object construction shortcut.
$ echo '{"1":"a","2":"b","3":"c"}' | jq '{"1", "3"}'
{
"1": "a",
"3": "c"
}
Numbers should be quoted to force jq to interpret them as keys instead of literals. In the case of keys not resembling a number, quotes are not needed:
$ echo '{"key1":"a","key2":"b","key3":"c"}' | jq '{key1, key3}'
{
"key1": "a",
"key3": "c"
}
Adding a non-existent key will yield a null value, unlikely what OP wanted:
$ echo '{"1":"a","2":"b","3":"c"}' | jq '{"1", "3", "4"}'
{
"1": "a",
"3": "c",
"4": null
}
but those can be filtered out:
$ echo '{"1":"a","2":"b","3":"c"}' | jq '{"1", "3", "4"} | with_entries(select(.value != null))'
{
"1": "a",
"3": "c"
}
Although this answer doesn't receive a valid input json array as OP asked, I find it useful for just filtering some keys you know are present.
An example usecase: get aud and iss from a JWT. The following is very succint:
echo "jwt-as-json" | jq '{aud, iss}'

Related

Get value if object or string if string in jq array

I have a JSON object that looks like this:
[{"name":"NAME_1"},"NAME_2"]
I would like an output of
["NAME_1", "NAME_2"]
Some of the entries in the array are an object with a key "name" and some are just a string of the name. I am trying to extract an array of the names. Using
jq -cr '.[].name // []'
throws an error as it is trying to index .name of the string object. Is there a way to check if it is a string, and if so just use its value instead of .name?
echo '[{"name":"NAME_1"},"NAME_2"]' \
| jq '[ .[] | if (.|type) == "object" then .name else . end ]'
[
"NAME_1"
"NAME_2"
]
Ref:
https://stedolan.github.io/jq/manual/#ConditionalsandComparisons
https://stedolan.github.io/jq/manual/#type
As #LĂ©aGris comments, a simpler version
jq '[ .[] | .name? // . ]' file
https://stedolan.github.io/jq/manual/#ErrorSuppression/OptionalOperator:%3f
https://stedolan.github.io/jq/manual/#Alternativeoperator://
You can use the type function which returns "object" for objects.
jq '.[] | if type == "object" then .name else . end' file.json
To get the output as array, just wrap the whole expression into [ ... ].
Just use the error suppression operator with ?, map and scalars
jq 'map( .name?, scalars )'
Note that by using scalars, it is assumed that other than objects with name, all others are names of form NAME_*. If there are other strings as well, and you need to exclude some of them you might need to add some additional logic to do that. e.g. using startswith(..) with a string of your choice.
map( .name?, select( scalars | startswith("NAME") ) )
Demo
With your shown samples only, please try following jq code. Using tostream function here to get the required values from requirement.
jq -c '[.[] | tostream | if .[1] != null then .[1] else empty end]' Input_file

Why doesn't fromjson work as a map function on an array of objects?

I'm trying to iterate through an object and convert any value (top level only for now, no recursion) that is a valid json string to json.
I think the answer lies in using the correct incantation perhaps something like with_entries(.value |= try fromjson), but I'm having trouble getting it working. So I broke it down a bit to try something simpler.
How about the following list of objects - I just want to parse the value key of each of them if it is a string that yields valid json (let's ignore the invalid cases for now, they can return null).
So I tried this:
$ jq -n '[{key: "one", value: 1},{key: "two", value: "{\"object\":true}"}] | map(.value |= try fromjson)'
[
{
"key": "one"
},
{
"key": "two"
}
]
Values are both missing even though the two key is a valid json string.
But if I try the same with a simple array, it works as expected:
$ jq -n '[1, "two", "{\"three\":3}"] | .[] | fromjson?'
{
"three": 3
}
So my question is what I am doing wrong here?
Thanks in advance for any pointers.
You have come across one of the (somewhat well-known) deficiencies of jq, namely that the trio of map, |=, try (and therefore postfix ?) do not mix well.
The good news is that the following will work in jq 1.5 and later:
map(.value = (.value | . as $v | try fromjson catch $v))
or equivalently:
map(.value as $v | .value = try ($v|fromjson) catch $v)

Find out parent key when a certain child value is met with jq

Here's the json:
{
"vendors": {
"vendor1": {
"vendor_version": "LS TT1706-POL",
"vendor_name": "toyota"
},
"vendor2": {
"vendor_version": "LSGS-2002-RC",
"vendor_name": "honda"
},
"vendor3": {
"vendor_version": "LS1903",
"vendor_name": "suzuki"
}
}
}
I basically need the jq expression to get "vendor2" when I am given LSGS-2002-RC. I've tried using select, map, variables, and every combination thereof.
here is something that didnt work:
jq -r '.vendors|to_entries[]|.value|select(.vendor_version=="LSGS-2002-RC")'
Basically I always end up with the keys vendor1, vendor2, etc... stripped
I am a little stumped. Note that the json structure or values cannot be altered. Thanks
You almost had it, but the right filter should have been to use the select() function on the .value.vendor_version and pick out the key name
jq -r '.vendors | to_entries[] | select(.value.vendor_version=="LSGS-2002-RC").key'
Also don't pass in dynamic strings to the function, use placeholders like variables
jq -r --arg vendor "LSGS-2002-RC" '.vendors | to_entries[] | select(.value.vendor_version == $vendor).key'
An alternate, less readable version than select() would be to use keys[]
.vendors | keys[] as $k | if .[$k].vendor_version == "LSGS-2002-RC" then $k else empty end

Numeric argument passed with jq --arg not matching data with ==

Here is a sample JSON response from my curl:
{
"success": true,
"message": "jobStatus",
"jobStatus": [
{
"ID": 9,
"status": "Successful"
},
{
"ID": 2,
"status": "Successful"
},
{
"ID": 99,
"status": "Failed"
}
]
}
I want to check the status of ID=2. Here is the command I tried:
cat test.txt|jq --arg v "2" '.jobStatus[]|select(.ID == $v)|.status'
response: there is none
I tried value 2 without quotes and still no result.
By contrast, if I try the command with a literal 2, it works:
cat test.txt | jq '.jobStatus[]|select(.ID == 2)|.status'
response:
"Successful"
I'm stuck. Can anyone help me identify the problem?
jq is data-type-aware:
.ID, as defined in the JSON input, is a number,
but any command-line argument passed with --arg (such as v here) is invariably a string (whether you quote the value or not),
so, in order to compare them, you must use an explicit type conversion, such as with tonumber/1:
jq --arg v '2' '.jobStatus[] | select(.ID == ($v | tonumber)) | .status' test.txt
Given that you're only passing a scalar argument here, the following solution, using --argjson (jq v1.5+) is a bit of an overkill, but it is an alternative to explicit type conversion in that passing a JSON argument in effect passes typed data:
jq --argjson v '{ "ID": 2 }' '.jobStatus[] | select(.ID == $v.ID) | .status' test.txt
peak's answer demonstrates that even --argjson v 2 works (in which case comparing to $v works directly), which is certainly the most concise solution, but may require an explanation:
Even though 2 may not look like JSON, it is: it is a valid JSON text containing a single value of type number (see json.org).
Specifically, it is the fact that 2 is an unquoted token that starts with a digit that makes it a number in the context of JSON (the JSON string-value equivalent is "2", which from the shell would have to be passed as '"2"' - note the embedded double quotes).
Therefore jq interprets --argjson -v 2 as a number, and comparison .ID == $v works as intended (note that the same applies to --argjson -v '2' / --argjson -v "2", where the shell removes the quotes before jq sees the value).
By contrast, anything you pass with --arg is always a string value that is used as-is.
In other words: --argjson, whose purpose is to accept arbitrary JSON texts as strings (such as '{ "ID": 2 }' in the example above), can also be used to pass number-string scalars to force their interpretation as numbers.
The same technique also works with Boolean strings true and false.
Tip of the hat to peak for his help.
Assuming you want to check for the JSON value 2, you have a choice to make - either convert the argument of --arg to a number, or use --argjson with a numeric argument. These alternatives are illustrated by the following:
jq --arg v 2 '.jobStatus[] | select(.ID == ($v|tonumber) | .status'
jq --argjson v 2 '.jobStatus[] | select(.ID == $v) | .status'
Note that --argjson requires a relatively recent version of jq.
Of course, if you want to "normalize" .ID so that it's always treated as a string, you could write:
jq --arg v 2 '.jobStatus[] | select((.ID|tostring) == $v) | .status'

Flatten nested JSON using jq

I'd like to flatten a nested json object, e.g. {"a":{"b":1}} to {"a.b":1} in order to digest it in solr.
I have 11 TB of json files which are both nested and contains dots in field names, meaning not elasticsearch (dots) nor solr (nested without the _childDocument_ notation) can digest it as is.
The other solutions would be to replace dots in the field names with underscores and push it to elasticsearch, but I have far better experience with solr therefore I prefer the flatten solution (unless solr can digest those nested jsons as is??).
I will prefer elasticsearch only if the digestion process will take far less time than solr, because my priority is digesting as fast as I can (thus I chose jq instead of scripting it in python).
Kindly help.
EDIT:
I think the pair of examples 3&4 solves this for me:
https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/
I'll try soon.
You can also use the following jq command to flatten nested JSON objects in this manner:
[leaf_paths as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries
The way it works is: leaf_paths returns a stream of arrays which represent the paths on the given JSON document at which "leaf elements" appear, that is, elements which do not have child elements, such as numbers, strings and booleans. We pipe that stream into objects with key and value properties, where key contains the elements of the path array as a string joined by dots and value contains the element at that path. Finally, we put the entire thing in an array and run from_entries on it, which transforms an array of {key, value} objects into an object containing those key-value pairs.
This is just a variant of Santiago's jq:
. as $in
| reduce leaf_paths as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })
It avoids the overhead of the key/value construction and destruction.
(If you have access to a version of jq later than jq 1.5, you can omit the "map(tostring)".)
Two important points about both these jq solutions:
Arrays are also flattened.
E.g. given {"a": {"b": [0,1,2]}} as input, the output would be:
{
"a.b.0": 0,
"a.b.1": 1,
"a.b.2": 2
}
If any of the keys in the original JSON contain periods, then key collisions are possible; such collisions will generally result in the loss of a value. This would happen, for example, with the following input:
{"a.b":0, "a": {"b": 1}}
Here is a solution that uses tostream, select, join, reduce and setpath
reduce ( tostream | select(length==2) | .[0] |= [join(".")] ) as [$p,$v] (
{}
; setpath($p; $v)
)
I've recently written a script called jqg that flattens arbitrarily complex JSON and searches the results using a regex; to simply flatten the JSON, your regex would be '.', which matches everything. Unlike the answers above, the script will handle embedded arrays, false and null values, and can optionally treat empty arrays and objects ([] & {}) as leaf nodes.
$ jq . test/odd-values.json
{
"one": {
"start-string": "foo",
"null-value": null,
"integer-number": 101
},
"two": [
{
"two-a": {
"non-integer-number": 101.75,
"number-zero": 0
},
"true-boolean": true,
"two-b": {
"false-boolean": false
}
}
],
"three": {
"empty-string": "",
"empty-object": {},
"empty-array": []
},
"end-string": "bar"
}
$ jqg . test/odd-values.json
{
"one.start-string": "foo",
"one.null-value": null,
"one.integer-number": 101,
"two.0.two-a.non-integer-number": 101.75,
"two.0.two-a.number-zero": 0,
"two.0.true-boolean": true,
"two.0.two-b.false-boolean": false,
"three.empty-string": "",
"three.empty-object": {},
"three.empty-array": [],
"end-string": "bar"
}
jqg was tested using jq 1.6
Note: I am the author of the jqg script.
As it turns out, curl -XPOST 'http://localhost:8983/solr/flat/update/json/docs' -d #json_file does just this:
{
"a.b":[1],
"id":"24e3e780-3a9e-4fa7-9159-fc5294e803cd",
"_version_":1535841499921514496
}
EDIT 1: solr 6.0.1 with bin/solr -e cloud. collection name is flat, all the rest are default (with data-driven-schema which is also default).
EDIT 2: The final script I used: find . -name '*.json' -exec curl -XPOST 'http://localhost:8983/solr/collection1/update/json/docs' -d #{} \;.
EDIT 3: Is is also possible to parallel with xargs and to add the id field with jq: find . -name '*.json' -print0 | xargs -0 -n 1 -P 8 -I {} sh -c "cat {} | jq '. + {id: .a.b}' | curl -XPOST 'http://localhost:8983/solr/collection/update/json/docs' -d #-" where -P is the parallelism factor. I used jq to set an id so multiple uploads of the same document won't create duplicates in the collection (when I searched for the optimal value of -P it created duplicates in the collection)
As #hraban mentioned, leaf_paths does not work as expected (furthermore, it is deprecated). leaf_paths is equivalent to paths(scalars), it returns the paths of any values for which scalars returns a truthy value. scalars returns its input value if it is a scalar, or null otherwise. The problem with that is that null and false are not truthy values, so they will be removed from the output. The following code does work, by checking the type of the values directly:
. as $in
| reduce paths(type != "object" and type != "array") as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })