Anyone knows how to use jq to sort keys and their array values in json?
For example:
Before sort:
{
z:["c","b","a"],
y:["e", "d", "f"],
x:["g", "i", "h"]
}
After sort:
{
x:["h", "i", "j"]
y:["d", "e", "f"],
z:["a","b","c"]
}
I am trying to use
jq --sort-keys
but it only sorts the keys, but not including their values.
Thanks!
If you are willing to rely on the --sort-keys command-line option to sort the keys, then you can ensure all arrays are sorted by writing:
walk(if type=="array" then sort else . end)
If you want the object keys to be sorted internally (i.e. before the final output is generated), then you could augment the above by using the following filter:
walk(if type=="array" then sort
elif type == "object" then to_entries | sort | from_entries
else . end)
Alternatives
If for some reason you wish not to use walk, then you can roll your own solution using some combination of sort (for JSON arrays) and to_entries|sort|from_entries (for JSON objects).
Related
I'm trying to iterate through an object and convert any value (top level only for now, no recursion) that is a valid json string to json.
I think the answer lies in using the correct incantation perhaps something like with_entries(.value |= try fromjson), but I'm having trouble getting it working. So I broke it down a bit to try something simpler.
How about the following list of objects - I just want to parse the value key of each of them if it is a string that yields valid json (let's ignore the invalid cases for now, they can return null).
So I tried this:
$ jq -n '[{key: "one", value: 1},{key: "two", value: "{\"object\":true}"}] | map(.value |= try fromjson)'
[
{
"key": "one"
},
{
"key": "two"
}
]
Values are both missing even though the two key is a valid json string.
But if I try the same with a simple array, it works as expected:
$ jq -n '[1, "two", "{\"three\":3}"] | .[] | fromjson?'
{
"three": 3
}
So my question is what I am doing wrong here?
Thanks in advance for any pointers.
You have come across one of the (somewhat well-known) deficiencies of jq, namely that the trio of map, |=, try (and therefore postfix ?) do not mix well.
The good news is that the following will work in jq 1.5 and later:
map(.value = (.value | . as $v | try fromjson catch $v))
or equivalently:
map(.value as $v | .value = try ($v|fromjson) catch $v)
I'm trying to define some custom filter functions and one thing I need to be able to do is pass a list of strings to a filter and get the corresponding values of the input object. For example:
jq -n '{name: "Chris", age: 25, city: "Chicago"} | myFilter(["name", "age"])'
should return:
{"name": "Chris", "age": 25}.
I know I can use .[some_string] to dynamically get a value on an object for a specific string key, but I don't know how to utilize this for multiple string keys. I think the problem I'm running into is that jq by default iterates over objects streamed into a filter, but doesn't give a way to iterate over an argument to that filter, even if I use def myFilter($var) syntax that the manual recommends for value-argument behavior.
You could easily define your myFilter using reduce:
def myFilter($keys):
. as $in
| reduce $keys[] as $k (null; . + {($k): $in[$k]} );
More interestingly, if you're willing to modify your "Query By Example" requirements slightly, you can simply specify the keys of interest in braces, as illustrated by this example:
jq -n '{name: "Chris", age: 25, city: "Chicago"} | {name, age}'
If any of the keys cannot be specified in this abbreviated format, simply double-quote them.
I'd like to flatten a nested json object, e.g. {"a":{"b":1}} to {"a.b":1} in order to digest it in solr.
I have 11 TB of json files which are both nested and contains dots in field names, meaning not elasticsearch (dots) nor solr (nested without the _childDocument_ notation) can digest it as is.
The other solutions would be to replace dots in the field names with underscores and push it to elasticsearch, but I have far better experience with solr therefore I prefer the flatten solution (unless solr can digest those nested jsons as is??).
I will prefer elasticsearch only if the digestion process will take far less time than solr, because my priority is digesting as fast as I can (thus I chose jq instead of scripting it in python).
Kindly help.
EDIT:
I think the pair of examples 3&4 solves this for me:
https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/
I'll try soon.
You can also use the following jq command to flatten nested JSON objects in this manner:
[leaf_paths as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries
The way it works is: leaf_paths returns a stream of arrays which represent the paths on the given JSON document at which "leaf elements" appear, that is, elements which do not have child elements, such as numbers, strings and booleans. We pipe that stream into objects with key and value properties, where key contains the elements of the path array as a string joined by dots and value contains the element at that path. Finally, we put the entire thing in an array and run from_entries on it, which transforms an array of {key, value} objects into an object containing those key-value pairs.
This is just a variant of Santiago's jq:
. as $in
| reduce leaf_paths as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })
It avoids the overhead of the key/value construction and destruction.
(If you have access to a version of jq later than jq 1.5, you can omit the "map(tostring)".)
Two important points about both these jq solutions:
Arrays are also flattened.
E.g. given {"a": {"b": [0,1,2]}} as input, the output would be:
{
"a.b.0": 0,
"a.b.1": 1,
"a.b.2": 2
}
If any of the keys in the original JSON contain periods, then key collisions are possible; such collisions will generally result in the loss of a value. This would happen, for example, with the following input:
{"a.b":0, "a": {"b": 1}}
Here is a solution that uses tostream, select, join, reduce and setpath
reduce ( tostream | select(length==2) | .[0] |= [join(".")] ) as [$p,$v] (
{}
; setpath($p; $v)
)
I've recently written a script called jqg that flattens arbitrarily complex JSON and searches the results using a regex; to simply flatten the JSON, your regex would be '.', which matches everything. Unlike the answers above, the script will handle embedded arrays, false and null values, and can optionally treat empty arrays and objects ([] & {}) as leaf nodes.
$ jq . test/odd-values.json
{
"one": {
"start-string": "foo",
"null-value": null,
"integer-number": 101
},
"two": [
{
"two-a": {
"non-integer-number": 101.75,
"number-zero": 0
},
"true-boolean": true,
"two-b": {
"false-boolean": false
}
}
],
"three": {
"empty-string": "",
"empty-object": {},
"empty-array": []
},
"end-string": "bar"
}
$ jqg . test/odd-values.json
{
"one.start-string": "foo",
"one.null-value": null,
"one.integer-number": 101,
"two.0.two-a.non-integer-number": 101.75,
"two.0.two-a.number-zero": 0,
"two.0.true-boolean": true,
"two.0.two-b.false-boolean": false,
"three.empty-string": "",
"three.empty-object": {},
"three.empty-array": [],
"end-string": "bar"
}
jqg was tested using jq 1.6
Note: I am the author of the jqg script.
As it turns out, curl -XPOST 'http://localhost:8983/solr/flat/update/json/docs' -d #json_file does just this:
{
"a.b":[1],
"id":"24e3e780-3a9e-4fa7-9159-fc5294e803cd",
"_version_":1535841499921514496
}
EDIT 1: solr 6.0.1 with bin/solr -e cloud. collection name is flat, all the rest are default (with data-driven-schema which is also default).
EDIT 2: The final script I used: find . -name '*.json' -exec curl -XPOST 'http://localhost:8983/solr/collection1/update/json/docs' -d #{} \;.
EDIT 3: Is is also possible to parallel with xargs and to add the id field with jq: find . -name '*.json' -print0 | xargs -0 -n 1 -P 8 -I {} sh -c "cat {} | jq '. + {id: .a.b}' | curl -XPOST 'http://localhost:8983/solr/collection/update/json/docs' -d #-" where -P is the parallelism factor. I used jq to set an id so multiple uploads of the same document won't create duplicates in the collection (when I searched for the optimal value of -P it created duplicates in the collection)
As #hraban mentioned, leaf_paths does not work as expected (furthermore, it is deprecated). leaf_paths is equivalent to paths(scalars), it returns the paths of any values for which scalars returns a truthy value. scalars returns its input value if it is a scalar, or null otherwise. The problem with that is that null and false are not truthy values, so they will be removed from the output. The following code does work, by checking the type of the values directly:
. as $in
| reduce paths(type != "object" and type != "array") as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })
I've a large JSON file where I'd like to transform some values based on some kind of mapping.
The data I have looks like:
[
{"id":1, "value":"yes"},
{"id":2, "value":"no"},
{"id":3, "value":"maybe"}
]
And I'd like to transform that into a list like this:
[
{"id":1, "value":"10"},
{"id":2, "value":"0"},
{"id":3, "value":"5"}
]
With the fixed mapping:
yes => 10
no => 0
maybe => 5
My current solution is based on a simple if-elif-else combination like this:
cat data.json| jq '.data[] | .value = (if .value == "yes" then "10" elif .value == "maybe" then "5" else "0" end)'
But this is really ugly and I'd love to have a more direct way to express the mapping.
Thanks for your help
If one wants to avoid having to specify the mapping on the command line, then the following two variants may be of interest.
The first variant can be used with jq 1.3, jq 1.4 and jq 1.5:
def mapping: {"yes":"10","no":"0","maybe":"5"};
map(.value |= mapping[.])
The next variant uses the --argfile option (available since jq 1.4), and is of interest if the mapping object is available in a file:
jq --argfile mapping mapping.jq 'map(.value |= $mapping[.])' data.json
Finally, in jq 1.5, other alternatives based on import are also available (!).
Here is a solution which uses an "inline" object since the mapping is small:
map(.value = {"yes":"10","no":"0","maybe":"5"}[.value])
which can be shorted with |= as in peak's solution to:
map(.value |= {"yes":"10","no":"0","maybe":"5"}[.])
Since you're translating string values, you should be able to use a json object to hold the mappings. Then mapping would be trivial.
$ jq --arg mapping '{"yes":"10","no":"0","maybe":"5"}'
'map(.value |= ($mapping | fromjson)[.])' data.json
Given an input json string of keys from an array, return an object with only the entries that had keys in the original object and in the input array.
I have a solution but I think that it isn't elegant ({($k):$input[$k]} feels especially clunky...) and that this is a chance for me to learn.
jq -n '{"1":"a","2":"b","3":"c"}' \
| jq --arg keys '["1","3","4"]' \
'. as $input
| ( $keys | fromjson )
| map( . as $k
| $input
| select(has($k))
| {($k):$input[$k]}
)
| add'
Any ideas how to clean this up?
I feel like Extracting selected properties from a nested JSON object with jq is a good starting place but i cannot get it to work.
solution with inside check:
jq 'with_entries(select([.key] | inside(["key1", "key2"])))'
the inside operator works for most of time; however, I just found the inside operator has side effect, sometimes it selected keys not desired, suppose input is { "key1": val1, "key2": val2, "key12": val12 } and select by inside(["key12"]) it will select both "key1" and "key12"
use the in operator if need an exact match: like this will select .key2 and .key12 only
jq 'with_entries(select(.key | in({"key2":1, "key12":1})))'
because the in operator checks key from an object only (or index exists? from an array), here it has to be written in an object syntax, with desired keys as keys, but values do not matter; the use of in operator is not a perfect one for this purpose, I would like to see the Javascript ES6 includes API's reverse version to be implemented as jq builtin
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/includes
jq 'with_entries(select(.key | included(["key2", "key12"])))'
to check an item .key is included? from an array
You can use this filter:
with_entries(
select(
.key as $k | any($keys | fromjson[]; . == $k)
)
)
Here is some additional clarification
For the input object {"key1":1, "key2":2, "key3":3} I would like to drop all keys that are not in the set of desired keys ["key1","key3","key4"]
jq -n --argjson desired_keys '["key1","key3","key4"]' \
--argjson input '{"key1":1, "key2":2, "key3":3}' \
' $input
| with_entries(
select(
.key == ($desired_keys[])
)
)'
with_entries converts {"key1":1, "key2":2, "key3":3} into the following array of key value pairs and maps the select statement on the array and then turns the resulting array back into an object.
Here is the inner object in the with_entries statement.
[
{
"key": "key1",
"value": 1
},
{
"key": "key2",
"value": 2
},
{
"key": "key3",
"value": 3
}
]
we can then select the keys from this array that meet our criteria.
This is where the magic happens... here is a look at whats going on in the middle of this command. The following command takes the expanded array of values and turns them into a list of objects that we can select from.
jq -cn '{"key":"key1","value":1}, {"key":"key2","value":2}, {"key":"key3","value":3}
| select(.key == ("key1", "key3", "key4"))'
This will yield the following result
{"key":"key1","value":1}
{"key":"key3","value":3}
The with entries command can be a little tricky but its easy to remember that it takes a filter and is defined as follows
def with_entries(f): to_entries|map(f)|from_entries;
This is the same as
def with_entries(f): [to_entries[] | f] | from_entries;
The other part of the question that confuses people is the multiple matches on the right hand side of the ==
Consider the following command. We see the output is an outer production of all the left hand lists and the right hand lists.
jq -cn '1,2,3| . == (1,1,3)'
true
true
false
false
false
false
false
false
true
If that predicate is in a select statement, we keep the input when the predicate is true. Note you can duplicate the inputs here too.
jq -cn '1,2,3| select(. == (1,1,3))'
1
1
3
Jeff's answer has a couple of unnecessary inefficiencies, both of which are addressed by the following, on the assumption that --argjson keys is used instead of --arg keys:
with_entries( select( .key as $k | $keys | index($k) ) )
Even better, if your jq has IN:
with_entries(select(.key | IN($keys[])))
If you are sure that all keys in the input array are present in the original object, you can use the object construction shortcut.
$ echo '{"1":"a","2":"b","3":"c"}' | jq '{"1", "3"}'
{
"1": "a",
"3": "c"
}
Numbers should be quoted to force jq to interpret them as keys instead of literals. In the case of keys not resembling a number, quotes are not needed:
$ echo '{"key1":"a","key2":"b","key3":"c"}' | jq '{key1, key3}'
{
"key1": "a",
"key3": "c"
}
Adding a non-existent key will yield a null value, unlikely what OP wanted:
$ echo '{"1":"a","2":"b","3":"c"}' | jq '{"1", "3", "4"}'
{
"1": "a",
"3": "c",
"4": null
}
but those can be filtered out:
$ echo '{"1":"a","2":"b","3":"c"}' | jq '{"1", "3", "4"} | with_entries(select(.value != null))'
{
"1": "a",
"3": "c"
}
Although this answer doesn't receive a valid input json array as OP asked, I find it useful for just filtering some keys you know are present.
An example usecase: get aud and iss from a JWT. The following is very succint:
echo "jwt-as-json" | jq '{aud, iss}'