Use jq to recursively select key names of an object - json

I have a JSON document that looks like:
simple: 42
normal:
description: "NORMAL"
combo:
one:
description: "ONE"
two:
description: "TWO"
arbitrary:
foo: 42
I want to use a jq expression to generate the following:
["normal", "one", "two"]
The condition to select the key is that its corresponding value is an object type that has a key description. In this case, keys simple and arbitrary don't qualify.
I'm having a hard time to craft the filter. Looked into with_entries and recurse/2 but can't solve it myself.
TIA.

It's not clear to me whether the YAML that you gave is just a "view" of your JSON or whether you actually want to start with YAML. If your document really is YAML, then one approach would be to use a tool
(such as yaml2json or yq) to convert the yaml to JSON, and then run jq
as shown below; another would be to use jq as a text-processor,
but in that case you could just as well use awk.
yaml2json input.yaml |
jq -c '[.. | objects | to_entries[]
| select(.value | has("description")?) | .key]'
Output
["normal","one","two"]
Streaming parser
This type of problem is also well-suited to jq's streaming parser, which is especially handy when dealing with very large JSON texts. Using jq --stream, a suitable jq filter would be:
[select(length==2) | .[0] | select(.[-1] == "description") | .[-2]]
The ordering of the results will depend on the ordering of the keys produced by the YAML-to-JSON conversion tool.

Related

How to select multiple values in an array in json using jq

Am using jq to get multiple responses from the JSON file using the below command.
.components| to_entries[]| "\(.key)- \(.value.status)"
which gives me below
Server2- UP
server1 - UP
Splunk- UP
Datameer - UP
Platfora - UP
diskSpace- Good
But I want to select only a few I tried giving in braces of to_entries[] but it didn't work.
Expected output:
Server1 - UP
Splunk -UP
Platfora - UP
Is there any way to pick only a few values.
Appreciate your help. Thank you.
With the -r command-line option, the following transforms the given input to the desired output, and is perhaps close to what you're looking for:
.components
| to_entries[]
| select(.key == ("server1", "Splunk", "Platfora"))
| "\(.key)- \(.value.status)"
If the list of components is available as a JSON list, then you could modify the selection criterion accordingly, e.g. using IN (uppercase) or index.

How to extract a json value substring with jq

I have this json:
{"temperature":"21", "humidity":"12.3", "message":"Today ID 342 is running"}
I want to use jq to obtain this json:
{"temp":"21", "hum":"12.3", "id":"342"}
As you can see, what i want to do is extract the ID number 342 and put it in the new json with a different key name. I think i should use a regex but i don't know how to insert it in jq syntax.
I can create another json using the basic command:
cat old.json | jq '{temp:.temperature,hum:.humidity, id:.message}' > new.json
I know i can select substring using square brackets, but i don't want to use them because they don't take into account strings with different lengths and structure. I want to use a regex because i know that the ID number comes lways after the "ID" part.
You're right that a regex is the way to go here. Fortunately, the jq manual has a large section on using them.
jq '
{
temp: .temperature,
hum: .humidity,
id: (.message | capture("ID (?<id>[[:digit:]]+)").id)
}' <old.json >new.json
You can see this running with your sample data at https://jqplay.org/s/k-ZylbOC6W

Can jq check each element of a comma seperated array of values to check if the value exists in JSON?

I have a JSON file and I am extracting data from it using jq. One simple use case is pulling out any JSON Object that contains an Id which is provided as an argument.
I use the following simple script to do so:
[.[] | select(.id == $ID)]
The script is stored in a separate file (by_id.jq) which I pass in using the -f argument.
The full command looks something like this:
cat ./my_json_file.json | jq -sf --arg ID "8df993c1-57d5-46b3-a8a3-d95066934e5b" ./by_id.jq
Is there a way by only using jq that a comma separated list of values could be passed as an argument to the jq script and iterate through the ids and check them against the value of .id in the the JSON file with the result being the objects that have that id?
For example if I wanted to pull out three objects by their ids I would want to structure the command in this way:
cat ./my_json_file.json | jq -sf --arg ID "8df993c1-57d5-46b3-a8a3-d95066934e5b,1d5441ca-5758-474d-a9fc-40d0f68aa538,23cc618a-8ad4-4141-bc1c-0251y0663963" ./by_id.jq
Sure. Though you'll need to parse (split) that list of ids to something that jq can work with, such as an array of ids. Then your problem becomes, given an array of keys, select objects that have any of these ids. Which you could use approaches found here.
$ jq --arg ID '8df993c1-57d5-46b3-a8a3-d95066934e5b,1d5441ca-5758-474d-a9fc-40d0f68aa538,23cc618a-8ad4-4141-bc1c-0251y0663963' '
select(.id | IN($ID|split(",")[]))
' ./my_json_file.json
I'm not sure what your input looks like but judging by your use of slurping then filtering the slurped input, it's a stream of objects. The slurping is not necessary here.
Here is an approach that focuses on efficiency.
Your Q indicates that in fact you have a stream of objects, so the first step towards efficiency is to avoid the -s option, and use -n with inputs instead.
The second step it to avoid splitting your comma-separated string of values more than once.
So your script might look like this:
INDEX($ids | splits(","); .) as $dict
| inputs
| select($dict[.id])
And the invocation would look like this:
jq -n --args a,b,c -f by_id.jq
This of course assumes that simply splitting the string of ids on "," will suffice. You might need to trim the values and take care of other potential anomalies.
For efficiency, it would be better to split $ID just once.
So if you have to use the -s option, you could use the following jq program:
INDEX($ID | splits(","); .) as $dict
| .[]
| select($dict[.id])

How to pass a key to a jq file

I would like to write a simple jq file that allows me to count items grouped by a specified key.
I expect the script contents to be something similar too:
group_by($group) | map({group: $group, cnt: length})
and to invoke it something like
cat my.json | jq --from-file count_by.jq --args group .header.messageType
Whatever I've tried the argument always ends up as a string and is not usable as a key.
Since you have not followed the minimal complete verifiable example
guidelines, it's a bit difficult to know what the best approach to your problem will be, but whatever approach you take, it is important to bear in mind that --arg always passes in a JSON string. It cannot be used to pass in a jq program fragment unless the fragment is a JSON string.
So let's consider one option: passing in a JSON object representing a path that you can use in your program.
So the invocation could be:
jq -f count_by.jq --argjson group '["header", "messageType"]'
and the program would begin with:
group_by(getpath($group)) | ...
Having your cake ...
If you really want to pass in arguments such as .header.messageType, there is a way: convert the string $group into a jq path:
($group|split(".")|map(select(length>0))) as $path
So your jq filter would look like this:
($group|split(".")|map(select(length>0))) as $path
| group_by(getpath($path)) | map({group: $group, cnt: length})
Shell string interpolation
If you want a quick bash solution that comes with many caveats:
group=".header.messageType"
jq 'group_by('"$group"') | map({group: "'"$group"'", cnt: length}'

jq to remove one of the duplicated objects

I have a json file like this:
{"caller_id":"123321","cust_name":"abc"}
{"caller_id":"123443","cust_name":"def"}
{"caller_id":"123321","cust_name":"abc"}
{"caller_id":"234432","cust_name":"ghi"}
{"caller_id":"123321","cust_name":"abc"}
....
I tried:
jq -s 'unique_by(.field1)'
but this will remove all the duplicated items, I,m looking to keep just one of the duplicated items, to get the file like this:
{"caller_id":"123321","cust_name":"abc"}
{"caller_id":"123443","cust_name":"def"}
{"caller_id":"234432","cust_name":"ghi"}
....
With field1, I doubt you are getting anything in the output, since there is no key/field with the given name. If you simply change your command to jq -s 'unique_by(.caller_id)' it will give you desired result containing unique & sorted objects based on caller_id key. It will ensure in result you have atleast & atmost one object for each caller_id.
NOTE: Same as what #Jeff Mercado has explained in the comments.
If the file consists of a sequence (stream) of JSON objects, then a very simple way to produce a stream of the distinct objects would be to use the invocation:
jq -s `unique[]`
A similar alternative would be:
jq -n `[inputs] | unique[]`
For large files, however, the above will probably be too inefficient, both with respect to RAM and run-time. Note that both unique and unique_by entail a sort.
A far better alternative would be to take advantage of the fact that the input is a stream, and to avoid the built-in unique and unique_by filters. This can be done with the assistance of the following filters, which are not yet built-in but likely to become so:
# emit a dictionary
def set(s): reduce s as $x ({}; .[$x | (type[0:1] + tostring)] = $x);
# distinct entities in the stream s
def distinct(s): set(s)[];
We now have only to add:
distinct(inputs)
to achieve the objective, provided jq is invoked with the -n command-line option.
This approach will also preserve the original ordering.
If the input is an array ...
If the input is an array, then using distinct as defined above still has the advantage of not requiring a sort. For arrays that are too large to fit comfortably in memory, it would be advisable to use jq's streaming parser to create a stream.
One possibility would be to proceed in two steps (jq --stream .... | jq -n ...), but it might be better to do everything in one step (jq -cn --stream ...), using the following "main" program:
distinct(fromstream(inputs
| (.[0] |= .[1:] )
| select(. != [[]])))