How to extract a json value substring with jq - json

I have this json:
{"temperature":"21", "humidity":"12.3", "message":"Today ID 342 is running"}
I want to use jq to obtain this json:
{"temp":"21", "hum":"12.3", "id":"342"}
As you can see, what i want to do is extract the ID number 342 and put it in the new json with a different key name. I think i should use a regex but i don't know how to insert it in jq syntax.
I can create another json using the basic command:
cat old.json | jq '{temp:.temperature,hum:.humidity, id:.message}' > new.json
I know i can select substring using square brackets, but i don't want to use them because they don't take into account strings with different lengths and structure. I want to use a regex because i know that the ID number comes lways after the "ID" part.

You're right that a regex is the way to go here. Fortunately, the jq manual has a large section on using them.
jq '
{
temp: .temperature,
hum: .humidity,
id: (.message | capture("ID (?<id>[[:digit:]]+)").id)
}' <old.json >new.json
You can see this running with your sample data at https://jqplay.org/s/k-ZylbOC6W

Related

Finding the location (line, column) of a field value in a JSON file

Consider the following JSON file example.json:
{
"key1": ["arr value 1", "arr value 2", "arr value 3"],
"key2": {
"key2_1": ["a1", "a2"],
"key2_2": {
"key2_2_1": 1.43123123,
"key2_2_2": 456.3123,
"key2_2_3": "string1"
}
}
}
The following jq command extracts a value from the above file:
jq ".key2.key2_2.key2_2_1" example.json
Output:
1.43123123
Is there an option in jq that, instead of printing the value itself, prints the location (line and column, start and end position) of the value within a (valid) JSON file, given an Object Identifier-Index (.key2.key2_2.key2_2_1 in the example)?
The output could be something like:
some_utility ".key2.key2_2.key2_2_1" example.json
Output:
(6,25) (6,35)
Given JSON data and a query, there is no
option in jq that, instead of printing the value itself, prints the location
of possible matches.
This is because JSON parsers providing an interface to developers usually focus on processing the logical structure of a JSON input, not the textual stream conveying it. You would have to instruct it to explicitly treat its input as raw text, while properly parsing it at the same time in order to extract the queried value. In the case of jq, the former can be achieved using the --raw-input (or -R) option, the latter then by parsing the read-in JSON-encoded string using fromjson.
The -R option alone would read the input linewise into an array of strings, which would have to be concatenated (e.g. using add) in order to provide the whole input at once to fromjson. The other way round, you could also provide the --slurp (or -s) option which (in combination with -R) already concatenates the input to a single string which then, after having parsed it with fromjson, would have to be split again into lines (e.g. using /"\n") in order to provide row numbers. I found the latter to be more convenient.
That said, this could give you a starting point (the --raw-output (or -r) option outputs raw text instead of JSON):
jq -Rrs '
"\(fromjson.key2.key2_2.key2_2_1)" as $query # save the query value as string
| ($query | length) as $length # save its length by counting its characters
| ./"\n" | to_entries[] # split into lines and provide 0-based line numbers
| {row: .key, col: .value | indices($query)[]} # find occurrences of the query
| "(\(.row),\(.col)) (\(.row),\(.col + $length))" # format the output
'
(5,24) (5,34)
Demo
Now, this works for the sample query, how about the general case? Your example queried a number (1.43123123) which is an easy target as it has the same textual representation when encoded as JSON. Therefore, a simple string search and length count did a fairly good job (not a perfect one because it would still find any occurrence of that character stream, not just "values"). Thus, for more precision, but especially with more complex JSON datatypes being queried, you would need to develop a more sophisticated searching approach, probably involving more JSON conversions, whitespace stripping and other normalizing shenanigans. So, unless your goal is to rebuild a full JSON parser within another one, you should narrow it down to the kind of queries you expect, and compose an appropriately tailored searching approach. This solution provides you with concepts to simultaneously process the input textually and structurally, and with a simple search and ouput integration.

How do you conditionally change a string value to a number in JQ?

I am pulling a secret from SecretsManager in AWS and using the resulting JSON to build a parameters JSON file that can pass this on to the cloud formation engine. Unfortunately, SecretsManager stores all values as strings, so when I try to pass these values to my cloud formation template it will fail because it is passing a string instead of a number and some cloud formation parameters need to be numbers (e.g. not a string).
In the example below, I want to tell JQ that "HEALTH_CHECK_UNHEALTHY_THRESHOLD_COUNT" and "AUTOSCALING_MAX_CAPACITY" are numbers. So, I prefix the key with "NUMBER::".
This serves two purposes. First, it tells the person viewing this secret that it will be converted to a number, second, it will tell JQ to convert the string value of "2" to 2. This needs to scale so that I can have 1..n keys that need to be converted in the JSON.
Consider this JSON:
{
"NUMBER::AUTOSCALING_MAX_CAPACITY": "12",
"SERVICE_PLATFORM_VERSION": "1.3.0",
"HEALTH_CHECK_PROTOCOL": "HTTPS",
"NUMBER::HEALTH_CHECK_UNHEALTHY_THRESHOLD_COUNT": "2"
}
Here is what I'd like to do with JQ:
JQ will copy over the key/value pairs for the majority of elements in the JSON "as is". If there is no "NUMBER::" prefix, they are copied over "as is".
However, if a key is prefixed with "NUMBER::" I'd like the following to happen:
a. JQ will remove the "NUMBER::" prefix from the key name.
b. JQ will convert the value from a string to a number.
The end result is a JSON that looks like this:
{
"AUTOSCALING_MAX_CAPACITY": 12,
"SERVICE_PLATFORM_VERSION": "1.3.0",
"HEALTH_CHECK_PROTOCOL": "HTTPS",
"HEALTH_CHECK_UNHEALTHY_THRESHOLD_COUNT": 2
}
What I've tried
I have tried using Map to do this with limited success. In this example I am looking for a specific field mainly as a test. I don't want to have to call out specific keys by name, but rather just use any key that begins with "NUMBER::" to do the conversions.
NOTE: The SECRET_STRING variable in the examples below contains the source JSON.
echo $SECRET_STRING | jq 'to_entries | map(if .key == "NUMBER::AUTOSCALING_MAX_CAPACITY" then . + {"value":.value} else . end ) | from_entries'**
I've also tried to use "tonumber" across the entire JSON. JQ will examine all the values and see if it can convert them to numbers. The problem is it fails when it hits the "SERVICE_PLATFORM_VERSION" key as it detects "1.3.0" as a number and it tries for make that a number, which of course is bogus.
Example: echo $SECRET_STRING | jq -r '.[] | tonumber'
Recap
I'd like to use JQ to convert JSON string values to number by use a prefix of "NUMBER::" in the key name.
Note: This problem does not exist when attempting to pull entries from the Systems Manager Parameter Store because AWS allows you use "resolve" entries as strings or numbers. The same feature does not exist in SecretsManager. I'd also like to use the SecretsManager to provide a list of some 30 or more configuration items to set up my stack. With the Parameter store you have to set up each config item as a separate entry, which we be a maintenance nightmare.
Select each entry with a key starting with NUMBER:: and update it to remove that prefix and convert the value to a number.
with_entries(
select(.key | startswith("NUMBER::")) |= (
(.key |= ltrimstr("NUMBER::")) |
(.value |= tonumber)
)
)
Online demo

Providing a very large argument to a jq command to filter on keys

I am trying to parse a very large file which consists of JSON objects like this:
{"id": "100000002", "title": "some_title", "year": 1988}
Now I also have a very big list of ID's that I want to extract from the file, if they are there.
Now I know that I can do this:
jq '[ .[map(.id)|indices("1", "2")[]] ]' 0.txt > p0.json
Which produces the result I want, namely fills p0.json with only the objects that have "id" 1 and "2". Now comes the problem: my list of id's is very long too (100k or so). So I have a Python programm that outputs the relevant id's. My line of thought was, to first assign that to a variable:
REL_IDS=`echo python3 rel_ids.py`
And then do:
jq --arg ids "$REL_IDS" '[ .[map(.id)|indices($ids)[]] ]' 0.txt > p0.json
I tried both with brackets [$ids] and without brackets, but no luck so far.
My question is, given a big amount of arguments for the filter, how would I proceed with putting them into my jq command?
Thanks a lot in advance!
Since the list of ids is long, the trick is NOT to use --arg. However, the details will depend on the details regarding the "long list of ids".
In general, though, you'd want to present the list of ids to jq as a file so that you could use --rawfile or --slurpfile or some such.
If for some reason you don't want to bother with an actual file, then provided your shell allows it, you could use these file-oriented options with process substitution: <( ... )
Example
Assuming ids.json contains a lising of the ids as JSON strings:
"1"
"2"
"3"
then one could write:
< objects.json jq -c -n --slurpfile ids ids.json '
inputs | . as $in | select( $ids | index($in.id))'
Notice the use of the -n command-line option.

Can jq check each element of a comma seperated array of values to check if the value exists in JSON?

I have a JSON file and I am extracting data from it using jq. One simple use case is pulling out any JSON Object that contains an Id which is provided as an argument.
I use the following simple script to do so:
[.[] | select(.id == $ID)]
The script is stored in a separate file (by_id.jq) which I pass in using the -f argument.
The full command looks something like this:
cat ./my_json_file.json | jq -sf --arg ID "8df993c1-57d5-46b3-a8a3-d95066934e5b" ./by_id.jq
Is there a way by only using jq that a comma separated list of values could be passed as an argument to the jq script and iterate through the ids and check them against the value of .id in the the JSON file with the result being the objects that have that id?
For example if I wanted to pull out three objects by their ids I would want to structure the command in this way:
cat ./my_json_file.json | jq -sf --arg ID "8df993c1-57d5-46b3-a8a3-d95066934e5b,1d5441ca-5758-474d-a9fc-40d0f68aa538,23cc618a-8ad4-4141-bc1c-0251y0663963" ./by_id.jq
Sure. Though you'll need to parse (split) that list of ids to something that jq can work with, such as an array of ids. Then your problem becomes, given an array of keys, select objects that have any of these ids. Which you could use approaches found here.
$ jq --arg ID '8df993c1-57d5-46b3-a8a3-d95066934e5b,1d5441ca-5758-474d-a9fc-40d0f68aa538,23cc618a-8ad4-4141-bc1c-0251y0663963' '
select(.id | IN($ID|split(",")[]))
' ./my_json_file.json
I'm not sure what your input looks like but judging by your use of slurping then filtering the slurped input, it's a stream of objects. The slurping is not necessary here.
Here is an approach that focuses on efficiency.
Your Q indicates that in fact you have a stream of objects, so the first step towards efficiency is to avoid the -s option, and use -n with inputs instead.
The second step it to avoid splitting your comma-separated string of values more than once.
So your script might look like this:
INDEX($ids | splits(","); .) as $dict
| inputs
| select($dict[.id])
And the invocation would look like this:
jq -n --args a,b,c -f by_id.jq
This of course assumes that simply splitting the string of ids on "," will suffice. You might need to trim the values and take care of other potential anomalies.
For efficiency, it would be better to split $ID just once.
So if you have to use the -s option, you could use the following jq program:
INDEX($ID | splits(","); .) as $dict
| .[]
| select($dict[.id])

Cannot match integer value using regex - error (at <stdin>:6): number not a string or array

I have multiple JSON files with a person's age and I want to match specific ages using regex, however, I cannot be able to match even a single integer in a file.
I can select age using following jq,
jq -r .details.Age
I can match Name using following jq,
jq -r 'select(.details.Name | match("r.*"))'
But when I try to use test or match with Age I get following error,
jq -r 'select(.details.Age | match(32))'
jq: error (at <stdin>:6): number not a string or array
Here is code,
{
"details": {
"Age": 32,
"Name": "reverent"
}
}
I want to be able to match Age using jq something like this,
jq -r 'select(.details.Age | match(\d))'
Your .Age value is a number, but regexes work on strings, so if you really want to use regexes, you would have to transform the number to a string. This can be done using tostring, but please remember that the tostring representation of a JSON number might not always be what you think it will be.
–––
p.s. That should be match("\\d")