regex-like search in a json with jq - json

I have this json and I want to get the id of the corresponding subnet that fit the variable subnet.
subnet="192.168.112"
json='{
"subnets": [
{
"cidr": "192.168.112.0/24",
"id": "123"
},
{
"cidr": "10.120.47.0/24",
"id": "456"
}
]
}'
Since regex is not supported with jq. The only way I found to get the right id is to mixte grep, sed and jq like this :
tabNum=$((`echo ${json} | jq ".subnets[].cidr" | grep -n "$subnet" | sed "s/^\([0-9]\+\):.*$/\1/"` - 1))
NET_ID=`echo ${json} | jq -r ".subnets[${tabNum}].id"`
Is there a way to get the id only using jq ?

It's not completely clear to me what your provided script does, but it seems like it only looks for a string that contains the provided subset. I would suggest using contains or startswith. A sample script would look like:
echo "${json}" | jq --arg subnet "$subnet" '.subnets[] | select(.cidr | startswith($subnet)).id'
Since you mention regex: the latest release of jq, 1.5, includes regex support (thanks to Jeff Mercado for pointing this out!) and, if you have to deal with string manipulation problems frequently, I'd recommend checking it out.

Related

Output paths to all keys named "id" where the type of value is "string"

Given a huge (15GB) deeply nested (12+ object layers) JSON file how can I find the paths to all the keys named id whose values are type string?
A massively simplified example file:
{
"a": [
{
"id": 3,
"foo": "red"
}
],
"b": [
{
"id": "7",
"bar": "orange",
"baz": {
"id": 13
},
"bax": {
"id": "12"
}
}
]
}
Looking for a less ugly solution where I don't run out of RAM and have to punt to grep at the end (sigh). (I failed to figure out how to chain to_entries into this usefully. If that's even something I should be trying to do.)
Ugly solution 1:
$ cat huge.json | jq 'path(..|select(type=="string")) | join(".")' | grep -E '\.id"$'
"b.0.id"
"b.0.bax.id"
Ugly solution 2:
$ cat huge.json | jq --stream -c | grep -E '"id"],"'
[["b",0,"id"],"7"]
[["b",0,"bax","id"],"12"]
Something like this should do that.
jq --stream 'select(.[0][-1] == "id" and (.[1] | strings)) | .[0]' file
And by the way, your first ugly solution can be simplified to this:
jq 'path(.. .id? | strings)' file
Stream the input in as you started with your second solution, but add some filtering. You do not want want to read the entire contents into memory. And also... UUOC.
$ jq --stream '
select(.[0][-1] == "id" and (.[1]|type) == "string")[0]
| join(".")
' huge.json
Thank you both oguz and Jeff! Beautiful! This runs in 6.5 minutes (on my old laptop), never uses more than 21MB of RAM, and gives me exactly what I need. <3
$ jq --stream -c 'select(.[0][-1] == "id" and (.[1]|type) == "string")' huge.json

jq: filter result by value (contains) is very slow

I am trying to use jq to filter a large number of JSON files and extract the ids of each object who belong to a specific domain, as well as the full URL within that domain. Here's a sample of the data:
{
"items": [
{
"completeness": 5,
"dcLanguageLangAware": {
"def": [
"de"
]
},
"edmIsShownBy": [
"https://gallica.example/image/2IC6BQAEGWUEG4OP7AYBDGIGYAX62KZ6H366KXP2IKVAF4LKY37Q/presentation_images/5591be60-01fc-11e6-8e10-fa163e091926/node-3/image/SBB/Berliner_Börsenzeitung/1920/02/27/F_065_098_0/F_SBB_00007_19200227_065_098_0_001/full/full/0/default.jpg"
],
"id": "/9200355/BibliographicResource_3000117730632",
"type": "TEXT",
"ugc": [
false
]
}
]
}
Bigger sample here: https://www.dropbox.com/s/0s0zjtxe01mecjc/AoQhRn%2B56KDm5AJJPwEvOTIwMDUyMC9hcmtfXzEyMTQ4X2JwdDZrMTAyNzY2Nw%3D%3D.json?dl=0
I can extract both ids and URL which contains the string "gallica" using the following command:
jq '[ .items[] | select(.edmIsShownBy[] | contains ("gallica")) | {id: .id, link: .edmIsShownBy[] }]'
However, i have more than 28000 JSON files to process and it is taking a large amount of time (around 1 file per minute). I am processing the files using bash with the command:
find . -name "*.json" -exec cat '{}' ';' | jq '[ .items[] | select(.edmIsShownBy[] | contains ("gallica")) | {id: .id, link: .edmIsShownBy[] }]'
I was wondering if the slowness is due by the instruction given to jq, and if it is the case, is there a faster way to filter a string contained in a chosen value? Any ideas?
It would probably be wise not to attempt to cat all the files at once; indeed, it would probably be best to avoid cat altogether.
For example, assuming program.jq contains whichever jq program you decide on (and there is nothing wrong with using contains here), you could try:
find . -name "*.json" -exec jq -f program.jq '{}' +
Using the non-standard + instead of ';' minimizes the number of times jq must be called, though the overhead of invoking jq is actually quite small. If your find does not support + and you wish to avoid calling jq once per file, then consider using xargs, or GNU parallel with the —-xargs option.
If you know the JSON files of interest are in the pwd, you could also speed up find by specifying -maxdepth 1.

Linux CLI - How to get substring from JSON jq + grep?

I need to pull a substring from JSON. In the JSON doc below, I need the end of the value of jq '.[].networkProfile.networkInterfaces[].id' In other words, I need just A10NICvw4konls2vfbw-data to pass to another command. I can't seem to figure out how to pull a substring using grep. I've seem regex examples out there but haven't been successful with them.
[
{
"id": "/subscriptions/blah/resourceGroups/IPv6v2/providers/Microsoft.Compute/virtualMachines/A10VNAvw4konls2vfbw",
"instanceView": null,
"licenseType": null,
"location": "centralus",
"name": "A10VNAvw4konls2vfbw",
"networkProfile": {
"networkInterfaces": [
{
"id": "/subscriptions/blah/resourceGroups/IPv6v2/providers/Microsoft.Network/networkInterfaces/A10NICvw4konls2vfbw-data",
"resourceGroup": "IPv6v2"
}
]
}
}
]
In your case, sub(".*/";"") will do the trick as * is greedy:
.[].networkProfile.networkInterfaces[].id | sub(".*/";"")
Try this:
jq -r '.[]|.networkProfile.networkInterfaces[].id | split("/") | last'
The -r tells JQ to print the output in "raw" form - in this case, that means no double-quotes around the string value.
As for the jq expression, after you access the id you want, piping it (still inside jq) through split("/") turns it into an array of the parts between slashes. Piping that through the last function (thanks, #Thor) returns just the last element of the array.
If you want to do it with grep here is one way:
jq -r '.[].networkProfile.networkInterfaces[].id' | grep -o '[^/]*$'
Output:
A10NICvw4konls2vfbw-data

Replace tags in text file using key-value pairs from JSON file

I am trying to write a shell script that can read a json string, decode it to an array and foreach through the array and use the key/value for replacing strings in another file.
If this were PHP, then I would write something like this.
$array = json_decode($jsonString, true);
foreach($array as $key => $value)
{
str_replace($key, $value, $rawString);
}
I need this to be converted to Bash script.
Here is the example JSON string.
{
"login": "lambda",
"id": 37398,
"avatar_url": "https://avatars.githubusercontent.com/u/37398?v=3",
"gravatar_id": "",
"url": "https://api.github.com/users/lambda",
"html_url": "https://github.com/lambda",
"followers_url": "https://api.github.com/users/lambda/followers",
"following_url": "https://api.github.com/users/lambda/following{/other_user}",
"gists_url": "https://api.github.com/users/lambda/gists{/gist_id}",
"starred_url": "https://api.github.com/users/lambda/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/lambda/subscriptions",
"organizations_url": "https://api.github.com/users/lambda/orgs",
"repos_url": "https://api.github.com/users/lambda/repos",
"events_url": "https://api.github.com/users/lambda/events{/privacy}",
"received_events_url": "https://api.github.com/users/lambda/received_events",
"type": "User",
"site_admin": false,
"name": "Brian Campbell",
"company": null,
"blog": null,
"location": null,
"email": null,
"hireable": null,
"bio": null,
"public_repos": 27,
"public_gists": 23,
"followers": 8,
"following": 2,
"created_at": "2008-11-30T21:03:27Z",
"updated_at": "2016-12-21T23:53:11Z"
}
I've this file,
Lamba login name is %login%, and avatar url is %avatar_url%
I am using jq
jq -c '.[]' /tmp/json | while read i; do
echo $i
done
This outputs only the value part. How do I loop through key and also get value?
Also, I've found that the keys of the json string can be returned using
jq 'keys' /tmp/params
However, I am still trying to figure out how to loop through the key and return the data.
The whole thing can be done quite simply (and very efficiently) in jq.
For the sake of illustration, suppose we have defined dictionary to be the dictionary object given in the question, and template to be the template string:
def dictionary: { ...... };
def template:
"Lamba login name is %login%, and avatar url is %avatar_url%";
Then the required interpolation can be performed as follows:
dictionary
| reduce to_entries[] as $pair (template; gsub("%\($pair.key)%"; $pair.value))
The above produces:
"Lamba login name is lambda, and avatar url is https://avatars.githubusercontent.com/u/37398?v=3"
There are of course many other ways in which the dictionary and template string can be presented.
I'm assuming your JSON is in infile.json and the text with the tags to be replaced in infile.txt.
Here is an entirely unreadable one-liner that does it:
$ sed -f <(jq -r 'to_entries[] | [.key, .value] | #tsv' < infile.json | sed 's~^~s|%~;s~\t~%|~;s~$~|g~') infile.txt
Lamba login name is lambda, and avatar url is https://avatars.githubusercontent.com/u/37398?v=3
Now, to decipher what this does. First, a few linebreaks for readability:
sed -f <(
jq -r '
to_entries[] |
[.key, .value] |
#tsv
' < infile.json |
sed '
s~^~s|%~
s~\t~%|~
s~$~|g~
'
) infile.txt
We're basically using a sed command that takes its instructions from a file; instead of an actual file, we use process substitution to generate the sed commands:
jq -r 'to_entries[] | [.key, .value] | #tsv' < infile.json |
sed 's~^~s|%~;s~\t~%|~;s~$~|g~'
Some processing with jq, followed by some sed substitutions.
This is what the jq command does:
Generate raw output (no quotes, actual tabs instead of \t) with the -r option
Turn the input JSON object into an array of key-value pairs with the to_entries function, resulting in
[
{
"key": "login",
"value": "lambda"
},
{
"key": "id",
"value": 37398
},
...
]
Get all elements of the array with []:
{
"key": "login",
"value": "lambda"
}
{
"key": "id",
"value": 37398
}
...
Get a list of arrays with key/value in each using [.key, .value], resulting in
[
"login",
"lambda"
]
[
"id",
37398
]
...
Finally, use the #tsv filter to get the key-value pairs as a tab separated list:
login lambda
id 37398
...
Now, we pipe this to sed, which performs three substitutions:
s~^~s|%~ – add s|% to the beginning of each line
s~\t~%|~ – replace the tab with %|
s~$~|g~ – add |g to the end of each line
This gives us a sed file that looks as follows:
s|%login%|lambda|g
s|%id%|37398|g
s|%avatar_url%|https://avatars.githubusercontent.com/u/37398?v=3|g
Notice that for these substitutions, we used ~ as the delimiter, and for the substitution commands we generated, we used | – mostly to avoid running into problems with strings containing /.
If this sed file were stored as commands.sed, the overall command would correspond to
sed -f commands.sed infile.txt
Remarks
If your shell doesn't support process substitution, you could make sed read from standard input instead, using sed -f -:
jq -r 'to_entries[] | [.key, .value] | #tsv' < infile.json |
sed 's~^~s|%~;s~\t~%|~;s~$~|g~' |
sed -f - infile.txt
If infile.json contained | or ~, you would have to choose different delimiters for the sed substitutions (see for example this answer about using a non-printable character as a delimiter) or even perform additional substitutions to get rid of the delimiting characters first and put them back in at the end (see this and this Q&A).
Some seds (such as BSD sed found in MacOS) have trouble with \t used in the pattern to substitute. If that is the case, the command s~\t~%|~ has to be replaced by s~'$'\t''~%|~ to "splice in" the tab character, or (if the shell doesn't support ANSI-C quoting) even with s~'"$(printf '\t')"'~%|~.
Here's a simple sed solution. Assume that the json Object is in x.json and the file where the replacements should be done in f.txt.
The following x.sed - Programm called as
sed -n -f x.sed x.json <(echo FILE_DELIM) f.txt
does the job.
x.sed:
1,$H
$ {
x
:b
s/\("\([^"]\+\)" *: *\(\("\([^"]*\)"\)\|\(\(\w\|\.\)\+\)\).*FILE_DELIM.*\)%\2%\(.*\)/\1\3\8/
tb
s/.*FILE_DELIM\n//
p
}
The trick is to save the two files (separated by the string FILE_DELIM) in one line in sed's hold space and then recursively replace the keys (e.g. %login%) by their values behind the FILE_DELIM.
The crucial point is to define the pattern which matches a key value pair in the json object. Here I used:
" followed by non " followed by " followed by blanks followed by a colon (*1) followed by blanks followed by (again a qouted string or a string consisting of (word characters or .)) (*2)
The backreference \2 in the search pattern matches the key and is replaced with \3 which matches the value.
*1): Up to here this matches a key like "login"
*2): The values are allowed to be "xyz", "", abc, 0.1, ...

Exclude column from jq json output

I would like to get rid of the timestamp field here using jq JSON processor.
[
{
"timestamp": 1448369447295,
"group": "employees",
"uid": "elgalu"
},
{
"timestamp": 1448369447296,
"group": "employees",
"uid": "mike"
},
{
"timestamp": 1448369786667,
"group": "services",
"uid": "pacts"
}
]
White listing would also works for me, i.e. select uid, group
Ultimately what I would really like is a list with unique values like this:
employees,elgalu
employees,mike
services,pacts
If you just want to delete the timestamps you can use the del() function:
jq 'del(.[].timestamp)' input.json
However to achieve the desired output, I would not use the del() function. Since you know which fields should appear in output, you can simply populate an array with group and id and then use the join() function:
jq -r '.[]|[.group,.uid]|join(",")' input.json
-r stands for raw ouput. jq will not print quotes around the values.
Output:
employees,elgalu
employees,mike
services,pacts
For the record, an alternative would be:
$ jq -r '.[] | "\(.uid),\(.group)"' input.json
(The white-listing approach makes it easy to rearrange the order, and this variant makes it easy to modify the spacing, etc.)
The following example may be of interest to anyone who wants safe CSV (i.e. even if the values have embedded commas or newline characters):
$ jq -r '.[] | [.uid, .group] | #csv' input.json
"elgalu","employees"
"mike","employees"
"pacts","services"
Sed is your best friend - I can't think of anything simpler. I've got here having the same problem as the question's author - but maybe this is a simpler answer to the same problem:
< file sed -e '/timestamp/d'