Get Substring from Json Value using Grep - json

I have such json file:
{
"fruits": [
{
"name": "apple",
"type": "gala",
"product_id": "PA:app-1d39gsg",
"in_stock": "Y"
}
]
}
From this file, I want to get only this app-id39gsg using grep.
Tried
grep -o 'app-*' test.json
but doesn't work.
How do I get that substring from json data?
I would appreciate your advice.

I assume that your string always starts with PA:.
jq -r '.fruits[].product_id | sub("^PA:";"")' file.json
Output:
app-1d39gsg

Do you need to use only grep or can you use other common Linux tools as well?
Here's a suggestion using grep + sed:
grep -o 'app-\S*' text.json | sed 's/..$//'
Output:
app-1d39gsg
Explaining:
grep matches anything starting from "app-"
sed removes the last two characters
Note: This assumes you're using an implementation of grep which supports flag -o, like GNU grep does. You should be able to check your version with grep --version.

Related

Parsing json output using jq -jr

I am running a puppet bolt command query certain information from a set of servers in json format. I am piping it to jq.. Below is what I get
$ bolt command run "cat /blah/blah" -n #hname.txt -u uid --no-host-key-check --format json |jq -jr '.items[]|[.node],[.result.stdout]'
[
"node-name"
][
"stdout data\n"
]
What do I need to do to make it appear like below
["nodename":"stdout data"]
If you really want output that is not valid JSON, you will have to construct the output string, which can easily be done using string interpolation, e.g.:
jq -r '.items[] | "[\"\(.node)\",\"\(.result.stdout)\"]"'
#peak thank you.. that helped. Below is how it looks like
$ bolt command run "cat /blah/blah" -n #hname.txt -u UID --no-host-key-check --format json |jq -r '.items[] | "[\"\(.node)\",\"\(.result.stdout)\"]"'
["node name","stdout data
"]
I used a work around to get the data I needed by using the #csv flag to the command itself. Sharing with you below what worked.
$ bolt command run "cat /blah/blah" -n #hname.txt -u uid --no-host-key-check --format json |jq -jr '.items[]|[.node],[.result.stdout]|#csv'
""node-name""stdout.data
"

Grep single value after match

I have a file containing:
{"id":1,"jsonrpc":"2.0","result":{"speed":0}}
How would I be able to grep "0" after "speed":"?
I have tried 'grep -o -P "speed":{1}', not what I am looking for.
You should use jq (sudo apt-get install jq on raspbian) for this task.
echo '{"id":1,"jsonrpc":"2.0","result":{"speed":0}}' | jq .result.speed
Result: 0
Since you said in your question that you have a file "containing" this line, you might want to use grep first to get only the line you're interested in, otherwise jq might throw an error.
Example file:
abc
{"id":1,"jsonrpc":"2.0","result":{"speed":0}}
123
Running grep "speed" yourfile.txt | jq .result.speed would output 0.

Using awk to extract a token from a larger JSON string

I have a string assigned to a variable:
#/bin/bash
fullToken='{"type":"APP","token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs"}'
I need to extract only l0ng_Str1ng.of.d1fF3erent_charAct3rs without quotes and assign that to another variable.
I understand I can use awk, sed, or cut but I am having trouble getting around the special characters in the original string.
Thanks in advance!
EDIT: I was not awake I should specify this is JSON. Thanks for the replies so far.
EDIT2: I am using BSD (macOS)
It looks like you have a JSON string there. Keep in mind that JSON is unordered, so most sed, awk, cut solutions will fail if you string comes next time in a different order.
It is most robust to use a JSON parser.
You could use ruby with its JSON parser library:
$ echo "$fullToken" | ruby -r json -e 'p JSON.parse($<.read)["token"];'
"l0ng_Str1ng.of.d1fF3erent_charAct3rs"
Or, if you don't want the quoted string (which is useful for Bash):
$ echo "$fullToken" | ruby -r json -e 'puts JSON.parse($<.read)["token"];'
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Or with jq:
$ echo "$fullToken" | jq '.token'
"l0ng_Str1ng.of.d1fF3erent_charAct3rs"
All these solutions will work even if the JSON string is in a different order:
$ echo '{"type":"APP","token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs"}' | jq '.token'
"l0ng_Str1ng.of.d1fF3erent_charAct3rs"
$ echo '{"token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs", "type":"APP"}' | jq '.token'
"l0ng_Str1ng.of.d1fF3erent_charAct3rs"
But KNOWING that you SHOULD use a JSON parser, you can also use a PCRE with a look behind in Gnu Grep:
$ echo "$fullToken" | grep -oP '(?<="token":)"([^"]*)'
Or in Perl:
$ echo "$fullToken" | perl -lane 'print $1 if /(?<="token":)"([^"]*)/'
Both of those also work if the string is in a different order.
Or, with POSIX awk:
$ echo "$fullToken" | awk -F"[,:}]" '{for(i=1;i<=NF;i++){if($i~/"token"/){print $(i+1)}}}'
Or, with POSIX sed, you can do:
$ echo "$fullToken" | sed -E 's/.*"token":"([^"]*).*/\1/'
Those solutions are presented strongest (use a JSON parser) to more fragile (sed). But the sed solution I have there is better than the other because it will support the key, values in the JSON string being in different order.
Ps: If you want to remove the quotes from a line, that is a great job for sed:
$ echo '"quoted string"'
"quoted string"
$ echo '"quoted string"' | sed -E 's/^"(.*)"$/UN\1/'
UNquoted string
In awk:
$ awk -v f="$fullToken" '
BEGIN{
while(match(f,/[^:{},]+:[^:{},]+/)) { # search key:value pairs
p=substr(f,RSTART,RLENGTH) # set pair to p
f=substr(f,RSTART+RLENGTH) # remove p from f
split(p,a,":") # split to get key and value
for(i in a) # remove leadin and trailing "
gsub(/^"|"$/,"",a[i])
if(a[1]=="token") { # if key is token
print a[2] # output value
exit # no need to process further
}
}
}'
l0ng_Str1ng.of.d1fF3erent_charAct3rs
l0ng_String can't have characters :{}.
GNU sed:
fullToken='{"type":"APP","token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs"}'
echo "$fullToken"|sed -r 's/.*"(.*)".*/\1/'
grep method would be,
$ grep -oP '[^"]+(?="[^"]+$)' <<< "$fullToken"
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Brief explanation,
[^"]+ : grep would extract the non-" pattern
(?="[^"]+$): extract until the pattern ahead of last "
You may also use sed method to do that,
$sed -E 's/.*"([^"]+)"[^"]+$/\1/' <<< "$fullToken"
l0ng_Str1ng.of.d1fF3erent_charAct3rs
If the source of your string is JSON, then you should use JSON-specific tools. If not, then consider:
Using awk
$ fullToken='{"type":"APP","token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs"}'
$ echo "$fullToken" | awk -F'"' '{print $8}'
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Using cut
$ echo "$fullToken" | cut -d'"' -f8
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Using sed
$ echo "$fullToken" | sed -E 's/.*"([^"]*)"[^"]*$/\1/'
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Using bash and one of the above
The above all work with POSIX shells. If the shell is bash, then we can use a here-string and eliminate the pipeline. Taking cut as the example:
$ cut -d'"' -f8 <<<"$fullToken"
l0ng_Str1ng.of.d1fF3erent_charAct3rs

What is the correct syntax for jq?

Is there a commandline documentation to use jq? I am currently running this command:
%jq% -f JSON.txt -r ".sm_api_content"
It is supposed to read from JSON.txt and to output the value of sm_api_content (which is a string).
But I am getting this error:
jq: error: Could not open file .sm_api_content: No such file or directory
Can anyone help me out here?
-f is for specifying a filename to read your "filter" from - the filter in this case being .sm_api_content
It sounds as if you just want to run jq without -f, e.g.
jq -r .sm_api_content JSON.txt

Store Codeship build ID in variable using GREP

I am trying to use a grep to search a JSON output, I used a curl command to return the data from a particular codeship build and I want to use GREP to store said ID value in a variable. However after I run the command and try to echo out the value of the variable its blank.
Below are the commands:
export API_KEY=abc123
export PROJECT_ID=123456
export LAST_BUILD_ID=$(curl -s https://codeship.com/api/v1/projects/$PROJECT_ID.json?api_key=$API_KEY | grep -Eo '"builds":\[{"id":\d+' | grep -Eo --color=never '\d+' | tail -1)
export LAST_BUILD_URL=$(echo "https://codeship.com/api/v1/builds/$LAST_BUILD_ID/restart.json?api_key=$API_KEY")
My response : never use grep nor regex to parse json.
Instead, use a proper json parser.
In shell, take a look to jq.
Example, adapt it a bit :
#!/bin/bash
API_KEY=abc123
PROJECT_ID=123456
html=$(curl -s https://codeship.com/api/v1/projects/$PROJECT_ID.json?api_key=$API_KEY)
LAST_BUILD_ID=$(jq '.builds | .[] | .never' <<< "$html") # just guessing
LAST_BUILD_URL=$(echo "https://codeship.com/api/v1/builds/$LAST_BUILD_ID/restart.json?api_key=$API_KEY")
Note
If you provide the JSON, I will be able to be more specific with the jq command