I am new to bash script.
I am getting some json response and i get only one property from the response. I want to save it to a variable but it is not working
token=$result |sed -n -e 's/^.*access_token":"//p' | cut -d'"' -f1
echo $token
it returns blank line.
I cannot use jq or any third party tools.
Please let me know what I am missing.
Your command should be:
token=$(echo "$result" | sed -n -e 's/^.*access_token":"//p' | cut -d'"' -f1)
You need to use echo to print the contents of the variable over standard output, and you need to use a command substitution $( ) to assign the output of the pipeline to token.
Quoting your variables is always encouraged, to avoid problems with white space and glob characters like *.
As an aside, note that you can probably obtain the output using something like:
token=$(jq -r .access_token <<<"$result")
I know you've said that you can't use jq but it's a standalone binary (no need to install it) and treats your JSON in the correct way, not as arbitrary text.
Give this a try:
token="$(sed -E -n -e 's/^.*access_token": ?"//p' <<<"$result" | cut -d'"' -f1)"
Explanation:
token="$( script here )" means that $token is set to the output/result of the script run inside the subshell through a process known as command substituion
-E in sed allows Extended Regular Expressions. We want this because JSON generally contains a space after the : and before the next ". We use the ? after the space to tell sed that the space may or may not be present.
<<<"$result" is a herestring that feeds the data into sed as stdin in place of a file.
Related
I have a json string and should extract the values in the square brackets with bash script and validate it against the expected values. If the expected value exists, leave as it is or else add the new values into the square brackets as expected.
"hosts": [“unix://“,”tcp://0.0.0.0:2376"]
I cannot use jq.
Expected :
Verify if the values “unix://“ and ”tcp://0.0.0.0:2376" exists for the key "hosts". Add if it doesn't exist
I tried using like below,
$echo "\"hosts\":[\"unix://\",\"tcp://0.0.0.0:2376\"]" | cut -d: -f2
["unix
$echo "\"hosts\":[\"unix://\",\"tcp://0.0.0.0:2376\"]" | sed 's/:.*//'
"hosts"
I have tried multiple possibilities with sed & cut but cannot achieve what I expect. I'm a shell script beginner.
How can I achieve this with sed or cut ?
You need to detect the precense of "unix://" and "tcp://0.0.0.0:2376" in your string. You can do it like this:
#!/bin/bash
#
string='"hosts": ["unix://","tcp://0.0.0.0:2376"]'
check1=$(echo "$string" | grep -c "unix://")
check2=$(echo "$string" | grep -c "tcp://0.0.0.0:2376")
(( total = check1 + check2 ))
if [[ "$total" -eq 2 ]]
then
echo "they are both in, nothing to do"
else
echo "they are NOT both there, fix variable string"
string='"hosts": ["unix://","tcp://0.0.0.0:2376"]'
fi
grep -c counts how many times a specific string appears. In your case, both strings have to be found once, so adding them together will produce 0, 1 or 2. Only when it is equal to 2 is the string correct.
cut will extract some string based on a certain delimiter. But it is not typically used to verify if a string is in there, grep does that.
sed has many uses, such as replacing text (with 's///'). But again, grep is the tool that was built to detect strings in other strings (or files).
Now when it comes to adding text, you say that if one of "unix://" or "tcp://0.0.0.0:2376" is missing, add it. Well that comes back to redefining the whole string with the correct values, so just assign it.
Finaly, if you think about it, you want to ensure that string is "hosts": ["unix://","tcp://0.0.0.0:2376"]. So no need to verify anything, just force it through hardcode at the start of your script. The end result will be the same.
Part 2
If you MUST use cut, you could:
#!/bin/bash
#
string='"hosts": ["unix://","tcp://0.0.0.0:2376"]'
firstelement=$(echo "$string" | cut -d',' -f1 | cut -d'"' -f4
echo $firstelement
# will display unix://
secondelement=$(echo "$string" | cut -d',' -f2 | cut -d'"' -f2
echo $secondelement
# will display tcp://0.0.0.0:2376
Then you can use if statements to compare to your desired values. But note that this approach will fail if you do not have at least 2 elements in your text between the [ ]. Ex. ["unix://"] will fail cut -d',' since there is no ',' character in the string.
Part 3
If you MUST use sed:
#!/bin/bash
#
string='"hosts": ["unix://","tcp://0.0.0.0:2376"]'
firstelement=$(echo "$string" | sed 's/.*\["\(.*\)",".*/\1/')
echo "$firstelement"
# will output unix://
secondelement=$(echo "$string" | sed 's/.*","\(.*\)"\]/\1/')
echo $secondelement
# will output tcp://0.0.0.0:2376
Again here, the main character to work with is the ,.
firstelement explanation
sed 's/.*\["\(.*\)",".*/\1/'
.* anything...
\[" followed by [ and ". Since [ means something to sed, you have to \ it
\(.*\) followed by anything at all (. matches any character, * matches any number of these characters).
"," followed by ",". This only happens for the first element.
.* followed by anything
\1 keep only the characters enclosed between \( and \)
Similarily, for the second element the s/// is modified to keep only what follows ",", up to the last "] at the end of the string.
Again like with cut above, use if statements to verify if the extracted values are what you wanted.
Again, read my last comments in the first approach, you might not need all this...
Given the following string:
arn:aws:secretsmanager:us-east-1:3264873466873:secret:foo/bar 1564681234.974 foo/bar {"username":"admin","password":"admin123","secret_key":"KASJDFJHAKHFKAHASDF"} 4e397333-3797-4f0b-ad7e-8c1cc0ed041c VERSIONSTAGES AWSCURRENT
Within a shell script, how do you extract just the JSON portion to end up like this:
{"username":"admin","password":"admin123","secret_key":"KASJDFJHAKHFKAHASDF"}
I was able to do it using two sed commands:
echo $longString | sed 's/^.*{/{/' | sed 's/}.*$/}/'
but was wondering if there is a way to do it using only one command.
To extract continuous part of the input, you can use grep with its -o option (if supported on your system). It tells grep to only output the matching part.
grep -o '{.*}'
For extracting columns, use awk:
echo $longString | awk '{print $4}'
Or cut:
echo $longString | cut -f 4 -d ' '
Beware if you have spaces in your JSON data. You might be better off using jq to process the results of aws secretsmanager list-secrets and similar.
You can use
echo $longString | sed -n 's|.*\({.*}\).*|\1|p'
to match and print the desired pattern
you can just join the sed commands to a single command
sed 's/^.*{/{/;s/}.*$/}/'
This awk should do. I will handle if there are any space in the string.
echo $string | awk -F"[{}]" '{print $2}'
"username":"admin","password":"admin123","secret_key":"KASJDFJHAKHFKAHASDF"
I have a json result and I would like to extract a string without double quotes
{"value1":5.0,"value2":2.5,"value3":"2019-10-24T15:26:00.000Z","modifier":[]}
With this regex I can extract the value3 (019-10-24T15:26:00.000Z) correctly
sed -e 's/^.*"endTime":"\([^"]*\)".*$/\1/'
How can I extract the "value2" result, a string without double quotes?
I need to do with sed so can’t install jq. That’s my problem
With GNU sed for -E to enable EREs:
$ sed -E 's/.*"value3":"?([^,"]*)"?.*/\1/' file
2019-10-24T15:26:00.000Z
$ sed -E 's/.*"value2":"?([^,"]*)"?.*/\1/' file
2.5
With any POSIX sed:
$ sed 's/.*"value3":"\{0,1\}\([^,"]*\)"\{0,1\}.*/\1/' file
2019-10-24T15:26:00.000Z
$ sed 's/.*"value2":"\{0,1\}\([^,"]*\)"\{0,1\}.*/\1/' file
2.5
The above assumes you never have commas inside quoted strings.
Just run jq a Command-line JSON processor
$ json_data='{"value1":5.0,"value2":2.5,"value3":"2019-10-24T15:26:00.000Z","modifier":[]}'
$ jq '.value2' <(echo "$json_data")
2.5
with the key .value2 to access the value you are interested in.
This link summarize why you should NOT use, regex for parsing json
(the same goes for XML/HTML and other data structures that are in
theory can be infinitely nested)
Regex for parsing single key: values out of JSON in Javascript
If you do not have jq available:
you can use the following GNU grep command:
$ echo '{"value1":5.0,"value2":2.5,"value3":"2019-10-24T15:26:00.000Z","modifier":[]}' | grep -zoP '"value2":\s*\K[^\s,]*(?=\s*,)'
2.5
using the regex detailed here:
"value2":\s*\K[^\s,]*(?=\s*,)
demo: https://regex101.com/r/82J6Cb/1/
This will even work if the json is not linearized!!!!
With python it is also pretty direct and you should have it installed by default on your machine even if it is not python3 it should work
$ cat data.json
{"value1":5.0,"value2":2.5,"value3":"2019-10-24T15:26:00.000Z","modifier":[]}
$ cat extract_value2.py
import json
with open('data.json') as f:
data = json.load(f)
print(data["value2"])
$ python extract_value2.py
2.5
You can try this :
creds=$(eval aws secretsmanager get-secret-value --region us-east-1 --secret-id dpi/dev/hivemetastore --query SecretString --output text )
passwd=$(/bin/echo "${creds}" | /bin/sed -n 's/.*"password":"\(.*\)",/\1/p' | awk -F"\"" '{print $1}')
it is definitely possible to remove the AWK part though ...
To extract all values in proper list form to a file using sed(LINUX).
sed 's/["{}\]//g' <your_file.json> | sed 's/,/\n/g' >> <your_new_file_to_save>
sed 's/regexp/replacement/g' inputFileName > outputFileName
In some versions of sed, the expression must be preceded by -e to indicate that an expression follows.
The s stands for substitute, while the g stands for global, which means that all matching occurrences in the line would be replaced.
I've put [ ] inside it as elements that you wanna remove from .json file.
The pipe character | is used to connect the output from one command to the input of another.
Then, the last thing I did is substitute , and add a \n, known as line breaker.
If you want to show a single value see below command:
sed 's/["{}\]//g' <your_file.json> | sed 's/,/\n/g' | sed 's/<ur_value>//p'
p is run; this is equivalent to /pattern match/! p as per above; i.e., "if the line does not match /pattern match/, print it". So the complete command prints all the lines from the first occurrence of the pattern to the last line, but suppresses the ones that match.
if your data in 'd' file, try gnu sed
sed -E 's/[{,]"\w+":([^,"]+)/\1\n/g ;s/(.*\n).*".*\n/\1/' d
I am trying to run the following script:
sed -E -n '/"data"/,/}/{/[{}]/d;s/^[[:space:]]*"([^"]+)":[[:space:]]*"([^"]+)".*$/\1|\2/g;p}' /tmp/data.json | while IFS="|" read -r item val;do item="${item^^}"; item="${val}"; export "${item}"; echo ${item}; done
This basically exports data from inside a JSON as environment variables.
That is,
Here, the key data will have a list (of different lengths) of key-value pairs within itself wherein the key is not fixed. Now, I want to read every key in the list and export its value. For example, I want these commands to be executed as part of the shell script.
export HELLO1
export SAMPLEKEY
However, when I run this, it gives the error: sed: 1: "/"data"/,/}/{/[{}]/d;s/ ...": extra characters at the end of p command. What might be the reason for this?
Rather than trying to use sed to parse .json files (which can rapidly grow beyond reasonable sed parsing), instead use a tool made for parsing json (like jq -- json query). You can easily obtain the keys for values under data, and then parse with your shell tools.
(note: your questions should be tagged bash since you use the parameter expansion for character-case which is a bashism, e.g. ${item^^})
Using jq, you could do something like the following:
jq '.data' /tmp/data.json | tail -n+2 | head -n-1 |
while read -r line; do line=${line#*\"}; line=${line%%\"*}; \
printf "export %s " ${line^^}; done; echo ""
Which results in the output:
export HELLO1 export SAMPLEKEY
(there are probably even cleaner way to do this with jq -- and there was)
You can have jq output the keys for data one per line with:
jq -r '.data | to_entries[] | (.key|ascii_upcase)' /tmp/data.json
This allows you to shorten your command to generate export in from of the keys with:
while read -r key; do \
printf "export %s " $key; \
done < <(jq -r '.data | to_entries[] | (.key|ascii_upcase)' /tmp/data.json); \
echo ""
(note: to effect your actual environment, you would need to export the values as part of your shell startup)
I want to extract the URL from within the anchor tags of an html file.
This needs to be done in BASH using SED/AWK. No perl please.
What is the easiest way to do this?
You could also do something like this (provided you have lynx installed)...
Lynx versions < 2.8.8
lynx -dump -listonly my.html
Lynx versions >= 2.8.8 (courtesy of #condit)
lynx -dump -hiddenlinks=listonly my.html
You asked for it:
$ wget -O - http://stackoverflow.com | \
grep -io '<a href=['"'"'"][^"'"'"']*['"'"'"]' | \
sed -e 's/^<a href=["'"'"']//i' -e 's/["'"'"']$//i'
This is a crude tool, so all the usual warnings about attempting to parse HTML with regular expressions apply.
grep "<a href=" sourcepage.html
|sed "s/<a href/\\n<a href/g"
|sed 's/\"/\"><\/a>\n/2'
|grep href
|sort |uniq
The first grep looks for lines containing urls. You can add more elements
after if you want to look only on local pages, so no http, but
relative path.
The first sed will add a newline in front of each a href url tag with the \n
The second sed will shorten each url after the 2nd " in the line by replacing it with the /a tag with a newline
Both seds will give you each url on a single line, but there is garbage, so
The 2nd grep href cleans the mess up
The sort and uniq will give you one instance of each existing url present in the sourcepage.html
With the Xidel - HTML/XML data extraction tool, this can be done via:
$ xidel --extract "//a/#href" http://example.com/
With conversion to absolute URLs:
$ xidel --extract "//a/resolve-uri(#href, base-uri())" http://example.com/
I made a few changes to Greg Bacon Solution
cat index.html | grep -o '<a .*href=.*>' | sed -e 's/<a /\n<a /g' | sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d'
This fixes two problems:
We are matching cases where the anchor doesn't start with href as first attribute
We are covering the possibility of having several anchors in the same line
An example, since you didn't provide any sample
awk 'BEGIN{
RS="</a>"
IGNORECASE=1
}
{
for(o=1;o<=NF;o++){
if ( $o ~ /href/){
gsub(/.*href=\042/,"",$o)
gsub(/\042.*/,"",$o)
print $(o)
}
}
}' index.html
You can do it quite easily with the following regex, which is quite good at finding URLs:
\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))
I took it from John Gruber's article on how to find URLs in text.
That lets you find all URLs in a file f.html as follows:
cat f.html | grep -o \
-E '\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))'
I am assuming you want to extract a URL from some HTML text, and not parse HTML (as one of the comments suggests). Believe it or not, someone has already done this.
OT: The sed website has a lot of good information and many interesting/crazy sed scripts. You can even play Sokoban in sed!
This is my first post, so I try to do my best explaining why I post this answer...
Since the first 7 most voted answers, 4 include GREP even when the
post explicitly says "using sed or awk only".
Even when the post requires "No perl please", due to the previous
point, and because use PERL regex inside grep.
and because this is the simplest way ( as far I know , and was
required ) to do it in BASH.
So here come the simplest script from GNU grep 2.28:
grep -Po 'href="\K.*?(?=")'
About the \K switch , not info was founded in MAN and INFO pages, so I came here for the answer....
the \K switch get rid the previous chars ( and the key itself ).
Bear in mind following the advice from man pages:
"This is highly experimental and grep -P may warn of unimplemented features."
Of course, you can modify the script to meet your tastes or needs, but I found it pretty straight for what was requested in the post , and also for many of us...
I hope folks you find it very useful.
thanks!!!
In bash, the following should work. Note that it doesn't use sed or awk, but uses tr and grep, both very standard and not perl ;-)
$ cat source_file.html | tr '"' '\n' | tr "'" '\n' | grep -e '^https://' -e '^http://' -e'^//' | sort | uniq
for example:
$ curl "https://www.cnn.com" | tr '"' '\n' | tr "'" '\n' | grep -e '^https://' -e '^http://' -e'^//' | sort | uniq
generates
//s3.amazonaws.com/cnn-sponsored-content
//twitter.com/cnn
https://us.cnn.com
https://www.cnn.com
https://www.cnn.com/2018/10/27/us/new-york-hudson-river-bodies-identified/index.html\
https://www.cnn.com/2018/11/01/tech/google-employee-walkout-andy-rubin/index.html\
https://www.cnn.com/election/2016/results/exit-polls\
https://www.cnn.com/profiles/frederik-pleitgen\
https://www.facebook.com/cnn
etc...
Expanding on kerkael's answer:
grep "<a href=" sourcepage.html
|sed "s/<a href/\\n<a href/g"
|sed 's/\"/\"><\/a>\n/2'
|grep href
|sort |uniq
# now adding some more
|grep -v "<a href=\"#"
|grep -v "<a href=\"../"
|grep -v "<a href=\"http"
The first grep I added removes links to local bookmarks.
The second removes relative links to upper levels.
The third removes links that don't start with http.
Pick and choose which one of these you use as per your specific requirements.
Go over with a first pass replacing the start of the urls (http) with a newline (\nhttp). Then you have guaranteed for yourself that your link starts at the beginning of the line and is the only URL on the line.The rest should be easy, here is an example:
sed "s/http/\nhttp/g" <(curl "http://www.cnn.com") | sed -n "s/\(^http[s]*:[a-Z0-9/.=?_-]*\)\(.*\)/\1/p"
alias lsurls='_(){ sed "s/http/\nhttp/g" "${1}" | sed -n "s/\(^http[s]*:[a-Z0-9/.=?_-]*\)\(.*\)/\1/p"; }; _'
You can try:
curl --silent -u "<username>:<password>" http://<NAGIOS_HOST/nagios/cgi-bin/status.cgi|grep 'extinfo.cgi?type=1&host='|grep "status"|awk -F'</A>' '{print $1}'|awk -F"'>" '{print $3"\t"$1}'|sed 's/<\/a> <\/td>//g'| column -c2 -t|awk '{print $1}'
That's how I tried it for better view, create shell file and give link as parameter, it will create temp2.txt file.
a=$1
lynx -listonly -dump "$a" > temp
awk 'FNR > 2 {print$2}' temp > temp2.txt
rm temp
>sh test.sh http://link.com
Eschewing the awk/sed requirement:
urlextract is made just for such a task (documentation).
urlview is an interactive CLI solution (github repo).
I scrape websites using Bash exclusively to verify the http status of client links and report back to them on errors found. I've found awk and sed to be the fastest and easiest to understand. Props to the OP.
curl -Lk https://example.com/ | sed -r 's~(href="|src=")([^"]+).*~\n\1\2~g' | awk '/^(href|src)/,//'
Because sed works on a single line, this will ensure that all urls are formatted properly on a new line, including any relative urls. The first sed finds all href and src attributes and puts each on a new line while simultaneously removing the rest of the line, inlcuding the closing double qoute (") at the end of the link.
Notice I'm using a tilde (~) in sed as the defining separator for substitution. This is preferred over a forward slash (/). The forward slash can confuse the sed substitution when working with html.
The awk finds any line that begins with href or src and outputs it.
Once the content is properly formatted, awk or sed can be used to collect any subset of these links. For example, you may not want base64 images, instead you want all the other images. Our new code would look like:
curl -Lk https://example.com/ | sed -r 's~(href="|src=")([^"]+).*~\n\1\2~g' | awk '/^(href|src)/,//' | awk '/^src="[^d]/,//'
Once the subset is extracted, just remove the href=" or src="
sed -r 's~(href="|src=")~~g'
This method is extremely fast and I use these in Bash functions to format the results across thousands of scraped pages for clients that want someone to review their entire site in one scrape.