bash: Extract a specific string from file and remove characters

bash: Extract a specific string from file and remove characters - json

I'm trying to extract a specific string value from a text file and then remove the backslash from him. the name of the value is "display_url"
My script:
url=$(cat /var/scripts/string.txt | grep -oP '(?<=display_url":")[^"]+')
for link in $url; do
echo 'https://'$link
done
output:
https://pastebin.com\/WRv5ir4Y
https://reddit.com\/r\/IBO\/comments\u2026
The desired output:
https://pastebin.com/WRv5ir4Y
https://reddit.com/r/IBO/comments/u2026
text file:
{"created_at":"Thu Dec 13 08:43:38 +0000 2018","id":1073136349845303297,"id_str":"1073136349845303297","text":"https:\/\/t.co\/aPu5ln7yjO\nhttps:\/\/t.co\/pBvevjSCc9\n\n#osectraining","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":961508561217052675,"id_str":"961508561217052675","name":"Online Security","screen_name":"osectraining","location":"Israel","url":"https:\/\/www.onlinesecurity.co.il","description":"OnlineSecurity provides online cyber-security training courses and certification, from beginner to advanced with the most advanced virtual labs in the field.","translator_type":"none","protected":false,"verified":false,"followers_count":2,"friends_count":51,"listed_count":0,"favourites_count":0,"statuses_count":1,"created_at":"Thu Feb 08 07:54:39 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/961510231346958336\/d_KhBeTD_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/961510231346958336\/d_KhBeTD_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/961508561217052675\/1518076913","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"osectraining","indices":[49,62]}],"urls":[{"url":"https:\/\/t.co\/aPu5ln7yjO","expanded_url":"https:\/\/pastebin.com\/WRv5ir4Y","display_url":"pastebin.com\/WRv5ir4Y","indices":[0,23]},{"url":"https:\/\/t.co\/pBvevjSCc9","expanded_url":"https:\/\/www.reddit.com\/r\/IBO\/comments\/9ragj7\/ioc_in_10_hours\/","display_url":"reddit.com\/r\/IBO\/comments\u2026","indices":[24,47]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"und","timestamp_ms":"1544690618369"}
any ideas?

Change your script into:
url=$(grep -oP '(?<=display_url":")[^"]+' /var/scripts/string.txt )
sed 's/\\//g;s#^#https://#' <<< "$url"
should help.
removed useless cat
make use of sed to do the substitution

The reason you are needing both that grep and a separate sed to parse it is because the grep is able to use Perl regexes (power...) but sed can't. You probably have Perl available - use it if you do.
perl -pe '
s/\\//g;
s{.*?display_url":"}{https://};
s{",".*display_url":"}{\nhttps://} while /display_url/;
s/",".*/\n/;
' /var/scripts/string.txt
https://pastebin.com/WRv5ir4Y
https://reddit.com/r/IBO/commentsu2026
And there's virtually always awk.
awk '{
gsub("\\\\","",$0);
split($0, chnk, "display_url.:.");
for (x=2; x<=length(chnk); x++) {
gsub("\".*","", chnk[x]);
printf "https://%s\n", chnk[x];
}
}' /var/scripts/string.txt
https://pastebin.com/WRv5ir4Y
https://reddit.com/r/IBO/commentsu2026
But if you can't use either of those, then one sed to strip the backslashes and some basic shell string processing in a loop, because it's fun. :D
$: txt=$(sed 's/\\//g' i)
$: while [[ "$txt" =~ display_url ]]
do txt=${txt#*display_url?:?}
echo https://${txt%%?,*}
done
https://pastebin.com/WRv5ir4Y
https://reddit.com/r/IBO/commentsu2026
Trickiest part is getting around the double-quotes in the shell parsing, but I'm sure someone can suggest a better way.

I would use the JSON command line parser jq:
jq -r '"https://" + .entities.urls[].display_url' /var/scripts/string.txt
-r stands for "return raw inputs" (strings without quotes)
"https://" + concat ...
.entities.urls[] ... for each item of the array .entities.urls ...
.display_url ... the value of display_url member"
Result:
https://pastebin.com/WRv5ir4Y
https://reddit.com/r/IBO/comments

Related

What does this specific sed command do exactly? Using sed to parse full HTML pages in a bash script

subreddit=$(curl -sL "https://www.reddit.com/search/?q=${query}&type=sr"|tr "<" "\n"|
sed -nE 's#.*class="_2torGbn_fNOMbGw3UAasPl">r/([^<]*)#\1#p'|gum filter)
I've been learning bash and have been making pretty good progress. One thing that just seems far too daunting is these complex sed commands. It's unfortunate because I really want to use them to do things like parse HTML but it quickly becomes a mess. this is a little snippet of a script that queries Reddit, pipes it through sed and returns just the names of the subreddits that were a result of the search on a new line.
My main question is.. What is it that this is actually cutting/replacing and what does the beginning part mean 's#.'?
What I tried:
I used curl to search for a subreddit name so that I could see the raw output from that command and then I tried to pipe it into sed using little snippets of the full command to see if I could reconstruct the logic behind the command and all I really figured out was that I am lacking in my knowledge of sed beyond basic replacements.
I'm trying to re-write this script (for learning purposes only, the script works just fine) that allows you to search reddit and view the image posts in your terminal using Kitty. Mostly everything is pretty readable but the sed commands just trip me up.
I'll attach the full script below in case anyone is interested and I welcome any advice or explanations that could help me fully understand and re-construct it.
I'm really curious about this. I'm also wondering if it would just be better to call a Python script from bash that could return the images using beautiful soup... or maybe using "htmlq" would be a better idea?
Thanks!
#!/bin/sh
get_input() {
[ -z "$*" ] && query=$(gum input --placeholder "Search for a subreddit") || query=$*
query=$(printf "%s" "$query"|tr ' ' '+')
subreddit=$(curl -sL "https://www.reddit.com/search/?q=${query}&type=sr"|tr "<" "\n"|
sed -nE 's#.*class="_2torGbn_fNOMbGw3UAasPl">r/([^<]*)#\1#p'|gum filter)
xml=$(curl -s "https://www.reddit.com/r/$subreddit.rss" -A "uwu"|tr "<|>" "\n")
post_href=$(printf "%s" "$xml"|sed -nE '/media:thumbnail/,/title/{p;n;p;}'|
sed -nE 's_.*href="([^"]+)".*_\1_p;s_.*media:thumbnail[^>]+url="([^"]+)".*_\1_p; /title/{n;p;}'|
sed -e 'N;N;s/\n/\t/g' -e 's/&/\&/g'|grep -vE '.*\.gif.*')
[ -z "$post_href" ] && printf "No results found for \"%s\"\n" "$query" && exit 1
}
readc() {
if [ -t 0 ]; then
saved_tty_settings=$(stty -g)
stty -echo -icanon min 1 time 0
fi
eval "$1="
while
c=$(dd bs=1 count=1 2> /dev/null; echo .)
c=${c%.}
[ -n "$c" ] &&
eval "$1=\${$1}"'$c
[ "$(($(printf %s "${'"$1"'}" | wc -m)))" -eq 0 ]'; do
continue
done
[ -t 0 ] && stty "$saved_tty_settings"
}
download_image() {
downloadable_link=$(curl -s -A "uwu" "$1"|sed -nE 's#.*class="_3Oa0THmZ3f5iZXAQ0hBJ0k".*<a href="([^"]+)".*#\1#p')
curl -s -A "uwu" "$downloadable_link" -o "$(basename "$downloadable_link")"
[ -z "$downloadable_link" ] && printf "No image found\n" && exit 1
tput clear && gum style \
--foreground 212 --border-foreground 212 --border double \
--align center --width 50 --margin "1 2" --padding "2 4" \
'Your image has been downloaded!' "Image saved to $(basename "$downloadable_link")"
# shellcheck disable=SC2034
printf "Press Enter to continue..." && read -r useless
}
cleanup() {
tput cnorm && exit 0
}
trap cleanup EXIT INT HUP
get_input "$#"
i=1 && tput clear
while true; do
tput civis
[ "$i" -lt 1 ] && i=$(printf "%s" "$post_href"|wc -l)
[ "$i" -gt "$(printf "%s" "$post_href"|wc -l)" ] && i=1
link=$(printf "%s" "$post_href"|sed -n "$i"p|cut -f1)
post_link=$(printf "%s" "$post_href"|sed -n "$i"p|cut -f2)
gum style \
--foreground 212 --border-foreground 212 --border double \
--align left --width 50 --margin "20 1" --padding "2 4" \
'Press (j) to go to next' 'Press (k) to go to previous' 'Press (d) to download' \
'Press (o) to open in browser' 'Press (s) to search for another subreddit' 'Press (q) to quit'
kitty +kitten icat --scale-up --place 60x40#69x3 --transfer-mode file "$link"
readc key
# shellcheck disable=SC2154
case "$key" in
j) i=$((i+1)) && tput clear ;;
k) i=$((i-1)) && tput clear ;;
d) download_image "$post_link" ;;
o) xdg-open "$post_link" || open "$post_link" ;;
s) get_input ;;
q) exit 0 && tput clear ;;
*) ;;
esac
done
"gum filter" is essentially a fuzzy finder like fzf and "gum style" draws pretty text and nice boxes that work kind of like css.

What does this specific sed command do exactly?
sed -nE 's#.*class="_2torGbn_fNOMbGw3UAasPl">r/([^<]*)#\1#p'
It does two things:
Select all lines that contain the literal string class="_2torGbn_fNOMbGw3UAasPl">r/.
For those lines, print only the part after ...>r/.
Basically, it translates to ... (written inefficiently on purpose)
grep 'class="_2torGbn_fNOMbGw3UAasPl">r/' |
sed 's/.*>r\///'
what does the beginning part mean 's#.'?
You are looking at the (beginning of the) substitution command. Normally, it is written as s/search/replace/ but the delimiter / can be chosen (mostly) freely. s/…/…/ and s#…#…# are equivalent.
Here, # has the benefit of not having to escape the / in …>r/.
The . belongs to the search pattern. The .* in the beginning selects everything from the start of the line, so that it can be deleted when doing the substitution. Here we delete the beginning of the line up to (and including) …>r/.
The \1 in the replacement pattern is a placeholder for the string that was matched by the group ([^<]*) (longest <-free substring directly after …>r/).
That part is unnecessarily complicated. Because sed is preceded by tr "<" "\n" there is no point in dealing with the < inside sed. It could be simplified to
sed -n 's#.*class="_2torGbn_fNOMbGw3UAasPl">r/##p'
Speaking about simplifications:
I really want to use them [sed commands] to do things like parse HTML
Advice: Don't. For one-off jobs where you know the exact formatting (!) of your html files, regexes are ok. But even then, they make only sense if you are quicker to write them than using a proper tool.
I'm also wondering if it would just be better to call a Python script from bash that could return the images using beautiful soup... or maybe using "htmlq" would be a better idea?
You are right! In general, regexes are not powerful enough to reliably parse html.
Whether you use python or bash is up to you. Personally, I find easier to use a "proper" language for bigger projects. But then I use only that. Writing half in python and half in bash only increases complexity in my opinion.
If you stick with bash, I'd recommend something a bit more mature and widespread than htmlq (first released in Sep 2021, currently at version 0.4). E.g. install libxml and use an XPath expression with post-processing:
curl -sL "https://www.reddit.com/search/?q=QUERY&type=sr" |
xmllint --html --xpath '//h6[#class="_2torGbn_fNOMbGw3UAasPl"]/text()' - 2>/dev/null |
sed 's#^r/##'
But then again, parsing HTML isn't necessary in the first place, since reddit has an API that can return JSON, which you can process using jq.
curl -sL "https://www.reddit.com/subreddits/search.json?q=QUERY" |
jq -r '.data.children | map(.data.display_name) | .[]'

Replace apostrophe in JSON with apostrophe suitable for curl via sed

I need to prepare JSON which contains apostrophe to be sent via CURL.
Example of JSON:
{"myField":"Apos'test"}
Example of JSON I need as an output:
{"myField":"Apos'\''test"}
What I have tried:
sed -e "s/'/'\\\''/g" <<< {"myField":"Apos'test"}
which outputs:
{myField:Apos'\''test}
And I do not understand why it removes double quotes.
P.S. it is not obligatory to use sed, any other standard linux tool would work.

try this:
#/bin/bash
replacement=$((cat << EOT
{"myField":"Apos'test"}
EOT
) | sed "s|'|'\\\''|")
echo $replacement
output:
{"myField":"Apos'\''test"}

It doesn't
If it was because you used <<< , here, of which "" pair was parsed, expanded and dropped by the shell you're in
$ cat d
{"myField":"Apos'test"}
$ sed -E "s/'/'\\\''/g" d
{"myField":"Apos'\''test"}

Exporting JSON to environment variables

If I have a JSON like this,
{
"hello1": "world1",
"testk": "testv"
}
And I want to export each of these key-value pairs as environment variables, how to do it via shell script? So for example, when I write on the terminal, echo $hello1, world1 should be printed and similarly for other key-value pairs?
Note: The above JSON is present in a variable called $values and not in a file.
I know it will be done via jq and written a shell script for this, but it doesn't work.
for row in $(echo "${values}" | jq -r '.[]'); do
-jq() {
echo ${row} | jq -r ${1}
}
echo $(_jq '.samplekey')
done

Borrowing from this answer which does all of the hard work of turning the JSON into key=value pairs, you could get these into the environment by looping over the jq output and exporting them:
for s in $(echo $values | jq -r "to_entries|map(\"\(.key)=\(.value|tostring)\")|.[]" ); do
export $s
done
If the variables being loaded contain embedded whitespace, this is also reasonable, if slightly more complex:
while read -rd $'' line
do
export "$line"
done < <(jq -r <<<"$values" \
'to_entries|map("\(.key)=\(.value)\u0000")[]')

Using command substitution $() :
# $(jq -r 'keys[] as $k | "export \($k)=\(.[$k])"' file.json)
# echo $testk
testv
Edit : Responding to this comment
You should do
$( echo "$values" | jq -r 'keys[] as $k | "export \($k)=\(.[$k])"' )
Just mind the double quotes around $values
Note: Couldn't confirm if there is security implication to this approach, that is if the user could manipulate the json to wreak havoc.

Another way, without using jq, is to parse the json with grep & sed:
for keyval in $(grep -E '": [^\{]' my.json | sed -e 's/: /=/' -e "s/\(\,\)$//"); do
echo "export $keyval"
eval export $keyval
done
Explanation:
First, grep will filter all "key" : value pairs (value can be
"string", number, or boolean).
Then, sed will replace : with =, and remove trailing ,.
Lastly, exporting the "key"=value with eval
Here's an output example, exporting json keys, from an AWS record-set:
export "Name"="\052.apps.nmanos-cluster-a.devcluster.openshift.com."
export "Type"="A"
export "HostedZoneId"="Z67SXBLZRQ7X7T"
export "DNSName"="a24070461d50270e-1391692.us-east-1.elb.amazonaws.com."
export "EvaluateTargetHealth"=false

None of the existing answers preserve whitespace in the values in a POSIX shell. The following line will use jq to take each key:value of some JSON and export them as environment variables, properly escaping whitespace and special characters.
2023-01-28: BUGFIX UPDATE:
My previous answer did not work for all possible values and could cause errors. Please instead use the following line, which uses jq's #sh format string to properly escape values for the shell. You must also enclose everything after eval in quotes to preserve newlines. I've updated the sample JSON file to include more characters to test with.
This answer now appears to be the only one that handles all cases. There are no loops and it's one line to export all values. The downside is that it uses eval, which is theoretically dangerous... but because the entire key=value is now being escaped for the shell, this should be safe to use.
New answer (use this one):
eval "export $(echo "$values" \
| jq -r 'to_entries | map("\(.key)=\(.value)") | #sh')"
Old answer (don't use this one):
eval export $(echo "$values" \
| jq -r 'to_entries|map("\"\(.key)=\(.value|tostring)\"")|.[]' )
edit thanks #Delthas for pointing out a missing 'export'
Sample JSON file:
bash-5.2$ cat <<'EOJSON' > foo.json
{
"foo_1": "bar 1",
"foo_2": "This ! is ' some # weird $text { to ( escape \" here",
"foo_3": "this is some \nsample new line\n text to\ntry and escape"
}
EOJSON
Sample script:
bash-5.2$ cat <<'EOSH' > foo.sh
values="`cat foo.json`"
eval "export $(echo "$values" | jq -r 'to_entries | map("\(.key)=\(.value)") | #sh')"
export
echo "foo_2: $foo_2"
echo "foo_3: $foo_3"
EOSH
Running the sample script:
bash-5.2$ env -i sh foo.sh
export PWD='/path/to/my/home'
export SHLVL='1'
export foo_1='bar 1'
export foo_2='This ! is '"'"' some # weird $text { to ( escape " here'
export foo_3='this is some
sample new line
text to
try and escape'
foo_2: This ! is ' some # weird $text { to ( escape " here
foo_3: this is some
sample new line
text to
try and escape
Pros:
no need for Bash
preserves whitespace in values
no loops
(update) properly escapes all values for use in the shell
Cons:
uses eval, which is considered "unsafe". however, because jq is escaping all input, this is unlikely to cause a security issue (unless jq is found to have a bug which does not properly escape data using the #sh filter).

The approach illustrated by the following shell script avoids most (but not all) problems with special characters:
#!/bin/bash
function json2keyvalue {
cat<<EOF | jq -r 'to_entries|map("\(.key)\t\(.value|tostring)")[]'
{
"hello1": "world1",
"testk": "testv"
}
EOF
}
while IFS=$'\t' read -r key value
do
export "$key"="$value"
done < <(json2keyvalue)
echo hello1="$hello1"
echo testk="$testk"
Note that the above assumes that there are no tabs in the keys themselves.

jtc solution:
export $(<file.json jtc -w'[:]<>a:<L>k' -qqT'"{L}={}"')

I've come up with a solution (here in bash):
function source_json_as_environ() {
eval "$(jq -r '
def replace_dot:
. | gsub("\\."; "_");
def trim_spaces:
. | gsub("^[ \t]+|[ \t]+$"; "");
to_entries|map(
"export \(.key|trim_spaces|replace_dot)="
+ "\(.value|tostring|trim_spaces|#sh)"
)|.[]' $#)"
}
And you can use it like this:
$ source_json_as_environ values.json

Build a JSON string with Bash variables

I need to read these bash variables into my JSON string and I am not familiar with bash. any help is appreciated.
#!/bin/sh
BUCKET_NAME=testbucket
OBJECT_NAME=testworkflow-2.0.1.jar
TARGET_LOCATION=/opt/test/testworkflow-2.0.1.jar
JSON_STRING='{"bucketname":"$BUCKET_NAME"","objectname":"$OBJECT_NAME","targetlocation":"$TARGET_LOCATION"}'
echo $JSON_STRING

You are better off using a program like jq to generate the JSON, if you don't know ahead of time if the contents of the variables are properly escaped for inclusion in JSON. Otherwise, you will just end up with invalid JSON for your trouble.
BUCKET_NAME=testbucket
OBJECT_NAME=testworkflow-2.0.1.jar
TARGET_LOCATION=/opt/test/testworkflow-2.0.1.jar
JSON_STRING=$( jq -n \
--arg bn "$BUCKET_NAME" \
--arg on "$OBJECT_NAME" \
--arg tl "$TARGET_LOCATION" \
'{bucketname: $bn, objectname: $on, targetlocation: $tl}' )

You can use printf:
JSON_FMT='{"bucketname":"%s","objectname":"%s","targetlocation":"%s"}\n'
printf "$JSON_FMT" "$BUCKET_NAME" "$OBJECT_NAME" "$TARGET_LOCATION"
much clear and simpler

A possibility:
#!/bin/bash
BUCKET_NAME="testbucket"
OBJECT_NAME="testworkflow-2.0.1.jar"
TARGET_LOCATION="/opt/test/testworkflow-2.0.1.jar
# one line
JSON_STRING='{"bucketname":"'"$BUCKET_NAME"'","objectname":"'"$OBJECT_NAME"'","targetlocation":"'"$TARGET_LOCATION"'"}'
# multi-line
JSON_STRING="{
\"bucketname\":\"${BUCKET_NAME}\",
\"objectname\":\"${OBJECT_NAME}\",
\"targetlocation\":\"${TARGET_LOCATION}\"
}"
# [optional] validate the string is valid json
echo "${JSON_STRING}" | jq

In addition to chepner's answer, it's also possible to construct the object completely from args with this simple recipe:
BUCKET_NAME=testbucket
OBJECT_NAME=testworkflow-2.0.1.jar
TARGET_LOCATION=/opt/test/testworkflow-2.0.1.jar
JSON_STRING=$(jq -n \
--arg bucketname "$BUCKET_NAME" \
--arg objectname "$OBJECT_NAME" \
--arg targetlocation "$TARGET_LOCATION" \
'$ARGS.named')
Explanation:
--null-input | -n disabled reading input. From the man page: Don't read any input at all! Instead, the filter is run once using null as the input. This is useful when using jq as a simple calculator or to construct JSON data from scratch.
--arg name value passes values to the program as predefined variables: value is available as $name. All named arguments are also available as $ARGS.named
Because the format of $ARGS.named is already an object, jq can output it as is.

First, don't use ALL_CAPS_VARNAMES: it's too easy to accidentally overwrite a crucial shell variable (like PATH)
Mixing single and double quotes in shell strings can be a hassle. In this case, I'd use printf:
bucket_name=testbucket
object_name=testworkflow-2.0.1.jar
target_location=/opt/test/testworkflow-2.0.1.jar
template='{"bucketname":"%s","objectname":"%s","targetlocation":"%s"}'
json_string=$(printf "$template" "$BUCKET_NAME" "$OBJECT_NAME" "$TARGET_LOCATION")
echo "$json_string"
For homework, read this page carefully: Security implications of forgetting to quote a variable in bash/POSIX shells
A note on creating JSON with string concatenation: there are edge cases. For example, if any of your strings contain double quotes, you can broken JSON:
$ bucket_name='a "string with quotes"'
$ printf '{"bucket":"%s"}\n' "$bucket_name"
{"bucket":"a "string with quotes""}
Do do this more safely with bash, we need to escape that string's double quotes:
$ printf '{"bucket":"%s"}\n' "${bucket_name//\"/\\\"}"
{"bucket":"a \"string with quotes\""}

I had to work out all possible ways to deal json strings in a command request, Please look at the following code to see why using single quotes can fail if used incorrectly.
# Create Release and Tag commit in Github repository
# returns string with in-place substituted variables
json=$(cat <<-END
{
"tag_name": "${version}",
"target_commitish": "${branch}",
"name": "${title}",
"body": "${notes}",
"draft": ${is_draft},
"prerelease": ${is_prerelease}
}
END
)
# returns raw string without any substitutions
# single or double quoted delimiter - check HEREDOC specs
json=$(cat <<-!"END" # or 'END'
{
"tag_name": "${version}",
"target_commitish": "${branch}",
"name": "${title}",
"body": "${notes}",
"draft": ${is_draft},
"prerelease": ${is_prerelease}
}
END
)
# prints fully formatted string with substituted variables as follows:
echo "${json}"
{
"tag_name" : "My_tag",
"target_commitish":"My_branch"
....
}
Note 1: Use of single vs double quotes
# enclosing in single quotes means no variable substitution
# (treats everything as raw char literals)
echo '${json}'
${json}
echo '"${json}"'
"${json}"
# enclosing in single quotes and outer double quotes causes
# variable expansion surrounded by single quotes(treated as raw char literals).
echo "'${json}'"
'{
"tag_name" : "My_tag",
"target_commitish":"My_branch"
....
}'
Note 2: Caution with Line terminators
Note the json string is formatted with line terminators such as LF \n
or carriage return \r(if its encoded on windows it contains CRLF \r\n)
using (translate) tr utility from shell we can remove the line terminators if any
# following code serializes json and removes any line terminators
# in substituted value/object variables too
json=$(echo "$json" | tr -d '\n' | tr -d '\r' )
# string enclosed in single quotes are still raw literals
echo '${json}'
${json}
echo '"${json}"'
"${json}"
# After CRLF/LF are removed
echo "'${json}'"
'{ "tag_name" : "My_tag", "target_commitish":"My_branch" .... }'
Note 3: Formatting
while manipulating json string with variables, we can use combination of ' and " such as following, if we want to protect some raw literals using outer double quotes to have in place substirution/string interpolation:
# mixing ' and "
username=admin
password=pass
echo "$username:$password"
admin:pass
echo "$username"':'"$password"
admin:pass
echo "$username"'[${delimiter}]'"$password"
admin[${delimiter}]pass
Note 4: Using in a command
Following curl request already removes existing \n (ie serializes json)
response=$(curl -i \
--user ${username}:${api_token} \
-X POST \
-H 'Accept: application/vnd.github.v3+json' \
-d "$json" \
"https://api.github.com/repos/${username}/${repository}/releases" \
--output /dev/null \
--write-out "%{http_code}" \
--silent
)
So when using it for command variables, validate if it is properly formatted before using it :)

If you need to build a JSON representation where members mapped to undefined or empty variables should be ommited, then jo can help.
#!/bin/bash
BUCKET_NAME=testbucket
OBJECT_NAME=""
JO_OPTS=()
if [[ ! "${BUCKET_NAME}x" = "x" ]] ; then
JO_OPTS+=("bucketname=${BUCKET_NAME}")
fi
if [[ ! "${OBJECT_NAME}x" = "x" ]] ; then
JO_OPTS+=("objectname=${OBJECT_NAME}")
fi
if [[ ! "${TARGET_LOCATION}x" = "x" ]] ; then
JO_OPTS+=("targetlocation=${TARGET_LOCATION}")
fi
jo "${JO_OPTS[#]}"
The output of the commands above would be just (note the absence of objectname and targetlocation members):
{"bucketname":"testbucket"}

can be done following way:
JSON_STRING='{"bucketname":"'$BUCKET_NAME'","objectname":"'$OBJECT_NAME'","targetlocation":"'$TARGET_LOCATION'"}'

For Node.js Developer, or if you have node environment installed, you can try this:
JSON_STRING=$(node -e "console.log(JSON.stringify({bucketname: $BUCKET_NAME, objectname: $OBJECT_NAME, targetlocation: $TARGET_LOCATION}))")
Advantage of this method is you can easily convert very complicated JSON Object (like object contains array, or if you need int value instead of string) to JSON String without worrying about invalid json error.
Disadvantage is it's relying on Node.js environment.

These solutions come a little late but I think they are inherently simpler that previous suggestions (avoiding the complications of quoting and escaping).
BUCKET_NAME=testbucket
OBJECT_NAME=testworkflow-2.0.1.jar
TARGET_LOCATION=/opt/test/testworkflow-2.0.1.jar
# Initial unsuccessful solution
JSON_STRING='{"bucketname":"$BUCKET_NAME","objectname":"$OBJECT_NAME","targetlocation":"$TARGET_LOCATION"}'
echo $JSON_STRING
# If your substitution variables have NO whitespace this is sufficient
JSON_STRING=$(tr -d [:space:] <<JSON
{"bucketname":"$BUCKET_NAME","objectname":"$OBJECT_NAME","targetlocation":"$TARGET_LOCATION"}
JSON
)
echo $JSON_STRING
# If your substitution variables are more general and maybe have whitespace this works
JSON_STRING=$(jq -c . <<JSON
{"bucketname":"$BUCKET_NAME","objectname":"$OBJECT_NAME","targetlocation":"$TARGET_LOCATION"}
JSON
)
echo $JSON_STRING
#... A change in layout could also make it more maintainable
JSON_STRING=$(jq -c . <<JSON
{
"bucketname" : "$BUCKET_NAME",
"objectname" : "$OBJECT_NAME",
"targetlocation" : "$TARGET_LOCATION"
}
JSON
)
echo $JSON_STRING

To build upon Hao's answer using NodeJS: you can split up the lines, and use the -p option which saves having to use console.log.
JSON_STRING=$(node -pe "
JSON.stringify({
bucketname: process.env.BUCKET_NAME,
objectname: process.env.OBJECT_NAME,
targetlocation: process.env.TARGET_LOCATION
});
")
An inconvenience is that you need to export the variables beforehand, i.e.
export BUCKET_NAME=testbucket
# etc.
Note: You might be thinking, why use process.env? Why not just use single quotes and have bucketname: '$BUCKET_NAME', etc so bash inserts the variables? The reason is that using process.env is safer - if you don't have control over the contents of $TARGET_LOCATION it could inject JavaScript into your node command and do malicious things (by closing the single quote, e.g. the $TARGET_LOCATION string contents could be '}); /* Here I can run commands to delete files! */; console.log({'a': 'b. On the other hand, process.env takes care of sanitising the input.

You could use envsubst:
export VAR="some_value_here"
echo '{"test":"$VAR"}' | envsubst > json.json
also it might be a "template" file:
//json.template
{"var": "$VALUE", "another_var":"$ANOTHER_VALUE"}
So after you could do:
export VALUE="some_value_here"
export ANOTHER_VALUE="something_else"
cat json.template | envsubst > misha.json

For a general case of building JSON from bash with arbitrary inputs, many of the previous responses (even the high voted ones with jq) omit cases when the variables contain " double quote, or \n newline escape string, and you need complex string concatenation of the inputs.
When using jq you need to printf %b the input first to get the \n converted to real newlines, so that once you pass through jq you get \n back and not \\n.
I found this with version with nodejs to be quite easy to reason about if you know javascript/nodejs well:
TITLE='Title'
AUTHOR='Bob'
JSON=$( TITLE="$TITLE" AUTHOR="$AUTHOR" node -p 'JSON.stringify( {"message": `Title: ${process.env.TITLE}\n\nAuthor: ${process.env.AUTHOR}`} )' )
It's a bit verbose due to process.env. but allows to properly pass the variables from shell, and then format things inside (nodejs) backticks in a safe way.
This outputs:
printf "%s\n" "$JSON"
{"message":"Title: Title\n\nAuthor: Bob"}
(Note: when having a variable with \n always use printf "%s\n" "$VAR" and not echo "$VAR", whose output is platform-dependent! See here for details)
Similar thing with jq would be
TITLE='Title'
AUTHOR='Bob'
MESSAGE="Title: ${TITLE}\n\nAuthor: ${AUTHOR}"
MESSAGE_ESCAPED_FOR_JQ=$(printf %b "${MESSAGE}")
JSON=$( jq '{"message": $jq_msg}' --arg jq_msg "$MESSAGE_ESCAPED_FOR_JQ" --null-input --compact-output --raw-output --monochrome-output )
(the last two params are not necessary when running in a subshell, but I just added them so that the output is then same when you run the jq command in a top-level shell).

Bash will not insert variables into a single-quote string. In order to get the variables bash needs a double-quote string.
You need to use double-quote string for the JSON and just escape double-quote characters inside JSON string.
Example:
#!/bin/sh
BUCKET_NAME=testbucket
OBJECT_NAME=testworkflow-2.0.1.jar
TARGET_LOCATION=/opt/test/testworkflow-2.0.1.jar
JSON_STRING="{\"bucketname\":\"$BUCKET_NAME\",\"objectname\":\"$OBJECT_NAME\",\"targetlocation\":\"$TARGET_LOCATION\"}"
echo $JSON_STRING

if you have node.js and get minimist installed in global:
jc() {
node -p "JSON.stringify(require('minimist')(process.argv), (k,v) => k=='_'?undefined:v)" -- "$#"
}
jc --key1 foo --number 12 --boolean \
--under_score 'abc def' --'white space' ' '
# {"key1":"foo","number":12,"boolean":true,"under_score":"abc def","white space":" "}
you can post it with curl or what:
curl --data "$(jc --type message --value 'hello world!')" \
--header 'content-type: application/json' \
http://server.ip/api/endpoint
be careful that minimist will parse dot:
jc --m.room.member #gholk:ccns.io
# {"m":{"room":{"member":"#gholk:ccns.io"}}}

Used this for AWS Macie configuration:
JSON_CONFIG=$( jq -n \
--arg bucket_name "$BUCKET_NAME" \
--arg kms_key_arn "$KMS_KEY_ARN" \
'{"s3Destination":{"bucketName":$bucket_name,"kmsKeyArn":$kms_key_arn}}'
)
aws macie2 put-classification-export-configuration --configuration "$JSON_CONFIG"

You can simply make a call like this to print the JSON.
#!/bin/sh
BUCKET_NAME=testbucket
OBJECT_NAME=testworkflow-2.0.1.jar
TARGET_LOCATION=/opt/test/testworkflow-2.0.1.jar
echo '{ "bucketName": "'"$BUCKET_NAME"'", "objectName": "'"$OBJECT_NAME"'", "targetLocation": "'"$TARGET_LOCATION"'" }'
or
JSON_STRING='{ "bucketName": "'"$BUCKET_NAME"'", "objectName": "'"$OBJECT_NAME"'", "targetLocation": "'"$TARGET_LOCATION"'" }'
echo $JOSN_STRING

How delete last comma in json file using Bash?

I wrote some script that takes all user data of aws ec2 instance, and echo to local.json. All this happens when I install my node.js modules.
I don't know how to delete last comma in the json file. Here is the bash script:
#!/bin/bash
export DATA_DIR=/data
export PATH=$PATH:/usr/local/bin
#install package from git repository
sudo -- sh -c "export PATH=$PATH:/usr/local/bin; export DATA_DIR=/data; npm install git+https://reader:secret#bitbucket.org/somebranch/$1.git#$2"
#update config files from instance user-data
InstanceConfig=`cat /instance-config`
echo '{' >> node_modules/$1/config/local.json
while read line
do
if [ ! -z "$line" -a "$line" != " " ]; then
Key=`echo $line | cut -f1 -d=`
Value=`echo $line | cut -f2 -d=`
if [ "$Key" = "Env" ]; then
Env="$Value"
fi
printf '"%s" : "%s",\n' "$Key" "$Value" >> node_modules/*/config/local.json
fi
done <<< "$InstanceConfig"
sed -i '$ s/.$//' node_modules/$1/config/local.json
echo '}' >> node_modules/$1/config/local.json
To run him im doing that way: ./script
I get json(OUTPUT), but with comma in all lines. Here is local.json that I get:
{
"Env" : "dev",
"EngineUrl" : "engine.url.net",
}
All I trying to do, is delete in last line of the json file - comma(",").
I try many ways, that I found in internet. I know that it should be after last "fi"(end of the loop). And I know that it should be something like this line:
sed -i "s?${Key} : ${Value},?${Key} : ${Value}?g" node_modules/$1/config/local.json
Or this:
sed '$ s/,$//' node_modules/$1/config/local.json
But they not work for me.
Can someone help me with that? Who knows Bash scripting well?
Thanks!

If you know that it is the last comma that needs to be replaced, a reasonably robust way is to use GNU sed in "slurp" mode like this:
sed -zr 's/,([^,]*$)/\1/' local.json
Output:
{
"Env" : "dev",
"EngineUrl" : "engine.url.net"
}

If you'd just post some sample input/output it'd remove the guess-work but IF this is your input file:
$ cat file
Env=dev
EngineUrl=engine.url.net
Then IF you're trying to do what I think you are then all you need is:
$ cat tst.awk
BEGIN { FS="="; sep="{\n" }
{
printf "%s \"%s\" : \"%s\"", sep, $1, $2
sep = ",\n"
}
END { print "\n}" }
which you'd execute as:
$ awk -f tst.awk file
{
"Env" : "dev",
"EngineUrl" : "engine.url.net"
}
Or you can execute the awk script inline within a shell script if you prefer:
awk '
BEGIN { FS="="; sep="{\n" }
{
printf "%s \"%s\" : \"%s\"", sep, $1, $2
sep = ",\n"
}
END { print "\n}" }
' file
{
"Env" : "dev",
"EngineUrl" : "engine.url.net"
}
The above is far more robust, portable, efficient and better in every other way than the shell script you posted because it's using the right tool for the job. A UNIX shell is an environment from which to call tools with a language to sequence those calls. It is NOT a language to process text which is why it's so difficult to get it right. The UNIX tool for general text processing is awk so when you need to process text in UNIX, you just have shell call awk, that's all.

Here a jq version if it's available:
jq --raw-input 'split("=") | {(.[0]):.[1]}' /instance-config | jq --slurp 'add'
There might be a way to do it with one jqpass, but I couldn't see it.

You an remove all trailing commas from invalid json with:
sed -i.bak ':begin;$!N;s/,\n}/\n}/g;tbegin;P;D' FILE
sed -i.bak = creates a backup of the original file, then applies changes to the file
':begin;$!N;s/,\n}/\n}/g;tbegin;P;D' = anything ending with , followed by
"new line and }". Remove the , on the previous line
FILE = the file you want to make the change to

If you're willing to use it, xidel is rather forgiving for trailing commas:
xidel -s local.json -e '$json'
{
"Env": "dev",
"EngineUrl": "engine.url.net"
}
xidel - -se '$json' <<< '{"Env":"dev","EngineUrl":"engine.url.net",}'
#or
xidel - -se 'parse-json($raw,{"liberal":true()})' <<< '{"Env":"dev","EngineUrl":"engine.url.net",}'
{
"Env": "dev",
"EngineUrl": "engine.url.net"
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

bash: Extract a specific string from file and remove characters - json

Change your script into: url=$(grep -oP '(?<=display_url":")[^"]+' /var/scripts/string.txt ) sed 's/\\//g;s#^#https://#' <<< "$url" should help. removed useless cat make use of sed to do the substitution

Related

What does this specific sed command do exactly? Using sed to parse full HTML pages in a bash script

Replace apostrophe in JSON with apostrophe suitable for curl via sed

Exporting JSON to environment variables

Build a JSON string with Bash variables

How delete last comma in json file using Bash?

Categories

Resources