Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I need a script in any language to capitalize the first letter of every word in a file.
In Python, open('file.txt').read().title() should suffice.
Using the non-standard (Gnu extension) sed utility from the command line:
sed -i '' -r 's/\b(.)/\U\1/g' file.txt
Get rid of the "-i" if you don't want it to modify the file in-place.
note that you should not use this in portable scripts
C#:
string foo = "bar baz";
foo = System.Globalization.CultureInfo.CurrentCulture.TextInfo.ToTitleCase(foo);
//foo = Bar Baz
From the shell, using ruby, this works assuming your input file is called FILENAME, and it should preserve all existing file formatting - it doesn't collapse the spacing as some other solutions might:
cat FILENAME | ruby -n -e 'puts $_.gsub(/^[a-z]|\s+[a-z]/) { |a| a.upcase }'
Scala:
scala> "hello world" split(" ") map(_.capitalize) mkString(" ")
res0: String = Hello World
or well, given that the input should be a file:
import scala.io.Source
Source.fromFile("filename").getLines.map(_ split(" ") map(_.capitalize) mkString(" "))
bash:
$ X=(hello world)
$ echo ${X[#]^}
Hello World
You can do it with Ruby too, using the command line format in a terminal:
cat FILENAME | ruby -n -e 'puts gsub(/\b\w/, &:upcase)'
Or:
ruby -e 'puts File.read("FILENAME").gsub(/\b\w/, &:upcase)'
A simple perl script that does this: (via http://www.go4expert.com/forums/showthread.php?t=2138)
sub ucwords {
$str = shift;
$str = lc($str);
$str =~ s/\b(\w)/\u$1/g;
return $str;
}
while (<STDIN>) {
print ucwords $_;
}
Then you call it with
perl ucfile.pl < srcfile.txt > outfile.txt
This is done in PHP.
$string = "I need a script in any language to capitalize the first letter of every word in a file."
$cap = ucwords($string);
VB.Net:
Dim sr As System.IO.StreamReader = New System.IO.StreamReader("c:\lowercase.txt")
Dim str As String = sr.ReadToEnd()
sr.Close()
str = System.Threading.Thread.CurrentThread.CurrentCulture.TextInfo.ToTitleCase(str)
Dim sw As System.IO.StreamWriter = New System.IO.StreamWriter("c:\TitleCase.txt")
sw.Write(str)
sw.Close()
php uses ucwords($string) or ucwords('all of this will start with capitals') to do the trick. so you can just open up a file and get the data and then use this function:
<?php
$file = "test.txt";
$data = fopen($file, 'r');
$allData = fread($data, filesize($file));
fclose($fh);
echo ucwords($allData);
?>
Edit, my code got cut off. Sorry.
Here's another Ruby solution, using Ruby's nice little one-line scripting helpers (automatic reading of input files etc.)
ruby -ni~ -e "puts $_.gsub(/\b\w+\b/) { |word| word.capitalize }" foo.txt
(Assuming your text is stored in a file named foo.txt.)
Best used with Ruby 1.9 and its awesome multi-language support if your text contains non-ASCII characters.
ruby:
irb> foo = ""; "foo bar".split.each { |x| foo += x.capitalize + " " }
=> ["foo", "bar"]
irb> foo
=> "Foo Bar "
In ruby:
str.gsub(/^[a-z]|\s+[a-z]/) { |a| a.upcase }
hrm, actually this is nicer:
str.each(' ') {|word| puts word.capitalize}
perl:
$ perl -e '$foo = "foo bar"; $foo =~ s/\b(\w)/uc($1)/ge; print $foo;'
Foo Bar
Although it was mentioned in the comments, nobody ever posted the awk approach to this problem:
$ cat output.txt
this is my first sentence. and this is the second sentence. that one is the third.
$
$ awk '{for (i=0;i<=NF;i++) {sub(".", substr(toupper($i), 1,1) , $i)}} {print}' output.txt
This Is My First Sentence. And This Is The Second Sentence. That One Is The Third.
Explanation
We loop through fields and capitalize the first letter of each word. If field separator was not a space, we could define it with -F "\t", -F "_" or whatever.
zsh solution
#!/bin/zsh
mystring="vb.net lOOKS very unsexy"
echo "${(C)mystring}"
Vb.Net Looks Very Unsexy
Note it does CAP after every non alpha character, see VB.Net .
Very basic version for AMOS Basic on the Amiga — only treats spaces as word separators though. I'm sure there is a better way using PEEK and POKE, but my memory is rather rusty with anything beyond 15 years.
FILE$=Fsel$("*.txt")
Open In 1,FILE$
Input #1,STR$
STR$=Lower$(STR$)
L=Len($STR)
LAST$=" "
NEW$=""
For I=0 to L-1
CUR$=MID$(STR$,I,1)
If LAST$=" "
NEW$=NEW$+Upper$(CUR$)
Else
NEW$=NEW$+$CUR$
Endif
LAST$=$CUR$
Next
Close 1
Print NEW$
I miss good old AMOS, a great language to learn with... pretty ugly though, heh.
Another solution with awk, pretending to be simpler instead of being shorter ;)
$ cat > file
thanks for all the fish
^D
$ awk 'function tocapital(str) {
if(length(str) > 1)
return toupper(substr(str, 1, 1)) substr(str,2)
else
return toupper(str)
}
{
for (i=1;i<=NF;i++)
printf("%s%s", tocapital($i), OFS);
printf ORS
}
' < file
Thanks For All The Fish
If using pipes and Python:
$ echo "HELLO WORLD" | python3 -c "import sys; print(sys.stdin.read().title())"
Hello World
For example:
$ lorem | python3 -c "import sys; print(sys.stdin.read().title())"
Officia Est Omnis Quia. Nihil Et Voluptatem Dolor Blanditiis Sit Harum. Dolore Minima Suscipit Quaerat. Soluta Autem Explicabo Saepe. Recusandae Molestias Et Et Est Impedit Consequuntur. Voluptatum Architecto Enim Nostrum Ut Corrupti Nobis.
You can also use things like strip() to remove spaces, or capitalize():
$ echo " This iS mY USER ${USER} " | python3 -c "import sys; print(sys.stdin.read().strip().lower().capitalize())"
This is my user jenkins
Related
subreddit=$(curl -sL "https://www.reddit.com/search/?q=${query}&type=sr"|tr "<" "\n"|
sed -nE 's#.*class="_2torGbn_fNOMbGw3UAasPl">r/([^<]*)#\1#p'|gum filter)
I've been learning bash and have been making pretty good progress. One thing that just seems far too daunting is these complex sed commands. It's unfortunate because I really want to use them to do things like parse HTML but it quickly becomes a mess. this is a little snippet of a script that queries Reddit, pipes it through sed and returns just the names of the subreddits that were a result of the search on a new line.
My main question is.. What is it that this is actually cutting/replacing and what does the beginning part mean 's#.'?
What I tried:
I used curl to search for a subreddit name so that I could see the raw output from that command and then I tried to pipe it into sed using little snippets of the full command to see if I could reconstruct the logic behind the command and all I really figured out was that I am lacking in my knowledge of sed beyond basic replacements.
I'm trying to re-write this script (for learning purposes only, the script works just fine) that allows you to search reddit and view the image posts in your terminal using Kitty. Mostly everything is pretty readable but the sed commands just trip me up.
I'll attach the full script below in case anyone is interested and I welcome any advice or explanations that could help me fully understand and re-construct it.
I'm really curious about this. I'm also wondering if it would just be better to call a Python script from bash that could return the images using beautiful soup... or maybe using "htmlq" would be a better idea?
Thanks!
#!/bin/sh
get_input() {
[ -z "$*" ] && query=$(gum input --placeholder "Search for a subreddit") || query=$*
query=$(printf "%s" "$query"|tr ' ' '+')
subreddit=$(curl -sL "https://www.reddit.com/search/?q=${query}&type=sr"|tr "<" "\n"|
sed -nE 's#.*class="_2torGbn_fNOMbGw3UAasPl">r/([^<]*)#\1#p'|gum filter)
xml=$(curl -s "https://www.reddit.com/r/$subreddit.rss" -A "uwu"|tr "<|>" "\n")
post_href=$(printf "%s" "$xml"|sed -nE '/media:thumbnail/,/title/{p;n;p;}'|
sed -nE 's_.*href="([^"]+)".*_\1_p;s_.*media:thumbnail[^>]+url="([^"]+)".*_\1_p; /title/{n;p;}'|
sed -e 'N;N;s/\n/\t/g' -e 's/&/\&/g'|grep -vE '.*\.gif.*')
[ -z "$post_href" ] && printf "No results found for \"%s\"\n" "$query" && exit 1
}
readc() {
if [ -t 0 ]; then
saved_tty_settings=$(stty -g)
stty -echo -icanon min 1 time 0
fi
eval "$1="
while
c=$(dd bs=1 count=1 2> /dev/null; echo .)
c=${c%.}
[ -n "$c" ] &&
eval "$1=\${$1}"'$c
[ "$(($(printf %s "${'"$1"'}" | wc -m)))" -eq 0 ]'; do
continue
done
[ -t 0 ] && stty "$saved_tty_settings"
}
download_image() {
downloadable_link=$(curl -s -A "uwu" "$1"|sed -nE 's#.*class="_3Oa0THmZ3f5iZXAQ0hBJ0k".*<a href="([^"]+)".*#\1#p')
curl -s -A "uwu" "$downloadable_link" -o "$(basename "$downloadable_link")"
[ -z "$downloadable_link" ] && printf "No image found\n" && exit 1
tput clear && gum style \
--foreground 212 --border-foreground 212 --border double \
--align center --width 50 --margin "1 2" --padding "2 4" \
'Your image has been downloaded!' "Image saved to $(basename "$downloadable_link")"
# shellcheck disable=SC2034
printf "Press Enter to continue..." && read -r useless
}
cleanup() {
tput cnorm && exit 0
}
trap cleanup EXIT INT HUP
get_input "$#"
i=1 && tput clear
while true; do
tput civis
[ "$i" -lt 1 ] && i=$(printf "%s" "$post_href"|wc -l)
[ "$i" -gt "$(printf "%s" "$post_href"|wc -l)" ] && i=1
link=$(printf "%s" "$post_href"|sed -n "$i"p|cut -f1)
post_link=$(printf "%s" "$post_href"|sed -n "$i"p|cut -f2)
gum style \
--foreground 212 --border-foreground 212 --border double \
--align left --width 50 --margin "20 1" --padding "2 4" \
'Press (j) to go to next' 'Press (k) to go to previous' 'Press (d) to download' \
'Press (o) to open in browser' 'Press (s) to search for another subreddit' 'Press (q) to quit'
kitty +kitten icat --scale-up --place 60x40#69x3 --transfer-mode file "$link"
readc key
# shellcheck disable=SC2154
case "$key" in
j) i=$((i+1)) && tput clear ;;
k) i=$((i-1)) && tput clear ;;
d) download_image "$post_link" ;;
o) xdg-open "$post_link" || open "$post_link" ;;
s) get_input ;;
q) exit 0 && tput clear ;;
*) ;;
esac
done
"gum filter" is essentially a fuzzy finder like fzf and "gum style" draws pretty text and nice boxes that work kind of like css.
What does this specific sed command do exactly?
sed -nE 's#.*class="_2torGbn_fNOMbGw3UAasPl">r/([^<]*)#\1#p'
It does two things:
Select all lines that contain the literal string class="_2torGbn_fNOMbGw3UAasPl">r/.
For those lines, print only the part after ...>r/.
Basically, it translates to ... (written inefficiently on purpose)
grep 'class="_2torGbn_fNOMbGw3UAasPl">r/' |
sed 's/.*>r\///'
what does the beginning part mean 's#.'?
You are looking at the (beginning of the) substitution command. Normally, it is written as s/search/replace/ but the delimiter / can be chosen (mostly) freely. s/…/…/ and s#…#…# are equivalent.
Here, # has the benefit of not having to escape the / in …>r/.
The . belongs to the search pattern. The .* in the beginning selects everything from the start of the line, so that it can be deleted when doing the substitution. Here we delete the beginning of the line up to (and including) …>r/.
The \1 in the replacement pattern is a placeholder for the string that was matched by the group ([^<]*) (longest <-free substring directly after …>r/).
That part is unnecessarily complicated. Because sed is preceded by tr "<" "\n" there is no point in dealing with the < inside sed. It could be simplified to
sed -n 's#.*class="_2torGbn_fNOMbGw3UAasPl">r/##p'
Speaking about simplifications:
I really want to use them [sed commands] to do things like parse HTML
Advice: Don't. For one-off jobs where you know the exact formatting (!) of your html files, regexes are ok. But even then, they make only sense if you are quicker to write them than using a proper tool.
I'm also wondering if it would just be better to call a Python script from bash that could return the images using beautiful soup... or maybe using "htmlq" would be a better idea?
You are right! In general, regexes are not powerful enough to reliably parse html.
Whether you use python or bash is up to you. Personally, I find easier to use a "proper" language for bigger projects. But then I use only that. Writing half in python and half in bash only increases complexity in my opinion.
If you stick with bash, I'd recommend something a bit more mature and widespread than htmlq (first released in Sep 2021, currently at version 0.4). E.g. install libxml and use an XPath expression with post-processing:
curl -sL "https://www.reddit.com/search/?q=QUERY&type=sr" |
xmllint --html --xpath '//h6[#class="_2torGbn_fNOMbGw3UAasPl"]/text()' - 2>/dev/null |
sed 's#^r/##'
But then again, parsing HTML isn't necessary in the first place, since reddit has an API that can return JSON, which you can process using jq.
curl -sL "https://www.reddit.com/subreddits/search.json?q=QUERY" |
jq -r '.data.children | map(.data.display_name) | .[]'
I have a set of JSON files in a local folder. What I want to do is change a particular string value in it, permanently. That means, deleting or modifying the old entry, writing a new one, and saving it.
Below is the format of the file:
{
"name": "ABC #1",
"description": "This is the description",
"image": "ipfs://NewUriToReplace/1.png",
"dna": "a56c520f57ba2a861de8c78099b4691f9dad6e87",
"edition": 1,
"date": 1641634646966,
"creator": "Team Dreamlabs",
"attributes": [
{
I want to change ABA #1 to ABC #9501 in this file, ABC #2 to ABC #9502 in the text file, and so on. How do I do that on MAC in one go?
As I understand from the example, you are adding a value of 9500 to your integers after the symbol #.
Because this kind of a replacement is a kind of string operation, a cycle with command sed might be used:
for f in *.json; do sed -i.bak 's/\("name": "ABC #\)\([0-9]\)",/\1950\2",/' $f; done
it just replaces a single digit to the new composition... Despite it responses to the example, obviously, it would not work for more than number #9.
Then we need to use a bash function:
function add_number() { old_number=$(cat $1 | sed -n 's/[ ]*"name": "ABC #\([0-9]*\)",/\1/p'); new_number=$(($old_number+9500)); sed -i.bak "s/\(\"name\": \"ABC #\)\([0-9]*\)\",/\1${new_number}\",/" $1; }; for f in *.json; do add_number $f ; done
The function add_number extracts the integer value, then adds a desired number to it and then replaces content of the file.
For both extraction and replacing the sed is used again.
At extraction flag -n allows to limit the amount of lines at sed output and mode p prints the result of replacement. Also, we do not want spaces symbols to pass into this assignment.
At replacement double quotes used in order to enable the bash to use the variable value inside of sed. Also, the real quotes are masked.
Regarding addition from the comment below, in order to make replacement in another line with tag edition (and using the same number), just a new replacement sed operation should be added with amended regular expression to fit this line.
Finally, the overall code in a better look:
function add_number() {
old_number=$(cat $1 | sed -n 's/[ ]*"name": "ABC #\([0-9]*\)",/\1/p')
new_number=$(($old_number+9500))
sed -i.bak "s/\(\"name\": \"ABC #\)[0-9]*\",/\1${new_number}\",/" $1
sed -i.bak "s/\(\"edition\": \)[0-9]*,/\1${new_number},/" $1
}
for f in *.json
do add_number $f
done
Those previous answers helped me to write this code:
using variables inside of sed
assigning the variable
If you are going to manipulate your JSON files on more than just this one occasion, then you might want to consider using tools that are designed to accomplish such tasks with ease.
One popular choice could be jq which is a "lightweight and flexible command-line JSON processor" that "has zero runtime dependencies" and is also available for OS X. By using jq within your shell, the following would be one way to accomplish what you have asked for.
Adding the numeric value 9500 to the number sitting in the field called edition:
jq '.edition += 9500' file.json
Interpreting a part of a string as number, adding again 9500 to it, and recomposing the string:
jq '.name |= ((./"#" | .[1] |= "\(tonumber + 9500)") | join("#"))' file.json
On the whole, iterating over your files, making both changes at once, writing to a temporary file and replacing the original on success, while having the value to be added as external variable:
v=9500
for f in *.json; do jq --argjson v $v '
.edition += $v | .name |= ((./"#" | .[1] |= "\(tonumber + $v)") | join("#"))
' "$f" > "$f.new" && mv "$f.new" "$f"
done
Here is an online "playground for jq", set up to simulate the application of my code from above to three imaginary files of yours. Feel free to edit the jq filter and/or the input JSON in order to see what could be possible using jq.
I need to prepare JSON which contains apostrophe to be sent via CURL.
Example of JSON:
{"myField":"Apos'test"}
Example of JSON I need as an output:
{"myField":"Apos'\''test"}
What I have tried:
sed -e "s/'/'\\\''/g" <<< {"myField":"Apos'test"}
which outputs:
{myField:Apos'\''test}
And I do not understand why it removes double quotes.
P.S. it is not obligatory to use sed, any other standard linux tool would work.
try this:
#/bin/bash
replacement=$((cat << EOT
{"myField":"Apos'test"}
EOT
) | sed "s|'|'\\\''|")
echo $replacement
output:
{"myField":"Apos'\''test"}
It doesn't
If it was because you used <<< , here, of which "" pair was parsed, expanded and dropped by the shell you're in
$ cat d
{"myField":"Apos'test"}
$ sed -E "s/'/'\\\''/g" d
{"myField":"Apos'\''test"}
I'm trying to extract a specific string value from a text file and then remove the backslash from him. the name of the value is "display_url"
My script:
url=$(cat /var/scripts/string.txt | grep -oP '(?<=display_url":")[^"]+')
for link in $url; do
echo 'https://'$link
done
output:
https://pastebin.com\/WRv5ir4Y
https://reddit.com\/r\/IBO\/comments\u2026
The desired output:
https://pastebin.com/WRv5ir4Y
https://reddit.com/r/IBO/comments/u2026
text file:
{"created_at":"Thu Dec 13 08:43:38 +0000 2018","id":1073136349845303297,"id_str":"1073136349845303297","text":"https:\/\/t.co\/aPu5ln7yjO\nhttps:\/\/t.co\/pBvevjSCc9\n\n#osectraining","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":961508561217052675,"id_str":"961508561217052675","name":"Online Security","screen_name":"osectraining","location":"Israel","url":"https:\/\/www.onlinesecurity.co.il","description":"OnlineSecurity provides online cyber-security training courses and certification, from beginner to advanced with the most advanced virtual labs in the field.","translator_type":"none","protected":false,"verified":false,"followers_count":2,"friends_count":51,"listed_count":0,"favourites_count":0,"statuses_count":1,"created_at":"Thu Feb 08 07:54:39 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/961510231346958336\/d_KhBeTD_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/961510231346958336\/d_KhBeTD_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/961508561217052675\/1518076913","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"osectraining","indices":[49,62]}],"urls":[{"url":"https:\/\/t.co\/aPu5ln7yjO","expanded_url":"https:\/\/pastebin.com\/WRv5ir4Y","display_url":"pastebin.com\/WRv5ir4Y","indices":[0,23]},{"url":"https:\/\/t.co\/pBvevjSCc9","expanded_url":"https:\/\/www.reddit.com\/r\/IBO\/comments\/9ragj7\/ioc_in_10_hours\/","display_url":"reddit.com\/r\/IBO\/comments\u2026","indices":[24,47]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"und","timestamp_ms":"1544690618369"}
any ideas?
Change your script into:
url=$(grep -oP '(?<=display_url":")[^"]+' /var/scripts/string.txt )
sed 's/\\//g;s#^#https://#' <<< "$url"
should help.
removed useless cat
make use of sed to do the substitution
The reason you are needing both that grep and a separate sed to parse it is because the grep is able to use Perl regexes (power...) but sed can't. You probably have Perl available - use it if you do.
perl -pe '
s/\\//g;
s{.*?display_url":"}{https://};
s{",".*display_url":"}{\nhttps://} while /display_url/;
s/",".*/\n/;
' /var/scripts/string.txt
https://pastebin.com/WRv5ir4Y
https://reddit.com/r/IBO/commentsu2026
And there's virtually always awk.
awk '{
gsub("\\\\","",$0);
split($0, chnk, "display_url.:.");
for (x=2; x<=length(chnk); x++) {
gsub("\".*","", chnk[x]);
printf "https://%s\n", chnk[x];
}
}' /var/scripts/string.txt
https://pastebin.com/WRv5ir4Y
https://reddit.com/r/IBO/commentsu2026
But if you can't use either of those, then one sed to strip the backslashes and some basic shell string processing in a loop, because it's fun. :D
$: txt=$(sed 's/\\//g' i)
$: while [[ "$txt" =~ display_url ]]
do txt=${txt#*display_url?:?}
echo https://${txt%%?,*}
done
https://pastebin.com/WRv5ir4Y
https://reddit.com/r/IBO/commentsu2026
Trickiest part is getting around the double-quotes in the shell parsing, but I'm sure someone can suggest a better way.
I would use the JSON command line parser jq:
jq -r '"https://" + .entities.urls[].display_url' /var/scripts/string.txt
-r stands for "return raw inputs" (strings without quotes)
"https://" + concat ...
.entities.urls[] ... for each item of the array .entities.urls ...
.display_url ... the value of display_url member"
Result:
https://pastebin.com/WRv5ir4Y
https://reddit.com/r/IBO/comments
This question already has answers here:
Parsing JSON with Unix tools
(45 answers)
Closed 6 years ago.
In shell I have a requirement wherein I have to read the JSON response which is in the following format:
{ "Messages": [ { "Body": "172.16.1.42|/home/480/1234/5-12-2013/1234.toSort", "ReceiptHandle": "uUk89DYFzt1VAHtMW2iz0VSiDcGHY+H6WtTgcTSgBiFbpFUg5lythf+wQdWluzCoBziie8BiS2GFQVoRjQQfOx3R5jUASxDz7SmoCI5bNPJkWqU8ola+OYBIYNuCP1fYweKl1BOFUF+o2g7xLSIEkrdvLDAhYvHzfPb4QNgOSuN1JGG1GcZehvW3Q/9jq3vjYVIFz3Ho7blCUuWYhGFrpsBn5HWoRYE5VF5Bxc/zO6dPT0n4wRAd3hUEqF3WWeTMlWyTJp1KoMyX7Z8IXH4hKURGjdBQ0PwlSDF2cBYkBUA=", "MD5OfBody": "53e90dc3fa8afa3452c671080569642e", "MessageId": "e93e9238-f9f8-4bf4-bf5b-9a0cae8a0ebc" } ] }
Here I am only concerned with the "Body" property value. I made some unsuccessful attempts like:
jsawk -a 'return this.Body'
or
awk -v k="Body" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}
But that did not suffice. Can anyone help me with this?
There is jq for parsing json on the command line:
jq '.Body'
Visit this for jq: https://stedolan.github.io/jq/
tl;dr
$ cat /tmp/so.json | underscore select '.Messages .Body'
["172.16.1.42|/home/480/1234/5-12-2013/1234.toSort"]
Javascript CLI tools
You can use Javascript CLI tools like
underscore-cli:
json:select(): CSS-like selectors for JSON.
Example
Select all name children of a addons:
underscore select ".addons > .name"
The underscore-cli provide others real world examples as well as the json:select() doc.
Similarly using Bash regexp. Shall be able to snatch any key/value pair.
key="Body"
re="\"($key)\": \"([^\"]*)\""
while read -r l; do
if [[ $l =~ $re ]]; then
name="${BASH_REMATCH[1]}"
value="${BASH_REMATCH[2]}"
echo "$name=$value"
else
echo "No match"
fi
done
Regular expression can be tuned to match multiple spaces/tabs or newline(s). Wouldn't work if value has embedded ". This is an illustration. Better to use some "industrial" parser :)
Here is a crude way to do it: Transform JSON into bash variables to eval them.
This only works for:
JSON which does not contain nested arrays, and
JSON from trustworthy sources (else it may confuse your shell script, perhaps it may even be able to harm your system, You have been warned)
Well, yes, it uses PERL to do this job, thanks to CPAN, but is small enough for inclusion directly into a script and hence is quick and easy to debug:
json2bash() {
perl -MJSON -0777 -n -E 'sub J {
my ($p,$v) = #_; my $r = ref $v;
if ($r eq "HASH") { J("${p}_$_", $v->{$_}) for keys %$v; }
elsif ($r eq "ARRAY") { $n = 0; J("$p"."[".$n++."]", $_) foreach #$v; }
else { $v =~ '"s/'/'\\\\''/g"'; $p =~ s/^([^[]*)\[([0-9]*)\](.+)$/$1$3\[$2\]/;
$p =~ tr/-/_/; $p =~ tr/A-Za-z0-9_[]//cd; say "$p='\''$v'\'';"; }
}; J("json", decode_json($_));'
}
use it like eval "$(json2bash <<<'{"a":["b","c"]}')"
Not heavily tested, though. Updates, warnings and more examples see my GIST.
Update
(Unfortunately, following is a link-only-solution, as the C code is far
too long to duplicate here.)
For all those, who do not like the above solution,
there now is a C program json2sh
which (hopefully safely) converts JSON into shell variables.
In contrast to the perl snippet, it is able to process any JSON,
as long as it is well formed.
Caveats:
json2sh was not tested much.
json2sh may create variables, which start with the shellshock pattern () {
I wrote json2sh to be able to post-process .bson with Shell:
bson2json()
{
printf '[';
{ bsondump "$1"; echo "\"END$?\""; } | sed '/^{/s/$/,/';
echo ']';
};
bsons2json()
{
printf '{';
c='';
for a;
do
printf '%s"%q":' "$c" "$a";
c=',';
bson2json "$a";
done;
echo '}';
};
bsons2json */*.bson | json2sh | ..
Explained:
bson2json dumps a .bson file such, that the records become a JSON array
If everything works OK, an END0-Marker is applied, else you will see something like END1.
The END-Marker is needed, else empty .bson files would not show up.
bsons2json dumps a bunch of .bson files as an object, where the output of bson2json is indexed by the filename.
This then is postprocessed by json2sh, such that you can use grep/source/eval/etc. what you need, to bring the values into the shell.
This way you can quickly process the contents of a MongoDB dump on shell level, without need to import it into MongoDB first.