Find / Replace / Append JSON String in bashscript without using jq - json

I have a json string and should extract the values in the square brackets with bash script and validate it against the expected values. If the expected value exists, leave as it is or else add the new values into the square brackets as expected.
"hosts": [“unix://“,”tcp://0.0.0.0:2376"]
I cannot use jq.
Expected :
Verify if the values “unix://“ and ”tcp://0.0.0.0:2376" exists for the key "hosts". Add if it doesn't exist
I tried using like below,
$echo "\"hosts\":[\"unix://\",\"tcp://0.0.0.0:2376\"]" | cut -d: -f2
["unix
$echo "\"hosts\":[\"unix://\",\"tcp://0.0.0.0:2376\"]" | sed 's/:.*//'
"hosts"
I have tried multiple possibilities with sed & cut but cannot achieve what I expect. I'm a shell script beginner.
How can I achieve this with sed or cut ?

You need to detect the precense of "unix://" and "tcp://0.0.0.0:2376" in your string. You can do it like this:
#!/bin/bash
#
string='"hosts": ["unix://","tcp://0.0.0.0:2376"]'
check1=$(echo "$string" | grep -c "unix://")
check2=$(echo "$string" | grep -c "tcp://0.0.0.0:2376")
(( total = check1 + check2 ))
if [[ "$total" -eq 2 ]]
then
echo "they are both in, nothing to do"
else
echo "they are NOT both there, fix variable string"
string='"hosts": ["unix://","tcp://0.0.0.0:2376"]'
fi
grep -c counts how many times a specific string appears. In your case, both strings have to be found once, so adding them together will produce 0, 1 or 2. Only when it is equal to 2 is the string correct.
cut will extract some string based on a certain delimiter. But it is not typically used to verify if a string is in there, grep does that.
sed has many uses, such as replacing text (with 's///'). But again, grep is the tool that was built to detect strings in other strings (or files).
Now when it comes to adding text, you say that if one of "unix://" or "tcp://0.0.0.0:2376" is missing, add it. Well that comes back to redefining the whole string with the correct values, so just assign it.
Finaly, if you think about it, you want to ensure that string is "hosts": ["unix://","tcp://0.0.0.0:2376"]. So no need to verify anything, just force it through hardcode at the start of your script. The end result will be the same.
Part 2
If you MUST use cut, you could:
#!/bin/bash
#
string='"hosts": ["unix://","tcp://0.0.0.0:2376"]'
firstelement=$(echo "$string" | cut -d',' -f1 | cut -d'"' -f4
echo $firstelement
# will display unix://
secondelement=$(echo "$string" | cut -d',' -f2 | cut -d'"' -f2
echo $secondelement
# will display tcp://0.0.0.0:2376
Then you can use if statements to compare to your desired values. But note that this approach will fail if you do not have at least 2 elements in your text between the [ ]. Ex. ["unix://"] will fail cut -d',' since there is no ',' character in the string.
Part 3
If you MUST use sed:
#!/bin/bash
#
string='"hosts": ["unix://","tcp://0.0.0.0:2376"]'
firstelement=$(echo "$string" | sed 's/.*\["\(.*\)",".*/\1/')
echo "$firstelement"
# will output unix://
secondelement=$(echo "$string" | sed 's/.*","\(.*\)"\]/\1/')
echo $secondelement
# will output tcp://0.0.0.0:2376
Again here, the main character to work with is the ,.
firstelement explanation
sed 's/.*\["\(.*\)",".*/\1/'
.* anything...
\[" followed by [ and ". Since [ means something to sed, you have to \ it
\(.*\) followed by anything at all (. matches any character, * matches any number of these characters).
"," followed by ",". This only happens for the first element.
.* followed by anything
\1 keep only the characters enclosed between \( and \)
Similarily, for the second element the s/// is modified to keep only what follows ",", up to the last "] at the end of the string.
Again like with cut above, use if statements to verify if the extracted values are what you wanted.
Again, read my last comments in the first approach, you might not need all this...

Related

not able to store sed output to variable

I am new to bash script.
I am getting some json response and i get only one property from the response. I want to save it to a variable but it is not working
token=$result |sed -n -e 's/^.*access_token":"//p' | cut -d'"' -f1
echo $token
it returns blank line.
I cannot use jq or any third party tools.
Please let me know what I am missing.
Your command should be:
token=$(echo "$result" | sed -n -e 's/^.*access_token":"//p' | cut -d'"' -f1)
You need to use echo to print the contents of the variable over standard output, and you need to use a command substitution $( ) to assign the output of the pipeline to token.
Quoting your variables is always encouraged, to avoid problems with white space and glob characters like *.
As an aside, note that you can probably obtain the output using something like:
token=$(jq -r .access_token <<<"$result")
I know you've said that you can't use jq but it's a standalone binary (no need to install it) and treats your JSON in the correct way, not as arbitrary text.
Give this a try:
token="$(sed -E -n -e 's/^.*access_token": ?"//p' <<<"$result" | cut -d'"' -f1)"
Explanation:
token="$( script here )" means that $token is set to the output/result of the script run inside the subshell through a process known as command substituion
-E in sed allows Extended Regular Expressions. We want this because JSON generally contains a space after the : and before the next ". We use the ? after the space to tell sed that the space may or may not be present.
<<<"$result" is a herestring that feeds the data into sed as stdin in place of a file.

Getting JSON value from JSON String using Shell Script

I have this JSON String:
{"name":"http://someUrl/ws/someId","id":"someId"}
I just want to get value for "id" key and store it in some variable. I succesfully tried using jq. But due to some constraints, I need to achieve this just by using grep and string matching.
I tried this so far: grep -Po '"id":.*?[^\\]"'; But that is giving "id":"ws-4c906698-03a2-49c3-8b3e-dea829c7fdbe" as output. I just need the id value. Please help
With a PCRE regex, you may use lookarounds. Thus, you need to put "id":" into the positive lookbehind construct, and then match 1 or more chars other than ":
grep -Po '(?<="id":")[^"]+'
where
(?<="id":") - requires a "id":" to appear immediately to the left of the current position (but the matched text is not added to the match value) and
[^"]+ - matches and adds to the match 1 or more chars other than ".
To get the values with escaped quotes:
grep -Po '(?<="id":")[^"\\]*(?:\\.[^"\\]*)*'
Here, (?<="id":") will still match the position right after "id":" and then the following will get matched:
[^"\\]* - zero or more chars other than " and \
(?:\\.[^"\\]*)* - zero or more consequent sequences of:
\\. - a \ and any char (any escape sequence)
[^"\\]* - zero or more chars other than " and \
See Jshon, it is a command line Json parser for shell script usage.
echo '{"name":"http://someUrl/ws/someId","id":"someId"}' | jshon -e id
"someId"
Just noticed I read past the section stating you needed to use standard tools available, if your admin doesn't allow Jshon it is very likely that the system will have Python available which you could use.
echo '{"name":"http://someUrl/ws/someId","id":"someId"}' | python -c 'import sys, json; print json.load(sys.stdin)["id"]'
someId
Using grep for this is just asking for trouble, I would avoid it and opt for a proper Json parser as above.

How to get ksh to read null fields

I have a tab delimited file with some fields potentially containing no data. In ksh 'read' though treats multiple tabs as a single delimiter. Is there any way to change that behavior so I could have blank data too? I.e. When encountering 2 tabs it would take it as a null field? Or do I have to use awk?
# where <TAB> would be a real tab:
while IFS="<TAB>" read a b c d; do echo $c; done < file.txt
cf.
awk -F"\t" '{print $3}' file.txt
The shell version will output the wrong field if the 1st or 2nd record is blank.
It is indeed possible to use modern Korn Shell natively to treat each tab char as a column delimiter such that multiple consecutive tabs will delimit null fields without sed, awk, or perl.
The trick is to set the IFS variable to 2 consecutive tab chars, like so:
IFS=$'\t\t'
The while loop in the following code will read a tab-separated-values file, putting the fields of each line into a simple indexed array.
The inner for loop simply prints out what it has read, one field per line of output:
typeset -a Cols
while IFS=$'\t\t' read -A Cols
do
for (( i=0 ; i < ${#Cols[#]} ; i++ ))
do
print "Cols[$i] '${Cols[$i]}' "
done
done
And yes, this will also properly treat a line beginning with a tab char as having a null value for column 1, i.e. in the above Cols[0] would be set to null.
I have tested this on /bin/ksh 'AJM93u+ 2012-08-01' on macOS High Sierra
but it should work with AT&T AST open-sourced ksh versions going back 10 years or more. See also https://github.com/att/ast
read will look for the first field, ignoring IFS. Another demonstration of this problem is
echo " b c d e" | while read a b c d e; do echo c=$c; done
I'll keep on using a space as the IFS, just a bit easier to test.
Avoiding awk is possible with cut:
echo c=$(echo " b c d e" | cut -d" " -f3)
When you want to assign all fields in one run, you will be stuck with cut.
Sed accepts different -e options and work on them in the order given.
You can get the fields by
eval $(echo " b c d e" |
sed -e 's/^/a=/' -e 's/ /;b=/' -e 's/ /;c=/' -e 's/ /;d=/' -e 's/ /;e=/')
echo check:
set | grep "^[a-e]="
Do you trust your input or do you prefer awk above sed?

removing commas from numbers in CSV file

I have a file that has many columns and I only need two of those columns. I am getting the columns I need using
cut -f 2-3 -d, file1.csv > file2.csv
The issue I am having is that the first column is ID and once it gets past 999 it becomes 1,000 and so it is treated as an extra column now. I cant get rid of all commas because I need them to separate the data. Is there a way to use sed to remove commas that only show up between 0-9?
I'd use a real CSV parser, and count backwards from the end of the line:
ruby -rcsv -ne '
row = $_.parse_csv
puts row[-5..-4].to_csv :force_quotes => true
' <<END
999,"someone#example.com","Doe, John","Doe","555-1212","address"
1,234,"email#email.com","name","lastname","phone","address"
END
"someone#example.com","Doe, John"
"email#email.com","name"
This works for the example in the comments:
awk -F'"?,"' '{print $2, $3}' file
The field separator is zero or one " followed by ,". This means that the comma in the first number doesn't count.
To separate the two fields with a comma instead of a space, you can change the OFS variable like this:
awk -F'"?,"' -v OFS=',' '{print $2, $3}' file
Or like this:
awk -F'"?,"' 'BEGIN{OFS=","}{print $2, $3}' file
Alternatively, if you want the quotes as well, you can use printf:
awk -F'"?,"' '{printf "\"%s\",\"%s\"\n", $2, $3}' file
From your comments, it sounds like there is a comma and a space (', ') pattern between tokens.
If this is the case, you can do this easily with sed. The strategy is to first replace all occurrences of , with some unique character sequence (like maybe ||).
's:, :||:g'
From there you can remove all commas:
's:,::g'
Finally, replace the double pipes with comma-space again.
's:||:, :g'
Putting it into one statement:
sed -i -e 's:, :||:g;s:,::g;s:||:, :g' your_odd_file.csv
And a command-line example to try before you buy:
bash$ sed -e 's:, :||:g;s:,::g;s:||:, :g' <<< "1,200,000, hello world, 123,456"
1200000, hello world, 123456
If you are in the unfortunate situation where there is not a space between fields in the CSV - you can attempt to 'fake it' by detecting changes in data type - like where there is a numeric field followed by a text field.
's:,\([^0-9]\):, \1:g' # numeric followed by non-numeric
's:\([^0-9]\),:\1, :g' # non-numeric field followed by something (anything)
You can put this all together into one statement, but you are venturing into dangerous waters here - this will definitely be a one-off solution and should be taken with a large grain of salt.
sed -e 's:,\([^0-9]\):, \1:g;s:\([^0-9]\),:\1, :g' \
-e 's:, :||:g;s:,::g;s:||:, :g' file1.csv > file2.csv
And another example:
bash$ sed -e 's:,\([^0-9]\):, \1:g;s:\([^0-9]\),:\1, :g' \
-e 's:, :||:g;s:,::g;s:||:, :g' <<< "1,200,000,hello world,123,456"
1200000, hello world, 123456

Can aspell output line number and not offset in pipe mode?

Can aspell output line number and not offset in pipe mode for html and xml files? I can't read the file line by line because in this case aspell can't identify closed tag (if tag situated on the next line).
This will output all occurrences of misspelt words with line numbers:
# Get aspell output...
<my_document.txt aspell pipe list -d en_GB --personal=./aspell.ignore.txt |
# Proccess the aspell output...
grep '[a-zA-Z]\+ [0-9]\+ [0-9]\+' -oh | \
grep '[a-zA-Z]\+' -o | \
while read word; do grep -on "\<$word\>" my_document.txt; done
Where:
my_document.txt is your original document
en_GB is your primary dictionary choice (e.g. try en_US)
aspell.ignore.txt is an aspell personal dictionary (example below)
aspell_output.txt is the output of aspell in pipe mode (ispell style)
result.txt is a final results file
aspell.ignore.txt example:
personal_ws-1.1 en 500
foo
bar
example results.txt output (for an en_GB dictionary):
238:color
302:writeable
355:backends
433:dataonly
You can also print the whole line by changing the last grep -on into grep -n.
This is just an idea, I haven't really tried it yet (I'm on a windows machine :(). But maybe you could pipe the html file through head (with byte limit) and count newlines using grep to find your line number. It's neither efficient nor pretty, but it might just work.
cat icantspell.html | head -c <offset from aspell> | egrep -Uc "$"
I use the following script to perform spell-checking and to work-around the awkward output of aspell -a / ispell. At the same time, the script also works around the problem that ordinals like 2nd aren't recognized by aspell by simply ignoring everything that aspell reports which is not a word of its own.
#!/bin/bash
set +o pipefail
if [ -t 1 ] ; then
color="--color=always"
fi
! for file in "$#" ; do
<"$file" aspell pipe list -p ./dict --mode=html |
grep '[[:alpha:]]\+ [0-9]\+ [0-9]\+' -oh |
grep '[[:alpha:]]\+' -o |
while read word ; do
grep $color -n "\<$word\>" "$file"
done
done | grep .
You even get colored output if the stdout of the script is a terminal, and you get an exit status of 1 in case the script found spelling mistakes, otherwise the exit status of the script is 0.
Also, the script protects itself from pipefail, which is a somewhat popular option to be set i.e. in a Makefile but doesn't work for this script. Last but not least, this script explicitly uses [[:alpha:]] instead of [a-zA-Z] which is less confusing when it's also matching non-ASCII characters like German äöüÄÖÜß and others. [a-zA-Z] also does, but that to some level comes at a surprise.
aspell pipe / aspell -a / ispell output one empty line for each input line (after reporting the errors of the line).
Demonstration printing the line number with awk:
$ aspell pipe < testFile.txt |
awk '/^$/ { countedLine=countedLine+1; print "#L=" countedLine; next; } //'
produces this output:
#(#) International Ispell Version 3.1.20 (but really Aspell 0.60.7-20110707)
& iinternational 7 0: international, Internationale, internationally, internationals, intentional, international's, Internationale's
#L=1
*
*
*
& reelly 22 11: Reilly, really, reel, rely, rally, relay, resell, retell, Riley, rel, regally, Riel, freely, real, rill, roll, reels, reply, Greeley, cruelly, reel's, Reilly's
#L=2
*
#L=3
*
*
& sometypo 18 8: some typo, some-typo, setup, sometime, someday, smote, meetup, smarty, stupor, Smetana, somatic, symmetry, mistype, smutty, smite, Sumter, smut, steppe
#L=4
with testFile.txt
iinternational
I say this reelly.
hello
here is sometypo.
(Still not as nice as hunspell -u (https://stackoverflow.com/a/10778071/4124767). But hunspell misses some command line options I like.)
For others using aspell with one of the filter modes (tex, html, etc), here's a way to only print line numbers for misspelled words in the filtered text. So for example, it won't print misspellings in the comments.
ASPELL_ARGS="--mode=html --personal=./.aspell.en.pws"
for file in "$#"; do
for word in $(aspell $ASPELL_ARGS list < "$file" | sort -u); do
grep -no "\<$word\>" <(aspell $ASPELL_ARGS filter < "$file")
done | sort -n
done
This works because aspell filter does not delete empty lines. I realize this isn't using aspell pipe as requested by OP, but it's in the same spirit of making aspell print line numbers.