shell parsing json contains spaces in string - json

I need to parse json which contains spaces in string in it, but the output value is truncated at spaces.
My Initial.json file is:
{
"WorkspaceName":"aaa bbb ccc ddd eee",
"ReportFileName":"xxx yyy zzz",
"StageName":"sit uat prod"
}
My current shell code is:
InitialFile=$JsonPath/deployment/configuration/Initial.json
data=$(cat $InitialFile | sed -r 's/",/"/' | egrep -v '^[{}]' | sed 's/"//g' | sed 's/:/=/1')
declare $data
echo WorkspaceName is_$WorkspaceName
echo ReportFileName is_$ReportFileName
echo StageName is_$StageName
The result is:
WorkspaceName is_aaa
ReportFileName is_xxx
StageName is_sit
The expected reuslt is aaa bbb ccc ddd eee, xxx yyy zzz, sit uat prod instead of aaa, xxx, sit.
How to achieve it? I'm not very familiar with shells, any advice would be greatly appreciated.
Update:
I using following code to resolve this issue:
WorkspaceName=$(grep -o '"WorkspaceName": "[^"]*' configuration/Initial.json | grep -o '[^"]*$')
ReportFileName=$(grep -o '"ReportFileName": "[^"]*' configuration/Initial.json | grep -o '[^"]*$')
StageName=$(grep -o '"StageName": "[^"]*' configuration/Initial.json | grep -o '[^"]*$')
As you can see, this solves the problem, but it doesn't seem perfect, I need to get each variable in the json separately, there will be a lot of repeated statements, when there are many variables in the json, this will be a Very troublesome, so is there a way to simplify it?

1st solution(GNU awk): With GNU awk you can try following solution, written and tested with your shown samples only.
awk -v RS='"[^"]*":"[^"]*",?' '
RT{
sub(/":"/,OFS,RT)
gsub(/^"|",?$/,"",RT)
print RT
}
' Input_file
2nd solution: If jq is allowed you can simply do following command. Which OP is saying is not in OP's system but adding it as a variant here.
jq -r 'to_entries[] | "\(.key) \(.value)"' Input_file
With shown samples output will be as follows:
WorkspaceName aaa bbb ccc ddd eee
ReportFileName xxx yyy zzz
StageName sit uat prod

Using sed
$ InitialFile="${JsonPath}/deployment/configuration/Initial.json"
$ data=$(sed -En 's/^[^"]*"([^"]*)":"([^"]*).*$/\1 is_\2/p' "$InitialFile")
$ echo "$data"
WorkspaceName is_aaa bbb ccc ddd eee
ReportFileName is_xxx yyy zzz
StageName is_sit uat prod

You could use sed and regexp:
eval $(sed -n -e 's/^.*"\(.*\)":\(".*"\).*$/\1=\2/p' $InitialFile)
sed will take the filename as an argument
-n will make sed not print per default
-e 's/<match pattern>/<output>/ a sed command to search and replace (live test).
p in case of matching pattern the output is printed
eval will evaluate the output as if you would have written it at the prompt. In this case assigning values to some vars.
I think the code above is shorter and better in several ways, but it could of course be done your way with some adjustment, or in several other ways. The main issue with your code is that the assignment needs to be one string, not several. So your code produces this:
WorkspaceName=aaa bbb ccc ddd eee
while it should be:
WorkspaceName="aaa bbb ccc ddd eee"

Related

Data extraction for specific string

I have a long list of JSON data, with repeats of contents similar to followings.
Due to the original JSON file is too long, I will just shared the hyperlinks here. This is a result generated from a database called RegulomeDB.
Direct link to the JSON file
I would like to extract specific data (eQTLs) from "method": "eQTLs" and "value": "xxxx", and put them into 2 columns (tab delimited) exactly like below.
Note: "value":"xxxx" is extracted right after "method": "eQTLs"is detected.
eQTLs firstResult, secondResult, thirdResult, ...
In this example, the desired output is:
eQTLs EIF3S8, EIF3CL
I've tried using a python script but was unsuccessful.
import json
with open('file.json') as f:
f_json = json.load(f)
print 'f_json[0]['"method": "eQTLs"'] + "\t" + f_json[0]["value"]
Thank you for your kind help.
Maybe you'll find the JSON-parser xidel useful. It can open urls and can manipulate strings any way you want:
$ xidel -s "https://regulomedb.org/regulome-search/?regions=chr16:28539847-28539848&genome=GRCh37&format=json" \
-e '"eQTLs "||join($json("#graph")()[method="eQTLs"]/value,", ")'
eQTLs EIF3S8, EIF3CL
Or with the XPath/XQuery 3.1 syntax:
-e '"eQTLs "||join($json?"#graph"?*[method="eQTLs"]?value,", ")'
Try this:
cat file.json | grep -iE '"method":\s*"eQTLs"[^}]*' -o | cut -d ',' -f 1,5 | sed -r 's/"|:|method|value//gi' | sed 's/\s*eqtls,\s*//gi' | tr '\n' ',' | sed 's/,$/\n/g' | sed 's/,/, /g' | xargs echo -e 'eQTLs\x09'

Arithmetic in web scraping in a shell

so, I have the example code here:
#!/bin/bash
clear
curl -s https://www.cnbcindonesia.com/market-data/currencies/IDR=/USD-IDR |
html2text |
sed -n '/USD\/IDR/,$p' |
sed -n '/Last updated/q;p' |
tail -n-1 |
head -c+6 && printf "\n"
exit 0
this should print out some number range 14000~15000
lets start from the very basic one, what I have to do in order to print result + 1 ? so if the printout is 14000 and increment it to 1 become 14001. I suppose the result of the html2text is not calculatable since it should be something like string output not integer.
the more advance thing i want to know is how to calculate the result of 2 curl results?
What I would do, bash + xidel:
$ num=$(xidel -se '//div[#class="mark_val"]/span[1]/text()' 'https://url')
$ num=$((${num//,/}+1)) # num was 14050
$ echo $num
Output
14051
 Explanations
$((...))
is an arithmetic substitution. After doing the arithmetic, the whole thing is replaced by the value of the expression. See http://mywiki.wooledge.org/ArithmeticExpression
Command Substitution: "$(cmd "foo bar")" causes the command 'cmd' to be executed with the argument 'foo bar' and "$(..)" will be replaced by the output. See http://mywiki.wooledge.org/BashFAQ/002 and http://mywiki.wooledge.org/CommandSubstitution
Bonus
You can compute directly in xidel, thanks Reino using xquery syntax :
$ xidel -s <url> e 'replace(//div[#class="mark_val"]/span[1],",","") + 1'
And to do addition arithmetic of 2 values :
$ xidel -s <url> -e '
let $num:=replace(//div[#class="mark_val"]/span[1],",","")
return $num + $num
'

How to find value of a key in a json response trace file using shell script

I have a response trace file containing below response:
#RESPONSE BODY
#--------------------
{"totalItems":1,"member":[{"name":"name","title":"PatchedT","description":"My des_","id":"70EA96FB313349279EB089BA9DE2EC3B","type":"Product","modified":"2019 Jul 23 10:22:15","created":"2019 Jul 23 10:21:54",}]}
I need to fetch the value of the "id" key in a variable which I can put in my further code.
Expected result is
echo $id - should give me 70EA96FB313349279EB089BA9DE2EC3B value
With valid JSON (remove first to second row with sed and parse with jq):
id=$(sed '1,2d' file | jq -r '.member[]|.id')
Output to variable id:
70EA96FB313349279EB089BA9DE2EC3B
I would strongly suggest using jq to parse json.
But given that json is mostly compatible with python dictionaries and arrays, this HACK would work too:
$ cat resp
#RESPONSE BODY
#--------------------
{"totalItems":1,"member":[{"name":"name","title":"PatchedT","description":"My des_","id":"70EA96FB313349279EB089BA9DE2EC3B","type":"Product","modified":"2019 Jul 23 10:22:15","created":"2019 Jul 23 10:21:54",}]}
$ awk 'NR==3{print "a="$0;print "print a[\"member\"][0][\"id\"]"}' resp | python
70EA96FB313349279EB089BA9DE2EC3B
$ sed -n '3s|.*|a=\0\nprint a["member"][0]["id"]|p' resp | python
70EA96FB313349279EB089BA9DE2EC3B
Note that this code is
1. dirty hack, because your system does not have the right tool - jq
2. susceptible to shell injection attacks. Hence use it ONLY IF you trust the response received from your service.
Quick and dirty (don't use eval):
eval $(cat response_file | tail -1 | awk -F , '{ print $5 }' | sed -e 's/"//g' -e 's/:/=/')
It is based on the exact structure you gave, and hoping there is no , in any value before "id".
Or assign it yourself:
id=$(cat response_file | tail -1 | awk -F , '{ print $5 }' | cut -d: -f2 | sed -e 's/"//g')
Note that you can't access the name field with that trick, as it is the first item of the member array and will be "swallowed" by the { print $2 }. You can use an even-uglier hack to retrieve it though:
id=$(cat response_file | tail -1 | sed -e 's/:\[/,/g' -e 's/}\]//g' | awk -F , '{ print $5 }' | cut -d: -f2 | sed -e 's/"//g')
But, if you can, jq is the right tool for that work instead of ugly hacks like that (but if it works...).
When you can't use jq, you can consider
id=$(grep -Eo "[0-9A-F]{32}" file)
This is only working when the file looks like what I expect, so you might need to add extra checks like
id=$(grep "My des_" file | grep -Eo "[0-9A-F]{32}" | head -1)

Nested while loop with input from main while in shell script

I am having Json file and i am trying to parse it by using below
#!/bin/ksh
while read rec
do
while read line
do
firstname=`echo $line | sed -n -e 's/^.*\(full-name\)/\1/p' | cut -f3 -d'"'`
id=`echo $line | sed -n -e 's/^.*\(id\)/\1/p' | cut -f3 -d'"'`
echo "${firstname}'|'${id}"
done < `echo $rec | nawk 'gsub("}}}}", "\n")' | sed 's/{"results"//g'`
done < /var/tmp/Cloud_test.txt
My sample file is :
{"results":[{"general-info":{"full-name":"TELOS MANAGEMENT","body":{"party":{"xrefs":{"xref":[{"id":"66666"}]}}}},"_id":"91002551"},{"_id":"222222","body":{"party":{"general-info":{"full-name":"DO REUSE"},"xrefs":{"xref":[{"id":"777777"}]}}}}]}
Expected Result:
TELOS MANAGEMENT|66666
DO REUSE|777777
I am facing problem in inside while passing parameter. Its not getting passed line by line. Its passed complete line and result is not coming as expected. Please help to get it fixed.
As pointed out by #l0b0 this kind of problem is best solved using a JSON-aware tool such as jq. Here, then, is a jq solution.
It must be pointed out, however, that the sample input is strangely irregular, so the requirements are not so clear. If the JSON were more regular, the jq solution would be simpler.
In any case, the following jq filter does produce the result as described:
.results[]
| ..
| objects
| select(has("general-info"))
| [(.["general-info"]|.["full-name"]), (.. | .id? // empty)]
| join("|")
Simplification
The second last line above could be simplified to:
[."general-info"."full-name", (.. | .id? // empty)]
This is a more complicated (double) case of this question.
The following works for me:
cat sample.json |
sed -e 's/"full-name"/\n&/g' |
tail -n+2 |
sed -e 's/"full-name":"\([^"]*\).*{"id":"\([^"]*\).*/\1\|\2/'

How to iterate through json in bash script

I have the json as below, i need to get only the mail from the above json in bash script
value={"count":5,"users":[{"username":"asa","name":"asa
Tran","mail":"asa#xyz.com"},{"username":"qq","name":"qq
Morris","mail":"qq#xyz.com"},{"username":"qwe","name":"qwe
Org","mail":"qwe#xyz.com"}]}
Output can be as
mail=asa#xyz.com,qq#xyz.com,qwe#xyz.com
All the above need to be done in the bash script (.sh)
I have already tried with the array iteration as but of no use
for key in "${!value[#]}"
do
#echo "key = $key"
echo "value = ${value[$key]}"
done
Even i have tried with the array conversion as
alias json-decode="php -r
'print_r(json_decode(file_get_contents(\"php://stdin\"),1));'"
value=$(curl --user $credentials -k $endPoint | json-decode)
Still i was not able to get the specific output.
jq is the tool to iterate through a json. In your case:
while read user; do
jq -r '.mail' <<< $user
done <<< $(jq -c '.users[]' users.json)
would give:
asa#xyz.com
qq#xyz.com
qwe#xyz.com
NOTE: I removed "value=" because that is not valid json. Users.json contains:
{"count":5,"users":[{"username":"asa","name":"asa Tran","mail":"asa#xyz.com"},{"username":"qq","name":"qq Morris","mail":"qq#xyz.com"},{"username":"qwe","name":"qwe Org","mail":"qwe#xyz.com"}]}
If this is valid json and the email field is the only one containing a # character, you can do something like this:
echo $value | tr '"' '\n' | grep #
It replaces double-quotes by new line character and only keeps lines containing #. It is really not json parsing, but it works.
You can store the result in a bash array
emails=($(echo $value | tr '"' '\n' | grep #))
and iterate on them
for email in ${emails[#]}
do
echo $email
done
You should use json_pp tool (in debian, it is part of the libjson-pp-perl package)
One would use it like this :
cat file.json | json_pp
And get a pretty print for your json.
So in your case, you could do :
#!/bin/bash
MAILS=""
LINES=`cat test.json | json_pp | grep '"mail"' | sed 's/.* : "\(.*\)".*/\1/'`
for LINE in $LINES ; do
MAILS="$LINE,$MAILS"
done
echo $MAILS | sed 's/.$//'
Output :
qwe#xyz.com,qq#xyz.com,asa#xyz.com
Using standard unix toolbox : sed command
cat so.json | sed "s/},/\n/g" | sed 's/.*"mail":"\([^"]*\)".*/\1/'
With R you could do this as follows:
$ value={"count":5,"users":[{"username":"asa","name":"asa Tran","mail":"asa#xyz.com"},{"username":"qq","name":"qq Morris","mail":"qq#xyz.com"},{"username":"qwe","name":"qwe Org","mail":"qwe#xyz.com"}]}
$ echo $value | R path users | R map path mail
["asa#xyz.com", "qq#xyz.com", "qwe#gyz.com"]