Data extraction for specific string - json

I have a long list of JSON data, with repeats of contents similar to followings.
Due to the original JSON file is too long, I will just shared the hyperlinks here. This is a result generated from a database called RegulomeDB.
Direct link to the JSON file
I would like to extract specific data (eQTLs) from "method": "eQTLs" and "value": "xxxx", and put them into 2 columns (tab delimited) exactly like below.
Note: "value":"xxxx" is extracted right after "method": "eQTLs"is detected.
eQTLs firstResult, secondResult, thirdResult, ...
In this example, the desired output is:
eQTLs EIF3S8, EIF3CL
I've tried using a python script but was unsuccessful.
import json
with open('file.json') as f:
f_json = json.load(f)
print 'f_json[0]['"method": "eQTLs"'] + "\t" + f_json[0]["value"]
Thank you for your kind help.

Maybe you'll find the JSON-parser xidel useful. It can open urls and can manipulate strings any way you want:
$ xidel -s "https://regulomedb.org/regulome-search/?regions=chr16:28539847-28539848&genome=GRCh37&format=json" \
-e '"eQTLs "||join($json("#graph")()[method="eQTLs"]/value,", ")'
eQTLs EIF3S8, EIF3CL
Or with the XPath/XQuery 3.1 syntax:
-e '"eQTLs "||join($json?"#graph"?*[method="eQTLs"]?value,", ")'

Try this:
cat file.json | grep -iE '"method":\s*"eQTLs"[^}]*' -o | cut -d ',' -f 1,5 | sed -r 's/"|:|method|value//gi' | sed 's/\s*eqtls,\s*//gi' | tr '\n' ',' | sed 's/,$/\n/g' | sed 's/,/, /g' | xargs echo -e 'eQTLs\x09'

Related

Get specific string line from file bash

I have a file with this kind of text with pattern
[{"foo":"bar:baz:foo*","bar*":"baz*","etc":"etc"},
{"foo2":"bar2:baz2:foo2*","bar2*":"baz2*","etc":"etc"},
{"foo3":"bar3:baz3:foo3*","bar3*":"baz3*","etc":"etc"},
{"foo4":"bar4:baz4:foo4*","bar4*":"baz4*","etc":"etc"}]
I need to take every string like this
{"foo":"bar:baz:foo*","bar*":"baz*","etc":"etc"} and send each of them to some url via curl
for i in text.txt
do (awk,sed,grep etc)
then curl $string
I can't figure out how to get the desired lines properly from the file without unnecessary symbols
I suggest that you can use jq to process your json file. jq is capable of reading json, and formatting output. Here's an example jq script to process your json file (which I unimaginatively call 'jsonfile'):
jq -r '.[] | "curl -d '\'' \(.) '\'' http://restful.com/api " ' jsonfile
Here's the output:
curl -d ' {"foo":"bar:baz:foo*","bar*":"baz*","etc":"etc"} ' http://restful.com/api
curl -d ' {"foo2":"bar2:baz2:foo2*","bar2*":"baz2*","etc":"etc"} ' http://restful.com/api
curl -d ' {"foo3":"bar3:baz3:foo3*","bar3*":"baz3*","etc":"etc"} ' http://restful.com/api
curl -d ' {"foo4":"bar4:baz4:foo4*","bar4*":"baz4*","etc":"etc"} ' http://restful.com/api
Here's what's going on:
We pass three arguments to the jq program: jq -r <script> <inputfile>.
The -r tells jq to output the results in raw format (that is, please don't escape quotes and stuff).
The script looks like this:
.[] | "some string \(.)"
The first . means take the whole json structure and the [] means iterate through each array element in the structure. The | is a filter that processes each element in the array. The filter is to output a string. We are using \(.) to interpolate the whole element passed into the | filter.
Wow... I've never really explained a jq script before (and it shows). But the crux of it is, we are using jq to find each element in the json array and insert it into a string. Our string is this:
curl -d '<the json dictionary array element>' http://restful.com/api
Ok. And you see the output. It works. But wait a second, we only have output. Let's tell the shell to run each line like this:
jq -r '.[] | "curl -d '\'' \(.) '\'' http://restful.com/api " ' jsonfile | bash
By piping the output to bash, we execute each line that we output. Essentially, we are writing a bash script with jq to curl http://restful.com/api passing the json element as the -d data parameter to POST the json element.
Revisiting for single quote issue
#oguz ismail pointed out that bash will explode if there is a single quote in the json input file. This is true. We can avoid the quote by escaping, but we gain more complexity - making this a non-ideal approach.
Here's the problem input (I just inserted a single quote):
[{"foo":"bar:'baz:foo*","bar*":"baz*","etc":"etc"},
{"foo2":"bar2:baz2:foo2*","bar2*":"baz2*","etc":"etc"},
{"foo3":"bar3:baz3:foo3*","bar3*":"baz3*","etc":"etc"},
{"foo4":"bar4:baz4:foo4*","bar4*":"baz4*","etc":"etc"}]
Notice above that baz is now 'baz. The problem is that a single single quote makes the bash shell complain about unmatched quotes:
$ jq -r '.[] | "curl -d '\'' \(.) '\'' http://restful.com/api " ' jsonfile | bash
bash: line 4: unexpected EOF while looking for matching `"'
bash: line 5: syntax error: unexpected end of file
Here's the solution:
$ jq -r $'.[] | "\(.)" | gsub( "\'" ; "\\\\\'" ) | "echo $\'\(.)\'" ' jsonfile | bash
{"foo":"bar'baz:foo*","bar*":"baz*","etc":"etc"}
{"foo2":"bar2:baz2:foo2*","bar2*":"baz2*","etc":"etc"}
{"foo3":"bar3:baz3:foo3*","bar3*":"baz3*","etc":"etc"}
{"foo4":"bar4:baz4:foo4*","bar4*":"baz4*","etc":"etc"}
Above I am using $'' to quote the jq script. This allows me to escape single quotes using '. I've also changed the curl command to echo so I can test the bash script without bothering the folks at http://restful.com/api.
The 'trick' is to make sure that the bash script we generate also escapes all single quotes with a backslash . So, we have to change ' to \'. That's what gsub is doing.
gsub( "\'" ; "\\\\\'" )
After making that substitution ( ' --> \' ) we pipe the entire string to this:
"echo $\'\(.)\'"
which surrounds the output of gsub with echo $''. Now we are using $' again so the \' is properly understood by bash.
So we wind up with this when we put the curl back in:
jq -r $'.[] | "\(.)" | gsub( "\'" ; "\\\\\'" ) | "curl -d $\'\(.)\' http://restful.com/api " ' jsonfile | bash
Use jq command. This is just example parsing.
for k in $(jq -c '.[]' a.txt); do
echo "hello-" $k
done
Output:
hello- {"foo":"bar:baz:foo*","bar*":"baz*","etc":"etc"}
hello- {"foo2":"bar2:baz2:foo2*","bar2*":"baz2*","etc":"etc"}
hello- {"foo3":"bar3:baz3:foo3*","bar3*":"baz3*","etc":"etc"}
hello- {"foo4":"bar4:baz4:foo4*","bar4*":"baz4*","etc":"etc"}
You can use the $k anywhere inside the loop you want.
for k in $(jq -c '.[]' a.txt); do
curl -d "$k" <url>
done

How to find value of a key in a json response trace file using shell script

I have a response trace file containing below response:
#RESPONSE BODY
#--------------------
{"totalItems":1,"member":[{"name":"name","title":"PatchedT","description":"My des_","id":"70EA96FB313349279EB089BA9DE2EC3B","type":"Product","modified":"2019 Jul 23 10:22:15","created":"2019 Jul 23 10:21:54",}]}
I need to fetch the value of the "id" key in a variable which I can put in my further code.
Expected result is
echo $id - should give me 70EA96FB313349279EB089BA9DE2EC3B value
With valid JSON (remove first to second row with sed and parse with jq):
id=$(sed '1,2d' file | jq -r '.member[]|.id')
Output to variable id:
70EA96FB313349279EB089BA9DE2EC3B
I would strongly suggest using jq to parse json.
But given that json is mostly compatible with python dictionaries and arrays, this HACK would work too:
$ cat resp
#RESPONSE BODY
#--------------------
{"totalItems":1,"member":[{"name":"name","title":"PatchedT","description":"My des_","id":"70EA96FB313349279EB089BA9DE2EC3B","type":"Product","modified":"2019 Jul 23 10:22:15","created":"2019 Jul 23 10:21:54",}]}
$ awk 'NR==3{print "a="$0;print "print a[\"member\"][0][\"id\"]"}' resp | python
70EA96FB313349279EB089BA9DE2EC3B
$ sed -n '3s|.*|a=\0\nprint a["member"][0]["id"]|p' resp | python
70EA96FB313349279EB089BA9DE2EC3B
Note that this code is
1. dirty hack, because your system does not have the right tool - jq
2. susceptible to shell injection attacks. Hence use it ONLY IF you trust the response received from your service.
Quick and dirty (don't use eval):
eval $(cat response_file | tail -1 | awk -F , '{ print $5 }' | sed -e 's/"//g' -e 's/:/=/')
It is based on the exact structure you gave, and hoping there is no , in any value before "id".
Or assign it yourself:
id=$(cat response_file | tail -1 | awk -F , '{ print $5 }' | cut -d: -f2 | sed -e 's/"//g')
Note that you can't access the name field with that trick, as it is the first item of the member array and will be "swallowed" by the { print $2 }. You can use an even-uglier hack to retrieve it though:
id=$(cat response_file | tail -1 | sed -e 's/:\[/,/g' -e 's/}\]//g' | awk -F , '{ print $5 }' | cut -d: -f2 | sed -e 's/"//g')
But, if you can, jq is the right tool for that work instead of ugly hacks like that (but if it works...).
When you can't use jq, you can consider
id=$(grep -Eo "[0-9A-F]{32}" file)
This is only working when the file looks like what I expect, so you might need to add extra checks like
id=$(grep "My des_" file | grep -Eo "[0-9A-F]{32}" | head -1)

Shell Script CURL JSON value to variable

I was wondering how to parse the CURL JSON output from the server into variables.
Currently, I have -
curl -X POST -H "Content: agent-type: application/x-www-form-urlencoded" https://www.toontownrewritten.com/api/login?format=json -d username="$USERNAME" -d password="$PASSWORD" | python -m json.tool
But it only outputs the JSON from the server and then have it parsed, like so:
{
"eta": "0",
"position": "0",
"queueToken": "6bee9e85-343f-41c7-a4d3-156f901da615",
"success": "delayed"
}
But how do I put - for example the success value above returned from the server into a variable $SUCCESS and have the value as delayed & have queueToken as a variable $queueToken and 6bee9e85-343f-41c7-a4d3-156f901da615 as a value?
Then when I use-
echo "$SUCCESS"
it shows this as the output -
delayed
And when I use
echo "$queueToken"
and the output as
6bee9e85-343f-41c7-a4d3-156f901da615
Thanks!
Find and install jq (https://stedolan.github.io/jq/). jq is a JSON parser. JSON is not reliably parsed by line-oriented tools like sed because, like XML, JSON is not a line-oriented data format.
In terms of your question:
source <(
curl -X POST -H "$content_type" "$url" -d username="$USERNAME" -d password="$PASSWORD" |
jq -r '. as $h | keys | map(. + "=\"" + $h[.] + "\"") | .[]'
)
The jq syntax is a bit weird, I'm still working on it. It's basically a series of filters, each pipe taking the previous input and transforming it. In this case, the end result is some lines that look like variable="value"
This answer uses bash's "process substitution" to take the results of the jq command, treat it like a file, and source it into the current shell. The variables will then be available to use.
Here's an example of Extract a JSON value from a BASH script
#!/bin/bash
function jsonval {
temp=`echo $json | sed 's/\\\\\//\//g' | sed 's/[{}]//g' | awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}' | sed 's/\"\:\"/\|/g' | sed 's/[\,]/ /g' | sed 's/\"//g' | grep -w $prop`
echo ${temp##*|}
}
json=`curl -s -X GET http://twitter.com/users/show/$1.json`
prop='profile_image_url'
picurl=`jsonval`
`curl -s -X GET $picurl -o $1.png`
A bash script which demonstrates parsing a JSON string to extract a
property value. The script contains a jsonval function which operates
on two variables, json and prop. When the script is passed the name of
a twitter user it attempts to download the user's profile picture.
You could use perl module on command line:
1st, ensure they is installed, under debian based, you could
sudo apt-get install libjson-xs-perl
But for other OS, you could install perl modules via CPAN (the Comprehensive Perl Archive Network):
cpan App::cpanminus
cpan JSON::XS
Note: You may have to run this with superuser privileges.
then:
curlopts=(-X POST -H
"Content: apent-type: application/x-www-form-urlencoded"
-d username="$USERNAME" -d password="$PASSWORD")
curlurl=https://www.toontownrewritten.com/api/login?format=json
. <(
perl -MJSON::XS -e '
$/=undef;my $a=JSON::XS::decode_json <> ;
printf "declare -A Json=\047(%s)\047\n", join " ",map {
"[".$_."]=\"".$a->{$_}."\""
} qw|queueToken success eta position|;
' < <(
curl "${curlopts[#]}" $curlurl
)
)
The line qw|...| let you precise which variables you want to be driven... This could be replaced by keys $a, but could have to be debugged as some characters is forbiden is associative arrays values names.
echo ${Json[queueToken]}
6bee9e85-343f-41c7-a4d3-156f901da615
echo ${Json[eta]}
0

Parsing JSON array: 'paste' for bash variables?

At first, I parsed an array JSON file with a loop using jshon, but it takes too long.
To speed things up, I thought I could return every value of id from every index, repeat with word another type, put these into variables, and finally join them together before echo-ing. I've done something similar with files using paste, but I get an error complaining that the input is too long.
If there is a more efficient way of doing this in bash without too many dependencies, let me know.
I forgot to mention that I want to have the possibility of colorizing the different parts independently (red id). Also, I don't store the json; it's piped:
URL="http://somewebsitewithanapi.tld?foo=no&bar=yes"
API=`curl -s "$URL"`
id=`echo $API | jshon -a -e id -u`
word=`echo $API | jshon -a -e word -u | sed 's/bar/foo/'`
red='\e[0;31m' blue='\e[0;34`m' #bash colors
echo "${red}$id${x}. ${blue}$word${x}" #SOMEHOW CONCATENATED SIDE-BY-SIDE,
# PRESERVING THE ABILITY TO COLORIZE THEM INDEPENDENTLY.
My input (piped; not a file):
[
{
"id": 1,
"word": "wordA"
},
{
"id": 2,
"word": "wordB"
},
{
"id": 3,
"word": "wordC"
}
]
Tried:
jshon -a -e id -u :
That yields:
1
2
3
And:
jshon -a -e text -u :
That yields:
wordA
wordB
wordC
Expected result after joining:
1 wordA
2 wordB
3 wordC
4 wordD
you can use the json parser jq:
jq '.[] | "\(.id) \(.word)"' jsonfile
It yields:
"1 wordA"
"2 wordB"
"3 wordC"
If you want to get rid of double quotes, pipe the output to sed:
jq '.[] | "\(.id) \(.word)"' jsonfile | sed -e 's/^.\(.*\).$/\1/'
That yields:
1 wordA
2 wordB
3 wordC
UPDATE: See Martin Neal's comment for a solution to remove quotes without an additional sed command.
The paste solution you're thinking of is this:
paste <(jshon -a -e id -u < foo.json) <(jshon -a -e word -u < foo.json)
Of course, you're processing the file twice.
You could also use a language with a JSON library, for example ruby:
ruby -rjson -le '
JSON.parse(File.read(ARGV.shift)).each {|h| print h["id"], " ", h["word"]}
' foo.json
1 wordA
2 wordB
3 wordC
API=$(curl -s "$URL")
# store ids and words in arrays
id=( $(jshon -a -e id -u <<< "$API") )
word=( $(jshon -a -e word -u <<< "$API" | sed 's/bar/foo/') )
red='\e[0;31m';
blue='\e[0;34m'
x='\e[0m'
for (( i=0; i<${#id[#]}; i++ )); do
printf "%s%s%s %s%s%s\n" "$red" "${id[i]}" "$x" \
"$blue" "${word[i]}" "$x"
done
I would go with Birei's solution but if your output is constrained along the lines of your sample, the following may work (with GNU grep)
paste -d ' ' <(grep -oP '(?<=id": ).*(?=,)' file.txt) \
<(grep -oP '(?<=word": ").*(?=",)' file.txt)

How to iterate through json in bash script

I have the json as below, i need to get only the mail from the above json in bash script
value={"count":5,"users":[{"username":"asa","name":"asa
Tran","mail":"asa#xyz.com"},{"username":"qq","name":"qq
Morris","mail":"qq#xyz.com"},{"username":"qwe","name":"qwe
Org","mail":"qwe#xyz.com"}]}
Output can be as
mail=asa#xyz.com,qq#xyz.com,qwe#xyz.com
All the above need to be done in the bash script (.sh)
I have already tried with the array iteration as but of no use
for key in "${!value[#]}"
do
#echo "key = $key"
echo "value = ${value[$key]}"
done
Even i have tried with the array conversion as
alias json-decode="php -r
'print_r(json_decode(file_get_contents(\"php://stdin\"),1));'"
value=$(curl --user $credentials -k $endPoint | json-decode)
Still i was not able to get the specific output.
jq is the tool to iterate through a json. In your case:
while read user; do
jq -r '.mail' <<< $user
done <<< $(jq -c '.users[]' users.json)
would give:
asa#xyz.com
qq#xyz.com
qwe#xyz.com
NOTE: I removed "value=" because that is not valid json. Users.json contains:
{"count":5,"users":[{"username":"asa","name":"asa Tran","mail":"asa#xyz.com"},{"username":"qq","name":"qq Morris","mail":"qq#xyz.com"},{"username":"qwe","name":"qwe Org","mail":"qwe#xyz.com"}]}
If this is valid json and the email field is the only one containing a # character, you can do something like this:
echo $value | tr '"' '\n' | grep #
It replaces double-quotes by new line character and only keeps lines containing #. It is really not json parsing, but it works.
You can store the result in a bash array
emails=($(echo $value | tr '"' '\n' | grep #))
and iterate on them
for email in ${emails[#]}
do
echo $email
done
You should use json_pp tool (in debian, it is part of the libjson-pp-perl package)
One would use it like this :
cat file.json | json_pp
And get a pretty print for your json.
So in your case, you could do :
#!/bin/bash
MAILS=""
LINES=`cat test.json | json_pp | grep '"mail"' | sed 's/.* : "\(.*\)".*/\1/'`
for LINE in $LINES ; do
MAILS="$LINE,$MAILS"
done
echo $MAILS | sed 's/.$//'
Output :
qwe#xyz.com,qq#xyz.com,asa#xyz.com
Using standard unix toolbox : sed command
cat so.json | sed "s/},/\n/g" | sed 's/.*"mail":"\([^"]*\)".*/\1/'
With R you could do this as follows:
$ value={"count":5,"users":[{"username":"asa","name":"asa Tran","mail":"asa#xyz.com"},{"username":"qq","name":"qq Morris","mail":"qq#xyz.com"},{"username":"qwe","name":"qwe Org","mail":"qwe#xyz.com"}]}
$ echo $value | R path users | R map path mail
["asa#xyz.com", "qq#xyz.com", "qwe#gyz.com"]