I am having Json file and i am trying to parse it by using below
#!/bin/ksh
while read rec
do
while read line
do
firstname=`echo $line | sed -n -e 's/^.*\(full-name\)/\1/p' | cut -f3 -d'"'`
id=`echo $line | sed -n -e 's/^.*\(id\)/\1/p' | cut -f3 -d'"'`
echo "${firstname}'|'${id}"
done < `echo $rec | nawk 'gsub("}}}}", "\n")' | sed 's/{"results"//g'`
done < /var/tmp/Cloud_test.txt
My sample file is :
{"results":[{"general-info":{"full-name":"TELOS MANAGEMENT","body":{"party":{"xrefs":{"xref":[{"id":"66666"}]}}}},"_id":"91002551"},{"_id":"222222","body":{"party":{"general-info":{"full-name":"DO REUSE"},"xrefs":{"xref":[{"id":"777777"}]}}}}]}
Expected Result:
TELOS MANAGEMENT|66666
DO REUSE|777777
I am facing problem in inside while passing parameter. Its not getting passed line by line. Its passed complete line and result is not coming as expected. Please help to get it fixed.
As pointed out by #l0b0 this kind of problem is best solved using a JSON-aware tool such as jq. Here, then, is a jq solution.
It must be pointed out, however, that the sample input is strangely irregular, so the requirements are not so clear. If the JSON were more regular, the jq solution would be simpler.
In any case, the following jq filter does produce the result as described:
.results[]
| ..
| objects
| select(has("general-info"))
| [(.["general-info"]|.["full-name"]), (.. | .id? // empty)]
| join("|")
Simplification
The second last line above could be simplified to:
[."general-info"."full-name", (.. | .id? // empty)]
This is a more complicated (double) case of this question.
The following works for me:
cat sample.json |
sed -e 's/"full-name"/\n&/g' |
tail -n+2 |
sed -e 's/"full-name":"\([^"]*\).*{"id":"\([^"]*\).*/\1\|\2/'
Related
I have a long list of JSON data, with repeats of contents similar to followings.
Due to the original JSON file is too long, I will just shared the hyperlinks here. This is a result generated from a database called RegulomeDB.
Direct link to the JSON file
I would like to extract specific data (eQTLs) from "method": "eQTLs" and "value": "xxxx", and put them into 2 columns (tab delimited) exactly like below.
Note: "value":"xxxx" is extracted right after "method": "eQTLs"is detected.
eQTLs firstResult, secondResult, thirdResult, ...
In this example, the desired output is:
eQTLs EIF3S8, EIF3CL
I've tried using a python script but was unsuccessful.
import json
with open('file.json') as f:
f_json = json.load(f)
print 'f_json[0]['"method": "eQTLs"'] + "\t" + f_json[0]["value"]
Thank you for your kind help.
Maybe you'll find the JSON-parser xidel useful. It can open urls and can manipulate strings any way you want:
$ xidel -s "https://regulomedb.org/regulome-search/?regions=chr16:28539847-28539848&genome=GRCh37&format=json" \
-e '"eQTLs "||join($json("#graph")()[method="eQTLs"]/value,", ")'
eQTLs EIF3S8, EIF3CL
Or with the XPath/XQuery 3.1 syntax:
-e '"eQTLs "||join($json?"#graph"?*[method="eQTLs"]?value,", ")'
Try this:
cat file.json | grep -iE '"method":\s*"eQTLs"[^}]*' -o | cut -d ',' -f 1,5 | sed -r 's/"|:|method|value//gi' | sed 's/\s*eqtls,\s*//gi' | tr '\n' ',' | sed 's/,$/\n/g' | sed 's/,/, /g' | xargs echo -e 'eQTLs\x09'
Piggybacking off of this question I have a command (running in a Docker container) where I am trying to sed to replace an expression with a JSON string generated by jq.
Tiny backstory:
I have a whitelist of env vars in a file tmp.txt:
ENV_VAR_A
ENV_VAR_B
ENV_VAR_C
I use jq using the answer in the previous thread to generate a JSON string like this:
jq -Rn '[inputs | {(.): env[.]}] | add' ./tmp.txt
# GENERATES { "ENV_VAR_A": "a val", "ENV_VAR_B": "a val", "ENV_VAR_C": "a val"}
Amazing! Now I am trying to use sed (as a Docker CMD) to do replace something:
# CMD sed -i 's#{{SOME_PATTERN}}#'$( jq -Rn '[inputs | {(.): env[.]}] | add' ./etc/nginx/conf.d/env)'#' ./somefile
But I am getting:
sed: -e expression #1, char 22: unterminated `s' command
So something went wrong the substitution - but I am not nearly knowledgeable enough in shell to figure out how to fix it, I feel like I have to move some quotes/delimiters around, or maybe pipe my jq to something to "clean up" the json string before I substitute, but I'm not sure what.
Looking for some sed-fu, can anyone help?
This is a bit tricky since replacement string has many lines. You can try this sed with a process substitution:
sed -i -e '/{{SOME_PATTERN}}/r '<( jq -Rn '[inputs | {(.): env[.]}] | add' /etc/nginx/conf.d/env) -e '//d' somefile
Make sure you're using bash.
With a bit modified jq command that produces single line output, you can just do:
sed -i 's/{{SOME_PATTERN}}/'"$(jq -nRc '[inputs | {(.): env[.]}] | add' /etc/nginx/conf.d/env)"'/' somefile
#Adam asked:
what would that look like?
If your jq has the --rawfile option, there should be no need to juggle jq and sed:
< somefile jq -R --rawfile text tmp.txt '
($text
| split("\n")
| map(select(length>0)
| {(.): env[.]}) | add) as $json
| sub("{{SOME_PATTERN}}"; $json|tostring)'
I have a response trace file containing below response:
#RESPONSE BODY
#--------------------
{"totalItems":1,"member":[{"name":"name","title":"PatchedT","description":"My des_","id":"70EA96FB313349279EB089BA9DE2EC3B","type":"Product","modified":"2019 Jul 23 10:22:15","created":"2019 Jul 23 10:21:54",}]}
I need to fetch the value of the "id" key in a variable which I can put in my further code.
Expected result is
echo $id - should give me 70EA96FB313349279EB089BA9DE2EC3B value
With valid JSON (remove first to second row with sed and parse with jq):
id=$(sed '1,2d' file | jq -r '.member[]|.id')
Output to variable id:
70EA96FB313349279EB089BA9DE2EC3B
I would strongly suggest using jq to parse json.
But given that json is mostly compatible with python dictionaries and arrays, this HACK would work too:
$ cat resp
#RESPONSE BODY
#--------------------
{"totalItems":1,"member":[{"name":"name","title":"PatchedT","description":"My des_","id":"70EA96FB313349279EB089BA9DE2EC3B","type":"Product","modified":"2019 Jul 23 10:22:15","created":"2019 Jul 23 10:21:54",}]}
$ awk 'NR==3{print "a="$0;print "print a[\"member\"][0][\"id\"]"}' resp | python
70EA96FB313349279EB089BA9DE2EC3B
$ sed -n '3s|.*|a=\0\nprint a["member"][0]["id"]|p' resp | python
70EA96FB313349279EB089BA9DE2EC3B
Note that this code is
1. dirty hack, because your system does not have the right tool - jq
2. susceptible to shell injection attacks. Hence use it ONLY IF you trust the response received from your service.
Quick and dirty (don't use eval):
eval $(cat response_file | tail -1 | awk -F , '{ print $5 }' | sed -e 's/"//g' -e 's/:/=/')
It is based on the exact structure you gave, and hoping there is no , in any value before "id".
Or assign it yourself:
id=$(cat response_file | tail -1 | awk -F , '{ print $5 }' | cut -d: -f2 | sed -e 's/"//g')
Note that you can't access the name field with that trick, as it is the first item of the member array and will be "swallowed" by the { print $2 }. You can use an even-uglier hack to retrieve it though:
id=$(cat response_file | tail -1 | sed -e 's/:\[/,/g' -e 's/}\]//g' | awk -F , '{ print $5 }' | cut -d: -f2 | sed -e 's/"//g')
But, if you can, jq is the right tool for that work instead of ugly hacks like that (but if it works...).
When you can't use jq, you can consider
id=$(grep -Eo "[0-9A-F]{32}" file)
This is only working when the file looks like what I expect, so you might need to add extra checks like
id=$(grep "My des_" file | grep -Eo "[0-9A-F]{32}" | head -1)
I have a JSON database change log, output of wal2json. It looks like this:
{"xid":1190,"timestamp":"2018-07-19 17:18:02.905354+02","change":[
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}]}
...
Each top level entry (xid) is a transaction, each item in change is, well, a change. One row may change multiple times.
To import to an OLAP system with limited feature set, I need to have the order explicitly stated. So I need to add a sn for each change in a transaction.
Also, each change must be a top level entry - the OLAP can't iterate sub-items within one entry.
{"xid":1190, "sn":1, "kind":"update", "data":{"id":401,"name":"Update AA","age":20} }
{"xid":1190, "sn":2, "kind":"update", "data":{"id":401,"name":"Update BB","age":20} }
{"xid":1191, "sn":1, "kind":"insert", "data":{"id":625,"name":"Inserted","age":20} }
{"xid":1191, "sn":2, "kind":"delete", "data":{"id":625} }
(The reason is that the OLAP has limited ability to transform the data during import, and also doesn't have the order as a parameter.)
So, I do this using jq:
function transformJsonDataStructure {
## First let's reformat it to XML, then transform using XPATH, then back to JSON.
## Example input:
# {"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change":[
# {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
# {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}]}
cat "$1" | while read -r LINE ; do
XID=`echo "$LINE" | jq -c '.xid'`;
export SN=0;
#serr "{xid: $XID, changes: $CHANGES}";
echo "$LINE" | jq -c '.change[]' | while read -r CHANGE ; do
SN=$((SN+=1))
KIND=`echo "$CHANGE" | jq -c --raw-output .kind`;
TABLE=`echo "$CHANGE" | jq -c --raw-output .table`;
DEST_FILE="$TARGET_PATH-$TABLE.json";
case "$KIND" in
update|insert)
MAP=$(convertTwoArraysToMap "$(echo "$CHANGE" | jq -c ".columnnames")" "$(echo "$CHANGE" | jq -c ".columnvalues")") ;;
delete)
MAP=$(convertTwoArraysToMap "$(echo "$CHANGE" | jq -c ".oldkeys.keynames")" "$(echo "$CHANGE" | jq -c ".oldkeys.keyvalues")") ;;
esac
#echo "{\"xid\":$XID, \"table\":\"$TABLE\", \"kind\":\"$KIND\", \"data\":$MAP }" >> "$DEST_FILE"; ;;
echo "{\"xid\":$XID, \"sn\":$SN, \"kind\":\"$KIND\", \"data\":$MAP }" | tee --append "$DEST_FILE";
done;
done;
return;
}
The problem is the performance. I am calling jq few times per entry. This is quite slow, around 1000x times slower than without the transformation.
How can perform the transformation above using just one pass? (jq is not a must, other tool can be used too, but should be in CentOS packages. I want to avoid coding an extra tool for that.
From man jq it seems that it could be capable of processing the whole file (JSON entry per row) in one go. I could do it in XSLT but I can't wrap my head around jq. Especially the iteration of the change array and combining columnnames and columnvalues to a map.
For the iteration, I think map or map_values could be used.
For the 2 arrays to map, I see the from_entries and with_entries functions, but can't get it work.
Any jq master around to advise?
The following helper function converts the incoming array into an object using headers as the keys:
def objectify(headers):
[headers, .] | transpose | map({(.[0]): .[1]}) | add;
The trick now is to use range(0;length) to generate .sn:
{xid} +
(.change
| range(0;length) as $i
| .[$i]
| .columnnames as $header
| {sn: ($i + 1),
kind,
data: (.columnvalues|objectify($header)) } )
Output
For the given log entry, the output would be:
{"xid":1190,"sn":1,"kind":"update","data":{"id":401,"name":"Update AA","age":20}}
{"xid":1190,"sn":2,"kind":"update","data":{"id":401,"name":"Update BB","age":20}}
Moral
If a solution looks too complicated, it probably is.
I have the json as below, i need to get only the mail from the above json in bash script
value={"count":5,"users":[{"username":"asa","name":"asa
Tran","mail":"asa#xyz.com"},{"username":"qq","name":"qq
Morris","mail":"qq#xyz.com"},{"username":"qwe","name":"qwe
Org","mail":"qwe#xyz.com"}]}
Output can be as
mail=asa#xyz.com,qq#xyz.com,qwe#xyz.com
All the above need to be done in the bash script (.sh)
I have already tried with the array iteration as but of no use
for key in "${!value[#]}"
do
#echo "key = $key"
echo "value = ${value[$key]}"
done
Even i have tried with the array conversion as
alias json-decode="php -r
'print_r(json_decode(file_get_contents(\"php://stdin\"),1));'"
value=$(curl --user $credentials -k $endPoint | json-decode)
Still i was not able to get the specific output.
jq is the tool to iterate through a json. In your case:
while read user; do
jq -r '.mail' <<< $user
done <<< $(jq -c '.users[]' users.json)
would give:
asa#xyz.com
qq#xyz.com
qwe#xyz.com
NOTE: I removed "value=" because that is not valid json. Users.json contains:
{"count":5,"users":[{"username":"asa","name":"asa Tran","mail":"asa#xyz.com"},{"username":"qq","name":"qq Morris","mail":"qq#xyz.com"},{"username":"qwe","name":"qwe Org","mail":"qwe#xyz.com"}]}
If this is valid json and the email field is the only one containing a # character, you can do something like this:
echo $value | tr '"' '\n' | grep #
It replaces double-quotes by new line character and only keeps lines containing #. It is really not json parsing, but it works.
You can store the result in a bash array
emails=($(echo $value | tr '"' '\n' | grep #))
and iterate on them
for email in ${emails[#]}
do
echo $email
done
You should use json_pp tool (in debian, it is part of the libjson-pp-perl package)
One would use it like this :
cat file.json | json_pp
And get a pretty print for your json.
So in your case, you could do :
#!/bin/bash
MAILS=""
LINES=`cat test.json | json_pp | grep '"mail"' | sed 's/.* : "\(.*\)".*/\1/'`
for LINE in $LINES ; do
MAILS="$LINE,$MAILS"
done
echo $MAILS | sed 's/.$//'
Output :
qwe#xyz.com,qq#xyz.com,asa#xyz.com
Using standard unix toolbox : sed command
cat so.json | sed "s/},/\n/g" | sed 's/.*"mail":"\([^"]*\)".*/\1/'
With R you could do this as follows:
$ value={"count":5,"users":[{"username":"asa","name":"asa Tran","mail":"asa#xyz.com"},{"username":"qq","name":"qq Morris","mail":"qq#xyz.com"},{"username":"qwe","name":"qwe Org","mail":"qwe#xyz.com"}]}
$ echo $value | R path users | R map path mail
["asa#xyz.com", "qq#xyz.com", "qwe#gyz.com"]