Parsing JSON array: 'paste' for bash variables? - json

At first, I parsed an array JSON file with a loop using jshon, but it takes too long.
To speed things up, I thought I could return every value of id from every index, repeat with word another type, put these into variables, and finally join them together before echo-ing. I've done something similar with files using paste, but I get an error complaining that the input is too long.
If there is a more efficient way of doing this in bash without too many dependencies, let me know.
I forgot to mention that I want to have the possibility of colorizing the different parts independently (red id). Also, I don't store the json; it's piped:
URL="http://somewebsitewithanapi.tld?foo=no&bar=yes"
API=`curl -s "$URL"`
id=`echo $API | jshon -a -e id -u`
word=`echo $API | jshon -a -e word -u | sed 's/bar/foo/'`
red='\e[0;31m' blue='\e[0;34`m' #bash colors
echo "${red}$id${x}. ${blue}$word${x}" #SOMEHOW CONCATENATED SIDE-BY-SIDE,
# PRESERVING THE ABILITY TO COLORIZE THEM INDEPENDENTLY.
My input (piped; not a file):
[
{
"id": 1,
"word": "wordA"
},
{
"id": 2,
"word": "wordB"
},
{
"id": 3,
"word": "wordC"
}
]
Tried:
jshon -a -e id -u :
That yields:
1
2
3
And:
jshon -a -e text -u :
That yields:
wordA
wordB
wordC
Expected result after joining:
1 wordA
2 wordB
3 wordC
4 wordD

you can use the json parser jq:
jq '.[] | "\(.id) \(.word)"' jsonfile
It yields:
"1 wordA"
"2 wordB"
"3 wordC"
If you want to get rid of double quotes, pipe the output to sed:
jq '.[] | "\(.id) \(.word)"' jsonfile | sed -e 's/^.\(.*\).$/\1/'
That yields:
1 wordA
2 wordB
3 wordC
UPDATE: See Martin Neal's comment for a solution to remove quotes without an additional sed command.

The paste solution you're thinking of is this:
paste <(jshon -a -e id -u < foo.json) <(jshon -a -e word -u < foo.json)
Of course, you're processing the file twice.
You could also use a language with a JSON library, for example ruby:
ruby -rjson -le '
JSON.parse(File.read(ARGV.shift)).each {|h| print h["id"], " ", h["word"]}
' foo.json
1 wordA
2 wordB
3 wordC
API=$(curl -s "$URL")
# store ids and words in arrays
id=( $(jshon -a -e id -u <<< "$API") )
word=( $(jshon -a -e word -u <<< "$API" | sed 's/bar/foo/') )
red='\e[0;31m';
blue='\e[0;34m'
x='\e[0m'
for (( i=0; i<${#id[#]}; i++ )); do
printf "%s%s%s %s%s%s\n" "$red" "${id[i]}" "$x" \
"$blue" "${word[i]}" "$x"
done

I would go with Birei's solution but if your output is constrained along the lines of your sample, the following may work (with GNU grep)
paste -d ' ' <(grep -oP '(?<=id": ).*(?=,)' file.txt) \
<(grep -oP '(?<=word": ").*(?=",)' file.txt)

Related

Data extraction for specific string

I have a long list of JSON data, with repeats of contents similar to followings.
Due to the original JSON file is too long, I will just shared the hyperlinks here. This is a result generated from a database called RegulomeDB.
Direct link to the JSON file
I would like to extract specific data (eQTLs) from "method": "eQTLs" and "value": "xxxx", and put them into 2 columns (tab delimited) exactly like below.
Note: "value":"xxxx" is extracted right after "method": "eQTLs"is detected.
eQTLs firstResult, secondResult, thirdResult, ...
In this example, the desired output is:
eQTLs EIF3S8, EIF3CL
I've tried using a python script but was unsuccessful.
import json
with open('file.json') as f:
f_json = json.load(f)
print 'f_json[0]['"method": "eQTLs"'] + "\t" + f_json[0]["value"]
Thank you for your kind help.
Maybe you'll find the JSON-parser xidel useful. It can open urls and can manipulate strings any way you want:
$ xidel -s "https://regulomedb.org/regulome-search/?regions=chr16:28539847-28539848&genome=GRCh37&format=json" \
-e '"eQTLs "||join($json("#graph")()[method="eQTLs"]/value,", ")'
eQTLs EIF3S8, EIF3CL
Or with the XPath/XQuery 3.1 syntax:
-e '"eQTLs "||join($json?"#graph"?*[method="eQTLs"]?value,", ")'
Try this:
cat file.json | grep -iE '"method":\s*"eQTLs"[^}]*' -o | cut -d ',' -f 1,5 | sed -r 's/"|:|method|value//gi' | sed 's/\s*eqtls,\s*//gi' | tr '\n' ',' | sed 's/,$/\n/g' | sed 's/,/, /g' | xargs echo -e 'eQTLs\x09'

Create a json from given list of filenames in unix script

Hello I am trying to write unix script/command where I have to list out all filenames from given directory with filename format string-{number}.txt(eg: filename-1.txt,filename-2.txt) from which I have to form a json object. any pointers would be helpful.
[{
"filenumber": "1",
"name": "filename-1.txt"
},
{
"filenumber": "2",
"name": "filename-2.txt"
}
]
In the above json file-number should be read from {number} format of the each filename
A single call to jq should suffice :
shopt -s extglob
printf "%s\0" *-+([0-9]).txt | \
jq -sR 'split("\u0000") |
map({filenumber:capture(".*-(?<n>.*)\\.txt").n,
name:.})'
Very easy for the command-line tool xidel and its integrated EXPath File Module:
$ xidel -se '
array{
for $x in file:list(.,false(),"*.txt")
return {
"filenumber":extract($x,"(\d+)\.txt",1),
"name":$x
}
}
'
Intuitively, I'd say you can do this with jq. However, in practice I've rarely been able to achieve what I wanted with jq :-)
With some lunch break puzzling, I've come up with this beauty:
ls | jq -R '{filenumber:input_line_number, name:.}' | jq -s .
Instead of ls you could use any other command that produces a newline separated list of strings.
I have tried with multiple examples to achieve exact use case of mine and finally found this working fine exactly how I wanted Thanks
for file in $(ls *.txt); do file_version=$(echo $file | sed 's/\(^.*-\)\(.*\)\(.txt.*$\)/\2/'); jq -n --arg name "$file_version" --arg path "$file" '{name: $name, name: $path}'; done | jq -n '.urls |= [inputs]'

Arithmetic in web scraping in a shell

so, I have the example code here:
#!/bin/bash
clear
curl -s https://www.cnbcindonesia.com/market-data/currencies/IDR=/USD-IDR |
html2text |
sed -n '/USD\/IDR/,$p' |
sed -n '/Last updated/q;p' |
tail -n-1 |
head -c+6 && printf "\n"
exit 0
this should print out some number range 14000~15000
lets start from the very basic one, what I have to do in order to print result + 1 ? so if the printout is 14000 and increment it to 1 become 14001. I suppose the result of the html2text is not calculatable since it should be something like string output not integer.
the more advance thing i want to know is how to calculate the result of 2 curl results?
What I would do, bash + xidel:
$ num=$(xidel -se '//div[#class="mark_val"]/span[1]/text()' 'https://url')
$ num=$((${num//,/}+1)) # num was 14050
$ echo $num
Output
14051
 Explanations
$((...))
is an arithmetic substitution. After doing the arithmetic, the whole thing is replaced by the value of the expression. See http://mywiki.wooledge.org/ArithmeticExpression
Command Substitution: "$(cmd "foo bar")" causes the command 'cmd' to be executed with the argument 'foo bar' and "$(..)" will be replaced by the output. See http://mywiki.wooledge.org/BashFAQ/002 and http://mywiki.wooledge.org/CommandSubstitution
Bonus
You can compute directly in xidel, thanks Reino using xquery syntax :
$ xidel -s <url> e 'replace(//div[#class="mark_val"]/span[1],",","") + 1'
And to do addition arithmetic of 2 values :
$ xidel -s <url> -e '
let $num:=replace(//div[#class="mark_val"]/span[1],",","")
return $num + $num
'

How to grep two row values from a same json file and display them as columnwise side by side in another file?

example_json_file:
{
"href": "/users/115",
"id": 115,
"username": "test",
"locked": false,
"type": "local",
"effective_groups":
}
Assume we have a lot of these in a file. I want to grep the ID and username values from the json file and put them as comma-separated values side by side columns in another file?
I tried the following to grep
sed -E 's/},\s*{/},\n{/g' user_id_json.txt | grep '"id"'
It helps to grep individual field values. However, I need help in merging two field values like ID and USERNAME and put them in a separate file with comma separated values using a single command.
expected output
test,115
With valid JSON and jq:
jq -r '"\(.id),\(.username)"' file
Output:
115,test
This requires Ed. If we hypothetically assume that each JSON set has the same number of values/lines; which I know we probably shouldn't, then...
#!/bin/sh
cat >> edchop+.txt << EOF
1,8W temp
1,8d
wq
EOF
while [ $(wc -l example_json_file | cut -d' ' -f1) -gt 0 ]
do
ed -s example_json_file < edchop+.txt
name=$(sed -n '/username/p' temp | cut -d':' -f2 | tr -d ' ' | tr -d ',' | tr -d '\"')
id=$(sed -n '/id/p' temp | cut -d':' -f2 | tr -d ' ' | tr -d ',')
echo -e "${name}","${id}" >> example_csvfile
done
rm -v ./edchop+.txt
rm -v ./temp

grep json value of a key name. (busybox without option -P)

I found tons of threads who discussed "how to grep json values".
But unfortunately useless for me and all who using grep from busybox (embedded linux). This grep version doesn't have the option "-P" (perl exp). Only "-E" (Extended Regexp) is available.
BusyBox v1.20.2 () multi-call binary.
Usage: grep [-HhnlLoqvsriwFE] [-m N] [-A/B/C N] PATTERN/-e PATTERN.../-f FILE [FILE]...
Search for PATTERN in FILEs (or stdin)
-H Add 'filename:' prefix
-h Do not add 'filename:' prefix
-n Add 'line_no:' prefix
-l Show only names of files that match
-L Show only names of files that don't match
-c Show only count of matching lines
-o Show only the matching part of line
-q Quiet. Return 0 if PATTERN is found, 1 otherwise
-v Select non-matching lines
-s Suppress open and read errors
-r Recurse
-i Ignore case
-w Match whole words only
-x Match whole lines only
-F PATTERN is a literal (not regexp)
-E PATTERN is an extended regexp
-m N Match up to N times per file
-A N Print N lines of trailing context
-B N Print N lines of leading context
-C N Same as '-A N -B N'
-e PTRN Pattern to match
-f FILE Read pattern from file
I have a json example:
{
"one": "apple",
"two": "banana"
}
Now, I want to extract the value e.g. "apple" from key "one".
grep -E '".*?"' file.json
Just an example how it should look like.
And btw: How to access groups from regex?
I would be grateful for any help or alternatives.
With busybox awk:
busybox awk -F '[:,]' '/"one"/ {gsub("[[:blank:]]+", "", $2); print $2}'
-F '[:,]' sets the field separator as : or ,
/"one"/ {gsub("[[:blank:]]+", "", $2); print $2} macthes if the line contains "one", if so strips off all horizontal whitespace(s) from second field and then printing the field
If you want to strip off the quotes too:
busybox awk -F '[:,]' '/"one"/ {gsub("[[:blank:]\"]+", "", $2); print $2}'
Example:
$ cat file.json
{
"one": "apple",
"two": "banana"
}
$ busybox awk -F '[:,]' '/"one"/ {gsub("[[:blank:]]+", "", $2); print $2}' file.json
"apple"
$ busybox awk -F '[:,]' '/"one"/ {gsub("[[:blank:]\"]+", "", $2); print $2}' file.json
apple
I like simple commands which enhance readability and easy to understand. In your file first we have to remove whitespaces to match the string. For that I usually prefer sed command. After that we can use awk command to find the match.
awk -F: '$1=="one" {print $2}' | sed -r 's/(\t|\s|,)//g' file.json
It will return:
"apple"
Note: I removed Comma(,) which present at end of line. If you need Comma also as output then refer below command.
awk -F: '$1=="one" {print $2}' | sed -r 's/(\t|\s)//g' file.json
It will return:
"apple",
awk solution won't work if a single line json
example:
{"one":"apple","two":"banana"}
following sed will do:
busybox cat file.json | sed -n 's/.*"one":\([^}, ]*\).*/\1/p'