Arithmetic in web scraping in a shell

Arithmetic in web scraping in a shell - html

so, I have the example code here:
#!/bin/bash
clear
curl -s https://www.cnbcindonesia.com/market-data/currencies/IDR=/USD-IDR |
html2text |
sed -n '/USD\/IDR/,$p' |
sed -n '/Last updated/q;p' |
tail -n-1 |
head -c+6 && printf "\n"
exit 0
this should print out some number range 14000~15000
lets start from the very basic one, what I have to do in order to print result + 1 ? so if the printout is 14000 and increment it to 1 become 14001. I suppose the result of the html2text is not calculatable since it should be something like string output not integer.
the more advance thing i want to know is how to calculate the result of 2 curl results?

What I would do, bash + xidel:
$ num=$(xidel -se '//div[#class="mark_val"]/span[1]/text()' 'https://url')
$ num=$((${num//,/}+1)) # num was 14050
$ echo $num
Output
14051
 Explanations
$((...))
is an arithmetic substitution. After doing the arithmetic, the whole thing is replaced by the value of the expression. See http://mywiki.wooledge.org/ArithmeticExpression
Command Substitution: "$(cmd "foo bar")" causes the command 'cmd' to be executed with the argument 'foo bar' and "$(..)" will be replaced by the output. See http://mywiki.wooledge.org/BashFAQ/002 and http://mywiki.wooledge.org/CommandSubstitution
Bonus
You can compute directly in xidel, thanks Reino using xquery syntax :
$ xidel -s <url> e 'replace(//div[#class="mark_val"]/span[1],",","") + 1'
And to do addition arithmetic of 2 values :
$ xidel -s <url> -e '
let $num:=replace(//div[#class="mark_val"]/span[1],",","")
return $num + $num
'

Related

Get specific string line from file bash

I have a file with this kind of text with pattern
[{"foo":"bar:baz:foo*","bar*":"baz*","etc":"etc"},
{"foo2":"bar2:baz2:foo2*","bar2*":"baz2*","etc":"etc"},
{"foo3":"bar3:baz3:foo3*","bar3*":"baz3*","etc":"etc"},
{"foo4":"bar4:baz4:foo4*","bar4*":"baz4*","etc":"etc"}]
I need to take every string like this
{"foo":"bar:baz:foo*","bar*":"baz*","etc":"etc"} and send each of them to some url via curl
for i in text.txt
do (awk,sed,grep etc)
then curl $string
I can't figure out how to get the desired lines properly from the file without unnecessary symbols

I suggest that you can use jq to process your json file. jq is capable of reading json, and formatting output. Here's an example jq script to process your json file (which I unimaginatively call 'jsonfile'):
jq -r '.[] | "curl -d '\'' \(.) '\'' http://restful.com/api " ' jsonfile
Here's the output:
curl -d ' {"foo":"bar:baz:foo*","bar*":"baz*","etc":"etc"} ' http://restful.com/api
curl -d ' {"foo2":"bar2:baz2:foo2*","bar2*":"baz2*","etc":"etc"} ' http://restful.com/api
curl -d ' {"foo3":"bar3:baz3:foo3*","bar3*":"baz3*","etc":"etc"} ' http://restful.com/api
curl -d ' {"foo4":"bar4:baz4:foo4*","bar4*":"baz4*","etc":"etc"} ' http://restful.com/api
Here's what's going on:
We pass three arguments to the jq program: jq -r <script> <inputfile>.
The -r tells jq to output the results in raw format (that is, please don't escape quotes and stuff).
The script looks like this:
.[] | "some string \(.)"
The first . means take the whole json structure and the [] means iterate through each array element in the structure. The | is a filter that processes each element in the array. The filter is to output a string. We are using \(.) to interpolate the whole element passed into the | filter.
Wow... I've never really explained a jq script before (and it shows). But the crux of it is, we are using jq to find each element in the json array and insert it into a string. Our string is this:
curl -d '<the json dictionary array element>' http://restful.com/api
Ok. And you see the output. It works. But wait a second, we only have output. Let's tell the shell to run each line like this:
jq -r '.[] | "curl -d '\'' \(.) '\'' http://restful.com/api " ' jsonfile | bash
By piping the output to bash, we execute each line that we output. Essentially, we are writing a bash script with jq to curl http://restful.com/api passing the json element as the -d data parameter to POST the json element.
Revisiting for single quote issue
#oguz ismail pointed out that bash will explode if there is a single quote in the json input file. This is true. We can avoid the quote by escaping, but we gain more complexity - making this a non-ideal approach.
Here's the problem input (I just inserted a single quote):
[{"foo":"bar:'baz:foo*","bar*":"baz*","etc":"etc"},
{"foo2":"bar2:baz2:foo2*","bar2*":"baz2*","etc":"etc"},
{"foo3":"bar3:baz3:foo3*","bar3*":"baz3*","etc":"etc"},
{"foo4":"bar4:baz4:foo4*","bar4*":"baz4*","etc":"etc"}]
Notice above that baz is now 'baz. The problem is that a single single quote makes the bash shell complain about unmatched quotes:
$ jq -r '.[] | "curl -d '\'' \(.) '\'' http://restful.com/api " ' jsonfile | bash
bash: line 4: unexpected EOF while looking for matching `"'
bash: line 5: syntax error: unexpected end of file
Here's the solution:
$ jq -r $'.[] | "\(.)" | gsub( "\'" ; "\\\\\'" ) | "echo $\'\(.)\'" ' jsonfile | bash
{"foo":"bar'baz:foo*","bar*":"baz*","etc":"etc"}
{"foo2":"bar2:baz2:foo2*","bar2*":"baz2*","etc":"etc"}
{"foo3":"bar3:baz3:foo3*","bar3*":"baz3*","etc":"etc"}
{"foo4":"bar4:baz4:foo4*","bar4*":"baz4*","etc":"etc"}
Above I am using $'' to quote the jq script. This allows me to escape single quotes using '. I've also changed the curl command to echo so I can test the bash script without bothering the folks at http://restful.com/api.
The 'trick' is to make sure that the bash script we generate also escapes all single quotes with a backslash . So, we have to change ' to \'. That's what gsub is doing.
gsub( "\'" ; "\\\\\'" )
After making that substitution ( ' --> \' ) we pipe the entire string to this:
"echo $\'\(.)\'"
which surrounds the output of gsub with echo $''. Now we are using $' again so the \' is properly understood by bash.
So we wind up with this when we put the curl back in:
jq -r $'.[] | "\(.)" | gsub( "\'" ; "\\\\\'" ) | "curl -d $\'\(.)\' http://restful.com/api " ' jsonfile | bash

Use jq command. This is just example parsing.
for k in $(jq -c '.[]' a.txt); do
echo "hello-" $k
done
Output:
hello- {"foo":"bar:baz:foo*","bar*":"baz*","etc":"etc"}
hello- {"foo2":"bar2:baz2:foo2*","bar2*":"baz2*","etc":"etc"}
hello- {"foo3":"bar3:baz3:foo3*","bar3*":"baz3*","etc":"etc"}
hello- {"foo4":"bar4:baz4:foo4*","bar4*":"baz4*","etc":"etc"}
You can use the $k anywhere inside the loop you want.
for k in $(jq -c '.[]' a.txt); do
curl -d "$k" <url>
done

How to parse JSON in shell script?

I run the curl command $(curl -i -o - --silent -X GET --cert "${CERT}" --key "${KEY}" "$some_url") and save the response in the variable response. ${response} is as shown below
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Length: 34
Connection: keep-alive
Keep-Alive: timeout=5
X-XSS-Protection: 1;
{"status":"running","details":"0"}
I want to parse the JSON {"status":"running","details":"0"} and assign 'running' and 'details' to two different variables where I can print status and details both. Also if the status is equal to error, the script should exit. I am doing the following to achieve the task -
status1=$(echo "${response}" | awk '/^{.*}$/' | jq -r '.status')
details1=$(echo "${response}" | awk '/^{.*}$/' | jq -r '.details')
echo "Status: ${status1}"
echo "Details: ${details1}"
if [[ $status1 == 'error' ]]; then
exit 1
fi
Instead of parsing the JSON twice, I want to do it only once. Hence I want to combine the following lines but still assign the status and details to two separate variables -
status1=$(echo "${response}" | awk '/^{.*}$/' | jq -r '.status')
details1=$(echo "${response}" | awk '/^{.*}$/' | jq -r '.details')

First, stop using the -i argument to curl. That takes away the need for awk (or any other pruning of the header after-the-fact).
Second:
{
IFS= read -r -d '' status1
IFS= read -r -d '' details1
} < <(jq -r '.status + "\u0000" + .details + "\u0000"' <<<"$response")
The advantage of using a NUL as a delimiter is that it's the sole character that can't be present in the value of a C-style string (which is how shell variables' values are stored).

You can use a construction like:
read status1 details1 < <(jq -r '.status + " " + .details' <<< "${response}")
You use read to assign the different inputs to two variables (or an array, if you want), and use jq to print the data you need separated by whitespace.

As Benjamin already suggested, only retrieving the json is a better way to go. Poshi's solution is solid.
However, if you're looking for the most compact to do this, no need to save the response as a variable if the only thing your're going to do with it is extract other variables from it on a one time basis. Just pipe curl directly into:
curl "whatever" | jq -r '[.status, .details] |#tsv'
or
curl "whatever" | jq -r '[.status, .details] |join("\t")'
and you'll get your values fielded for you.

how to get a value from a json key value pair in linux shell scripting

hi i am writings a small shell script. there i use curl command to call to api. what it return is a status of a scan.
{"status":"14"}
i want to get this status and check if it is less than 100; this is what i have done so far
a=0
while [ $a -lt 100 ]
do
curlout=$(curl "http://localhost:9090/JSON/spider/view/status/?zapapiformat=JSON&scanId=0");
echo "$curlout";
a=`expr $a + 1`
done
what i want to do is assign that status to $a; how to get read this json to get the value in shell script

If you need to work with JSON, you should obtain jq:
$ echo '{"status": "14"}' | jq '.status|tonumber'
14
or, less rigorously:
$ echo '{"status": "14"}' | jq -r '.status'
14

If you're sure about the format of the curl output, then it's very simple.
echo "$curlout" | tr -cd '[:digit:]'
From manpage of tr,
-c, -C, --complement
use the complement of SET1
-d, --delete
delete characters in SET1, do not translate
[:digit:]
all digits
So this command removes all characters other than digits.

Shell Script CURL JSON value to variable

I was wondering how to parse the CURL JSON output from the server into variables.
Currently, I have -
curl -X POST -H "Content: agent-type: application/x-www-form-urlencoded" https://www.toontownrewritten.com/api/login?format=json -d username="$USERNAME" -d password="$PASSWORD" | python -m json.tool
But it only outputs the JSON from the server and then have it parsed, like so:
{
"eta": "0",
"position": "0",
"queueToken": "6bee9e85-343f-41c7-a4d3-156f901da615",
"success": "delayed"
}
But how do I put - for example the success value above returned from the server into a variable $SUCCESS and have the value as delayed & have queueToken as a variable $queueToken and 6bee9e85-343f-41c7-a4d3-156f901da615 as a value?
Then when I use-
echo "$SUCCESS"
it shows this as the output -
delayed
And when I use
echo "$queueToken"
and the output as
6bee9e85-343f-41c7-a4d3-156f901da615
Thanks!

Find and install jq (https://stedolan.github.io/jq/). jq is a JSON parser. JSON is not reliably parsed by line-oriented tools like sed because, like XML, JSON is not a line-oriented data format.
In terms of your question:
source <(
curl -X POST -H "$content_type" "$url" -d username="$USERNAME" -d password="$PASSWORD" |
jq -r '. as $h | keys | map(. + "=\"" + $h[.] + "\"") | .[]'
)
The jq syntax is a bit weird, I'm still working on it. It's basically a series of filters, each pipe taking the previous input and transforming it. In this case, the end result is some lines that look like variable="value"
This answer uses bash's "process substitution" to take the results of the jq command, treat it like a file, and source it into the current shell. The variables will then be available to use.

Here's an example of Extract a JSON value from a BASH script
#!/bin/bash
function jsonval {
temp=`echo $json | sed 's/\\\\\//\//g' | sed 's/[{}]//g' | awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}' | sed 's/\"\:\"/\|/g' | sed 's/[\,]/ /g' | sed 's/\"//g' | grep -w $prop`
echo ${temp##*|}
}
json=`curl -s -X GET http://twitter.com/users/show/$1.json`
prop='profile_image_url'
picurl=`jsonval`
`curl -s -X GET $picurl -o $1.png`
A bash script which demonstrates parsing a JSON string to extract a
property value. The script contains a jsonval function which operates
on two variables, json and prop. When the script is passed the name of
a twitter user it attempts to download the user's profile picture.

You could use perl module on command line:
1st, ensure they is installed, under debian based, you could
sudo apt-get install libjson-xs-perl
But for other OS, you could install perl modules via CPAN (the Comprehensive Perl Archive Network):
cpan App::cpanminus
cpan JSON::XS
Note: You may have to run this with superuser privileges.
then:
curlopts=(-X POST -H
"Content: apent-type: application/x-www-form-urlencoded"
-d username="$USERNAME" -d password="$PASSWORD")
curlurl=https://www.toontownrewritten.com/api/login?format=json
. <(
perl -MJSON::XS -e '
$/=undef;my $a=JSON::XS::decode_json <> ;
printf "declare -A Json=\047(%s)\047\n", join " ",map {
"[".$_."]=\"".$a->{$_}."\""
} qw|queueToken success eta position|;
' < <(
curl "${curlopts[#]}" $curlurl
)
)
The line qw|...| let you precise which variables you want to be driven... This could be replaced by keys $a, but could have to be debugged as some characters is forbiden is associative arrays values names.
echo ${Json[queueToken]}
6bee9e85-343f-41c7-a4d3-156f901da615
echo ${Json[eta]}
0

converting bash output to JSON / Dictionary

I am trying to create a JSON compatible output in bash that can be read by nodejs & python:
{"link":XX,"signal":YY,"noise":ZZ}
here's the unfiltered result:
iwconfig wlan0
wlan0 IEEE 802.11bg ESSID:"wifi#someplace" Nickname:"<WIFI#REALTEK>"
Mode:Managed Frequency:2.452 GHz Access Point: C8:4C:75:20:B4:8E
Bit Rate:54 Mb/s Sensitivity:0/0
Retry:off RTS thr:off Fragment thr:off
Encryption key:A022-1191-3A Security mode:open
Power Management:off
Link Quality=100/100 Signal level=67/100 Noise level=0/100
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:0 Missed beacon:0
But after applying my filters:
iwconfig wlan0 | grep Link | tr -d '/100' | tr '=' ' ' | xargs | awk '{printf "{\"link\":"$3",\"signal\":"$6",\"noise\":"$9"}"}'
I am getting erratic and incomplete results:
{"link":98,"signal":6,"noise":}
{"link":Signal,"signal":Noise,"noise":}
The "noise" value is never captured, and sometimes printf returns the wrong chunk.
Is there a more 'reliable' way of doing this ?

The problem with the code in your question is here:
tr -d '/100'
What that command does it simply delete all the characters: '/', '1', '0'.
From the manual for tr,
-d, --delete
delete characters in SET1, do not translate
Thats not something you'd want. What you want is to replace the entire string /100 with "".
Use: sed 's/\/100//g' instead.
... | grep Link | sed 's/\/100//g' | tr '=' ' ' | awk '{printf "{\"link\":"$3",\"signal\":"$6",\"noise\":"$9"}"}'

You could restructure the output using perl, by piping the output through the following command:
perl -n -E 'if ($_ =~ qr{Link Quality=(\d+)/100.*?Signal level=(\d+)/100.*?Noise level=(\d+)/100}) { print qq({"link":$1,"signal":$2,"noise":$3}); }'

Using awk it is quite simple:
awk -F '[ =/]+' '$2=="Link"{printf "{\"%s\":%s,\"%s\":%s,\"%s\":%s}\n",
$2, $5, $6, $8, $10, $12}'
{"Link":100,"Signal":67,"Noise":0}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Arithmetic in web scraping in a shell - html

Related

Get specific string line from file bash

How to parse JSON in shell script?

how to get a value from a json key value pair in linux shell scripting

Shell Script CURL JSON value to variable

converting bash output to JSON / Dictionary

Categories

Resources