Bash: Check json response and write to a file if string exists - json

I curl an endpoint for a json response and write the response to a file.
So far I've got a script that:
1). does the curl if the file does not exist and
2). else sets a variable
#!/bin/bash
instance="server1"
curl=$(curl -sk https://my-app-api.com | python -m json.tool)
json_response_file="/tmp/file"
if [ ! -f ${json_response_file} ] ; then
${curl} > ${json_response_file}
instance_info=$(cat ${json_response_file})
else
instance_info=$(cat ${json_response_file})
fi
The problem is, the file may exist with a bad response or is empty.
Possibly using bash until, I'd like to
(1). check (using JQ) that a field in the curl response contains $instance and only then write the file.
(2). retry the curl XX number of times until the response contains $instance
(3). write the file once the response contains $instance
(4). set the variable instance_info=$(cat ${json_response_file}) when the above is done correctly.
I started like this... then got stuck...
until [[ $(/usr/bin/jq --raw-output '.server' <<< ${curl}) = $instance ]]
do

One sane implementation might look something like this:
retries=10
instance=server1
response_file=filename
# define a function, since you want to run this code multiple times
# the old version only ran curl once and reused that result
fetch() { curl -sk https://my-app-api.com; }
instance_info=
for (( retries_left=retries; retries_left > 0; retries_left-- )); do
content=$(fetch)
server=$(jq --raw-output '.server' <<<"$content")
if [[ $server = "$instance" ]]; then
# Writing isn't atomic, but renaming is; doing it this way makes sure that no
# incomplete response will ever exist in response_file. If working in a directory
# like /tmp where others users may have write, use $(mktemp) to create a tempfile with
# a random name to avoid security risk.
printf '%s\n' "$content" >"$response_file.tmp" \
&& mv "$response_file.tmp" "$response_file"
instance_info=$content
break
fi
done
[[ $instance_info ]] || { echo "ERROR: Giving up after $retries retries" >&2; }

Related

JQ Group Multiple Files

I have a set of JSON that all contain JSON in the following format:
File 1:
{ "host" : "127.0.0.1", "port" : "80", "data": {}}
File 2:
{ "host" : "127.0.0.2", "port" : "502", "data": {}}
File 3:
{ "host" : "127.0.0.1", "port" : "443", "data": {}}
These files can be rather large, up to several gigabytes.
I want to use JQ or some other bash json processing tool that can merge these json files into one file with a grouped format like so:
[{ "host" : "127.0.0.1", "data": {"80": {}, "443" : {}}},
{ "host" : "127.0.0.2", "data": {"502": {}}}]
Is this possible with jq and if yes, how could I possibly do this? I have looked at the group_by function in jq, but it seems like I need to combine all files first and then group on this big file. However, since the files can be very large, it might make sense to stream the data and group them on the fly.
With really big files, I'd look into a primarily disk based approach instead of trying to load everything into memory. The following script leverages sqlite's JSON1 extension to load the JSON files into a database and generate the grouped results:
#!/usr/bin/env bash
DB=json.db
# Delete existing database if any.
rm -f "$DB"
# Create table. Assuming each host,port pair is unique.
sqlite3 -batch "$DB" <<'EOF'
CREATE TABLE data(host TEXT, port INTEGER, data TEXT,
PRIMARY KEY (host, port)) WITHOUT ROWID;
EOF
# Insert the objects from the files into the database.
for file in file*.json; do
sqlite3 -batch "$DB" <<EOF
INSERT INTO data(host, port, data)
SELECT json_extract(j, '\$.host'), json_extract(j, '\$.port'), json_extract(j, '\$.data')
FROM (SELECT json(readfile('$file')) AS j) as json;
EOF
done
# And display the results of joining the objects Could use
# json_group_array() instead of this sed hackery, but we're trying to
# avoid building a giant string with the entire results. It might still
# run into sqlite maximum string length limits...
sqlite3 -batch -noheader -list "$DB" <<'EOF' | sed '1s/^/[/; $s/,$/]/'
SELECT json_object('host', host,
'data', json_group_object(port, json(data))) || ','
FROM data
GROUP BY host
ORDER BY host;
EOF
Running this on your sample data prints out:
[{"host":"127.0.0.1","data":{"80":{},"443":{}}},
{"host":"127.0.0.2","data":{"502":{}}}]
If the goal is really to produce a single ginormous JSON entity, then presumably that entity is still small enough to have a chance of fitting into the memory of some computer, say C. So there is a good chance of jq being up to the job on C. At any rate, to utilize memory efficiently, you would:
use inputs while performing the grouping operation;
avoid the built-in group_by (since it requires an in-memory sort).
Here then is a two-step candidate using jq, which assumes grouping.jq contains the following program:
# emit a stream of arrays assuming that f is always string-valued
def GROUPS_BY(stream; f):
reduce stream as $x ({}; ($x|f) as $s | .[$s] += [$x]) | .[];
GROUPS_BY(inputs | .data=.port | del(.port); .host)
| {host: .[0].host, data: map({(.data): {}}) | add}
If the JSON files can be captured by *.json, you could then consider:
jq -n -f grouping.jq *.json | jq -s .
One advantage of this approach is that if it fails, you could try using a temporary file to hold the output of the first step, and then processing it later, either by "slurping" it, or perhaps more sensibly distributing it amongst several files, one per .host.
Removing extraneous data
Obviously, if the input files contain extraneous data, you might want to remove it first, e.g. by running
for f in *.json ; do
jq '{host,port}' "$f" | sponge $f
done
or by performing the projection in program.jq, e.g. using:
GROUPS_BY(inputs | {host, data: .port}; .host)
| {host: .[0].host, data: map( {(.data):{}} )}
Here's a script which uses jq to solve the problem without requiring more memory than is needed for the largest group. For simplicity:
it reads *.json and directs output to $OUT as defined at the top of the script.
it uses sponge
#!/usr/bin/env bash
# Requires: sponge
OUT=big.json
/bin/rm -i "$OUT"
if [ -s "$OUT" ] ; then
echo $OUT already exists
exit 1
fi
### Step 0: setup
TDIR=$(mktemp -d /tmp/grouping.XXXX)
function cleanup {
if [ -d "$TDIR" ] ; then
/bin/rm -r "$TDIR"
fi
}
trap cleanup EXIT
### Step 1: find the groups
for f in *.json ; do
host=$(jq -r '.host' "$f")
echo "$f" >> "$TDIR/$host"
done
for f in $TDIR/* ; do
echo $f ...
jq -n 'reduce (inputs | {host, data: {(.port): {} }}) as $in (null;
.host=$in.host | .data += [$in.data])' $(cat $f) | sponge "$f"
done
### Step 2: assembly
i=0
echo "[" > $OUT
find $TDIR -type f | while read f ; do
i=$((i + 1))
if [ $i -gt 1 ] ; then echo , >> $OUT ; fi
cat "$f" >> $OUT
done
echo "]" >> $OUT
Discussion
Besides requiring enough memory to handle the largest group, the main deficiencies of the above implementation are:
it assumes that the .host string is suitable as a file name.
the resultant file is not strictly speaking pretty-printed.
These two issues could however be addressed quite easily with minor modifications to the script without requiring additional memory.

How to get an element of a JSON object in bash?

I'm performing a curl command using a bash file and the return is a json object. How to get a element of this json object in this bash file?
Put request
https://sms.world-text.com/v2.0/sms/send?id=11111&key=Testkey&srcaddr=DA_Health&dstaddr=000000000000&method=PUT&txt=Message_Text_Text
Response
{"status":"1","error":"1000","desc":"Authorisation Failure"}
to="000000000000"
message="Test_message"
url="https://sms.world-text.com/v2.0/sms/send?id=11111&key=TestKey&srcaddr=SMSMsg&dstaddr=${to}&method=PUT&txt=${message}"
return=$(curl -sm 5 $url --data-urlencode "${message}" -A 'Test')
Finally, the "return" variable has the value below:
{"status":"1","error":"1000","desc":"Authorisation Failure"}
I expect to perform that validation
if [[ "$status" != 0]]; then
&2 echo "$return"
fi
But how can I get the element "status" and his value "1" from $return in the bash file?
it's ideologically wrong to process JSON format with JSON-agnostic tools (like awk, sed, etc). JSON format must be processed with JSON-aware tools.
E.g., if your curl response was a multi-line JSON (which is quite often the case), then most likely the sed based solution would not work right for you.
One of the unix utilities to work with JSON is jtc, with that one your solution would look like this:
status=$(<<<$return jtc -w[status] -qq)
and then you can apply your check:
if [[ "$status" != 0]]; then
>&2 echo "$return"
fi
PS> Disclosure: I'm the creator of the jtc - shell cli tool for JSON operations
You could use sed to filter out the status code from the $result.
status=$(echo "$return" | sed -E 's/\{"status"\s?:\s?"([0-9]+)".*/\1/')
and then can do test:
if [[ "$status" != 0]]; then
>&2 echo "$return"
fi
You can use jq. Here is a comprehensive guide about the tool -> https://stedolan.github.io/jq/tutorial/.
e.g.
echo '{ "foo": 123, "bar": 456 }' | jq '.foo'
This would print 123.
It also works with nested objects.

Newbie: unix bash, nested if statement, results from a loop results from sql

Newbie here, please pardon any confusing wording that I use.
A common task I have is to take a list of names and do a MySQL query to look the names up in a table and see if they are "live" on our site.
Doing this one at a time, my SQL query works fine. I then wanted to do the query using a loop from a file listing multiple names. This works fine, too.
I added this query loop to my bash profile so that I can quickly do the task by typing this:
$ ValidOnSite fileName
This works fine, and I even added an usage statement for my process to remind myself of the syntax. Below is what I have that works fine:
validOnSite() {
if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
echo "Usage:"
echo " $ validOnSite [filename]"
echo " Where validOnSite uses specified file as variables in sql query:"
echo " SELECT name, active FROM dbDb WHERE name=lines in file"
else
cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name, active from dbDb where name='$line'" ; done
fi
Using a file "list.txt" which contains:
nameA
nameB
I would then type:
validOnSite list.txt
and both entries in list.txt meet my query criteria and are found in sql. My results will be:
nameA 1
nameB 1
Note the "1" after each result. I assume this is some sort of "yes" status.
Now, I add a third name to my list.txt, one that I know is not a match in sql. Now list.txt contains:
nameA
nameB
foo
When I again run this command for my list with 3 rows:
validOnSite list.txt
My results are the same as when I used the 1st version of file.txt, and I cannot see which lines failed, I still only see which lines were a success:
nameA 1
nameB 1
I have been trying all kinds of things to add a nested if statement, something that says, "If $line is a match, echo "pass", else echo "fail."
I do not want to see a "1" in my results. Using file.txt with 2 matches and 1 non-match, I would like my results to be:
nameA pass
nameB pass
foo fail
Or even better, color code a pass with green and a fail with red.
As I said, newbie here... :)
Any pointers in the right direction would help. Here is my latest sad attempt, but I realize I may be going in a wrong direction entirely:
validOnSite() {
if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
echo "Usage:"
echo " $ validOnSite [filename]"
echo " Where validOnSite uses specified file as variables in sql query:"
echo " SELECT name, active FROM dbDb WHERE name=lines in file"
else
cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name, active from dbDb where name='$line'" > /dev/null ; done
if ( "status") then
echo $line "failed"
echo $line "failed" >> outfile
else
echo $line "ok"
echo $line "ok" >>outfile
clear
cat outfile
fi
fi
If something looks crazy in my last attempt, it's because it is - I am just googling around and trying as many things as I can while trying to learn. Any help appreciated, I feel stuck after working on this for a long time, but I am excited to move forward and find a solution! I think there is something I'm missing about understanding stdout, and also confusion about nested if's.
Note: I do not need an outfile, but it's ok if one is needed to accomplish the goal. stdout result alone would suffice, and is preferred.
Note: hgssql is just the name of our MySQL server. The MySQL part works fine, I am looking for a better way to deal with my bash output, and I think there is something about stderr that I'm missing. I'm looking for a fairly simple answer as I'm a newbie!
I guess, by hgsql you mean some Mercurial extension that allows to perform MySQL queries. I don't know how hgsql works, but I know that MySQL returns only the matching rows. But in terms of shell scripting, the result is a string that may contain extra information even if the number of matched rows is zero. For example, some MySQL client may return the header or a string like "No rows found", although it is unlikely.
I'll show how it is done with the official mysql client. I'm sure you will manage to adapt hgsql with the help of its documentation to the following example.
if [ -t 1 ]; then
red_color=$(tput setaf 1)
green_color=$(tput setaf 2)
reset_color=$(tput sgr0)
else
red_color=
green_color=
reset_color=
fi
colorize_flag() {
local color
if [ "$1" = 'fail' ]; then
color="$red_color"
else
color="$green_color"
fi
printf '%s' "${color}${1}${reset_color}"
}
sql_fmt='SELECT IF(active, "pass", "fail") AS flag FROM dbDb WHERE name = "%s"'
while IFS= read -r line; do
sql=$(printf "$sql_fmt" "$line")
flag=$(mysql --skip-column-names dbname -e "$sql")
[ -z "$flag" ] && flag='fail'
printf '%-20s%s\n' "$line" "$(colorize_flag "$flag")"
done < file
The first block detects if the script is running in interactive mode by checking if the file descriptor 1 (standard output) is opened on a terminal (see help test). If it is opened in a terminal, the script considers that the script is running interactively, i.e. the standard output is connected to the user's terminal directly, but not via pipe, for example. For interactive mode, it assigns variables to the terminal color codes with the help of tput command.
colorize_flag function accepts a string ($1) and outputs the string with the color codes applied according to its value.
The last block reads file line by line. For each line builds an SQL query string (sql) and invokes mysql command with the column names stripped off the output. The output of the mysql command is assigned to flag by means of command substitution. If "$flag" is empty, it is assigned to 'fail'. The $line and the colorized flag are printed to standard output.
You can test the non-interactive mode by chaining the output via pipe, e.g.:
./script | tee -a
I must warn you that it is generally bad idea to pass the shell variables into SQL queries unless the values are properly escaped. And the popular shells do not provide any tools to escape MySQL strings. So consider running the queries in Perl, PHP, or any programming language that is capable of building and running the queries safely.
Also note that in terms of performance it is better to run a single query and then parse the result set in a loop instead of running multiple queries in a loop, with the exception of prepared statements.
I found a way to get to my solution by piecing together the few basic things that I know. Not elegant, but it works well enough for now. I created a file "[filename]Results" with the output:
nameA 1
nameB 1
I then cut out the "1"s and made a new file. I then did a comparison with "[fileName]results" to list.txt in order to see what lines exist in file.txt but do not exist in results.
Note: I have the following in my .zshrc file.
validOnSite() {
if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
echo "Usage:"
echo " $ validOnSite [filename]"
echo " Where validOnSite uses specified file as variables in sql query:"
echo " SELECT name, active FROM dbDb WHERE name=lines in file"
else
cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name from dbDb where name='$line' and active='1'" >> $1"Pass"; done
autoload -U colors
colors
echo $fg_bold[magenta]Assemblies active on site${reset_color}
echo
cat $1"Pass"
echo
echo $fg_bold[red]Not active or not found on site${reset_color}
comm -23 $1 $1"Pass" 2> /dev/null
echo
echo
mv $1"Pass" ~cath/myFiles/validOnSiteResults
echo "Results file containing only active assemblies resides in ~cath/myFiles/validOnSiteResults"
fi
}
list.txt:
nameA
nameB
foo
My input:
validOnSite list.txt
My output:
Assemblies active on site (<--this font is magenta)
nameA
nameB
Not active or not found on site (<--this font is red)
foo
Results file containing only active assemblies resides in ~me/myFiles/validOnRRresults

Periodically reading output from async background scripts

Context: I'm making my own i3-Bar script to read output from other (asynchronous) scripts running in background, concatenate them and then echo them to i3-Bar itself.
The way I'm passing outputs is in plain files, and I guess (logically) the problem is that the files are sometimes read and written at the same time. The best way to reproduce this behavior is by suspending the computer and then waking it back up - I don't know the exact cause of this, I can only go on what I see from my debug log files.
Main Code: Added comments for clarity
#!/usr/bin/env bash
cd "${0%/*}";
trap "kill -- -$$" EXIT; #The bg. scripts are on a while [ 1 ] loop, have to kill them.
rm -r ../input/*;
mkdir ../input/; #Just in case.
for tFile in ./*; do
#Run all of the available scripts in the current directory in the background.
if [ $(basename $tFile) != "main.sh" ]; then ("$tFile" &); fi;
done;
echo -e '{ "version": 1 }\n['; #I3-Bar can use infinite array of JSON input.
while [ 1 ]; do
input=../input/*; #All of the scripts put their output in this folder as separate text files
input=$(sort -nr <(printf "%s\n" $input));
output="";
for tFile in $input; do
#Read and add all of the files to one output string.
if [ $tFile == "../input/*" ]; then break; fi;
output+="$(cat $tFile),";
done;
if [ "$output" == "" ]; then
echo -e "[{\"full_text\":\"ERR: No input files found\",\"color\":\"#ff0000\"}],\n";
else
echo -e "[${output::-1}],\n";
fi;
sleep 0.2s;
done;
Example Input Script:
#!/usr/bin/env bash
cd "${0%/*}";
while [ 1 ]; do
echo -e "{" \
"\"name\":\"clock\"," \
"\"separator_block_width\":12," \
"\"full_text\":\"$(date +"%H:%M:%S")\"}" > ../input/0_clock;
sleep 1;
done;
The Problem
The problem isn't the script itself, but the fact, that i3-Bar receives a malformed JSON input (-> parse error), and terminates - I'll show such log later.
Another problem is, that the background scripts should run asynchronously, because some need to update every 1 second nad some only every 1 minute, etc. So the use of a FIFO isn't really an option, unless I create some ugly inefficient hacky stuff.
I know there is a need for IPC here, but I have no idea how to effieciently do this.
Script output from randomly crashing - waking up error looks the same
[{ "separator_block_width":12, "color":"#BAF2F8", "full_text":"192.168.1.104 "},{ "separator_block_width":12, "color":"#BAF2F8", "full_text":"100%"}],
[{ "separator_block_width":12, "color":"#BAF2F8", "full_text":"192.168.1.104 "},,],
(Error is created by the second line)
As you see, the main script tries to read the file, doesn't get any output, but the comma is still there -> malformed JSON.
The immediate error is easy to fix: don't append an entry to output if the corresponding file is empty:
for tFile in $input; do
[[ $tFile != "../input/*" ]] &&
[[ -s $tFile ]] &&
output+="$(<$tFile),"
done
There is a potential race condition here, though. Just because a particular input file exists doesn't mean that the data is fully written to it yet. I would change your input scripts to look something like
#!/usr/bin/env bash
cd "${0%/*}";
while true; do
o=$(mktemp)
printf '{"name": "clock", "separator_block_width": 12, "full_text": %(%H:%M:%S)T}\n' > "$o"
mv "$o" ../input/0_clock
sleep 1
done
Also, ${output%,} is a safer way to trim a trailing comma when necessary.

How to parse json response in the shell script?

I am working with bash shell script. I need to execute an URL using shell script and then parse the json data coming from it.
This is my URL - http://localhost:8080/test_beat and the responses I can get after hitting the URL will be from either these two -
{"error": "error_message"}
{"success": "success_message"}
Below is my shell script which executes the URL using wget.
#!/bin/bash
DATA=$(wget -O - -q -t 1 http://localhost:8080/test_beat)
#grep $DATA for error and success key
Now I am not sure how to parse json response in $DATA and see whether the key is success or error. If the key is success, then I will print a message "success" and print $DATA value and exit out of the shell script with zero status code but if the key is error, then I will print "error" and print $DATA value and exit out of the shell script with non zero status code.
How can I parse json response and extract the key from it in shell script?
I don't want to install any library to do this since my JSON response is fixed and it will always be same as shown above so any simpler way is fine.
Update:-
Below is my final shell script -
#!/bin/bash
DATA=$(wget -O - -q -t 1 http://localhost:8080/tester)
echo $DATA
#grep $DATA for error and success key
IFS=\" read __ KEY __ MESSAGE __ <<< "$DATA"
case "$KEY" in
success)
exit 0
;;
error)
exit 1
;;
esac
Does this looks right?
If you are going to be using any more complicated json from the shell and you can install additional software, jq is going to be your friend.
So, for example, if you want to just extract the error message if present, then you can do this:
$ echo '{"error": "Some Error"}' | jq ".error"
"Some Error"
If you try this on the success case, it will do:
$echo '{"success": "Yay"}' | jq ".error"
null
The main advantage of the tool is simply that it fully understands json. So, no need for concern over corner cases and whatnot.
#!/bin/bash
IFS= read -d '' DATA < temp.txt ## Imitates your DATA=$(wget ...). Just replace it.
while IFS=\" read -ra LINE; do
case "${LINE[1]}" in
error)
# ERROR_MSG=${LINE[3]}
printf -v ERROR_MSG '%b' "${LINE[3]}"
;;
success)
# SUCCESS_MSG=${LINE[3]}
printf -v SUCCESS_MSG '%b' "${LINE[3]}"
;;
esac
done <<< "$DATA"
echo "$ERROR_MSG|$SUCCESS_MSG" ## Shows: error_message|success_message
* %b expands backslash escape sequences in the corresponding argument.
Update as I didn't really get the question at first. It should simply be:
IFS=\" read __ KEY __ MESSAGE __ <<< "$DATA"
[[ $KEY == success ]] ## Gives $? = 0 if true or else 1 if false.
And you can examine it further:
case "$KEY" in
success)
echo "Success message: $MESSAGE"
exit 0
;;
error)
echo "Error message: $MESSAGE"
exit 1
;;
esac
Of course similar obvious tests can be done with it:
if [[ $KEY == success ]]; then
echo "It was successful."
else
echo "It wasn't."
fi
From your last comment it can be simply done as
IFS=\" read __ KEY __ MESSAGE __ <<< "$DATA"
echo "$DATA" ## Your really need to show $DATA and not $MESSAGE right?
[[ $KEY == success ]]
exit ## Exits with code based from current $?. Not necessary if you're on the last line of the script.
You probably already have python installed, which has json parsing in the standard library. Python is not a great language for one-liners in shell scripts, but here is one way to use it:
#!/bin/bash
DATA=$(wget -O - -q -t 1 http://localhost:8080/test_beat)
if python -c '
import json, sys
exit(1 if "error" in json.loads(sys.stdin.read()) else 0)' <<<"$DATA"
then
echo "SUCCESS: $DATA"
else
echo "ERROR: $DATA"
exit 1
fi
Given:
that you don't want to use JSON libraries.
and that the response you're parsing is simple and the only thing you care about is the presence of substring "success", I suggest the following simplification:
#!/bin/bash
wget -O - -q -t 1 http://localhost:8080/tester | grep -F -q '"success"'
exit $?
-F tells grep to search for a fixed (literal) string.
-q tells grep to produce no output and instead only reflect via its exit code whether a match was found or not.
exit $? simply exits with grep's exit code ($? is a special variable that reflects the most recently executed command's exit code).
Note that if you all you care about is whether wget's output contains "success", the above pipeline will do - no need to capture wget's output in an aux. variable.