Newbie: unix bash, nested if statement, results from a loop results from sql - mysql

Newbie here, please pardon any confusing wording that I use.
A common task I have is to take a list of names and do a MySQL query to look the names up in a table and see if they are "live" on our site.
Doing this one at a time, my SQL query works fine. I then wanted to do the query using a loop from a file listing multiple names. This works fine, too.
I added this query loop to my bash profile so that I can quickly do the task by typing this:
$ ValidOnSite fileName
This works fine, and I even added an usage statement for my process to remind myself of the syntax. Below is what I have that works fine:
validOnSite() {
if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
echo "Usage:"
echo " $ validOnSite [filename]"
echo " Where validOnSite uses specified file as variables in sql query:"
echo " SELECT name, active FROM dbDb WHERE name=lines in file"
else
cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name, active from dbDb where name='$line'" ; done
fi
Using a file "list.txt" which contains:
nameA
nameB
I would then type:
validOnSite list.txt
and both entries in list.txt meet my query criteria and are found in sql. My results will be:
nameA 1
nameB 1
Note the "1" after each result. I assume this is some sort of "yes" status.
Now, I add a third name to my list.txt, one that I know is not a match in sql. Now list.txt contains:
nameA
nameB
foo
When I again run this command for my list with 3 rows:
validOnSite list.txt
My results are the same as when I used the 1st version of file.txt, and I cannot see which lines failed, I still only see which lines were a success:
nameA 1
nameB 1
I have been trying all kinds of things to add a nested if statement, something that says, "If $line is a match, echo "pass", else echo "fail."
I do not want to see a "1" in my results. Using file.txt with 2 matches and 1 non-match, I would like my results to be:
nameA pass
nameB pass
foo fail
Or even better, color code a pass with green and a fail with red.
As I said, newbie here... :)
Any pointers in the right direction would help. Here is my latest sad attempt, but I realize I may be going in a wrong direction entirely:
validOnSite() {
if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
echo "Usage:"
echo " $ validOnSite [filename]"
echo " Where validOnSite uses specified file as variables in sql query:"
echo " SELECT name, active FROM dbDb WHERE name=lines in file"
else
cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name, active from dbDb where name='$line'" > /dev/null ; done
if ( "status") then
echo $line "failed"
echo $line "failed" >> outfile
else
echo $line "ok"
echo $line "ok" >>outfile
clear
cat outfile
fi
fi
If something looks crazy in my last attempt, it's because it is - I am just googling around and trying as many things as I can while trying to learn. Any help appreciated, I feel stuck after working on this for a long time, but I am excited to move forward and find a solution! I think there is something I'm missing about understanding stdout, and also confusion about nested if's.
Note: I do not need an outfile, but it's ok if one is needed to accomplish the goal. stdout result alone would suffice, and is preferred.
Note: hgssql is just the name of our MySQL server. The MySQL part works fine, I am looking for a better way to deal with my bash output, and I think there is something about stderr that I'm missing. I'm looking for a fairly simple answer as I'm a newbie!

I guess, by hgsql you mean some Mercurial extension that allows to perform MySQL queries. I don't know how hgsql works, but I know that MySQL returns only the matching rows. But in terms of shell scripting, the result is a string that may contain extra information even if the number of matched rows is zero. For example, some MySQL client may return the header or a string like "No rows found", although it is unlikely.
I'll show how it is done with the official mysql client. I'm sure you will manage to adapt hgsql with the help of its documentation to the following example.
if [ -t 1 ]; then
red_color=$(tput setaf 1)
green_color=$(tput setaf 2)
reset_color=$(tput sgr0)
else
red_color=
green_color=
reset_color=
fi
colorize_flag() {
local color
if [ "$1" = 'fail' ]; then
color="$red_color"
else
color="$green_color"
fi
printf '%s' "${color}${1}${reset_color}"
}
sql_fmt='SELECT IF(active, "pass", "fail") AS flag FROM dbDb WHERE name = "%s"'
while IFS= read -r line; do
sql=$(printf "$sql_fmt" "$line")
flag=$(mysql --skip-column-names dbname -e "$sql")
[ -z "$flag" ] && flag='fail'
printf '%-20s%s\n' "$line" "$(colorize_flag "$flag")"
done < file
The first block detects if the script is running in interactive mode by checking if the file descriptor 1 (standard output) is opened on a terminal (see help test). If it is opened in a terminal, the script considers that the script is running interactively, i.e. the standard output is connected to the user's terminal directly, but not via pipe, for example. For interactive mode, it assigns variables to the terminal color codes with the help of tput command.
colorize_flag function accepts a string ($1) and outputs the string with the color codes applied according to its value.
The last block reads file line by line. For each line builds an SQL query string (sql) and invokes mysql command with the column names stripped off the output. The output of the mysql command is assigned to flag by means of command substitution. If "$flag" is empty, it is assigned to 'fail'. The $line and the colorized flag are printed to standard output.
You can test the non-interactive mode by chaining the output via pipe, e.g.:
./script | tee -a
I must warn you that it is generally bad idea to pass the shell variables into SQL queries unless the values are properly escaped. And the popular shells do not provide any tools to escape MySQL strings. So consider running the queries in Perl, PHP, or any programming language that is capable of building and running the queries safely.
Also note that in terms of performance it is better to run a single query and then parse the result set in a loop instead of running multiple queries in a loop, with the exception of prepared statements.

I found a way to get to my solution by piecing together the few basic things that I know. Not elegant, but it works well enough for now. I created a file "[filename]Results" with the output:
nameA 1
nameB 1
I then cut out the "1"s and made a new file. I then did a comparison with "[fileName]results" to list.txt in order to see what lines exist in file.txt but do not exist in results.
Note: I have the following in my .zshrc file.
validOnSite() {
if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
echo "Usage:"
echo " $ validOnSite [filename]"
echo " Where validOnSite uses specified file as variables in sql query:"
echo " SELECT name, active FROM dbDb WHERE name=lines in file"
else
cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name from dbDb where name='$line' and active='1'" >> $1"Pass"; done
autoload -U colors
colors
echo $fg_bold[magenta]Assemblies active on site${reset_color}
echo
cat $1"Pass"
echo
echo $fg_bold[red]Not active or not found on site${reset_color}
comm -23 $1 $1"Pass" 2> /dev/null
echo
echo
mv $1"Pass" ~cath/myFiles/validOnSiteResults
echo "Results file containing only active assemblies resides in ~cath/myFiles/validOnSiteResults"
fi
}
list.txt:
nameA
nameB
foo
My input:
validOnSite list.txt
My output:
Assemblies active on site (<--this font is magenta)
nameA
nameB
Not active or not found on site (<--this font is red)
foo
Results file containing only active assemblies resides in ~me/myFiles/validOnRRresults

Related

Periodically reading output from async background scripts

Context: I'm making my own i3-Bar script to read output from other (asynchronous) scripts running in background, concatenate them and then echo them to i3-Bar itself.
The way I'm passing outputs is in plain files, and I guess (logically) the problem is that the files are sometimes read and written at the same time. The best way to reproduce this behavior is by suspending the computer and then waking it back up - I don't know the exact cause of this, I can only go on what I see from my debug log files.
Main Code: Added comments for clarity
#!/usr/bin/env bash
cd "${0%/*}";
trap "kill -- -$$" EXIT; #The bg. scripts are on a while [ 1 ] loop, have to kill them.
rm -r ../input/*;
mkdir ../input/; #Just in case.
for tFile in ./*; do
#Run all of the available scripts in the current directory in the background.
if [ $(basename $tFile) != "main.sh" ]; then ("$tFile" &); fi;
done;
echo -e '{ "version": 1 }\n['; #I3-Bar can use infinite array of JSON input.
while [ 1 ]; do
input=../input/*; #All of the scripts put their output in this folder as separate text files
input=$(sort -nr <(printf "%s\n" $input));
output="";
for tFile in $input; do
#Read and add all of the files to one output string.
if [ $tFile == "../input/*" ]; then break; fi;
output+="$(cat $tFile),";
done;
if [ "$output" == "" ]; then
echo -e "[{\"full_text\":\"ERR: No input files found\",\"color\":\"#ff0000\"}],\n";
else
echo -e "[${output::-1}],\n";
fi;
sleep 0.2s;
done;
Example Input Script:
#!/usr/bin/env bash
cd "${0%/*}";
while [ 1 ]; do
echo -e "{" \
"\"name\":\"clock\"," \
"\"separator_block_width\":12," \
"\"full_text\":\"$(date +"%H:%M:%S")\"}" > ../input/0_clock;
sleep 1;
done;
The Problem
The problem isn't the script itself, but the fact, that i3-Bar receives a malformed JSON input (-> parse error), and terminates - I'll show such log later.
Another problem is, that the background scripts should run asynchronously, because some need to update every 1 second nad some only every 1 minute, etc. So the use of a FIFO isn't really an option, unless I create some ugly inefficient hacky stuff.
I know there is a need for IPC here, but I have no idea how to effieciently do this.
Script output from randomly crashing - waking up error looks the same
[{ "separator_block_width":12, "color":"#BAF2F8", "full_text":"192.168.1.104 "},{ "separator_block_width":12, "color":"#BAF2F8", "full_text":"100%"}],
[{ "separator_block_width":12, "color":"#BAF2F8", "full_text":"192.168.1.104 "},,],
(Error is created by the second line)
As you see, the main script tries to read the file, doesn't get any output, but the comma is still there -> malformed JSON.
The immediate error is easy to fix: don't append an entry to output if the corresponding file is empty:
for tFile in $input; do
[[ $tFile != "../input/*" ]] &&
[[ -s $tFile ]] &&
output+="$(<$tFile),"
done
There is a potential race condition here, though. Just because a particular input file exists doesn't mean that the data is fully written to it yet. I would change your input scripts to look something like
#!/usr/bin/env bash
cd "${0%/*}";
while true; do
o=$(mktemp)
printf '{"name": "clock", "separator_block_width": 12, "full_text": %(%H:%M:%S)T}\n' > "$o"
mv "$o" ../input/0_clock
sleep 1
done
Also, ${output%,} is a safer way to trim a trailing comma when necessary.

Bash: Check json response and write to a file if string exists

I curl an endpoint for a json response and write the response to a file.
So far I've got a script that:
1). does the curl if the file does not exist and
2). else sets a variable
#!/bin/bash
instance="server1"
curl=$(curl -sk https://my-app-api.com | python -m json.tool)
json_response_file="/tmp/file"
if [ ! -f ${json_response_file} ] ; then
${curl} > ${json_response_file}
instance_info=$(cat ${json_response_file})
else
instance_info=$(cat ${json_response_file})
fi
The problem is, the file may exist with a bad response or is empty.
Possibly using bash until, I'd like to
(1). check (using JQ) that a field in the curl response contains $instance and only then write the file.
(2). retry the curl XX number of times until the response contains $instance
(3). write the file once the response contains $instance
(4). set the variable instance_info=$(cat ${json_response_file}) when the above is done correctly.
I started like this... then got stuck...
until [[ $(/usr/bin/jq --raw-output '.server' <<< ${curl}) = $instance ]]
do
One sane implementation might look something like this:
retries=10
instance=server1
response_file=filename
# define a function, since you want to run this code multiple times
# the old version only ran curl once and reused that result
fetch() { curl -sk https://my-app-api.com; }
instance_info=
for (( retries_left=retries; retries_left > 0; retries_left-- )); do
content=$(fetch)
server=$(jq --raw-output '.server' <<<"$content")
if [[ $server = "$instance" ]]; then
# Writing isn't atomic, but renaming is; doing it this way makes sure that no
# incomplete response will ever exist in response_file. If working in a directory
# like /tmp where others users may have write, use $(mktemp) to create a tempfile with
# a random name to avoid security risk.
printf '%s\n' "$content" >"$response_file.tmp" \
&& mv "$response_file.tmp" "$response_file"
instance_info=$content
break
fi
done
[[ $instance_info ]] || { echo "ERROR: Giving up after $retries retries" >&2; }

Bourne shell function return variable always empty

The following Bourne shell script, given a path, is supposed to test each component of the path for existence; then set a variable comprising only those components that actually exist.
#! /bin/sh
set -x # for debugging
test_path() {
path=""
echo $1 | tr ':' '\012' | while read component
do
if [ -d "$component" ]
then
if [ -z "$path" ]
then path="$component"
else path="$path:$component"
fi
fi
done
echo "$path" # this prints nothing
}
paths=/usr/share/man:\
/usr/X11R6/man:\
/usr/local/man
MANPATH=`test_path $paths`
echo $MANPATH
When run, it always prints nothing. The trace using set -x is:
+ paths=/usr/share/man:/usr/X11R6/man:/usr/local/man
++ test_path /usr/share/man:/usr/X11R6/man:/usr/local/man
++ path=
++ echo /usr/share/man:/usr/X11R6/man:/usr/local/man
++ tr : '\012'
++ read component
++ '[' -d /usr/share/man ']'
++ '[' -z '' ']'
++ path=/usr/share/man
++ read component
++ '[' -d /usr/X11R6/man ']'
++ read component
++ '[' -d /usr/local/man ']'
++ '[' -z /usr/share/man ']'
++ path=/usr/share/man:/usr/local/man
++ read component
++ echo ''
+ MANPATH=
+ echo
Why is the final echo $path empty? The $path variable within the while loop was incrementally set for each iteration just fine.
The pipe runs all commands involved in sub-shells, including the entire while ... loop. Therefore, all changes to variables in that loop are confined to the sub-shell and invisible to the parent shell script.
One way to work around that is putting the while ... loop and the echo into a list that executes entirely in the sub-shell, so that the modified variable $path is visible to echo:
test_path()
{
echo "$1" | tr ':' '\n' | {
while read component
do
if [ -d "$component" ]
then
if [ -z "$path" ]
then
path="$component"
else
path="$path:$component"
fi
fi
done
echo "$path"
}
}
However, I suggest using something like this:
test_path()
{
echo "$1" | tr ':' '\n' |
while read dir
do
[ -d "$dir" ] && printf "%s:" "$dir"
done |
sed 's/:$/\n/'
}
... but that's a matter of taste.
Edit: As others have said, the behaviour you are observing depends on the shell. The POSIX standard describes pipelined commands as run in sub-shells, but that is not a requirement:
Additionally, each command of a multi-command pipeline is in a subshell environment; as an extension, however, any or all commands in a pipeline may be executed in the current environment.
Bash runs them in sub-shells, but some shells run the last command in the context of the main script, when only the preceding commands in the pipeline are run in sub-shells.
This should work in a Bourne shell that understands functions (and would work in Bash and other shells too):
test_path() {
echo $1 | tr ':' '\012' |
{
path=""
while read component
do
if [ -d "$component" ]
then
if [ -z "$path" ]
then path="$component"
else path="$path:$component"
fi
fi
done
echo "$path" # this prints nothing
}
}
The inner set of braces groups the commands into a unit, so path is only set in the subshell but is echoed from the same subshell.
Why is the final echo $path empty?
Until recently, Bash would give all components of a pipeline their own process, separate from the shell process in which the pipeline is run.
Separate process == separate address space, and no variable sharing.
In ksh93 and in recent Bash (may need a shopt setting), the shell will run the last component of a pipeline in the calling shell, so any variables changed inside the loop are preserved when the loop exits.
Another way to accomplish what you want is to make sure that the echo $path is in the same process as the loop, using parentheses:
#! /bin/sh
set -x # for debugging
test_path() {
path=""
echo $1 | tr ':' '\012' | ( while read component
do
[ -d "$component" ] || continue
path="${path:+$path:}$component"
done
echo "$path"
)
}
Note: I simplified the inner if. There was no else so the test can be replaced with a shortcut. Also, the two path assignments can be combined into one, using the S{var:+ ...} parameter substitution trick.
Your script works just fine with no change under Solaris 11 and probably also most commercial Unix like AIX and HP-UX because under these OSes, the underlying implementation of /bin/sh is provided by ksh. This would be also the case if /bin/sh is backed by zsh.
It doesn't work for you likely because your /bin/sh is implemented by one of bash, dash, mksh or busybox sh which all process each component of a pipeline in a subshell while ksh and zsh both keep the last element of a pipeline in the current shell, saving an unnecessary fork.
It is possible to "fix" your script for it to work when sh is provided by bash by adding this line somewhere before the pipeline:
shopt -s lastpipe
or better, if you wan't to keep portability:
command -v shopt > /dev/null && shopt -s lastpipe
This will keep the script working for ksh, and zsh but still won't work for dash, mksh or the original Bourne shell.
Note that both bash and ksh behaviors are allowed by the POSIX standard.

Shell script: variable scope in functions

I wrote a quick shell script to emulate the situation of xkcd #981 (without hard links, just symlinks to parent dirs) and used a recursive function to create all the directories. Unfortunately this script does not provide the desired result, so I think my understanding of the scope of variable $count is wrong.
How can I properly make the function use recursion to create twenty levels of folders, each containing 3 folders (3^20 folders, ending in soft links back to the top)?
#!/bin/bash
echo "Generating folders:"
toplevel=$PWD
count=1
GEN_DIRS() {
for i in 1 2 3
do
dirname=$RANDOM
mkdir $dirname
cd $dirname
count=$(expr $count + 1)
if [ $count < 20 ] ; then
GEN_DIRS
else
ln -s $toplevel "./$dirname"
fi
done
}
GEN_DIRS
exit
Try this (amended version of the script) — it seems to work for me. I decline to test to 20 levels deep, though; at 8 levels deep, each of the three top-level directories occupies some 50 MB on a Mac file system.
#!/bin/bash
echo "Generating folders:"
toplevel=$PWD
GEN_DIRS()
{
cur=${1:?}
max=${2:?}
for i in 1 2 3
do
dirname=$RANDOM
if [ $cur -le $max ]
then
(
echo "Directory: $PWD/$dirname"
mkdir $dirname
cd $dirname
GEN_DIRS $((cur+1)) $max
)
else
echo "Symlink: $PWD/$dirname"
ln -s $toplevel "./$dirname"
fi
done
}
GEN_DIRS 1 ${1:-4}
Lines 6 and 7 are giving names to the positional parameters ($1 and $2) passed to the function — the ${1:?} notation simply means that if you omit to pass a parameter $1, you get an error message from the shell (or sub-shell) and it exits.
The parentheses on their own (lines 13 and 18 above) mean that the commands in between are run in a sub-shell, so changes in directory inside the sub-shell do not affect the parent shell.
The condition on line 11 now uses arithmetic (-le) instead of string < comparisons; this works better for deep nesting (because the < is a lexicographic comparison, so level 9 is not less than level 10). It also means that the [ command is OK to use instead of the [[ command (although [[ would also work, I prefer the old-fashioned notation).
I end up creating a script like this:
#!/bin/bash
echo "Generating folders:"
toplevel=$PWD
level=0
maxlevel=4
function generate_dirs {
pushd "$1" >/dev/null || return
(( ++level ))
for i in 1 2 3; do
dirname=$RANDOM
if (( level < maxlevel )); then
echo "$PWD/$dirname"
mkdir "$dirname" && generate_dirs "$dirname"
else
echo "$PWD/$dirname (link to top)"
ln -sf "$toplevel" "$dirname"
fi
done
popd >/dev/null
(( --level ))
}
generate_dirs .
exit

BASH: check for errors before testing conditions

I am trying to write a simple Bash script to monitor MySQL replication status. The script is like this:
#!/bin/bash
dbhost=192.168.1.2
repluser=root
replpasswd=password
echo "show slave status\G"|\
mysql -h $dbhost -u $repluser -p$replpasswd > tmpd 2>/dev/null
repl_IO=$(cat tmpd | grep "Slave_IO_Running" | cut -f2 -d':')
repl_SQL=$(cat tmpd | grep "Slave_SQL_Running" | cut -f2 -d':')
if [ $repl_IO != "Yes" -o $repl_SQL != "Yes" ] ; then
echo
echo -e "\033[31m Replication Error."
echo -e "\033[0m"
mail -s "replication error" email#domain.com < tmpd
else
echo
echo -e "\033[32mReplication is working fine"
echo -e "\033[0m"
fi
The problem is that the script only works if both the master and the slave are up. If the master is down, and I run the script, it displays the error message and sends the email.
If both master/slave are up, the script displays "Replication is working fine" which is okay. But when I shutdown the slave and run the script, I get this error:
./monitor.bash: line 9: [: too many arguments
Replication is working fine
I know the problem is that since I'm querying the slave MySQL server, it's not finding it. So it's not checking the conditions Slave_IO_Running and Slave_SQL_Running. How would I go
about checking if the slave server is up BEFORE running those conditions. So in short, I only want "Replication is working fine" to be displayed, if both the master & slave are up and
running and it matches those conditions. Any help would be appreciated. Thank you.
If $repl_IO and $repl_SQL are blank, then this:
if [ $repl_IO != "Yes" -o $repl_SQL != "Yes" ] ; then
is equivalent to this:
if [ != Yes -o != Yes ] ; then
and I think you can see why that doesn't work. You need either to wrap your parameter-expansions in double-quotes, so that they're treated as single arguments no matter what they contain:
if [ "$repl_IO" != "Yes" -o "$repl_SQL" != "Yes" ] ; then
or to use [[...]] instead of [...], since it's a bit smarter with these things:
if [[ $repl_IO != "Yes" -o $repl_SQL != "Yes" ]] ; then
or both:
if [[ "$repl_IO" != "Yes" -o "$repl_SQL" != "Yes" ]] ; then
One problem with your script: experienced shell programmers know that if a variable is empty the statement
if [ $foo = something ]
will look to the shell like
if [ = something ]
You can fix this with
if [ "$foo" = something ]
So in general, put " marks around all variables used inside [ ]
Or use [[ ]] if you use bash, quoting variables isn't needed and it's more powerful than [ ].
greybot on the IRC channel of freenode said :
[[ is a bash keyword similar to (but more powerful than) the [
command. See http://mywiki.wooledge.org/BashFAQ/031 and
http://mywiki.wooledge.org/BashGuide/TestsAndConditionals. Unless
you're writing for POSIX sh, we recommend [[.