SGE: view jobs not on hold with qstat - sungridengine

I'm running some jobs on an SGE cluster. Is there a way to make qstat show me only jobs that are not on hold?
qstat -s p shows pending jobs, which is all those with state "qw" and "hqw".
qstat -s h shows hold jobs, which is all those with state "hqw".
I want to be able to see all jobs with state "qw" only and NOT state "hqw". The man pages seem to suggest it isn't possible, but I want to be sure I didn't miss something. It would be REALLY useful and it's really frustrating me that I can't make it work.
Other cluster users have a few thousand jobs on hold ("hqw") and only a handful actually in the queue waiting to run ("qw"). I want to see quickly and easily the stuff that is not on hold so I can see where my jobs are in the queue. It's a pain to have to show everything and then scroll back up to find the relevant part of the output.

So I figured out a way to show what I want by piping the output of qstat into grep:
qstat -u "*" | grep " qw"
(Note that I need to search for " qw" not just "qw" or it will return the "hqw" states as well.)
But I'd still love to know if it's possible using qstat options only.

I think this approach is better and useful, I have defined a code_script in my home directory and created an alias
alias jobstat 'source <code_script>
echo "user\n------------------------" ;
echo "Running " ` qstat -u user | awk ' { if ($5 == "r") print $0 }' | wc -l` ;
echo "Pending " ` qstat -u user | awk ' { if ($5 == "qw" || $5 == "hqw") print $0 }' | wc -l ` ;
echo "------------------------" ;
echo "Total " `qstat -u user | awk ' { if ($4 =="user") print $0}' | wc -l `
this helps me a lot!

Related

Passing variable into a function that uses awk

I have a script that attempts to stop a certain process by name (but I need to specify a certain string that can't be killed, namely "notThisProcess"), then kills it after 20 seconds if it hasn't come down gracefully, ie:
#!/bin/ksh
bname=BTEST
bserver=BSERVER
PROCESS_ID=`ps auxww | awk '/PROCESS_NAME_ALPHA/ && !/awk/ && !/notThisProcess/ {print $2}'`
/apps/customapp/stopcommand -a $bname -processName PROCESS_NAME_ALPHA -serverName $bserver
sleep 20
kill -4 $PROCESS_ID
PROCESS_ID2=`ps auxww | awk '/PROCESS_NAME_BETA/ && !/awk/ && !/notThisProcess/ {print $2}'`
/apps/customapp/stopcommand -a $bname -processName PROCESS_NAME_BETA -serverName $bserver
sleep 20
kill -4 $PROCESS_ID2
#etc..
As my list of processes just increased I'm trying to put those steps into a function but I can't figure out how to pass the process name to awk. ie, this doesn't work:
#!/bin/ksh
bname=BTEST
bserver=BSERVER
cycleProcess()
{
PROCESS_ID=`ps auxww | awk '/$1/ && !/awk/ && !/notThisProcess/ {print $2}'`
/apps/customapp/stopcommand -a $bname -processName PROCESS_NAME_ALPHA -serverName $bserver
sleep 20
kill -4 $PROCESS_ID
}
cycleProcess PROCESS_NAME_ALPHA
cycleProcess PROCESS_NAME_BETA
exit
I've seen several references to assignment via -v but despite several attempts I haven't been successful. Any suggestions?
I'd write it like this:
#!/bin/ksh
bname=BTEST
bserver=BSERVER
cycleProcess() {
typeset procname="$1"
typeset pid=$(ps auxww | awk -v name="$procname" '$0 ~ name && !/awk/ && !/notThisProcess/ {print $2}')
if [[ -z "$pid" ]]; then
echo "$procname is not running"
return
fi
/apps/customapp/stopcommand -a "$bname" -processName "$procname" -serverName "$bserver"
sleep 20
kill -4 "$pid"
}
processes=(
PROCESS_NAME_ALPHA
PROCESS_NAME_BETA
)
for proc in "${processes[#]}"; do
cycleProcess "$proc"
done
typeset in a function is a way to declare a variable as local to that function.
I don't have access to an AIX box. ps auxww output on my Linux box shows the command name in field 11, so instead of /name/ && !/awk/ && !/thisScript/ you might be able to use $11 == name {print $2},
or $11 ~ name if the match is not exact.
you can pass them in a pipe char delimited list and compare with the last field.
ps ... | awk -v keep="process1|process2|process3" '$NF!~keep{print $2}'
also note that in your script awk '/$1/ && ... the variable is not the bash variable but the first field passed to awk script.
As others already noted, shell variables may be passed to awk scripts using the option -v. This must be used if the awk script resides in a seperate file (by using the option -f).
When specifying the awk script directly within the shell script between single quotes ('...'), You may also use the construct ' " $shell_variable " '. Note that when doing so, there must be no spaces between the single and double quotes!
Example:
process_string="plugin-container"
pids=$( ps -fu $LOGNAME | awk '/'"$process_string"'/ { print $2 }' )

Bash loop to merge files in batches for mongoimport

I have a directory with 2.5 million small JSON files in it. It's 104gb on disk. They're multi-line files.
I would like to create a set of JSON arrays from the files so that I can import them using mongoimport in a reasonable amount of time. The files can be no bigger than 16mb, but I'd be happy even if I managed to get them in sets of ten.
So far, I can use this to do them one at a time at about 1000/minute:
for i in *.json; do mongoimport --writeConcern 0 --db mydb --collection all --quiet --file $i; done
I think I can use "jq" to do this, but I have no idea how to make the bash loop pass 10 files at a time to jq.
Note that using bash find results in an error as there are too many files.
With jq you can use --slurp to create arrays, and -c to make multiline json single line. However, I can't see how to combine the two into a single command.
Please help with both parts of the problem if possible.
Here's one approach. To illustrate, I've used awk as it can read the list of files in small batches and because it has the ability to execute jq and mongoimport. You will probably need to make some adjustments to make the whole thing more robust, to test for errors, and so on.
The idea is either to generate a script that can be reviewed and then executed, or to use awk's system() command to execute the commands directly. First, let's generate the script:
ls *.json | awk -v group=10 -v tmpfile=json.tmp '
function out() {
print "jq -s . " files " > " tmpfile;
print "mongoimport --writeConcern 0 --db mydb --collection all --quiet --file " tmpfile;
print "rm " tmpfile;
files="";
}
BEGIN {n=1; files="";
print "test -r " tmpfile " && rm " tmpfile;
}
n % group == 0 {
out();
}
{ files = files " \""$0 "\"";
n++;
}
END { if (files) {out();}}
'
Once you've verified this works, you can either execute the generated script, or change the "print ..." lines to use "system(....)"
Using jq to generate the script
Here's a jq-only approach for generating the script.
Since the number of files is very large, the following uses features that were only introduced in jq 1.5, so its memory usage is similar to the awk script above:
def read(n):
# state: [answer, hold]
foreach (inputs, null) as $i
([null, null];
if $i == null then .[0] = .[1]
elif .[1]|length == n then [.[1],[$i]]
else [null, .[1] + [$i]]
end;
.[0] | select(.) );
"test -r json.tmp && rm json.tmp",
(read($group|tonumber)
| map("\"\(.)\"")
| join(" ")
| ("jq -s . \(.) > json.tmp", mongo("json.tmp"), "rm json.tmp") )
Invocation:
ls *.json | jq -nRr --arg group 10 -f generate.jq
Here is what I came up with. It seems to work and is importing at roughly 80 a second into an external hard drive.
#!/bin/bash
files=(*.json)
for((I=0;I<${#files[*]};I+=500)); do jq -c '.' ${files[#]:I:500} | mongoimport --writeConcern 0 --numInsertionWorkers 16 --db mydb --collection all --quiet;echo $I; done
However, some are failing. I've imported 105k files but only 98547 appeared in the mongo collection. I think it's because some documents are > 16mb.

Newbie: unix bash, nested if statement, results from a loop results from sql

Newbie here, please pardon any confusing wording that I use.
A common task I have is to take a list of names and do a MySQL query to look the names up in a table and see if they are "live" on our site.
Doing this one at a time, my SQL query works fine. I then wanted to do the query using a loop from a file listing multiple names. This works fine, too.
I added this query loop to my bash profile so that I can quickly do the task by typing this:
$ ValidOnSite fileName
This works fine, and I even added an usage statement for my process to remind myself of the syntax. Below is what I have that works fine:
validOnSite() {
if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
echo "Usage:"
echo " $ validOnSite [filename]"
echo " Where validOnSite uses specified file as variables in sql query:"
echo " SELECT name, active FROM dbDb WHERE name=lines in file"
else
cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name, active from dbDb where name='$line'" ; done
fi
Using a file "list.txt" which contains:
nameA
nameB
I would then type:
validOnSite list.txt
and both entries in list.txt meet my query criteria and are found in sql. My results will be:
nameA 1
nameB 1
Note the "1" after each result. I assume this is some sort of "yes" status.
Now, I add a third name to my list.txt, one that I know is not a match in sql. Now list.txt contains:
nameA
nameB
foo
When I again run this command for my list with 3 rows:
validOnSite list.txt
My results are the same as when I used the 1st version of file.txt, and I cannot see which lines failed, I still only see which lines were a success:
nameA 1
nameB 1
I have been trying all kinds of things to add a nested if statement, something that says, "If $line is a match, echo "pass", else echo "fail."
I do not want to see a "1" in my results. Using file.txt with 2 matches and 1 non-match, I would like my results to be:
nameA pass
nameB pass
foo fail
Or even better, color code a pass with green and a fail with red.
As I said, newbie here... :)
Any pointers in the right direction would help. Here is my latest sad attempt, but I realize I may be going in a wrong direction entirely:
validOnSite() {
if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
echo "Usage:"
echo " $ validOnSite [filename]"
echo " Where validOnSite uses specified file as variables in sql query:"
echo " SELECT name, active FROM dbDb WHERE name=lines in file"
else
cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name, active from dbDb where name='$line'" > /dev/null ; done
if ( "status") then
echo $line "failed"
echo $line "failed" >> outfile
else
echo $line "ok"
echo $line "ok" >>outfile
clear
cat outfile
fi
fi
If something looks crazy in my last attempt, it's because it is - I am just googling around and trying as many things as I can while trying to learn. Any help appreciated, I feel stuck after working on this for a long time, but I am excited to move forward and find a solution! I think there is something I'm missing about understanding stdout, and also confusion about nested if's.
Note: I do not need an outfile, but it's ok if one is needed to accomplish the goal. stdout result alone would suffice, and is preferred.
Note: hgssql is just the name of our MySQL server. The MySQL part works fine, I am looking for a better way to deal with my bash output, and I think there is something about stderr that I'm missing. I'm looking for a fairly simple answer as I'm a newbie!
I guess, by hgsql you mean some Mercurial extension that allows to perform MySQL queries. I don't know how hgsql works, but I know that MySQL returns only the matching rows. But in terms of shell scripting, the result is a string that may contain extra information even if the number of matched rows is zero. For example, some MySQL client may return the header or a string like "No rows found", although it is unlikely.
I'll show how it is done with the official mysql client. I'm sure you will manage to adapt hgsql with the help of its documentation to the following example.
if [ -t 1 ]; then
red_color=$(tput setaf 1)
green_color=$(tput setaf 2)
reset_color=$(tput sgr0)
else
red_color=
green_color=
reset_color=
fi
colorize_flag() {
local color
if [ "$1" = 'fail' ]; then
color="$red_color"
else
color="$green_color"
fi
printf '%s' "${color}${1}${reset_color}"
}
sql_fmt='SELECT IF(active, "pass", "fail") AS flag FROM dbDb WHERE name = "%s"'
while IFS= read -r line; do
sql=$(printf "$sql_fmt" "$line")
flag=$(mysql --skip-column-names dbname -e "$sql")
[ -z "$flag" ] && flag='fail'
printf '%-20s%s\n' "$line" "$(colorize_flag "$flag")"
done < file
The first block detects if the script is running in interactive mode by checking if the file descriptor 1 (standard output) is opened on a terminal (see help test). If it is opened in a terminal, the script considers that the script is running interactively, i.e. the standard output is connected to the user's terminal directly, but not via pipe, for example. For interactive mode, it assigns variables to the terminal color codes with the help of tput command.
colorize_flag function accepts a string ($1) and outputs the string with the color codes applied according to its value.
The last block reads file line by line. For each line builds an SQL query string (sql) and invokes mysql command with the column names stripped off the output. The output of the mysql command is assigned to flag by means of command substitution. If "$flag" is empty, it is assigned to 'fail'. The $line and the colorized flag are printed to standard output.
You can test the non-interactive mode by chaining the output via pipe, e.g.:
./script | tee -a
I must warn you that it is generally bad idea to pass the shell variables into SQL queries unless the values are properly escaped. And the popular shells do not provide any tools to escape MySQL strings. So consider running the queries in Perl, PHP, or any programming language that is capable of building and running the queries safely.
Also note that in terms of performance it is better to run a single query and then parse the result set in a loop instead of running multiple queries in a loop, with the exception of prepared statements.
I found a way to get to my solution by piecing together the few basic things that I know. Not elegant, but it works well enough for now. I created a file "[filename]Results" with the output:
nameA 1
nameB 1
I then cut out the "1"s and made a new file. I then did a comparison with "[fileName]results" to list.txt in order to see what lines exist in file.txt but do not exist in results.
Note: I have the following in my .zshrc file.
validOnSite() {
if [[ "$1" == "" ]] || [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
echo "Usage:"
echo " $ validOnSite [filename]"
echo " Where validOnSite uses specified file as variables in sql query:"
echo " SELECT name, active FROM dbDb WHERE name=lines in file"
else
cat $1 | while read line ; do hgsql -h genome-centdb hgcentral -Ne "select name from dbDb where name='$line' and active='1'" >> $1"Pass"; done
autoload -U colors
colors
echo $fg_bold[magenta]Assemblies active on site${reset_color}
echo
cat $1"Pass"
echo
echo $fg_bold[red]Not active or not found on site${reset_color}
comm -23 $1 $1"Pass" 2> /dev/null
echo
echo
mv $1"Pass" ~cath/myFiles/validOnSiteResults
echo "Results file containing only active assemblies resides in ~cath/myFiles/validOnSiteResults"
fi
}
list.txt:
nameA
nameB
foo
My input:
validOnSite list.txt
My output:
Assemblies active on site (<--this font is magenta)
nameA
nameB
Not active or not found on site (<--this font is red)
foo
Results file containing only active assemblies resides in ~me/myFiles/validOnRRresults

shell script and database

i am retrieving course names from database using shell scripting. when there becomes space in course name e.g computer science, environmental sciences etc then the name is stored in two different instances of list, i.e before space and after space like "computer" in different instance and "science" in different instance of list, but i want to store it like "computer science" in the same instance of list, how do i overcome this problem
here is my code for retrieving course names
db_value=`mysql -uroot -proot -Dproject_db -rN --execute
"SELECT CourseName FROM applicant_result"`
z=1
for value in $db_value
do
echo "V,eligibility_$z=$value"
let "z+=1"
done
echo "E,resume"
And the result i get (note values of V,eligibility_2 and V,eligibility_3)
V,eligibility_1=chemistry
V,eligibility_2=computer
V,eligibility_3=science
V,eligibility_4=mathematics
V,eligibility_5=physics
z=1
mysql -uroot -proot -Dproject_db -rN --execute "SELECT CourseName FROM applicant_result" | while read value
do
echo "V,eligibility_$z=$value"
let "z+=1"
done
echo "E,resume"
If that way doesn't work, then don't do it that way. for ... in splits on words rather than lines.
A possibly better solution is to use something more line aware:
mysql blah blah blah | awk '{
n = n + 1;
print "V,eligability_"n"=$0
} END { print "E,resume" }'
See the following tanscript for proof of concept:
pax> echo 'chemistry
...> computer science
...> mathematcis
...> physics
...> basket weaving' | awk '{
...> n = n + 1;
...> print "V,eligibility_"n"="$0
...> } END {print "E,resume"}'
V,eligibility_1=chemistry
V,eligibility_2=computer science
V,eligibility_3=mathematcis
V,eligibility_4=physics
V,eligibility_5=basket weaving
E,resume
The key could be the IFS variable - Input Field Separator
This determine on which characters the input should be broke to parts.
So in your case you want to separate by new lines:
#!/bin/bash
db_value=`mysql -uroot -proot -Dproject_db -rN -e "SELECT CourseName, CourseEligibility FROM applicant_result"`
oldIFS="$IFS" # save old to be able to restore it later
IFS="
";
z=1
for value in $db_value; do
name=`echo $value | cut -f1`
elig=`echo $value | cut -f2`
#do whatever with $name and $elig
echo "V,eligibility_$z=$name"
: $[z++]
done
IFS="$oldIFS"
echo "E,resume"

BASH: check for errors before testing conditions

I am trying to write a simple Bash script to monitor MySQL replication status. The script is like this:
#!/bin/bash
dbhost=192.168.1.2
repluser=root
replpasswd=password
echo "show slave status\G"|\
mysql -h $dbhost -u $repluser -p$replpasswd > tmpd 2>/dev/null
repl_IO=$(cat tmpd | grep "Slave_IO_Running" | cut -f2 -d':')
repl_SQL=$(cat tmpd | grep "Slave_SQL_Running" | cut -f2 -d':')
if [ $repl_IO != "Yes" -o $repl_SQL != "Yes" ] ; then
echo
echo -e "\033[31m Replication Error."
echo -e "\033[0m"
mail -s "replication error" email#domain.com < tmpd
else
echo
echo -e "\033[32mReplication is working fine"
echo -e "\033[0m"
fi
The problem is that the script only works if both the master and the slave are up. If the master is down, and I run the script, it displays the error message and sends the email.
If both master/slave are up, the script displays "Replication is working fine" which is okay. But when I shutdown the slave and run the script, I get this error:
./monitor.bash: line 9: [: too many arguments
Replication is working fine
I know the problem is that since I'm querying the slave MySQL server, it's not finding it. So it's not checking the conditions Slave_IO_Running and Slave_SQL_Running. How would I go
about checking if the slave server is up BEFORE running those conditions. So in short, I only want "Replication is working fine" to be displayed, if both the master & slave are up and
running and it matches those conditions. Any help would be appreciated. Thank you.
If $repl_IO and $repl_SQL are blank, then this:
if [ $repl_IO != "Yes" -o $repl_SQL != "Yes" ] ; then
is equivalent to this:
if [ != Yes -o != Yes ] ; then
and I think you can see why that doesn't work. You need either to wrap your parameter-expansions in double-quotes, so that they're treated as single arguments no matter what they contain:
if [ "$repl_IO" != "Yes" -o "$repl_SQL" != "Yes" ] ; then
or to use [[...]] instead of [...], since it's a bit smarter with these things:
if [[ $repl_IO != "Yes" -o $repl_SQL != "Yes" ]] ; then
or both:
if [[ "$repl_IO" != "Yes" -o "$repl_SQL" != "Yes" ]] ; then
One problem with your script: experienced shell programmers know that if a variable is empty the statement
if [ $foo = something ]
will look to the shell like
if [ = something ]
You can fix this with
if [ "$foo" = something ]
So in general, put " marks around all variables used inside [ ]
Or use [[ ]] if you use bash, quoting variables isn't needed and it's more powerful than [ ].
greybot on the IRC channel of freenode said :
[[ is a bash keyword similar to (but more powerful than) the [
command. See http://mywiki.wooledge.org/BashFAQ/031 and
http://mywiki.wooledge.org/BashGuide/TestsAndConditionals. Unless
you're writing for POSIX sh, we recommend [[.