Serial section within SGE script - sungridengine

Is it possible to force a 'serial' section within an SGE script?
#$ -S /bin/bash
#$ -N example
#$ -v MPI_HOME
#$ -q all.q
#$ -pe ompi 40
#$ -j yes
#$ -o example.log
$MPI_HOME/bin/mpirun example.exe
# now do some serial commands
grep 'success' example.log
mv example.out /archive
Currently, I split these types of job into two scripts, and make one dependent on the other. It would be much simpler to maintain and schedule if I could keep everything in one script.

You can do this but the job will hold on to all the slots it is using while you do it. As the serial code is invoked directly in the job script rather than via mpirun it will only be run once on the head node of the job. For quick stuff like your example it doesn't matter too much but if you have a long running serial section then it is a more efficient use of resources to split them into two jobs as you are doing.

Related

Pass flags to the Sphinx runner?

So I've got the following project OpenFHE-development and when I run the build process, there are lots of warnings. However, most of these warnings are fine to ignore (we vet them before pushing to the main branch)
Specifically, is there a way to take
pth/python -m sphinx -T -E -b readthedocssinglehtmllocalmedia -d _build/doctrees -D language=en . _build/localmedia
and convert it to
pth/python -m sphinx -T -E -b readthedocssinglehtmllocalmedia -d _build/doctrees -D language=en . _build/localmedia 2> errors.txt
(pipe the stderr to a file instead of having it display on stdout)?
Does not seem to be possible at the moment. See git discussion

Setting the SGE cluster job name with Snakemake while using DRMAA?

Problem
I'm not sure if the -N argument is being saved. SGE Cluster. Everything works except for the -N argument.
Snakemake requires a valid -N call
It doesn't set the job name properly.
It always reverts to the default name. This is my call, which has the same results, with or without the -N argument.
snakemake --jobs 100 --drmaa "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1 -N {rule}.{wildcards}.varScan"
The only way I have found to influence the job name is to use --jobname.
snakemake --jobs 100 --drmaa "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1 -N {rule}.{wildcards}.varScan" --jobname "{rule}.{wildcards}.{jobid}"
Background
I've tried a variety of things. Usually I actually just use a cluster configuration file, but that isn't working either, so that's why in the code above, I ditched the file system to make sure it's the '-N' command which isn't being saved.
My usual call is:
snakemake --drmaa "{cluster.clusterSpec}" --jobs 10 --cluster-config input/config.json
1) If I use '-n' instead of '-N', I receive a workflow error:
drmaa.errors.DeniedByDrmException: code 17: ERROR! invalid option argument "-n"
2) If I use '-N', but give it an incorrect wildcard, say {rule.name}:
AttributeError: 'str' object has no attribute 'name'
3) I cannot use both --drmaa AND --cluster:
snakemake: error: argument --cluster/-c: not allowed with argument --drmaa
4) If I specify the {jobid} in the config.json file, then Snakemake doesn't know what to do with it.
RuleException in line 13 of /extscratch/clc/projects/tboyarski/gitRepo-LCR-BCCRC/Snakemake/modules/mpileup/mpileupSPLIT:
NameError: The name 'jobid' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}
EDIT Added #5 w/ Solution
5) I can set the job name using the config.json and just concatenate the jobid on afterwards in my snakemake call. That way I have a generic snakemake call (--jobname "{cluster.jobName}.{jobid}"), and a highly configurable and specific job name ({rule}-{wildcards.sampleMPUS}_chr{wildcards.chrMPUS}) which results in:
mpileupSPLIT-Pfeiffer_chr19.1.e7152298
The 1 is the Snakemake jobid according to the DAG.
The 7152298 is my cluster's job number.
2nd EDIT - Just tried v3.12, same thing. Concatenation must occur in snakemake call.
Alternative solution
I would also be okay with something like this:
snakemake --drmaa "{cluster.clusterSpec}" --jobname "{cluster.jobName}" --jobs 10 --cluster-config input/config.json
With my cluster file like this:
"mpileupSPLIT": {
"clusterSpec": "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1 -n {rule}.{wildcards}.varScan",
"jobName": "{rule}-{wildcards.sampleMPUS}_chr{wildcards.chrMPUS}.{jobid}"
}
Documentation Reviewed
I've read the documentation but I was unable to figure it out.
http://snakemake.readthedocs.io/en/latest/executable.html?-highlight=job_name#cluster-execution
http://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#snakefiles-cluster-configuration
https://groups.google.com/forum/#!topic/snakemake/whwYODy_I74
System
Snakemake v3.10.2 (Will try newest conda version tomorrow)
Red Hat Enterprise Linux Server release 5.4
SGE Cluster
Solution
Use '--jobname' in your snakemake call instead of '-N' in your qsub parameter submission
Setup your cluster config file to have a targetable parameter for the jobname suffix. In this case these are the overrides for my Snakemake rule named "mpileupSPLIT":
"mpileupSPLIT": {
"clusterSpec": "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1",
"jobName": "{rule}-{wildcards.sampleMPUS}_chr{wildcards.chrMPUS}"
}
Utilize a generic Snakemake call which includes {jobid}. On a cluster (SGE), the 'jobid' variable contains both the Snakemake Job# and the Cluster Job#, both are valuable as the first corresponds to the Snakemake DAG and the later is for cluster logging. (E.g. --jobname "{cluster.jobName}.{jobid}")
EDIT Added solution to resolve post.

Sun Grid Engine - print task I am on in my array job

I've been using http://wiki.gridengine.info/wiki/index.php/Simple-Job-Array-Howto as my reference.
How do I print the number of the task I am currently on? So far I have this:
#!/bin/sh
# Grid Engine options (lines prefixed with #$)
#$ -t 1-2
#$ -N test$SGE_TASK_ID.txt
#$ -cwd
#$ -l h_rt=24:00:00
#$ -l h_vmem=10G
# Initialise the environment modules
. /etc/profile.d/modules.sh
echo $SGE_TASK_ID
This gives me nothing.

Creating custom DVD for RHEL 7 with kickstart

I am trying to create a custom CD/DVD to deploy RHEL 7 with kickstart file. Here is what I did:
Edited isolinux.cfg (in the ISOLinux folder) and grub.cfg file (in the EFI\BOOT folder).
Created ISO using mkisofs.
But it is not working. Am I using correct files/method?
Edit the ISO image and put the ks.cfg file that you have created.
Preferably, put the ks.cfg file inside ks directory. More information can be found here.
You need to use the new command. Here is an example of what will work:
Add the kickstart file to your download and exploded ISO.
Run this command in the area with the ISO and kickstart and point to another location to build the ISO:
genisoimage -r -v -V "OEL6 with KS for OVM Manager" -cache-inodes -J -l -b isolinux/isolinux.bin -c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -o OEL6U6_OVM_Manager.iso /var/www/html/Template/ISO/
I found the way to create custom DVD from the RHEL7 page.
Mount the downloaded image
mount -t iso9660 -o loop path/to/image.iso /mnt/iso
Create a working directory - a directory where you want to place the contents of the ISO image.
mkdir /tmp/ISO
Copy all contents of the mounted image to your new working directory. Make sure to use the -p option to preserve file and directory permissions and ownership.
cp -pRf /mnt/iso /tmp/ISO
Unmount the image.
umount /mnt/iso
Make sure your current working directory is the top-level directory of the extracted ISO image - e.g. /tmp/ISO/iso. Create the new ISO image using genisoimage:
genisoimage -U -r -v -T -J -joliet-long -V "RHEL-7.1 Server.x86_64" -Volset "RHEL-7.1 Server.x86_64" -A "RHEL-7.1 Server.x86_64" -b isolinux/isolinux.bin -c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -eltorito-alt-boot -e images/efiboot.img -no-emul-boot -o ../NEWISO.iso .
Hope the answer will helpful:
I am editing my answer due to the comments posted. Here is a more comprehensive solution:
(A) You need to create the ISO properly. I found helpful information in this URL.
Here is the line that I actually ended up with, for my MBR/UEFI ISO creation:
mkisofs -U -A "<Volume Header>" -V "RHEL-7.1 x86_64" -volset "RHEL-7.1 x86_64" -J -joliet-long -r -v -T -x ./lost+found -o ${OUTPUT}/${HOST}.iso -b isolinux/isolinux.bin -c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -eltorito-alt-boot -e images/efiboot.img -no-emul-boot -boot-load-size 18755 /dir/where/sources/for/ISO/are/located
Be careful with the -V parameter, as it has to match what the kernel has defined for inst.stage2. In the default grub.conf included in the boot disk, it is configured to be "hd:LABEL=RHEL-7.1\x20x86_64" which matches with the settings above.
(B) You need the correct setup for EFI for RHEL7. For some reason, this has changed from RHEL6, where you could just use the /EFI/BOOT/BOOTX64.conf. Now it uses the /EFI/BOOT/grub.cfg. Common wisdom from Red Hat Manuals state to add the inst.ks= parameter to the kernel line. The grub.cfg that comes in the /EFI/BOOT directory of the RHEL7 boot iso actually has the linuxefi parameter, instead of the kernel one, I would guess they would work the same. If you are including the KS file on the CD, this should get you there.
Good Luck!

How to copy the environment variables in cluster system using qsub?

I use the SUN's SGE to submit my jobs into a cluster system. The problem is how to let the
computing machine find the environment variables in the host machine, or how to config the qsub script to make the computing machine load the environment variables in host machine?
The following is an script example, but it will say some errors, such as libraries not found:
#!/bin/bash
#
#$ -V
#$ -cwd
#$ -j y
#$ -o /home/user/jobs_log/$JOB_ID.out
#$ -e /home/user/jobs_log/$JOB_ID.err
#$ -S /bin/bash
#
echo "Starting job: $SGE_TASK_ID"
# Modify this to use the path to matlab for your system
/home/user/Matlab/bin/matlab -nojvm -nodisplay -r matlab_job
echo "Done with job: $SGE_TASK_ID"
The technique you are using (adding a -V) should work. One possibility since you are specifying the shell with -S is that grid engine is configured to launch /bin/bash as a login shell and your profile scripts are stomping all over the environment you are trying to pass to the job.
Try using qstat -xml -j on the job while it is queued/running to see what environment variables grid engine is trying to pass to the job.
Try adding an env command to the script to see what variables are set.
Try adding shopt -q login_shell;echo $? in the script to tell you if it is being run as a login shell.
To list out shells that are configured as login shells in grid engine try:
SGE_SINGLE_LINE=true qconf -sconf|grep ^login_shells
I think this issue is due to you didn't config BASH in the login_shells of SGE
check your login_shells by qconf -sconf and see if bash in there.
login_shells
UNIX command interpreters like the Bourne-Shell (see sh(1)) or the C-
Shell (see csh(1)) can be used by Grid Engine to start job scripts. The
command interpreters can either be started as login-shells (i.e. all
system and user default resource files like .login or .profile will be
executed when the command interpreter is started and the environment
for the job will be set up as if the user has just logged in) or just
for command execution (i.e. only shell specific resource files like
.cshrc will be executed and a minimal default environment is set up by
Grid Engine - see qsub(1)). The parameter login_shells contains a
comma separated list of the executable names of the command inter-
preters to be started as login-shells. Shells in this list are only
started as login shells if the parameter shell_start_mode (see above)
is set to posix_compliant.
Changes to login_shells will take immediate effect. The default for
login_shells is sh,csh,tcsh,ksh.
This value is a global configuration parameter only. It cannot be over-
written by the execution host local configuration.