Changing number of available slots in running SGE Instance

Changing number of available slots in running SGE Instance - sungridengine

I would like to change the number of slots on a running SGE instance. The instance was started by StarCluster.
I tried following this page and running:
ubuntu#master: $ qconf -mattr exechost complex_values slots=1 master
ubuntu#master modified "master" in exechost list
but it does not look like anything changed:
ubuntu#master: $ qstat -f
queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
all.q#master BIP 0/1/8 1.04 linux-x64

run
qconf -mq all.q
Then you will see a line
slots 1,[master=0],[node1366=2],[node1379=2]
If you want master to have 1 slot, change it to
[master=1]
Save and quit. That's as simple as that.

Related

How to limit the number of jobs on a host using Sungrid?

I am using Sungrid6.2u5 ,I am trying to submit some jobs on 4 hosts, I need to run 50 jobs using all the 4 hosts but I want to inform the SGE that I want only 5 jobs to be run on the 4th host at any given time,how do I do that?
I am new to SunGrid.Could any one please point me to the SGE basics,I mean where do I get started?
I found this online,
Beginner's Guide to Sun Grid Engine 6.2 by Daniel Templeton
but apparently this is intended for system administrators ,I am just a normal user who is trying to understand the SGE features.
Thanks,

If you should not run more than 5 jobs on 4th node (let's call it computer04), probably, it is not capable of running something more. In general, you are encouraged to specify amount of resources for you job properly to prevent cores overload and out-of-memory situation.
If you have totally 20 Gb on computer04 and your job uses 5 Gb, you can limit all your jobs to 5Gb memory usage:
qsub -l vmem=5G my_work
The similar holds for disk amount:
qsub -l fsize=10G my_work
I found it is possible to run job on specific host with -l -h= option.
qsub -l -h=computer04 -l vmem=5G my_work
for 5 jobs. Then use
qsub -l vmem=5G my_work
for other 45 jobs.
(More dirty way)
You could do it without memory/disk restrictions:
qsub -l -h=computer04 my_work # 5 jobs
qsub -l -h="!computer04" my_work # for 45 jobs
If you have different queues or resources, and you could use them for different jobs. E.g., you have queue_4 that runs everything on computer04, and queue_main that is linked with other computers, then, you do
qsub -q queue_4 my_work
for 5 jobs, and
qsub -q queue_main my_work
for other jobs.
UPD on comment:
It is possible to force SGE denial of more than X jobs for user/host. It should be done by queue administrator.
qconf -arqs
{
name max_jobs_per_computer04
description "maximal number of jobs for user1 on computer04 restricted to 5!"
enabled TRUE
limit users user1 hosts computer04 to slots=5
}
If you want to restrict your user only in submitting jobs of some kind for computer04, you need to define complex parameter as shown here.

Programmatically Create+Mount Disk From Within Google Compute VM

I'd like to write a script that can be run from a Google Compute instance, which creates a disk and mounts it. The disks I've created and mounted so far have been done through the web console. The problem I'm having is in figuring out the paramaters for safe_format_and_mount (and possibly in some step before).
From within the instance, here is my attempt so far:
ami#snowflake:~$ gcloud compute disks create foo --zone europe-west1-c
Created [https://www.googleapis.com/compute/v1/projects/snowflake- 1056/zones/europe-west1-c/disks/foo].
NAME ZONE SIZE_GB TYPE STATUS
foo europe-west1-c 500 pd-standard READY
ami#snowflake:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 10G 0 disk
`-sda1 8:1 0 10G 0 part /
sdb 8:16 0 500G 0 disk /mnt/work
sdc 8:32 0 2T 0 disk /mnt/data1
The docs for safe_format_and_mount give now this sequence:
$ sudo mkdir MOUNT_POINT
$ sudo /usr/share/google/safe_format_and_mount -m "mkfs.ext4 -F" DISK_LOCATION MOUNT_POINT
However, I have no idea what DISK_LOCATION is, nor does lsblock's output give anything that seems pertinent.

Don't forget that you need to attach the disk to your instance before you can use it:
gcloud compute instances attach-disk myinstance --disk foo \
--zone europe-west1-c --device-name foo
The --device-name option allows you to specify the device name the guest operating system will see. If you use the same name as the disk name, the disk location will be /dev/disk/by-id/google-foo.

Hot reconfiguration of HAProxy still lead to failed request, any suggestions?

I found there are still failed request when the traffic is high using command like this
haproxy -f /etc/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
to hot reload the updated config file.
Here below is the presure testing result using webbench :
/usr/local/bin/webbench -c 10 -t 30 targetHProxyIP:1080
Webbench – Simple Web Benchmark 1.5
Copyright (c) Radim Kolar 1997-2004, GPL Open Source Software.
Benchmarking: GET targetHProxyIP:1080
10 clients, running 30 sec.
Speed=70586 pages/min, 13372974 bytes/sec.
**Requests: 35289 susceed, 4 failed.**
I run command
haproxy -f /etc/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
several times during the pressure testing.
In the haproxy documentation, it mentioned
They will receive the SIGTTOU
611 signal to ask them to temporarily stop listening to the ports so that the new
612 process can grab them
so there is a time period that the old process is not listening on the PORT(say 80) and the new process haven’t start to listen to the PORT (say 80), and during this specific time period, it will cause the NEW connections failed, make sense?
So is there any approach that makes the configuration reload of haproxy that will not impact both existing connections and new connections?

On recent kernels where SO_REUSEPORT is finally implemented (3.9+), this dead period does not exist anymore. While a patch has been available for older kernels for something like 10 years, it's obvious that many users cannot patch their kernels. If your system is more recent, then the new process will succeed its attempt to bind() before asking the previous one to release the port, then there's a period where both processes are bound to the port instead of no process.
There is still a very tiny possibility that a connection arrived in the leaving process' queue at the moment it closes it. There is no reliable way to stop this from happening though.

Monit to monitor 2 searchd instances on one server

I have 2 rails apps hosted on the same server and each one of them have its own configuration for thinking_sphinx /searchd with different ports configured. I managed to get this set up working and I have 2 instances of searchd running.
My problem is getting Monit to monitor these 2 instances. Even though these 2 instances of searchd have its own PID in separate directories, I was not able to define the configuration in the monitrc because the process names in this case are the same, namely searchd.
In my monitrc, i have 2 separate commands as follows:
check process searchd with pidfile /var/www/app1/shared/pids/production.sphinx.pid
start program=....
stop program=....
check process searchd with pidfile /var/www/app2/shared/pids/production.sphinx.pid
start program=...
stop program=...
Monit requires a unique process name. Is it possible to start up my second instance of searchd using a different process name?
Thanks for the help.

You can call the process whatever you wish in monit configuration files - it doesn't need to match the executable. So:
check process searchd_app1 with pidfile /var/www/app1/shared/pids/production.sphinx.pid
start program=....
stop program=....
check process searchd_app2 with pidfile /var/www/app2/shared/pids/production.sphinx.pid
start program=...
stop program=...

libvirt cpuset is not able to set affinity

I have been trying to set cpu affinity for a VM. Now, I edited the VM xml file present in /etc/libvirt/qemu/$VM.xml and put cpuset attribute. I have 4 cores and I put cpuset = '1,3'. But still when I did virsh vcpuinfo $VM, it showed that my VM's vcpus are still attached to pcpus 0 and 2. What am I doing wrong?

Would you mind pasting out the elements of your domain xml? you may refer to [CPU Allocation] to compare.
A handy tool is command taskset -p <your qemu process id> to see the CPU allocation on the KVM hypervisor.
BTW: you need qemu v0.8.5+ to get this feature.

Editing /etc/libvirt/qemu/$VM.xml under libvirt's hands is not what you should do, neither is setting the affinity without libvirt. In that case libvirt doesn't know about the settings.
The right thing to do is use 'virsh edit $VM', set what you want and stop and start the domain. You can also use virsh to pin each vCPU to particular host CPU(s):
for i in {1..X}; do # X is the number of VCPUs
virsh vcpupin $VM 0 1,3
done
virsh emulatorpin $VM 1,3
or
virsh numatune $VM --nodeset 1,3 # To pin to particular
You can use '--config' and '--live' to set it in config or for live domain respectively. For further options see the manual for comman virsh (man virsh).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008