How to limit the number of jobs on a host using Sungrid? - sungridengine

I am using Sungrid6.2u5 ,I am trying to submit some jobs on 4 hosts, I need to run 50 jobs using all the 4 hosts but I want to inform the SGE that I want only 5 jobs to be run on the 4th host at any given time,how do I do that?
I am new to SunGrid.Could any one please point me to the SGE basics,I mean where do I get started?
I found this online,
Beginner's Guide to Sun Grid Engine 6.2 by Daniel Templeton
but apparently this is intended for system administrators ,I am just a normal user who is trying to understand the SGE features.
Thanks,

If you should not run more than 5 jobs on 4th node (let's call it computer04), probably, it is not capable of running something more. In general, you are encouraged to specify amount of resources for you job properly to prevent cores overload and out-of-memory situation.
If you have totally 20 Gb on computer04 and your job uses 5 Gb, you can limit all your jobs to 5Gb memory usage:
qsub -l vmem=5G my_work
The similar holds for disk amount:
qsub -l fsize=10G my_work
I found it is possible to run job on specific host with -l -h= option.
qsub -l -h=computer04 -l vmem=5G my_work
for 5 jobs. Then use
qsub -l vmem=5G my_work
for other 45 jobs.
(More dirty way)
You could do it without memory/disk restrictions:
qsub -l -h=computer04 my_work # 5 jobs
qsub -l -h="!computer04" my_work # for 45 jobs
If you have different queues or resources, and you could use them for different jobs. E.g., you have queue_4 that runs everything on computer04, and queue_main that is linked with other computers, then, you do
qsub -q queue_4 my_work
for 5 jobs, and
qsub -q queue_main my_work
for other jobs.
UPD on comment:
It is possible to force SGE denial of more than X jobs for user/host. It should be done by queue administrator.
qconf -arqs
{
name max_jobs_per_computer04
description "maximal number of jobs for user1 on computer04 restricted to 5!"
enabled TRUE
limit users user1 hosts computer04 to slots=5
}
If you want to restrict your user only in submitting jobs of some kind for computer04, you need to define complex parameter as shown here.

Related

Programmatically check data transfer on IPFS

We are building a desktop app, on Electron, to share media on IPFS. We want to incentivize the people, who either by an IPFS add or pin, make data available to other users and in effect are "seeding" the data. We want to track how much data is being sent and received by each user, programmatically and periodically.
Is there a standard pattern or a service to be able to do this?
TIA!
On the CLI you can use the ipfs stats bw -p <peer id> command to see the total bytes sent and recieved between your node and the peer id you pass in.
$ ipfs stats bw -p QmeMKDA6HbDD8Bwb4WoAQ7s9oKZTBpy55YFKG1RSHnBz6a
Bandwidth
TotalIn: 875 B
TotalOut: 14 kB
RateIn: 0 B/s
RateOut: 0 B/s
See: https://docs.ipfs.io/reference/api/cli/#ipfs-stats-bw
You can use the ipfs.stats.bw method to the data programatically from the js implementation of IPFS js-ipfs or via the js-ipfs-http-client talking to the http api of a locally running ipfs daemon.
ipfs.stats.bw will show all traffic between to peers, which can include dht queries and other traffic that isn't directly related to sharing blocks of data.
If you want info on just blocks of data shared then you can use ipfs bitswap ledger from the command line.
$ ipfs bitswap ledger QmeMKDA6HbDD8Bwb4WoAQ7s9oKZTBpy55YFKG1RSHnBz6a
Ledger for QmeMKDA6HbDD8Bwb4WoAQ7s9oKZTBpy55YFKG1RSHnBz6a
Debt ratio: 0.000000
Exchanges: 0
Bytes sent: 0
Bytes received: 0
See: https://docs.ipfs.io/reference/api/cli/#ipfs-bitswap-ledger
That api is not directly available in js-ipfs or the js-http-api-client yet.

Changing number of available slots in running SGE Instance

I would like to change the number of slots on a running SGE instance. The instance was started by StarCluster.
I tried following this page and running:
ubuntu#master: $ qconf -mattr exechost complex_values slots=1 master
ubuntu#master modified "master" in exechost list
but it does not look like anything changed:
ubuntu#master: $ qstat -f
queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
all.q#master BIP 0/1/8 1.04 linux-x64
run
qconf -mq all.q
Then you will see a line
slots 1,[master=0],[node1366=2],[node1379=2]
If you want master to have 1 slot, change it to
[master=1]
Save and quit. That's as simple as that.

Hot reconfiguration of HAProxy still lead to failed request, any suggestions?

I found there are still failed request when the traffic is high using command like this
haproxy -f /etc/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
to hot reload the updated config file.
Here below is the presure testing result using webbench :
/usr/local/bin/webbench -c 10 -t 30 targetHProxyIP:1080
Webbench – Simple Web Benchmark 1.5
Copyright (c) Radim Kolar 1997-2004, GPL Open Source Software.
Benchmarking: GET targetHProxyIP:1080
10 clients, running 30 sec.
Speed=70586 pages/min, 13372974 bytes/sec.
**Requests: 35289 susceed, 4 failed.**
I run command
haproxy -f /etc/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
several times during the pressure testing.
In the haproxy documentation, it mentioned
They will receive the SIGTTOU
611 signal to ask them to temporarily stop listening to the ports so that the new
612 process can grab them
so there is a time period that the old process is not listening on the PORT(say 80) and the new process haven’t start to listen to the PORT (say 80), and during this specific time period, it will cause the NEW connections failed, make sense?
So is there any approach that makes the configuration reload of haproxy that will not impact both existing connections and new connections?
On recent kernels where SO_REUSEPORT is finally implemented (3.9+), this dead period does not exist anymore. While a patch has been available for older kernels for something like 10 years, it's obvious that many users cannot patch their kernels. If your system is more recent, then the new process will succeed its attempt to bind() before asking the previous one to release the port, then there's a period where both processes are bound to the port instead of no process.
There is still a very tiny possibility that a connection arrived in the leaving process' queue at the moment it closes it. There is no reliable way to stop this from happening though.

Recording busy network traffic with tcpdump

I have set up a system on my Raspberry Pi to record some TCPDUMP data. This system works under a light workload, but for some unknown reason, doesn't work under my "heavy" traffic (27 relevant packets per second).
Under the last heavy traffic system I tried to record, my monitor.log file had 35,200 rows that only contained the last 16 minutes worth of data (judging by the timestamps). My filter.log also only goes back 16 minutes worth. There should be something like 1 million rows.
Could anyone advise on how to find the possible bug, bottle-necks, dropped pipe data, etc?
RC.LOCAL:
java -jar filter.jar > filter.log 2>&1 &
bash ./monitor &
MONITOR:
TCPDUMP -l | SED | tee monitor.log | tee myFIFO
You may try an utility like iptraf to monitor traffic.
try sed with stream option (unbuffered) ( -u on aix and --unbuffered on GNU sed) so sed does not wait for an EOF or assimilate (like a >> file from a discontinu stream)

libvirt cpuset is not able to set affinity

I have been trying to set cpu affinity for a VM. Now, I edited the VM xml file present in /etc/libvirt/qemu/$VM.xml and put cpuset attribute. I have 4 cores and I put cpuset = '1,3'. But still when I did virsh vcpuinfo $VM, it showed that my VM's vcpus are still attached to pcpus 0 and 2. What am I doing wrong?
Would you mind pasting out the elements of your domain xml? you may refer to [CPU Allocation] to compare.
A handy tool is command taskset -p <your qemu process id> to see the CPU allocation on the KVM hypervisor.
BTW: you need qemu v0.8.5+ to get this feature.
Editing /etc/libvirt/qemu/$VM.xml under libvirt's hands is not what you should do, neither is setting the affinity without libvirt. In that case libvirt doesn't know about the settings.
The right thing to do is use 'virsh edit $VM', set what you want and stop and start the domain. You can also use virsh to pin each vCPU to particular host CPU(s):
for i in {1..X}; do # X is the number of VCPUs
virsh vcpupin $VM 0 1,3
done
virsh emulatorpin $VM 1,3
or
virsh numatune $VM --nodeset 1,3 # To pin to particular
You can use '--config' and '--live' to set it in config or for live domain respectively. For further options see the manual for comman virsh (man virsh).