I've eliminated core dumps in my C shell using the command "limit core 0". I've noticed that when I qsub jobs, that setting is not inherited even when "-V" is passed. Is there an option to inherit shell limits?
No. Generally with batch schedulers like Grid Engine you let the scheduler sort that out. You could send it to a queue with h_core set to 0 though.
Related
I'm currently setting up a gridengine on Ubuntu 16.04 using the sun gridengine.
Most of the features I want to use are working. However, I'm struggling with the following problem:
I have a 32 core machine (64 threads)
I'm running jobs which use software like Matlab...
These software packages can use multiple threads for calcultion
Current situation:
The Queue has 2 slots, Processors is set to 1.
I submit one job and all 64 threads are used for the calculation.
I submit a second job and both are running in parallel.
So, for run time test, I cannot control the number of used cores.
I also tried to setup a parallel environment (connected to that queue). But also if I run a job there, all cores are used.
I guess I have a general understanding problem.
Does anybody know or have an idea, how it is possible to setup something like that:
a) each slot can only use one core (then the parallel environment would allow me to specify the slots/cores of a job
b) to restrict the cores of a submitted job
Important is also that it is not only an upper but also a lower bound. But this could be handled by the number of slots, I guess.
Thanks already in advance for any ideas.
You can't(easily) control the number of threads a process can spawn but,using a recent grid engine, you can control the number of cores it can access. If your grid engine is recent check out the -binding parameter of qsub and the USE_CGROUPS option in sge_conf. If you have an older grid engine then you could try playing tricks with the starter_method.
I'm trying to validate our company's code works when NServiceBus v4.3 is using the MaximumConcurrencyLevel value setup in the config.
The problem is, when I try to process 12k+ of queued entries, I cannot tell any difference in times between the five different max concur levels I change. I set it to 1 and I can process the queue in 8m, then I put it to 2 and I get 9m, seems interesting (I was expecting more, but it's still going in the right direction), but then I put 3, 4, 5 and the timings stay at around 8m. I was expecting a much better throughput.
My question is, how can I verify that NServiceBus is actually indeed using five threads to process entries on the queue?
PS I've tried setting the MaximumConcurrencyLevel="1" and the MaximumMessageThroughputPerSecond along with logging the Thread.CurrentThread.ManagedThreadId thinking\hoping I was ONLY going to see one ThreadID value, but I'm seeing quite a few of different ones, which surprised me. My plan was to see one, then bump the max concur level to 5 and hopefully see five different values.
What am I missing? Thank you in advance.
There can be multiple reasons why you don't see faster processing times when increasing the concurrency setting described on the official documentation page: http://docs.particular.net/nservicebus/operations/tuning
You mentioned you're using the MaximumMessageThroughputPerSecond which will negate any performance gains my parallel message processing if a low value has been configured. Try removing this setting if possible.
Maybe you're accessing a resource in your handlers which isn't supporting/optimized for parallel access.
NServiceBus internally schedules the processing logic on the threadpool. This means that even with a MaximumConcurrencyLevel of 1, you will most likely see a different thread processing each message since there is no thread affinity. But the configuration values work as expected, if your queue contains 5 messages:
it will process these messages one by one if you configured MaximumConcurrencyLevel to 1
it will process all messages in parallel if you configured MaximumConcurrencyLevel to 5.
Depending on your handlers it can of course happen that the first message is already processed at the time the fifth message is read from the queue.
When qsubing jobs on a StarCluster / SGE cluster, is there an easy way to ensure that each node receives at most one job at a time? I am having issues where multiple jobs end up on the same node leading to out of memory (OOM) issues.
I tried using -l cpu=8 but I think that does not check the number of USED cores just the number of cores on the box itself.
I also tried -l slots=8 but then I get:
Unable to run job: "job" denied: use parallel environments instead of requesting slots explicitly.
In your config file (.starcluster/config) add this section:
[plugin sge]
setup_class = starcluster.plugins.sge.SGEPlugin
slots_per_host = 1
Largely depends on how the cluster resources are configured i.e. memory limits, etc. However, one thing to try is to request a lot of memory for each job:
-l h_vmem=xxG
This will have side-effect of excluding other jobs from running on a node by virtue that most of the memory on that node is already requested by another previously running job.
Just make sure the memory you request is not above the allowable limit for the node. You can see if it bypassing this limit by checking the output of qstat -j <jobid> for errors.
I accomplished this by setting the number of slots on each my nodes to 1 using:
qconf -aattr queue slots "[nodeXXX=1]" all.q
I have set up a small cluster (9 nodes) for computing in our lab. Currrently I am using one node as slurm controller, i.e. it is not being used for computing.
I would like to use it too, but I do not want to allocate all the CPUs, I would like to keep 2 CPU free for scheduling and other master-node-related tasks.
Is it possible to write something like that in slurm.conf:
NodeName=master NodeHostname=master CPUs=10 RealMemory=192000 TmpDisk=200000 State=UNKNOWN
NodeName=node0[1-8] NodeHostname=node0[1-8] CPUs=12 RealMemory=192000 TmpDisk=200000 State=UNKNOWN
PartitionName=mycluster Nodes=node0[1-8],master Default=YES MaxTime=INFINITE State=UP
Or do I break something? I do not want to test it without asking first because the cluster is already in production and I am worried about breaking something... In the partition here above, master is the hostname of my controller and node0[1-8] are my normal computing nodes. As you can see, not using master is a lost of 10% of CPU of the cluster...
Thanks in advance
Actually YES, it works.
I also added Weight=1 to nodes and Weight=2 to master, so that it get used only when the nodes are busy.
Cheers
I'm trying to run two procedures in parallel. As TCL is the interpreter, it will process procedures one by one. Can someone explain with an example how I can use multi-threading in TCL?
These days, the usual way to do multi-threading in Tcl is to use its Thread extension — it's being developed along with the Tcl's core, but on certain platforms (such as various Linux-based OSes) you might need to install a separate package to get this extension available.
The threading model the Thread extension implements is "one thread per interpreter". This means, each thread can "host" just one Tcl interpreter (and an unlimited number of its child interpreters), but no code executed by any thread may access interpreters hosted in other threads. This, in turn, means that when you work with threads in Tcl, you have to master the idea of multiple interpreters.
The classical approach to exchanging data between interpreters running in different threads is message passing: you post scripts to the input queue of the target interpreter running in different thread and then wait for reply. On the other hand, thread-shared variables (implementing sharing memory by locking) is also available. Another available feature is support for thread pools.
Read the "Tcl and threads" wiki page, the Thread's extension manual pages.
The code examples are on the wiki. Here's just one of them.
Please note that if your procedures which, you think, have to be run in parrallel, are mostly I/O bound (that is, they read something from the network and/or send something there) and not CPU-bound (doing heavy computations), you might have better results with the event-based approach to processing: the Tcl has built-in support for the event loop, and you are able to make Tcl execute your code when the next chunk of data can be read from a channel (such as a network socket) or written to a channel.