How to prevent two CUDA programs from interfering - cuda

I've noticed that if two users try to run CUDA programs at the same time, it tends to lock up either the card or the driver (or both?). We need to either reset the card or reboot the machine to restore normal behavior.
Is there a way to get a lock on the GPU so other programs can't interfere while it's running?
Edit
OS is Ubuntu 11.10 running on a server. While there is no X Windows running, the card is used to display the text system console. There are multiple users.

If you are running on either Linux or Windows with the TCC driver, you can put the GPU into compute exclusive mode using the nvidia-smi utility.
Compute exclusive mode makes the driver refuse a context establishment request if another process already holds a context on that GPU. Any process trying to run on a busy compute exclusive GPU will receive a no device available error and fail.

You can use something like Task Spooler to queue the programs and run one at the time.
We use TORQUE Resource Manager but it's harder to configure than ts. With TORQUE you can have multiple queues (ie one for cuda jobs, two for cpu jobs) and assign a different job to each gpu.

Related

Resizing cloud VM disk without taking instance down (Google cloud)

So I saw there is an option in google compute (I assume the same option exists in other cloud VM suppliers so the question isnt specifically on Google compute, but on the underlying technology) to resize the disk without having to restart the machine, and I ask, how is this possible?
Even if it uses some sort of abstraction to the disk and they dont actually assign a physical disk to the VM, but just part of the disk (or part of a number of disks), once the disk is created in the guest VM is has a certain size, how can it change without needing a restart? Does it utilize NFS somehow?
This is built directly into disk protocols these days. This capability has existed for a while, since disks have been virtualized since the late 1990s (either through network protocols like iSCSI / FibreChannel, or through a software-emulated version of hardware like VMware).
Like the VMware model, GCE doesn't require any additional network hops or protocols to do this; the hypervisor just exposes the virtual disk as if it is a physical device, and the guest knows that its size can change and handles that. GCE uses a virtualization-specific driver type for its disks called VirtIO SCSI, but this feature is implemented in many other driver types (across many OSes) as well.
Since a disk can be resized at any time, disk protocols need a way to tell the guest that an update has occurred. In general terms, this works as follows in most protocols:
Administrator resizes disk from hypervisor UI (or whatever storage virtualization UI they're using).
Nothing happens inside the guest until it issues an IO to the disk.
Guest OS issues an IO command to the disk, via the device driver in the guest OS.
Hypervisor emulates that IO command, notices that the disk has been resized and the guest hasn't been alerted yet, and returns a response to the guest telling it to update its view of the device.
The guest OS recognizes this response and re-queries the device size and other details via some other command.
I'm not 100% sure, but I believe the reason it's structured like this is that traditionally disks cannot send updates to the OS unless the OS requests them first. This is probably because the disk has no way to know what memory is free to write to, and even if it did, no way to synchronize access to that memory with the OS. However, those constraints are becoming less true to enable ultra-high-throughput / ultra-low-latency SSDs and NVRAM, so new disk protocols such as NVMe may do this slightly differently (I don't know).

Why do my google cloud compute instances always unexpectedly restart?

Help! Help! Help!
It is really annoying and I almost cannot bear it anymore! I'm using google cloud compute engine instances but they often unexpectedly restart without any notification in advance. The restart of instances seems to happen randomly and I have no idea what's going wrong there! I'm pretty sure that the instances are been occupied (usage of CPUs > 50% and all GPUs are in use) when restart happens. Could anyone please tell me how to solve this problem? Thanks in advance!
The issue is right here:
all GPUs are in use
If you check the official documentation about GPU:
GPU instances must terminate for host maintenance events, but can automatically restart. These maintenance events typically occur once per week, but can occur more frequently when necessary. You must configure your workloads to handle these maintenance events cleanly. Specifically, long-running workloads like machine learning and high-performance computing (HPC) must handle the interruption of host maintenance events. Learn how to handle host maintenance events on instances with GPUs.
This is because an instance that has a GPU attached cannot be migrated to another host for maintenance as it happens for the rest of the virtual machines. To get a physical GPU attached to the instance and bare metal performance you are using GPU passthrough , which sadly means if the host has to go through maintenance the VM is going down with it.
This sounds like Preemptible VM instance.
Preemptible instances function like normal instances, but have the following limitations:
Compute Engine might terminate preemptible instances at any time due to system events. The probability that Compute Engine will terminate a preemptible instance for a system event is generally low, but might vary from day to day and from zone to zone depending on current conditions.
Compute Engine always terminates preemptible instances after they run for 24 hours.
To check if your instance is preemptible using gcloud cli, just run
gcloud compute instances describe instance-name --format="(scheduling.preemptible)"
Result
scheduling:
preemptible: false
change "instance-name" to real name.
Or simply via UI, click on compute instance and scroll down:
To check for system operations performed on your instance, you can review it using following command:
gcloud compute operations list

Pm2 cluster mode, The ideal number of workers?

I using PM2 to run my nodejs application.
When starting it in cluster mode "pm2 start server -i 0": PM2 will automatically spawn as many workers as you have CPU cores.
What is the ideal number of workers to run and why?
Beware of the context switch
When running multiple processes on your machine, try to make sure each CPU core will be kepy busy by a single application thread at a time. As a general rule, you should look to spawn N-1 application processes, where N is the number of available CPU cores. That way, each process is guaranteed to get a good slice of one core, and there’s one spare for the kernel scheduler to run other server tasks on. Additionally, try to make sure the server will be running little or no work other than your Node.JS application, so processes don’t fight for CPU.
We made a mistake where we deployed two busy node.js applications to our servers, both apps spawning N-1 processes each. The applications’ processes started vehemently competing for CPU, resulting in CPU load and usage increasing dramatically. Even though we were running these on beefy 8-core servers, we were paying a noticeable penalty due to context switching. Context switching is the behaviour whereby the CPU suspends one task in order to work on another. When context switching, the kernel must suspend all state for one process while it loads and executes state for another. After simply reducing the number of processes the applications spawned such that they each shared an equal number of cores, load dropped significantly:
https://engineering.gosquared.com/optimising-nginx-node-js-and-networking-for-heavy-workloads

Kvm/Qemu maximum vm count limit

For a research project I am trying to boot as many VM's as possible, using python libvirt bindings, in KVM under Ubuntu server 12.04. All the VM's are set to idle after boot, and to use a minimum amount of memory. At the most I was able to boot 1000 VM's on a single host, at which point the kernel (Linux 3x) became unresponsive, even if both CPU- and memory usage is nowhere near the limits (48 cores AMD, 128GB mem.) Before this, the booting process became successively slower, after a couple of hundred VM's.
I assume this must be related to the KVM/Qemu driver, as the linux kernel itself should have no problem handling this few processes. However, I did read that the Qemu driver was now multi-threaded. Any ideas of what the cause of this slowness may be - or at least where I should start looking?
You are booting all the VMs using qemu-kvm right, and after 100s of VM you feel it's becoming successively slow. So when you feels it stop using kvm, just boot using qemu, I expect you see the same slowliness. My guess is that after those many VMs, KVM (hardware support) exhausts. Because KVM is nothing but software layer for few added hardware registers. So KVM might be the culprit here.
Also what is the purpose of this experiment ?
The following virtual hardware limits for guests have been tested. We ensure host and VMs install and work successfully, even when reaching the limits and there are no major performance regressions (CPU, memory, disk, network) since the last release (SUSE Linux Enterprise Server 11 SP1).
Max. Guest RAM Size --- 512 GB
Max. Virtual CPUs per Guest --- 64
Max. Virtual Network Devices per Guest --- 8
Max. Block Devices per Guest --- 4 emulated (IDE), 20 para-virtual (using virtio-blk)
Max. Number of VM Guests per VM Host Server --- Limit is defined as the total number of virtual CPUs in all guests being no greater than 8 times the number of CPU cores in the host
for more limitations of KVm please refer this document link

GPUDirect RDMA transfer from GPU to remote host

Scenario:
I have two machines, a client and a server, connected with Infiniband. The server machine has an NVIDIA Fermi GPU, but the client machine has no GPU. I have an application running on the GPU machine that uses the GPU for some calculations. The result data on the GPU is never used by the server machine, but is instead sent directly to the client machine without any processing. Right now I'm doing a cudaMemcpy to get the data from the GPU to the server's system memory, then sending it off to the client over a socket. I'm using SDP to enable RDMA for this communication.
Question:
Is it possible for me to take advantage of NVIDIA's GPUDirect technology to get rid of the cudaMemcpy call in this situation? I believe I have the GPUDirect drivers correctly installed, but I don't know how to initiate the data transfer without first copying it to the host.
My guess is that it isn't possible to use SDP in conjunction with GPUDirect, but is there some other way to initiate an RDMA data transfer from the server machine's GPU to the client machine?
Bonus: If somone has a simple way to test if I have the GPUDirect dependencies correctly installed that would be helpful as well!
Yes, it is possible with supporting networking hardware. See the GPUDirect RDMA documentation.