GPU device Id Mismatch while calling from keras [duplicate] - cuda

Assume on a single node, there are several devcies with different compute capabilities, how nvidia rank them (by rank I mean the number assigned by cudaSetDevice)?
Are there any general guideline about this? thanks.

I believe the ordering of devices corresponding to cudaGetDevice and cudaSetDevice (i.e. the CUDA runtime enumeration order should be either based on a heuristic that determines the fastest device and makes it first or else based on PCI enumeration order. You can confirm this using the deviceQuery sample, which prints the properties of devices (including PCI ID) based on the order they get enumerated in for cudaSetDevice.
However I would recommend not to base any decisions on this. There's nothing magical about PCI enumeration order, and even things like a system BIOS upgrade can change the device enumeration order (as can swapping devices, moving to another system, etc.)
It's usually best to query devices (see the deviceQuery sample) and then make decisions based on the specific devices returned and/or their properties. You can also use cudaChooseDevice to select a device heuristically.
You can cause the CUDA runtime to choose either "Faster First" or "PCI Enumeration Order" based on the setting (or lack of) an environment variable in CUDA 8.

Related

How do I make sure Vulkan is using the same GPU as CUDA?

I'm using an application that uses both vulkan and cuda (specifically pytorch) on an HPC cluster (univa grid engine).
When a job is submitted, the cluster scheduler sets an environment variable SGE_HGR_gpu which contains a GPU ID for the job to use (so other jobs run by other users do not use the same GPU)
The typical way to tell an application that uses CUDA to use a specific GPU is to set CUDA_VISIBLE_DEVICES=n
As i'm also using Vulkan, I dont know how to make sure that I choose the same device from those that are listed with vkEnumeratePhysicalDevices.
I think that the order of the values that 'n' can take is the same as the order of the devices on the PCI BUS, however I dont know if the order of the devices returned by vkEnumeratePhysicalDevices are in this order, and the documentation does not specify what this order is.
So how can I go about making sure i'm choosing the same physical GPU for both Vulkan and CUDA?
With VkPhysicalDeviceIDPropertiesKHR (Vulkan 1.1) resp VkPhysicalDeviceVulkan11Properties (Vulkan 1.2) you can get device UUID, which is one of the formats CUDA_VISIBLE_DEVICES seems to use. You should also be able to convert index to UUID (or vice versa) with nvidia-smi -L (or with NVML library).
Or other way around, cudaDeviceProp includes PCI info which could be compared to VK_EXT_pci_bus_info extensions output.
If the order matches in Vulkan, it is best to ask NVidia directly; I cannot find info how NV orders them. IIRC from the Vulkan Loader implementation, the order should match the order from registry, and then the order the NV driver itself gives them. Even so you would have to filter non-NV GPUs from the list in generic code, and you do not know if the NV Vulkan ICD implementation matches CUDA without asking NV.

CUDA GPU selected by position, but how to set default to be something other than device 0?

I've recently installed a second GPU (Tesla K40) on my machine at home and my searches have suggested that the first PCI slot becomes the default GPU chosen for CUDA jobs. A great link is explaining it can be found here:
Default GPU Assignment
My original GPU is a TITAN X, also CUDA enabled, but it's really best for single precision calculations and the Tesla better for double precision. My question for the group is whether there is a way to set up my default CUDA programming device to be the second one always? Obviously I can specify in the code each time which device to use, but I'm hoping that I can configure my set such that it will always default to using the Tesla card.
Or is the only way to open the box up and physically swap positions of the devices? Somehow that seems wrong to me....
Any advice or relevant links to follow up on would be greatly appreciated.
As you've already pointed out, the cuda runtime has its own heuristic for ordering GPUs and assigning devices indices to them.
The CUDA_VISIBLE_DEVICES environment variable will allow you to modify this ordering.
For example, suppose that in ordinary use, my display device is enumerated as device 0, and my preferred CUDA GPU is enumerated as device 1. Applications written without any usage of cudaSetDevice, for example, will default to using the device enumerated as 0. If I want to change this, under linux I could use something like:
CUDA_VISIBLE_DEVICES="1" ./my_app
to cause the cuda runtime to enumerate the device that would ordinarily be device 1 as device 0 for this application run (and the ordinary device 0 would be "hidden" from CUDA, in this case). You can make this "permanent" for the session simply by exporting that variable (e.g., bash):
export CUDA_VISIBLE_DEVICES="1"
./my_app
If I simply wanted to reverse the default CUDA runtime ordering, but still make both GPUs available to the application, I could do something like:
CUDA_VISIBLE_DEVICES="1,0" ./deviceQuery
There are other specification options, such as using GPU UUID identifiers (instead of device indices) as provided by nvidia-smi.
Refer to the documentation or this writeup as well.

A single program appear on two GPU card

I have multiple GPU cards(NO.0, NO.1 ...), and every time I run a caffe process on NO.1 or 2 ... (except 0) card, it will use up 73MiB on the NO.0 card.
For example, in the fig below, process 11899 will use 73MiB on NO.0 card but it actually run on NO.1 card.
Why? Can I disable this feature?
The CUDA driver is like an operating system. It will reserve memory for various purposes when it is active. Certain features, such as managed memory, may cause substantial side-effect allocations to occur (although I don't think this is the case with Caffe). And its even possible that the application itself may be doing some explicit allocations on those devices, for some reason.
If you want to prevent this, one option is to use the CUDA_VISIBLE_DEVICES environment variable when you launch your process.
For example, if you want to prevent CUDA from doing anything with card "0", you could do something like this (on linux):
CUDA_VISIBLE_DEVICES="1,2" ./my_application ...
Note that the enumeration used above (the CUDA enumeration) is the same enumeration that would be reported by the deviceQuery sample app, but not necessarily the same enumeration reported by nvidia-smi (the NVML enumeration). You may need to experiment or else run deviceQuery to determine which GPUs you want to use, and which you want to exclude.
Also note that using this option actually affects the devices that are visible to an application, and will cause a re-ordering of device enumeration (the device that was previously "1" will appear to be enumerated as device "0", for example). So if your application is multi-GPU aware, and you are selecting specific devices for use, you may need to change the specific devices you (or the application) are selecting, when you use this environment variable.

How do the nVIDIA drivers assign device indices to GPUs?

Assume on a single node, there are several devcies with different compute capabilities, how nvidia rank them (by rank I mean the number assigned by cudaSetDevice)?
Are there any general guideline about this? thanks.
I believe the ordering of devices corresponding to cudaGetDevice and cudaSetDevice (i.e. the CUDA runtime enumeration order should be either based on a heuristic that determines the fastest device and makes it first or else based on PCI enumeration order. You can confirm this using the deviceQuery sample, which prints the properties of devices (including PCI ID) based on the order they get enumerated in for cudaSetDevice.
However I would recommend not to base any decisions on this. There's nothing magical about PCI enumeration order, and even things like a system BIOS upgrade can change the device enumeration order (as can swapping devices, moving to another system, etc.)
It's usually best to query devices (see the deviceQuery sample) and then make decisions based on the specific devices returned and/or their properties. You can also use cudaChooseDevice to select a device heuristically.
You can cause the CUDA runtime to choose either "Faster First" or "PCI Enumeration Order" based on the setting (or lack of) an environment variable in CUDA 8.

How to choose device when running a CUDA executable?

I'm connecting to a GPU cluster from the outside and I have no idea how to select the device on which to run my CUDA programs.
I know there are two Tesla GPU in the cluster, and I'd like to choose one of them.
Any ideas how? How do you choose the device you want to use when there are many connected to your computer?
The canonical way to select a device in the runtime API is using cudaSetDevice. That will configure the runtime to perform lazy context establishment on the nominated device. Prior to CUDA 4.0, this call didn't actually establish a context, it just told the runtime which GPU to try and use. Since CUDA 4.0, this call will establish a context on the nominated GPU at the time of calling. There is also cudaChooseDevice, which will select amongst available devices to find one which matches criteria supplied by the caller.
You can enumerate the available GPUs on a system with cudaGetDeviceCount, and retrieve their particulars using cudaGetDeviceProperties. The SDK deviceQuery example shows full details of how to do this.
You may need to be careful, however, on how you select GPUs in a multi-GPU system, depending on the host and driver configuration. In both the Linux and the Windows TCC driver, there exists the option for GPUs to be marked "compute exculsive", meaning that the driver will limit each GPU to one active context at a time, or compute prohibited, meaning that no CUDA program can establish a context on that device. If your code attempts to establish a context on a compute prohibited device, or on a compute exclusive device which is in use, the result will be an invalid device error. In a multiple GPU system where the policy is to use compute exclusivity, the correct approach is not to try and select a particular GPU, but simply to allow lazy context establishment to happen implicitly. The driver will automagically select a free GPU for your code to run. The compute mode status of any device can be checked by reading the cudaDeviceProp.computeMode field using the cudaGetDeviceProperties call. Note that you are free to check unavailable or prohibited GPUs and query their properties, but any operation which would require context establishment will fail.
See the runtime API documentation on all of these calls
You can set the environment variable CUDA_VISIBLE_DEVICES to a comma-separated list of device IDs to make only those devices visible to an application. Use this either to mask out devices or to change the visibility order of devices so that the CUDA runtime enumerates them in a specific order.