MPI+CUDA mixed programming - Driver error - cuda

I'm using MPI+CUDA mixed mode to program a GPU cluster for matrix multiplication. When I offload the multiplication operations to the GPUs via MPI and CUDA, it gives an error message at run time:
FATAL: Error inserting nvidia (/lib/modules/3.2.0-23-generic-pae/kernel/drivers/video/nvidia.ko): No such device
MPI is used to transfer the data blocks and then upon receiving the data, a generic C function is called that triggers a CUDA kernel.
Test setup has 3 machines, each has single GPU.
I tested with a CUDA only local version version. I didn't get any error messages, but the answers of the algorithms were wrong (Even for the small simple algorithms).
What's the reason for this error?
Please note that this is only when I try to use the MPI with CUDA. CUDA only version works well. Thanks in advance.

The errors have been caused because Nouveau is controlling the GPU, not the NVIDIA driver. So, before installing NVIDIA driver and CUDA toolkit, nouveau should be blacklisted.
sudo nano /etc/modprobe.d/blacklist.conf
Insert nouveau at the end of the file.
If the NVIDIA driver is already installed, then re-install the NVIDIA driver.

Related

Is there a way to compile CUDA programs in a machine that does not have NVIDIA graphics card? [duplicate]

I tried to install cuda toolkit without display driver in CentOS 6. It gets installed properly. I was able to compile but it is compiling without performing any operation and I get garbage values in array addition. For cudaGetDeviceCount(&count) I am getting value as "o" which means I don't have any card on my machine.
You can install the CUDA toolkit without installing the driver.
You can then compile CUDA codes that use the runtime API.
You will not be able to run those codes unless you have a proper CUDA driver and GPU installed in the machine, however.
Codes that depend on the driver API will also not be compilable in this configuration, on older CUDA toolkits, without additional work. Newer CUDA toolkits provide stub libraries for driver libraries, which can be linked against.
This answer covers the method to install the CUDA toolkit without the driver.
If you want just run the codes and profiling the performance and other parameters, it would be helpful if you install GPGPU-sim simulator. It doesn't need any graphic card on your machine.

Loading a PTX programatically returns error 209 when run against device with CUDA capability 5.0

I am trying to use the ptxjit sample from the CUDA SDK as the basis for instrument the interaction with the GPU device.
I've managed to successfully compile the instrumentation code, and control the device to load and execute a PTX module with a Geforce GT440 that has CUDA capability 2.0.
When compiling the same instrumentation code on a (laptop using bumblebee to control the discrete GPU) system with a Geforce 830M that has CUDA capability 5.0 the code compiles but gives me 209 (CUDA_ERROR_NO_BINARY_FOR_GPU).
I've tried to compile the kernel to be compatible with CUDA capability 5.0 but had no success, still the same error.
Any ideas?
In the end the problem was with the driver. It seams that it affects only the functions that are used for PTX code loading with GPUs that have CUDA Capability 5.0.
I removed all the nvidia driver packages that were updated recently and installed the driver and OpenGL libraries that comes with the CUDA SDK. The driver version for SDK 7.5 is 352.39, with this driver both the original ptxjit sample as well as the modified one executed perfectly as on the other systems.
I don't have any GPU with CUDA capability 3.0 to test if the same problem would appear, also, I didn't updated my desktop to the 367.44 driver to see if it would break the ptxjit sample.
For now, the solution is to keep the driver that comes with the CUDA SDK and turn off updates from the nvidia repository.

Can I compile a cuda program without having a cuda device

Is it possible to compile a CUDA program without having a CUDA capable device on the same node, using only NVIDIA CUDA Toolkit...?
The answer to your question is YES.
The nvcc compiler driver is not related to the physical presence of a device, so you can compile CUDA codes even without a CUDA capable GPU. Be warned however that, as remarked by Robert Crovella, the CUDA driver library libcuda.so (cuda.lib for Windows) comes with the NVIDIA driver and not with the CUDA toolkit installer. This means that codes requiring driver APIs (whose entry points are prefixed with cu, see Appendix H of the CUDA C Programming Guide) will need a forced installation of a "recent" driver without the presence of an NVIDIA GPU, running the driver installer separately with the --help command line switch.
Following the same rationale, you can compile CUDA codes for an architecture when your node hosts a GPU of a different architecture. For example, you can compile a code for a GeForce GT 540M (compute capability 2.1) on a machine hosting a GT 210 (compute capability 1.2).
Of course, in both the cases (no GPU or GPU with different architecture), you will not be able to successfully run the code.
For the early versions of CUDA, it was possible to compile the code under an emulation modality and run the compiled code on a CPU, but device emulation is since some time deprecated. If you don't have a CUDA capable device, but want to run CUDA codes you can try using gpuocelot (but I don't have any experience with that).

Cuda driver initialization failed

I have a two gpu system, a Geforce 8400 GS and Geforce GT 520. I am able to run my cuda programs on both the gpus. But when I use cuda-gdb to debug them I get an error saying that the Cuda driver initialization failed. Also, when I run the program with cuda-gdb, the cudaGetDeviceCount says I have only 1 gpu. I am able to run the programs on either of the gpus if I am not using cuda-gdb. Can somebody help me with this?
I am running Ubuntu 11.04.
It looks like you have a display driver version older than the one required by the CUDA Toolkit. Make sure you installed the display driver downloaded from the same download page you got your toolkit from.
cuda-gdb will hide from the application being debugged GPUs used to run your desktop environment. Otherwise the desktop environment might've hanged when the application is suspended on the breakpoint. To see both GPUs in cuda-gdb you need to run without desktop environment.

CUDA 5.0 cuda-gdb on Linux Needs dedicated CPU?

With a fresh CUDA 5.0 Linux install on CentOS 5.5, I am not able to gdb. So I am wondering if you still need a dedicated GPU for the Linux cuda-gdb? I tried it with the Vesa device driver for X11, but get the same result. Profiling works, running the app works, but trying to run cuda-gdb gives :
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000
Any suggestions?
cuda-gdb still needs a GPU that is not used by graphical environment (e.g. if you are running Gnome/KDE/etc. you need to have system with several GPUs - not necessary all of them must be NVIDIA GPUs)
This particular message is not about this problem - you can ignore it. cuda-gdb will tell if it fails because no GPU can be used for debugging.