Loading a PTX programatically returns error 209 when run against device with CUDA capability 5.0 - cuda

I am trying to use the ptxjit sample from the CUDA SDK as the basis for instrument the interaction with the GPU device.
I've managed to successfully compile the instrumentation code, and control the device to load and execute a PTX module with a Geforce GT440 that has CUDA capability 2.0.
When compiling the same instrumentation code on a (laptop using bumblebee to control the discrete GPU) system with a Geforce 830M that has CUDA capability 5.0 the code compiles but gives me 209 (CUDA_ERROR_NO_BINARY_FOR_GPU).
I've tried to compile the kernel to be compatible with CUDA capability 5.0 but had no success, still the same error.
Any ideas?

In the end the problem was with the driver. It seams that it affects only the functions that are used for PTX code loading with GPUs that have CUDA Capability 5.0.
I removed all the nvidia driver packages that were updated recently and installed the driver and OpenGL libraries that comes with the CUDA SDK. The driver version for SDK 7.5 is 352.39, with this driver both the original ptxjit sample as well as the modified one executed perfectly as on the other systems.
I don't have any GPU with CUDA capability 3.0 to test if the same problem would appear, also, I didn't updated my desktop to the 367.44 driver to see if it would break the ptxjit sample.
For now, the solution is to keep the driver that comes with the CUDA SDK and turn off updates from the nvidia repository.

Related

NVIDIA Ampere GPU Architecture Compatibility

Can anyone please help me to understand about NVIDIA devices serias 30 with Ampere architecture and compatible CUDA versions?
From here and from all over the net I understand that in CUDA toolkit v11 support for Ampere was added :
https://forums.developer.nvidia.com/t/can-rtx-3080-support-cuda-10-1/155849
What I don't understand is how it make sense with this :
https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html
Section
"1.3.1. Applications Built Using CUDA Toolkit 10.2 or Earlier"
So 🤷‍♂️ is it posible or not with CUDA 10.1 ?
Thanks you very very much 🙏
Note the sentence
CUDA applications built using CUDA Toolkit versions 2.1 through 10.2 are compatible with NVIDIA Ampere architecture based GPUs as long as they are built to include PTX versions
(emphasis mine)
Plus the explanation in the section above.
When a CUDA application launches a kernel on a GPU, the CUDA Runtime determines the compute capability of the GPU in the system and uses this information to find the best matching cubin or PTX version of the kernel. If a cubin compatible with that GPU is present in the binary, the cubin is used as-is for execution. Otherwise, the CUDA Runtime first generates compatible cubin by JIT-compiling 1 the PTX and then the cubin is used for the execution. If neither compatible cubin nor PTX is available, kernel launch results in a failure.
In effect: The CUDA toolkit remains ABI-compatible between 2.1 and 11. Therefore an application built for an old version will continue to load at runtime. The CUDA runtime will then detect that your kernels are built for a version that is not compatible with Ampere. So it will take the PTX and compile a new version at runtime.
As note in comments, only a current driver is required on the production system for this to work.

Can a Cuda application built and running on Jetson TX2 run on Jetson Xavier?

I have a Cuda application that was built with Cuda Toolkit 9.0 and running fine on Jetson TX2 board.
I now have a Jetson Xavier board, flashed with Jetpack 4 that installs Cuda Toolkit 10.0 (only 10.0 is available).
What do I need to do if I want to run the same application on Xavier?
Nvidia documentation suggests that as long as I specify the correct target hardware when running nvcc, I should be able to run on future hardwares thanks to JIT compilation. But does this hold for different versions of Cuda toolkit (9 vs 10)?
In theory (and note I don't have access to a Xavier board to test anything), you should be able to run a cross compiled CUDA 9 application (and that might mean both ARM and GPU architecture settings) on a CUDA 10 host.
What you will need to make sure is that you either statically link or copy all the CUDA runtime API library components you require with your application on the Xavier board. Note that there is still an outside chance that those libraries might lack the necessary GPU and ARM features to run correctly on a Xavier system, or more subtle issues like libC incompatibility. That you will have to test for yourself.

Can I compile a cuda program without having a cuda device

Is it possible to compile a CUDA program without having a CUDA capable device on the same node, using only NVIDIA CUDA Toolkit...?
The answer to your question is YES.
The nvcc compiler driver is not related to the physical presence of a device, so you can compile CUDA codes even without a CUDA capable GPU. Be warned however that, as remarked by Robert Crovella, the CUDA driver library libcuda.so (cuda.lib for Windows) comes with the NVIDIA driver and not with the CUDA toolkit installer. This means that codes requiring driver APIs (whose entry points are prefixed with cu, see Appendix H of the CUDA C Programming Guide) will need a forced installation of a "recent" driver without the presence of an NVIDIA GPU, running the driver installer separately with the --help command line switch.
Following the same rationale, you can compile CUDA codes for an architecture when your node hosts a GPU of a different architecture. For example, you can compile a code for a GeForce GT 540M (compute capability 2.1) on a machine hosting a GT 210 (compute capability 1.2).
Of course, in both the cases (no GPU or GPU with different architecture), you will not be able to successfully run the code.
For the early versions of CUDA, it was possible to compile the code under an emulation modality and run the compiled code on a CPU, but device emulation is since some time deprecated. If you don't have a CUDA capable device, but want to run CUDA codes you can try using gpuocelot (but I don't have any experience with that).

Cuda driver initialization failed

I have a two gpu system, a Geforce 8400 GS and Geforce GT 520. I am able to run my cuda programs on both the gpus. But when I use cuda-gdb to debug them I get an error saying that the Cuda driver initialization failed. Also, when I run the program with cuda-gdb, the cudaGetDeviceCount says I have only 1 gpu. I am able to run the programs on either of the gpus if I am not using cuda-gdb. Can somebody help me with this?
I am running Ubuntu 11.04.
It looks like you have a display driver version older than the one required by the CUDA Toolkit. Make sure you installed the display driver downloaded from the same download page you got your toolkit from.
cuda-gdb will hide from the application being debugged GPUs used to run your desktop environment. Otherwise the desktop environment might've hanged when the application is suspended on the breakpoint. To see both GPUs in cuda-gdb you need to run without desktop environment.

CUDA version is insufficient for CUDART version. running in emu mode without NVIDIA card

I get the error -
cutilCheckMsg() CUTIL CUDA error : Kernel execution failed
: CUDA version is insufficient for CUDART version.
when I run the sample code. The code however builds successfully
Details of the environment Im running the program -
Windows XP with NO NVIDIA driver
Visual Studio 2008 Express Edition
Cuda toolkit, sdk 3.0
Emulation mode
Here is a similar question asked on SO before but in that case the person had NVIDIA card. I do not have NVIDIA card on my machine. - CUDA driver version is insufficient for CUDA runtime version
Please suggest a solution
Currently the only way to run CUDA code without a GPU is to use CUDA x86 from PGI. Emulation mode was dropped from CUDA several versions ago (current version is CUDA 4.2).