NVIDIA Ampere GPU Architecture Compatibility - cuda

Can anyone please help me to understand about NVIDIA devices serias 30 with Ampere architecture and compatible CUDA versions?
From here and from all over the net I understand that in CUDA toolkit v11 support for Ampere was added :
https://forums.developer.nvidia.com/t/can-rtx-3080-support-cuda-10-1/155849
What I don't understand is how it make sense with this :
https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html
Section
"1.3.1. Applications Built Using CUDA Toolkit 10.2 or Earlier"
So 🤷‍♂️ is it posible or not with CUDA 10.1 ?
Thanks you very very much 🙏

Note the sentence
CUDA applications built using CUDA Toolkit versions 2.1 through 10.2 are compatible with NVIDIA Ampere architecture based GPUs as long as they are built to include PTX versions
(emphasis mine)
Plus the explanation in the section above.
When a CUDA application launches a kernel on a GPU, the CUDA Runtime determines the compute capability of the GPU in the system and uses this information to find the best matching cubin or PTX version of the kernel. If a cubin compatible with that GPU is present in the binary, the cubin is used as-is for execution. Otherwise, the CUDA Runtime first generates compatible cubin by JIT-compiling 1 the PTX and then the cubin is used for the execution. If neither compatible cubin nor PTX is available, kernel launch results in a failure.
In effect: The CUDA toolkit remains ABI-compatible between 2.1 and 11. Therefore an application built for an old version will continue to load at runtime. The CUDA runtime will then detect that your kernels are built for a version that is not compatible with Ampere. So it will take the PTX and compile a new version at runtime.
As note in comments, only a current driver is required on the production system for this to work.

Related

Nsight Compute says: "Profiling is not supported on this device" - why?

I have a machine with an NVIDA GTX 1050 Ti GPU (compute capability 6.1), and am trying to profile a kernel in a program I built with CUDA 11.4. My OS distribution is Devuan GNU/Linux 4 Chimaera (~= Debian 11 Bullseye).
NSight Compute starts my program, and shows me API call after API call, but when I get to the first kernel launch, it gives me an error message in the Details column of the API call listing:
Error: Profiling is not supported on this device
Why? What's wrong with my device? Is it a permissions issue?
tl;dr: Nsight Compute no longer supports Pascal GPUs.
Nsight Compute used to support Pascal-microarchitecture GPUs (Compute Capability 6.x) - up until version 2019.5.1. Beginning with 2020, Nsight Compute dropped support for Pascal.
If you're wondering why that is - no reason or justification was given to my knowledge (see also the quote below). This is especially puzzling, or annoying, given the short period of time between the release of post-Pascal GPUs and this dropping of support (as little as 1.5 years if you look at consumer GTX cards).
On the other hand, you may still use the NVIDIA Visual Profiler tool with Pascal cards, so they did throw you entirely under the bus. And you can also download and use Nsight Computer 2019.5.1.
To quote an NVIDIA moderator's statement on the matter on the NVIDIA developer forums:
Pascal support was deprecated, then dropped from Nsight Compute after Nsight Compute 2019.5.1. The profiling tools that support Pascal in the CUDA Toolkit 11.1 and later are nvprof and visual profiler.

Does CUDA 11.2 supports backward compatibility with application that is compiled on CUDA 10.2?

I have the base image for my application built with nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04.I have to run that application in the cluster which is having cuda version
NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2.
My application is not giving me right prediction results for the GPU trained model(it is returning the base score as prediction output).However, it is able to return accurate prediction results for the CPU-trained model.so, I am speculating it as the CUDA version incompatibility issue between the two. I want to know that whether CUDA version 11.2 works well with application that is complied with CUDA 10.2 or not..
Yes, it is possible for an application compiled with CUDA 10.2 to run in an environment that has CUDA 11.2 installed. This is part of the CUDA compatibility model/system.
Otherwise, there isn't enough information in this question to diagnose why your application is behaving the way you describe. For that, SO expects a minimal reproducible example.

Loading a PTX programatically returns error 209 when run against device with CUDA capability 5.0

I am trying to use the ptxjit sample from the CUDA SDK as the basis for instrument the interaction with the GPU device.
I've managed to successfully compile the instrumentation code, and control the device to load and execute a PTX module with a Geforce GT440 that has CUDA capability 2.0.
When compiling the same instrumentation code on a (laptop using bumblebee to control the discrete GPU) system with a Geforce 830M that has CUDA capability 5.0 the code compiles but gives me 209 (CUDA_ERROR_NO_BINARY_FOR_GPU).
I've tried to compile the kernel to be compatible with CUDA capability 5.0 but had no success, still the same error.
Any ideas?
In the end the problem was with the driver. It seams that it affects only the functions that are used for PTX code loading with GPUs that have CUDA Capability 5.0.
I removed all the nvidia driver packages that were updated recently and installed the driver and OpenGL libraries that comes with the CUDA SDK. The driver version for SDK 7.5 is 352.39, with this driver both the original ptxjit sample as well as the modified one executed perfectly as on the other systems.
I don't have any GPU with CUDA capability 3.0 to test if the same problem would appear, also, I didn't updated my desktop to the 367.44 driver to see if it would break the ptxjit sample.
For now, the solution is to keep the driver that comes with the CUDA SDK and turn off updates from the nvidia repository.

Caffe using GPU with NVidia Quadro 2200

I'm using the deep learning framework Caffe on a Ubuntu 14.04 machine. I compiled CAFE with CPU_ONLY option, i.e. I disabled GPU and CUDA usage. I have an NVidia Quadro K2200 graphics card and CUDA version 5.5.
I would like to know if it is possible to use Caffe with CUDA enabled with my GPU. On NVidia page, it is written that Quadro K2200 has a compute capability of 5.0. Does it mean that I can use it with CUDA versions up to release 5.0? When it is possible to use Caffe with GPU-enabled with Quadro K2200, how can I choose the appropriate CUDA version for that?
CUDA version is not the same thing as Compute Capability. For one, CUDA is current (7.5 prerelease), while CC is only at 5.2. K2200 supports CC 5.0.
The difference:
CUDA version means the library/toolkit/SDK/etc version. You should always use the highest one available.
Compute Capability is your GPU's capability to perform certain instructions, etc. Every CUDA function has a minimum CC requirement. When you write a CUDA program, it's CC requirement is the maximum of the requirements of all the features you used.
That said, I've no idea what Caffe is, but a quick search shows they require CC of 2.0, so you should be good to go. CC 5.0 is pretty recent, so very few things won't work on it.

Can I compile a cuda program without having a cuda device

Is it possible to compile a CUDA program without having a CUDA capable device on the same node, using only NVIDIA CUDA Toolkit...?
The answer to your question is YES.
The nvcc compiler driver is not related to the physical presence of a device, so you can compile CUDA codes even without a CUDA capable GPU. Be warned however that, as remarked by Robert Crovella, the CUDA driver library libcuda.so (cuda.lib for Windows) comes with the NVIDIA driver and not with the CUDA toolkit installer. This means that codes requiring driver APIs (whose entry points are prefixed with cu, see Appendix H of the CUDA C Programming Guide) will need a forced installation of a "recent" driver without the presence of an NVIDIA GPU, running the driver installer separately with the --help command line switch.
Following the same rationale, you can compile CUDA codes for an architecture when your node hosts a GPU of a different architecture. For example, you can compile a code for a GeForce GT 540M (compute capability 2.1) on a machine hosting a GT 210 (compute capability 1.2).
Of course, in both the cases (no GPU or GPU with different architecture), you will not be able to successfully run the code.
For the early versions of CUDA, it was possible to compile the code under an emulation modality and run the compiled code on a CPU, but device emulation is since some time deprecated. If you don't have a CUDA capable device, but want to run CUDA codes you can try using gpuocelot (but I don't have any experience with that).