Nsight Compute says: "Profiling is not supported on this device" - why? - cuda

I have a machine with an NVIDA GTX 1050 Ti GPU (compute capability 6.1), and am trying to profile a kernel in a program I built with CUDA 11.4. My OS distribution is Devuan GNU/Linux 4 Chimaera (~= Debian 11 Bullseye).
NSight Compute starts my program, and shows me API call after API call, but when I get to the first kernel launch, it gives me an error message in the Details column of the API call listing:
Error: Profiling is not supported on this device
Why? What's wrong with my device? Is it a permissions issue?

tl;dr: Nsight Compute no longer supports Pascal GPUs.
Nsight Compute used to support Pascal-microarchitecture GPUs (Compute Capability 6.x) - up until version 2019.5.1. Beginning with 2020, Nsight Compute dropped support for Pascal.
If you're wondering why that is - no reason or justification was given to my knowledge (see also the quote below). This is especially puzzling, or annoying, given the short period of time between the release of post-Pascal GPUs and this dropping of support (as little as 1.5 years if you look at consumer GTX cards).
On the other hand, you may still use the NVIDIA Visual Profiler tool with Pascal cards, so they did throw you entirely under the bus. And you can also download and use Nsight Computer 2019.5.1.
To quote an NVIDIA moderator's statement on the matter on the NVIDIA developer forums:
Pascal support was deprecated, then dropped from Nsight Compute after Nsight Compute 2019.5.1. The profiling tools that support Pascal in the CUDA Toolkit 11.1 and later are nvprof and visual profiler.

Related

NVIDIA Ampere GPU Architecture Compatibility

Can anyone please help me to understand about NVIDIA devices serias 30 with Ampere architecture and compatible CUDA versions?
From here and from all over the net I understand that in CUDA toolkit v11 support for Ampere was added :
https://forums.developer.nvidia.com/t/can-rtx-3080-support-cuda-10-1/155849
What I don't understand is how it make sense with this :
https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html
Section
"1.3.1. Applications Built Using CUDA Toolkit 10.2 or Earlier"
So 🤷‍♂️ is it posible or not with CUDA 10.1 ?
Thanks you very very much 🙏
Note the sentence
CUDA applications built using CUDA Toolkit versions 2.1 through 10.2 are compatible with NVIDIA Ampere architecture based GPUs as long as they are built to include PTX versions
(emphasis mine)
Plus the explanation in the section above.
When a CUDA application launches a kernel on a GPU, the CUDA Runtime determines the compute capability of the GPU in the system and uses this information to find the best matching cubin or PTX version of the kernel. If a cubin compatible with that GPU is present in the binary, the cubin is used as-is for execution. Otherwise, the CUDA Runtime first generates compatible cubin by JIT-compiling 1 the PTX and then the cubin is used for the execution. If neither compatible cubin nor PTX is available, kernel launch results in a failure.
In effect: The CUDA toolkit remains ABI-compatible between 2.1 and 11. Therefore an application built for an old version will continue to load at runtime. The CUDA runtime will then detect that your kernels are built for a version that is not compatible with Ampere. So it will take the PTX and compile a new version at runtime.
As note in comments, only a current driver is required on the production system for this to work.

Running 32-bit application on CUDA

I read on the CUDA toolkit documentation (11.3.0) that "Deployment and execution of CUDA applications on x86_32 is still supported, but is limited to use with GeForce GPUs."
This looks in conflict with the fact that I was able to run a 32-bit app on my Tesla T4. (I verified that the code was actually running on the GPU and the app was 32-bit).
Have I misinterpreted the documentation? Why am I able to run 32-bit apps on a Tesla GPU?
(I'm running Visual studio 2017 on Windows 10)
It's a question of what is supported.
Other things may work, or they may not.

Find supported GPU

I want to know if the latest CUDA version, which is 8.0, supports the GPUs in my computer, which are GeForce GTX 970 and Quadro K4200 (a dual-GPU system); I couldn't find the info online.
In general, how to find if a CUDA version, especially the newly released version, supports a specific Nvidia GPU?
Thanks!
In general, how to find if a CUDA version, especially the newly released version, supports a specific Nvidia GPU?
All CUDA versions from CUDA 7.0 to CUDA 8.0 support GPUs that have a compute capability of 2.0 or higher. Both of your GPUs are in this category.
Prior to CUDA 7.0, some older GPUs were supported also. You can find details of that here.
Note that CUDA 8.0 has announced that development for compute capability 2.0 and 2.1 is deprecated, meaning that support for these (Fermi) GPUs may be dropped in a future CUDA release.
In general, a list of currently supported CUDA GPUs and their compute capabilities is maintained by NVIDIA here although the list occasionally has omissions for very new GPUs just released.

Caffe using GPU with NVidia Quadro 2200

I'm using the deep learning framework Caffe on a Ubuntu 14.04 machine. I compiled CAFE with CPU_ONLY option, i.e. I disabled GPU and CUDA usage. I have an NVidia Quadro K2200 graphics card and CUDA version 5.5.
I would like to know if it is possible to use Caffe with CUDA enabled with my GPU. On NVidia page, it is written that Quadro K2200 has a compute capability of 5.0. Does it mean that I can use it with CUDA versions up to release 5.0? When it is possible to use Caffe with GPU-enabled with Quadro K2200, how can I choose the appropriate CUDA version for that?
CUDA version is not the same thing as Compute Capability. For one, CUDA is current (7.5 prerelease), while CC is only at 5.2. K2200 supports CC 5.0.
The difference:
CUDA version means the library/toolkit/SDK/etc version. You should always use the highest one available.
Compute Capability is your GPU's capability to perform certain instructions, etc. Every CUDA function has a minimum CC requirement. When you write a CUDA program, it's CC requirement is the maximum of the requirements of all the features you used.
That said, I've no idea what Caffe is, but a quick search shows they require CC of 2.0, so you should be good to go. CC 5.0 is pretty recent, so very few things won't work on it.

Can I compile a cuda program without having a cuda device

Is it possible to compile a CUDA program without having a CUDA capable device on the same node, using only NVIDIA CUDA Toolkit...?
The answer to your question is YES.
The nvcc compiler driver is not related to the physical presence of a device, so you can compile CUDA codes even without a CUDA capable GPU. Be warned however that, as remarked by Robert Crovella, the CUDA driver library libcuda.so (cuda.lib for Windows) comes with the NVIDIA driver and not with the CUDA toolkit installer. This means that codes requiring driver APIs (whose entry points are prefixed with cu, see Appendix H of the CUDA C Programming Guide) will need a forced installation of a "recent" driver without the presence of an NVIDIA GPU, running the driver installer separately with the --help command line switch.
Following the same rationale, you can compile CUDA codes for an architecture when your node hosts a GPU of a different architecture. For example, you can compile a code for a GeForce GT 540M (compute capability 2.1) on a machine hosting a GT 210 (compute capability 1.2).
Of course, in both the cases (no GPU or GPU with different architecture), you will not be able to successfully run the code.
For the early versions of CUDA, it was possible to compile the code under an emulation modality and run the compiled code on a CPU, but device emulation is since some time deprecated. If you don't have a CUDA capable device, but want to run CUDA codes you can try using gpuocelot (but I don't have any experience with that).