cudart_static - when is it necessary? - cuda

Since newer drivers ship with the CUDA runtime (I can choose 9.1 or 9.2 in the drivers download page) my question is: should my library (which uses a CUDA kernel internally) be shipped with -lcudart_static?
I had issues launching kernels compiled with 9.2 on systems which used 9.1 CUDA drivers. What's the most 'compatible' way of ensuring my library will run everywhere a recent CUDA driver is installed? (I'm already compiling for a virtual architecture)

Since newer drivers ship with the CUDA runtime (I can choose 9.1 or 9.2 in the drivers download page)
No, that's incorrect. That choice in the drivers download page is related to the fact that each CUDA version has a minimum required driver version associated with it. It does not mean that the driver ships with the CUDA runtime (stated another way, the driver does not install libcudart.so on linux and never has - with some careful experimentation on a clean install, you can prove this to yourself.)
Some additional comments:
-lcudart_static is actually the default for current/recent versions of nvcc. You can discover this by reading the nvcc manual. Therefore, by default, your executable, when compiled/built with nvcc should already be statically linked to the CUDA runtime library corresponding to the version of nvcc that you are using for compilation. The reason you might need to specify this or something like this is if you are building an application with e.g. the gnu toolchain (on linux) rather than nvcc.
The purpose of static linking to the CUDA runtime library is, as you surmise, so that an application can be built in such a way that it does not need an installation of the CUDA toolkit to run properly. It only needs a machine with a proper GPU driver install.
The most compatible way to ensure that an application will run on a range of machines with a range of GPU driver installs is to compile your application using the oldest CUDA toolkit required to meet the needs of the earliest GPU driver in the range you intend to cover. Again, you can refer to the table here.

Related

NVCC: is it possible to target an earlier driver while compiling with the most recent toolkit?

I've recently downloaded and successfully compiled a small CUDA dll using NVCC (10.2). Unfortunately because I have the most recent toolkit version the distribution requires the most recent driver version too. So I was wondering if there was an NVCC flag that enabled me to effectively target an earlier driver version and then distribute with an older runtime.
Currently, I have to check the run time and driver versions in order to check for compatibility.
The CUDA toolchain, runtime API and its support libraries are versioned and if you build runtime API code with a given toolkit version, you must ship the resulting code with all the libraries from that version or have users install that toolkit version (aka the tensorflow problem).
If you use the driver API, then you can potentially target a lower compute capability with PTX which might be backward compatible with a different driver. I say might because there are still PTX version support limits which can stop it from working correctly.
If you want to support older CUDA versions, just install the older toolchain and build using that toolkit.

NVIDIA driver - what does the 'toolkit' option mean?

Not a duplicate of this question
When downloading NVIDIA GPU drivers, I've also been asked for some time which CUDA toolkit I prefer.
Now, what does this choice imply when downloading a driver?
As far as I know, different CUDA toolkits have different minimum drivers supporting them (also stated in the release notes), but what does this choice at the driver download page imply?
Generally speaking, there is a backwards compatibility strategy for drivers with respect to CUDA toolkits. For example, the latest driver should work with any older CUDA toolkit. An older driver may not work with a newer CUDA toolkit.
That is a general statement of compatibility. You can find it expressed here (e.g. table 1) also.
However, each CUDA toolkit ships with a particular driver branch. For example CUDA 10.1 ships with a 418.xx driver branch (this corresponds to the version of the GPU driver that is bundled with the CUDA toolkit installer).
So even though a 430.xx driver is compatible with and should work with CUDA 10.1, that isn't actually the driver branch that ships with CUDA 10.1
The dropdown allows you to select a driver that is in the same branch as the driver that particular CUDA toolkit was shipped with and has the highest test coverage with.

Can a Cuda application built and running on Jetson TX2 run on Jetson Xavier?

I have a Cuda application that was built with Cuda Toolkit 9.0 and running fine on Jetson TX2 board.
I now have a Jetson Xavier board, flashed with Jetpack 4 that installs Cuda Toolkit 10.0 (only 10.0 is available).
What do I need to do if I want to run the same application on Xavier?
Nvidia documentation suggests that as long as I specify the correct target hardware when running nvcc, I should be able to run on future hardwares thanks to JIT compilation. But does this hold for different versions of Cuda toolkit (9 vs 10)?
In theory (and note I don't have access to a Xavier board to test anything), you should be able to run a cross compiled CUDA 9 application (and that might mean both ARM and GPU architecture settings) on a CUDA 10 host.
What you will need to make sure is that you either statically link or copy all the CUDA runtime API library components you require with your application on the Xavier board. Note that there is still an outside chance that those libraries might lack the necessary GPU and ARM features to run correctly on a Xavier system, or more subtle issues like libC incompatibility. That you will have to test for yourself.

Is there a way to compile CUDA programs in a machine that does not have NVIDIA graphics card? [duplicate]

I tried to install cuda toolkit without display driver in CentOS 6. It gets installed properly. I was able to compile but it is compiling without performing any operation and I get garbage values in array addition. For cudaGetDeviceCount(&count) I am getting value as "o" which means I don't have any card on my machine.
You can install the CUDA toolkit without installing the driver.
You can then compile CUDA codes that use the runtime API.
You will not be able to run those codes unless you have a proper CUDA driver and GPU installed in the machine, however.
Codes that depend on the driver API will also not be compilable in this configuration, on older CUDA toolkits, without additional work. Newer CUDA toolkits provide stub libraries for driver libraries, which can be linked against.
This answer covers the method to install the CUDA toolkit without the driver.
If you want just run the codes and profiling the performance and other parameters, it would be helpful if you install GPGPU-sim simulator. It doesn't need any graphic card on your machine.

Which version of Cuda toolkit can I use with MSVC 2015

I recently upgraded from msvc 2005 to 2015.
I have compiled my code with revision 4.2 of cuda toolkit for year. I'm now learning the hard way that there is no forward compatibility betweend visual and cuda, however Google shows that some trick exists to force the compilation (messing up with .props and .targets files).
From what I understand, cuda 4.2 is a no-go. nvcc seems to have an hardcoded check on the msvc revision.
My questions are:
is there a way to compile with cuda 5.x or 6.x?
worst case scenario is that I have to update to cuda 7.5, does it even work?
Thanks for your help.
Update: CUDA 8RC supports VS2015 Update 1 officially (not update 2).
For CUDA toolkits prior to CUDA 8RC, none officially list MSVC 2015 as a supported environment, including CUDA 7.5 (the most recent production toolkit, at the moment).
For recent CUDA toolkits, the official support matrix is given in the windows getting started guide or installation guide which you would have to review for each toolkit version, to find the support for that version.
Since support for a VS version means that the CUDA toolkit will make changes to the VS environment (e.g. installing CUDA build customization rules, what you refer to as "messing with .props and .targets") and also provide appropriate project definition files for each of the cuda sample projects, if you wanted to work around this, you would have to duplicate those functions yourself. There might be non-standard ways to do this, but you would be operating in unsupported territory.
CUDA 8 is the first version to support MSVC 2015, including the community edition (with the exception of cross compiling). At the time of writing this, CUDA 8 is available as a release candidate if you are signed up for the NVIDIA "Accelerated Computing Developer Program".