CUDA 6.5 and Jetson TK1 - cuda

I have CUDA 6.5 in my host machine. To do cross compilation for Jetson TK1, do I have to have CUDA 6.0 in the host machine?

If you have CUDA 6.0 installed on your jetson, then to do cross-compiling you need to have CUDA 6.0 (nvcc and libraries) installed on your host machine. (You could also have CUDA 6.5 installed on the host machine, if desired, but your build environment for cross-compiling would need to use CUDA 6.0 tools and libraries.)
This blog post will be a useful read, I think.
Cross compiling means the target executable is built on the host machine, not on the target. Therefore, the target executable must be compatible with (in particular with the libraries on) the target machine. This compatibility is achieved by having the correct version of nvcc as well as the correct library versions (CUDA version and target OS) to link against, that match your target.
Note that it is possible to "remotely" build on the jetson directly, as mentioned in the blog post, which would alleviate this requirement.

Related

Safe to install CUDA toolkit separately on WSL2 and Windows 10?

I've installed Nvidia CUDA toolkit on WSL2 Ubuntu following the specified instructions from the Windows site. I was wondering if installing the Nvidia toolkit on Windows 10 directly as well would cause any conflicts or override anything potentially for the WSL2 install?
I'll be using the two separate toolkits for two separate purposes (WSL2 for linux libraries requiring the linux toolkit, Windows for things such as VS NSight requiring the Windows toolkikt)
No, it won't be a problem and this is what you would have to do to use CUDA on the pure-windows side as well as on the WSL2 side.
Other expectations/requirements still apply. For example the CUDA toolkit versions installed in each location should be consistent with the GPU driver you have already installed.

NVCC: is it possible to target an earlier driver while compiling with the most recent toolkit?

I've recently downloaded and successfully compiled a small CUDA dll using NVCC (10.2). Unfortunately because I have the most recent toolkit version the distribution requires the most recent driver version too. So I was wondering if there was an NVCC flag that enabled me to effectively target an earlier driver version and then distribute with an older runtime.
Currently, I have to check the run time and driver versions in order to check for compatibility.
The CUDA toolchain, runtime API and its support libraries are versioned and if you build runtime API code with a given toolkit version, you must ship the resulting code with all the libraries from that version or have users install that toolkit version (aka the tensorflow problem).
If you use the driver API, then you can potentially target a lower compute capability with PTX which might be backward compatible with a different driver. I say might because there are still PTX version support limits which can stop it from working correctly.
If you want to support older CUDA versions, just install the older toolchain and build using that toolkit.

Is there a way to compile CUDA programs in a machine that does not have NVIDIA graphics card? [duplicate]

I tried to install cuda toolkit without display driver in CentOS 6. It gets installed properly. I was able to compile but it is compiling without performing any operation and I get garbage values in array addition. For cudaGetDeviceCount(&count) I am getting value as "o" which means I don't have any card on my machine.
You can install the CUDA toolkit without installing the driver.
You can then compile CUDA codes that use the runtime API.
You will not be able to run those codes unless you have a proper CUDA driver and GPU installed in the machine, however.
Codes that depend on the driver API will also not be compilable in this configuration, on older CUDA toolkits, without additional work. Newer CUDA toolkits provide stub libraries for driver libraries, which can be linked against.
This answer covers the method to install the CUDA toolkit without the driver.
If you want just run the codes and profiling the performance and other parameters, it would be helpful if you install GPGPU-sim simulator. It doesn't need any graphic card on your machine.

cudart_static - when is it necessary?

Since newer drivers ship with the CUDA runtime (I can choose 9.1 or 9.2 in the drivers download page) my question is: should my library (which uses a CUDA kernel internally) be shipped with -lcudart_static?
I had issues launching kernels compiled with 9.2 on systems which used 9.1 CUDA drivers. What's the most 'compatible' way of ensuring my library will run everywhere a recent CUDA driver is installed? (I'm already compiling for a virtual architecture)
Since newer drivers ship with the CUDA runtime (I can choose 9.1 or 9.2 in the drivers download page)
No, that's incorrect. That choice in the drivers download page is related to the fact that each CUDA version has a minimum required driver version associated with it. It does not mean that the driver ships with the CUDA runtime (stated another way, the driver does not install libcudart.so on linux and never has - with some careful experimentation on a clean install, you can prove this to yourself.)
Some additional comments:
-lcudart_static is actually the default for current/recent versions of nvcc. You can discover this by reading the nvcc manual. Therefore, by default, your executable, when compiled/built with nvcc should already be statically linked to the CUDA runtime library corresponding to the version of nvcc that you are using for compilation. The reason you might need to specify this or something like this is if you are building an application with e.g. the gnu toolchain (on linux) rather than nvcc.
The purpose of static linking to the CUDA runtime library is, as you surmise, so that an application can be built in such a way that it does not need an installation of the CUDA toolkit to run properly. It only needs a machine with a proper GPU driver install.
The most compatible way to ensure that an application will run on a range of machines with a range of GPU driver installs is to compile your application using the oldest CUDA toolkit required to meet the needs of the earliest GPU driver in the range you intend to cover. Again, you can refer to the table here.

nvidia visual profiler Encountered invalid option : --openacc-profiling

Running a simple application on nvidia Visual Profiler shows the error:
Encountered invalid option : --openacc-profiling
======== Use "nvprof --help" to get more information.
Any gpu applicatiion I try to profile gets the same error.
I tried to uncheck the option "Enable OpenACC profiling" and got the same error.
Versions:
nvprof --version
nvprof: NVIDIA (R) Cuda command line profiler
Copyright (c) 2013 - 2014 NVIDIA Corporation
Release version 6.5.14 (21)
And
NVIDIA Visual Profiler
Version: 6.5
It appears (based on comments above) that the issue here was a mixed configuration - a CUDA 8 version of nvvp (the visual profiler) calling a CUDA 6.5 version of nvprof.
The visual profiler performs some of its work by calling nvprof to do low-level profiling. As a result, it is passing command-line switches to nvprof, and so nvprof is expected to match, version-wise, the version of nvvp that is being used. If that is not the case, problems like this can occur.
The solution is to have a consistent install. It should be possible to have both CUDA 6.5 and CUDA 8 installed on the same machine, but it's necessary for the PATH and LD_LIBRARY_PATH variables to be set in such a way that the CUDA 8 version of nvvp will find/invoke the CUDA 8 version of nvprof, for example. Generally, the instructions contained in the linux install guide for setting of these variables should be sufficient, but care should be taken, for example, to be sure that there is not some previous version of nvprof that will be found due to the PATH setting when using CUDA 8. It's not possible cover all the possible ways in which this may happen, so some rudimentary linux administration skills will be necessary to ensure such a configuration is internally consistent.
Otherwise, if these skills don't exist, the linux install instructions may provide the best solution - remove all previous versions of CUDA when installing a new version. That is another possible approach which, if done correctly, should absolutely prevent a problem such as this from occurring.