Is it possible to run multiple CUDA version on windows? - cuda

I am doing an experiment on a chest x-ray Project. and I want multiple versions of the CUDA toolkit but the problem is that my system put the latest version which I installed lastly is appearing.
Is it possible to run any of CUDA like 9.0, 10.2, 11.0 as required to GitHub code?
I have done all the initial steps like path added to an environment variable and added CUDNN copied file and added to the environment.
Now the problem is that I want to use Cuda 9.0 as per my code but my default setting put cuda.11.0 what is the solution or script to switch easily between these version

You may set CUDA_PATH_V9_0, CUDA_PATH_V10_0, etc properly, then set CUDA_PATH to any one of them (e.g. CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0).
Then in your VS project, set your cuda library path using the CUDA_PATH (e.g. $CUDA_PATH\lib).
To switch, just set the CUDA_PATH to another version, and clean & rebuild your VS project(s).

Related

No "nvcc" in "cuda-10.2/bin" toolkit patch

I have a working cuda-10.0 toolkit and 470 driver. I need to use new virtual memory management features that I found in 10.2 driver. And I can't install more than 10.x because my old video card has compute capability 3.0.
So after applying new toolkit with:
sudo sh ./cuda_10.2.1_linux.run --toolkit --silent --override
it is as I think successfully installed:
But now in folder with "cuda-10.2" there is almost nothing, "bin" folder only has uninstaller and no "nvcc" and others. And newly created link links to that "nothing". How to deal with it?
I tried official docs and googling but nothing was found.
The patch updates for CUDA 10.2 do not contain complete toolkits. The idea behind a "patch" is that it contains only the files necessary to address the items that the patch is focused on.
To get a full CUDA 10.2 CUDA tookit install, you must first install using a full CUDA 10.2 toolkit installer, and a typical filename for that would be cuda_10.2.89_440.33.01_linux.run (runfile installer to match your indicated runfile installer usage). After that, if you decide you need/want the items addressed by the patch, you must also install the desired patch.
Note the statement on the download page:
These patches require the base installer to be installed first.

What is libcublasLt.so (not libcublas.so)?

I'm compiling the source code by using pgf95 (Fortran compiler).
If I use cuda 10.0, it successfully compiles the source code.
However, If I use cuda 10.1, it fails showing that 'cannot find libcublasLt.so'.
When I scan the directory cuda-10.0/lib64, cuda-10.1/lib64, both do not have the file starting with 'libcublasLt'.
How can I solve this issue?
libcublasLt.so is the library that provides the implementation for the cublasLt API which is defined here. It just happens to be a separate shared object from libcublas.so
In the past (e.g. CUDA 10.0 and prior), most CUDA libraries were installed in /usr/local/cuda/lib64 (or similar) by default (on linux). At about the CUDA 10.1 timeframe, it was decided that some libraries would be installed in different places. CUDA 10.1 is also where the cublasLt API and library were introduced. This affected some cublas libraries and is discussed in the CUDA 10.1 release notes here (both the introduction of the cublasLt library, as well as the change in library locations).
So there are 2 possibilities here (for CUDA 10.1, CUDA 10.2):
libcublasLt.so is on your machine, but it is simply not where you were expecting to find it.
libcublasLt.so is not on your machine. This means you are working with CUDA version prior to the introduction of the cublasLt API (i.e. 10.0 or prior), or you have a broken install.
So, assuming you are working with CUDA 10.1 or CUDA 10.2, the first step is to locate/determine whether libcublasLt.so is on your machine or not. You can use a linux utility like find or locate to accomplish that. They should have man pages available for you.
If you can find it, then you need to provide the path to it, via a linker spec (e.g. -L/path/to/libcublasLt.so/
If you can't find it, then either you are working with an older version of CUDA (10.0 or prior), or you need to reinstall CUDA.
I believe by the time you get to CUDA 11.0, the CUDA packages put the cublas libraries back in /usr/local/cuda/lib64 with the other libraries. YMMV.

Best solution to have multiple CUDA/cuDNN versions installed on Ubuntu

I am using Conda on Ubuntu 16.04. My objective is to associate each Conda environment to a specific version of CUDA / cuDNN. I had a look around and I found this interesting article, which basically suggests to put different CUDA versions into different folders and then use an environment-specific bash script (run when the environment is activated) to properly set the PATH/LD_LIBRARY_PATH variables (which creates the association with the CUDA version).
This is fine, but when I try to install frameworks such as pytorch using Conda, it forces me to install also the "cudatoolkit" package.
So, a couple of questions:
1) does downloading cudatoolkit mess up my previous CUDA configurations? which version will be used?
2) if using Conda is possible to install "cudatoolkit" and also "cudnn", why not just using conda for everything? Why even needing to apply the instructions of the above mentioned article?
Thank you.
As an answer to the first question, no, downloading and installing another CUDA toolkit won't mess up other configurations. From CUDA toolkit installer, you specify an installation directory, so just pick whatever works for you that is unique to that CUDA version. This won't affect any currently installed CUDA versions. A Pytorch install will look for a CUDA_HOME environment variable as well as in '/usr/local/cuda' (the default CUDA toolkit install dir.), so it's just this environment variable that needs to be changed.
I can't speak for the second part. Perhaps the installation using Conda will use the default installation directory for the CUDA toolkit (seems silly but this is just speculation).

TensorFlow Bazel Configuration

We are using a GTX 1080
with Ubuntu 16.04,
we installed CUDA 7.5, cudnn v5.1
We used compute capabilities 6.1
The 16.04 installation of CUDA was
made using the 15.04 Ubuntu version
and some very minor changes suggested by
https://www.pugetsystems.com/labs/hpc/NVIDIA-CUDA-with-Ubuntu-16-04-beta-on-a-laptop-if-you-just-cannot-wait-775/
All this seems to have worked fine.
In trying to install tensorflow from sources, per Google's
instructions for anything other than the default configuration,
we have run into a problem.
We do not know if this is something wrong on your end or Google's.
If you cannot help us, can you refer us to someone who can?
Thank you.
Below is the relevant run from a script file, with embedded special characters edited out.
laefsky#main:~/anaconda2/envs/tensorflow/tensorflow file://main/home/laefsky/anaconda2/envs/tensorflow/tensorflow(tensorflow) laefsky#main: ~/anaconda2/envs/tensorflow/tensorflow laefsky#main ~/anaconda2/envs/tensorflow/tensorflow$ ./configure
~/anaconda2/envs/tensorflow/tensorflow ~/anaconda2/envs/tensorflow/tensorflow
Please specify the location of python. [Default is /home/laefsky/anaconda2/envs/tensorflow/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Found possible Python library paths:
/home/laefsky/anaconda2/envs/tensorflow/lib/python2.7/site-packages
Please input the desired Python library path to use. Default is [/home/laefsky/anaconda2/envs/tensorflow/lib/python2.7/site-packages]
/home/laefsky/anaconda2/envs/tensorflow/lib/python2.7/site-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:
Please specify the location where CUDA toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: "6.1"
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
......
ERROR: /home/laefsky/anaconda2/envs/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl:442:18: function 'repository_rule' does not exist.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
Configuration finished
laefsky#main:~/anaconda2/envs/tensorflow/tensorflow[]7;file://main/home/laefsky/anaconda2/envs/tensorflow/tensorflow(tensorflow) []0;laefsky#main: ~/anaconda2/envs/tensorflow/tensorflow[[01;32mlaefsky#main[[00m:[[01;34m~/anaconda2/envs/tensorflow/tensorflow$ logout

Cuda/output of nsight different from the command line (nvcc ) output

I run code using nvcc command and it gave correct output but when I run the same code on the nsight eclipse it gave wrong output. Any one have any idea why is this behavior.
Finally I found there is problem in one of the array allocation.While the command line doesn't make any problem the nsight does.
Nsight EE builds the projects by generating make files based on the project settings and by invoking the OS make utility to build the project. It is using nvcc compiler found in the PATH but it relies on some newer options introduced in NVCC compiler 5.0 (that is a part of the same toolkit distribution).
Please do a clean rebuild in Nsight Eclipse - it will print out the command lines used to build your application. Then you can compare that command line with the one you use outside. Possible differences are:
Nsight specifies debug flags and optimization flag when building in debug and release modes.
By default, Nsight sets the new project to build for the hardware detected on your system. NVCC default is SM 1.0.
Make sure the compiler used by Nsight and from the command line are one and the same. It is possible that you have different compilers (e.g. 4.x and 5.0) installed on your system that may generate a slightly different code.
In any case, it is likely your code has some bug that manifests itself under different compilation settings. I would recommend running CUDA memcheck on you program to ensure there is no hidden bugs.