No "nvcc" in "cuda-10.2/bin" toolkit patch - cuda

I have a working cuda-10.0 toolkit and 470 driver. I need to use new virtual memory management features that I found in 10.2 driver. And I can't install more than 10.x because my old video card has compute capability 3.0.
So after applying new toolkit with:
sudo sh ./cuda_10.2.1_linux.run --toolkit --silent --override
it is as I think successfully installed:
But now in folder with "cuda-10.2" there is almost nothing, "bin" folder only has uninstaller and no "nvcc" and others. And newly created link links to that "nothing". How to deal with it?
I tried official docs and googling but nothing was found.

The patch updates for CUDA 10.2 do not contain complete toolkits. The idea behind a "patch" is that it contains only the files necessary to address the items that the patch is focused on.
To get a full CUDA 10.2 CUDA tookit install, you must first install using a full CUDA 10.2 toolkit installer, and a typical filename for that would be cuda_10.2.89_440.33.01_linux.run (runfile installer to match your indicated runfile installer usage). After that, if you decide you need/want the items addressed by the patch, you must also install the desired patch.
Note the statement on the download page:
These patches require the base installer to be installed first.

Related

nvcc not found but cuda runs fine?

I was trying to run nvcc -V to check cuda version but I got the following error message.
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit
But gpu acceleration is working fine for training models on cuda. Is there another way to find out cuda compiler tools version. I know nvidia-smi doesn't give the right version.
Is there a way to install or configure nvcc. So I don't have to install a whole new toolkit.
Most of the time, nvcc and other CUDA SDK binaries are not in the environment variable PATH. Check the installation path of CUDA; if it is installed under /usr/local/cuda, add its bin folder to the PATH variable in your ~/.bashrc:
export CUDA_HOME=/usr/local/cuda
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
You can apply the changes with source ~/.bashrc, or the next time you log in, everything is set automatically.
As #pQB and #talonmies above mentioned you only need to install the GPU drivers (Versioned 430-470 these days) to use PyTorch. If you are using your GPU display port you should be fine.
For Cuda compilation tools you need to install the whole toolkit, which includes the driver as well. If installing manually from CLI the downloaded file, CLI will give you the option to choose the components to install or skip.
Generally, it is recommended to install the compilation tools (which are system wide) and GPU drivers together because it avoids compatibility issues.
Append:
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
to
~/.bashrc
Note: your path to cuda may include a version so navigate to /usr/local/ and check for cudaXX.XX and modify the command to point to that in ~/.bashrc

Best solution to have multiple CUDA/cuDNN versions installed on Ubuntu

I am using Conda on Ubuntu 16.04. My objective is to associate each Conda environment to a specific version of CUDA / cuDNN. I had a look around and I found this interesting article, which basically suggests to put different CUDA versions into different folders and then use an environment-specific bash script (run when the environment is activated) to properly set the PATH/LD_LIBRARY_PATH variables (which creates the association with the CUDA version).
This is fine, but when I try to install frameworks such as pytorch using Conda, it forces me to install also the "cudatoolkit" package.
So, a couple of questions:
1) does downloading cudatoolkit mess up my previous CUDA configurations? which version will be used?
2) if using Conda is possible to install "cudatoolkit" and also "cudnn", why not just using conda for everything? Why even needing to apply the instructions of the above mentioned article?
Thank you.
As an answer to the first question, no, downloading and installing another CUDA toolkit won't mess up other configurations. From CUDA toolkit installer, you specify an installation directory, so just pick whatever works for you that is unique to that CUDA version. This won't affect any currently installed CUDA versions. A Pytorch install will look for a CUDA_HOME environment variable as well as in '/usr/local/cuda' (the default CUDA toolkit install dir.), so it's just this environment variable that needs to be changed.
I can't speak for the second part. Perhaps the installation using Conda will use the default installation directory for the CUDA toolkit (seems silly but this is just speculation).

Install multiple versions of CUDA and cuDNN

I am currently using CUDA version 7.5 with cuDNN version 5 for MatConvNet. I'd like to install version 8.0 and cuDNN version 5.1 and I want to know if there will be any conflicts if I have the environment paths pointing to both versions of CUDAand cuDNN.
The only environment variables that matter are PATH and LD_LIBRARY_PATH. There shouldn't be any conflicts due to LD_LIBRARY_PATH since all the libs' sonames seem to be bumped properly in each version. As for PATH, the shell will execute the version from the path that appears first in the variable. So there is no point for PATH to contain both versions at the same time, you'll need to decide which version to use at a time.
There's a good article that describes all the steps. The important ones for me were:
Run the CUDA install script with the --silent --toolkit --override options.
Set the LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64.
Change the /usr/local/cuda symbolic link to point back to the default version.

TensorFlow Bazel Configuration

We are using a GTX 1080
with Ubuntu 16.04,
we installed CUDA 7.5, cudnn v5.1
We used compute capabilities 6.1
The 16.04 installation of CUDA was
made using the 15.04 Ubuntu version
and some very minor changes suggested by
https://www.pugetsystems.com/labs/hpc/NVIDIA-CUDA-with-Ubuntu-16-04-beta-on-a-laptop-if-you-just-cannot-wait-775/
All this seems to have worked fine.
In trying to install tensorflow from sources, per Google's
instructions for anything other than the default configuration,
we have run into a problem.
We do not know if this is something wrong on your end or Google's.
If you cannot help us, can you refer us to someone who can?
Thank you.
Below is the relevant run from a script file, with embedded special characters edited out.
laefsky#main:~/anaconda2/envs/tensorflow/tensorflow file://main/home/laefsky/anaconda2/envs/tensorflow/tensorflow(tensorflow) laefsky#main: ~/anaconda2/envs/tensorflow/tensorflow laefsky#main ~/anaconda2/envs/tensorflow/tensorflow$ ./configure
~/anaconda2/envs/tensorflow/tensorflow ~/anaconda2/envs/tensorflow/tensorflow
Please specify the location of python. [Default is /home/laefsky/anaconda2/envs/tensorflow/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Found possible Python library paths:
/home/laefsky/anaconda2/envs/tensorflow/lib/python2.7/site-packages
Please input the desired Python library path to use. Default is [/home/laefsky/anaconda2/envs/tensorflow/lib/python2.7/site-packages]
/home/laefsky/anaconda2/envs/tensorflow/lib/python2.7/site-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:
Please specify the location where CUDA toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: "6.1"
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
......
ERROR: /home/laefsky/anaconda2/envs/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl:442:18: function 'repository_rule' does not exist.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
Configuration finished
laefsky#main:~/anaconda2/envs/tensorflow/tensorflow[]7;file://main/home/laefsky/anaconda2/envs/tensorflow/tensorflow(tensorflow) []0;laefsky#main: ~/anaconda2/envs/tensorflow/tensorflow[[01;32mlaefsky#main[[00m:[[01;34m~/anaconda2/envs/tensorflow/tensorflow$ logout

Installing CUDA without gcc-4.3

So I downloaded the latest Cuda (5.0.35) script to install Cuda on my desktop on which I have Debian (kernel 2.6.32).
When I ran the script though I get an error on the log which says:
The compiler used to compile the kernel (gcc-4.3) does not exactly match the current compiler (gcc-4.7)
So I looked to install gcc-4.3 from the repositories but it isn't there. Then I downloaded the gcc-4.3 package separately but when I try to install it I get many conflicting dependencies so installing it is really not an option. I installed gcc-4.4 which is in the repositories and changed the soft link for gcc to link to the gcc-4.4 version but I get the same message above
The compiler used to compile the kernel (gcc-4.3) does not exactly match the current compiler (gcc-4.4)
So the question is, is there a way that I can install the driver successfully without relying on the gcc-4.3?
I installed a 3.2 kernel which was compiled with gcc-4.6 and that worked for me. You could also compile the old kernel using gcc-4.7 although I tried it and had some errors. The problem is that Debian uses a very old kernel so it was compiled with gcc-4.3.