Caffe Installation Issue on Ubuntu 14.04 - cuda

I successfully installed caffe on my dual-boot laptop (GTX 860M, Windows 7 + Ubuntu 14.04.2). All the tests were successfully passed. When I restarted, however, the ubuntu got stuck on the opening screen (the one with ubuntu logo and five red dots). Don't know what to do with it.
Has anyone run into the same issue before? I reckon something is wrong with graphic card driver booting. I installed newest CUDA 7 Toolkit with nvidia drivers built inside. Since all tests were passed before I restarted, it seems that the driver would work once successfully booted.
the stuck screen is like this: http://i.stack.imgur.com/pRtEF.jpg

I had a similar issue when trying to install Caffe on my system. The steps below worked for me, but it has at least one known issue (documented below).
I'm not sure what precisely caused this problem, but it surely has something to do with the Nvidia Driver and Cuda Toolkit installation and is not caused by Caffe.
After completing the steps below, I've been able to successfully install Caffe on my system with the following tutorials and guides:
Official Install Guide
Github Install Guide
Update
Recently, I had the exact same problem trying to make Cuda 7.5 work on Ubuntu 14.04; this approach also solved that problem. Specs:
CPU: Intel Core i7-4700MQ (4x 2.40 GHz with Hyperthreading)
GPU: NVidia GT 940M
RAM: 8 GB
HDD: 52.7 GB (of which 6.7 GB used after installation)
INSTALL NVIDIA DRIVER AND CUDA ON UBUNTU 14.04
Source: ubuntuforums.org/showthread.php?t=2246526
!! Known Issues !!
After the system has been suspended (or hibernated, not confirmed), all applications using the Nvidia Driver and Cuda 6.5 Toolkit will freeze. When this happens, the command sudo shutdown -r now will print the reboot message but nothing will happen.
Executed and tested on a fresh 64-bit Ubuntu 14.04 install with the following hardware specifications:
CPU: Intel Core i5-2410m (2x 2.30 GHz with Hyperthreading)
GPU: NVidia GT 540M
RAM: 6 GB
HDD: 52.7 GB (of which 8.6 GB used after installation)
The following command was executed before installation:
sudo apt-get -y build-essential vim git llvm clang
The following steps resulted in a stable system with the latest Nvidia Driver and Cuda 6.5 Toolkit installed:
Remove all traces of previous/legacy Nvidia Drivers and Cuda Toolkits or perform a fresh Ubuntu 14.04 install.
Download the latest Nvidia Driver .run file for Ubuntu 14.04 and your system specifications to the ~/Downloads directory.
e.g.: NVIDIA-Linux-x86_64-346.35.run
Download the latest Cuda 6.5 Toolkit .run file for Ubuntu 14.04 and your system specifications to the ~/Downloads directory.
e.g.: cuda_6.5.14_linux_64.run
Blacklist the 'nouveau' Driver by appending the following lines to /etc/modprobe.d/blacklist.conf (nouveau is a free open-source driver for Nvidia cards, it is the default for Ubuntu 14.04):
blacklist nouveau
options nouveau modeset=0
Reboot the system, do NOT log in but drop to the terminal with CTRL+ALT+F1
Kill lightdm (replace 'lightdm' with your own Display Manager if you have changed it, lightdm is the default for Ubuntu 14.04):
sudo service lightdm stop
The next step is critical, make sure to check twice before continuing!
Run the Nvidia Driver installer with the --no-opengl-files option (the option prevents OpenGL files from being overwritten; without this option, Unity would not function properly and the screen would freeze after login):
sudo chmod +x ~/Downloads/NVIDIA-Linux-x68_64-346.35.run
sudo ~/Downloads/NVIDIA-Linux-x68_64-346.35.run --no-opengl-files
Accept the EULA and acknowledge all further warnings but deny to install anything extra.
Reboot and login to the desktop, verify with the 'Additional Drivers' (System Settings > Software & Updates > Additional Drivers) utility that the manually installed driver is in use.
Open a terminal and install the Cuda 6.5 Toolkit:
sudo chmod +x ~/Downloads/cuda_6.5.14_linux_64.run
sudo ~/Downloads/cuda_6.5.14_linux_64.run
Accept the EULA, do NOT install the driver, install the Toolkit and the Examples (if you want to), leave all default directories in place.
Add the Cuda 6.5 Toolkit environment variables by appending the following lines to ~/.bashrc:
# For 32-bit systems, append these:
export PATH=$PATH:/usr/local/cuda-6.5/bin
export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib
# For 64-bit systems, append these:
export PATH=$PATH:/usr/local/cuda-6.5/bin
export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64
The Nvidia Driver and Cuda 6.5 Toolkit should now be correctly installed.
Optional: confirm your Nvidia Driver and Cuda 6.5 Toolkit installation.
Confirm the Nvidia Driver installation by running the following command:
nvidia-smi
Confirm the Cuda Compiler installation by running the following command:
nvcc -V
Confirm everything works by building and running the optionally installed Cuda Examples: (build-essential is required to use 'make')
sudo apt-get install -y build-essential
cd ~/NVIDIA_CUDA-6.5_SAMPLES/1_Utilities/deviceQuery
make
./deviceQuery
cd ~/NVIDIA_CUDA-6.5_SAMPLES/1_Utilities/bandwidthTest
make
./bandwidthTest

This problem is not related to caffe.
The problem is that the nVidia driver that is installed from the ubuntu software center does not support your card.
Uninstall any nvidia package (sudo apt-get purge nvidia-*) and install the latest driver version from the nvidia website.

I recommend you to change the cuda 7.5 ubuntu 15.04 version. I try it on the ubuntu 14.04, it solves this problem. And when I install cuda 7.5 ubuntu 14.04 version on ubuntu 14.04 I countered the exactly problem.

Related

Confusing cuda versions

I just installed the latest CUDA 9.1 on Ubuntu 16.04 according to the official instruction. But when I run the command nvcc -V, it still shows my cuda version is 7.5 like below.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
Also, which nvcc gave me /usr/bin/nvcc which is not under /usr/local folder. Is this normal? Is this a compatibility issue? I have a GTX 1080 Ti and a GTX 980. I added commands below to .bashrc file, but it still didn't work.
export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
The best thing to do here is to remove all traces of CUDA binaries from the /usr/bin directory, and in the future always install the CUDA toolkit in the "default" locations at /usr/local/cuda-XX
To remove CUDA items from /usr/bin, just use the linux rm command as a root user. Not sure what to remove? Take a look in an "ordinary" CUDA install bin directory, such as /usr/local/cuda-8.0/bin
By having your CUDA install at the default locations e.g. /usr/local/cuda-8.0 and /usr/local/cuda-9.0 (for example), you can have "side-by-side" installs, and switch between them by modifying the PATH and LD_LIBRARY_PATH variables accordingly.

Installing cuda via brew and dmg

After attempting to install nvidia toolkit on MAC by following guide : http://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html#axzz4FPTBCf7X I received error "Package manifest parsing error" which led me to this : NVidia CUDA toolkit 7.5.27 failing to install on OS X . I unmounted the dmg and upshot was that instead of receiving "Package manifest parsing error" the installer would not launch (it seemed to launch briefly , then quit).
Installing via command brew install Caskroom/cask/cuda (CUDA 7.5 install on Mac missing nvrtc) seems to have successfully installed cuda.
command nvcc --version returns :
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Apr_11_13:23:40_CDT_2016
Cuda compilation tools, release 7.5, V7.5.26
I've built the example in /Developer/NVIDIA/CUDA-7.5/samples/1_Utilities with :
make -C bandwidthTest/
This executed without error.
It appears installing with brew install Caskroom/cask/cuda is safe method of installing ? What is difference between this install method and installing via DMG file from nvidia ?
Caskroom appears to be an extension for brew for installing GUI applications : https://github.com/caskroom/homebrew-cask
Should an IDE also be installed as part of the cuda install ?
Nowadays you have to do the following to install cuda via brew:
brew tap homebrew/cask-drivers
brew cask install nvidia-cuda
See https://github.com/caskroom/homebrew-cask/issues/38325 .
Then you also need to add the following to your file ~/.bash_profile:
export PATH=/Developer/NVIDIA/CUDA-9.0/bin${PATH:+:${PATH}}
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-9.0/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}
See http://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html.
UPDATE: Newer versions of Mac OS X with activated SIP (System integrity protection) will prevent modifying the DYLD_LIBRARY_PATH (see https://groups.google.com/forum/#!topic/caffe-users/waugt62RQMU). You can check that via
source ~/.bash_profile
env | grep DYLD_LIBRARY_PATH
If the output of this command is empty SIP is active and you might want to deactivate it as described at https://www.macworld.com/article/2986118/security/how-to-modify-system-integrity-protection-in-el-capitan.html . After doing this you should see
env | grep DYLD_LIBRARY_PATH
DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-9.0/lib
Both methods download and install from the same .dmg file from NVidia.
The homebrew-cask framework is the preferred method for installing software distributed as binaries in the homebrew paradigm.
This is my understanding.
Using DMG file, follow below:
wget 'https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_mac.dmg' && \
hdiutil attach cuda_10.2.89_mac.dmg \
-nobrowse \
-mountpoint \
/Volumes/CUDAMacOSXInstaller
Open installer:
open /Volumes/CUDAMacOSXInstaller/CUDAMacOSXInstaller.app
Uncheck "CUDA Samples" before continue.
Unmount and remove file:
hdiutil detach /Volumes/CUDAMacOSXInstaller && rm ./cuda_10.2.89_mac.dmg

How to set CUDA parameters with GTX1080 for Tensorflow?

After I install the diriver of GTX1080, tensorflow shows that it can find the cudnn library.
However, the GPU driver is not recognized by the modprobe.
Detais information are as follows:
$ python
[14:22:14]
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
>>> sess = tf.InteractiveSession()
modprobe: ERROR: could not insert 'nvidia_352_uvm': Invalid argument
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: work-data
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: work-data
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:347] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.27 Thu Jun 9 18:53:27 PDT 2016 GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) """
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 367.27.0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
The version of GTX1080 driver is 367.27, which is provided by the NVIDIA.
I don't know why there is a 'nvidia_352_uvm'?
The result of nvidia-smi is here.
May be I need to reinstall cuda, but I really reinstall it several times.
Should I remove all the cuda library and nvidia dirver, then reinstall them all? Is there any install sequence about this two?
enter image description here
Too long for a comment, but here are some tips I've learned after trying to get NVidia drivers to play nice with Ubuntu.
Upgrading new driver on top of existing driver gives a partially upgraded installation. You need to remove the previous stuff first.
sudo apt-get remove --purge nvidia-*
sudo rm /etc/X11/xorg.conf # if you ran nvidia-xconfig
Reload NVidia driver as follows (from virtual terminal, CTRL+ALT+F7)
sudo service lightdm stop # stop your window manager
killall python # kill all running TensorFlow instances to free GPU
sudo modprobe -r nvidia
sudo modprobe nvidia
dmesg | tail -100 # check for error messages
Check logs for any error messages from NVidia
dmesg | grep -i nvidia
lspci | grep -i nvidia
nvidia-smi # make sure this reports version 367.27
Also, there are two ways to install drivers, using Ubuntu's built-in upgrade with sudo apt-get install nvidia-current, or by getting tar ball from NVidia website. I was not able to get sudo apt-get route to work for TensorFlow, so I would recommend downloading drivers from NVidia website

Seeing No CUDA capable GPU detected after i upgraded to cuda 6.5 from 5.5

Hey i am receving the error : No CUDA capable GPU detected
after i upgraded from Cuda 5.5 to Cuda 6.5 .
Nvidia driver version i have is 331.49 .
Is this compatible for running 6.5 version or what is the best stable version for cuda 6.5
CUDA 6.5 requires a r340 driver or newer. On linux that would be 340.29 or higher.
331.49 won't work. Whatever method you used to "upgrade" from 5.5 to 6.5 was incomplete.
There are getting started guides for each supported OS that may help.
If you just want to load a new driver, you can select a driver appropriate for your GPU and OS at http://www.nvidia.com/Download/index.aspx?lang=en-us
The best stable version driver for 6.5 is 346.
you can uninstall completely the driver by
sudo nvidia-uninstall
and then add xorg-edgers repository, and next update the apt-get and install the desired driver version:
sudo add-apt-repository ppa:xorg-edgers/ppa
sudo apt-get update
sudo apt-get install nvidia-346
and then run
sudo nvidia-xconfig
and reboot after that.
After startup verify the driver installation by:
nvidia-smi
It should print out desirable output. which is some information about the GPU.
And after you can verify the cuda installation by running deviceQuery in samples.

CUDA "No compatible Device" error on Ubuntu 11.10/12.04

I have been trying to set up an Ubuntu environment on my laptop for some time now for CUDA programming. I am currently dual booting Windows 8 and Ubuntu 12.04 and want to install CUDA 5 on Ubuntu.
The laptop has a GeForce GT 640M graphics card (See below for full specs). It is an Optimus card.
Originally I was dual booting Ubuntu 11.10 and have tried tutorials on both 11.10 and 12.04.
I have tried many tutorials of all shapes and sizes, including this tutorial. The installation process shows the device driver installing and the Toolkit installing, and the Samples failing, but when I go to test a simple Vector Add CUDA program in NSight, "No compatible CUDA Device" error is thrown.
Ubuntu Details also still shows "Unknown" for Graphics
Suggestions?
Laptop Specs:
Acer V3-771G
Intel Core i7 2670QM
nVidia GeForce GT 640M 2GB - Optimus
16GB DDR3-1600 RAM
120GB SSD + 500GB HDD + 32GB Cache SSD
Since it is an optimus device, there are some extra steps to be able to use the nvidia GPU. While it is not necessary, I suggest that you use the bumblebee wrapper program because it is the easiest solution.
After you have installed the bumblebee wrapper you can run your programs using optirun programname or start a shell with the nvidia card activated: optirun bash --login
An added bonus is that the bumblebee daemon will disable the GPU when it is not running and will save you some battery.
If you don't care about battery life and just want CUDA to be always enabled without wrapping commands you can load the nvidia kernel module and then create the necessary device nodes manually:
mknod /dev/nvidia0 c 195 0
mknod /dev/nvidiactl c 195 255
(This advanced method lets you run cuda programs from the console without starting Xorg, for example when SSH-ing to a machine without a running X server.)
See also https://askubuntu.com/questions/131506/how-can-i-get-nvidia-cuda-or-opencl-working-on-a-laptop-with-nvidia-discrete-car for a more detailed discussion.
Try the command sudo apt-get install mesa-utils.
See if the graphics is recognized and then try to install cuda
If does not recognized with the first command try:
sudo add-apt-repository ppa:ubuntu-x-swat/x-updates
sudo apt-get update
sudo apt-get install nvidia-current
First install the following libraries & Tools:
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
Next we will blacklist some modules(drivers), in terminal enter:
sudo gedit /etc/modprobe.d/blacklist.conf
Add the following to the end of the file(one per line like so):
blacklist amd76x_edac
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
Save the file and close the editor.
Now we want to get rid of any nvidia risiduals, in terminal:
sudo apt-get remove --purge nvidia*
Next you need to restart your machine (sudo reboot).
0) Press Ctrl+Alt+F1 at login screen(you don't have to login, we'll have to restart later anyway), then log in.
1) sudo service lightdm stop
2) cd Downloads
3) chmod +x devdriver*.run (your driver filename)
4) sudo ./devdriver*.run
You might have to run the driver-installer once, reboot(it will remove nouveau drivers) and repeat the steps again. Follow the installer instructions and it will be fine, when it asks you;
yes, you do want the 32-bit libraries and you DO want it to change the xorg.conf file.
Once the installer completes, restart (sudo reboot). You're done :]
In Order to install SDK and Toolkit,
use the steps 3 and 4 with the downloaded files. (.run)
In theory, the drivers included with CUDA 5.5 should natively support Optimus (as well as single GPU debugging for non-Optimus laptops). I haven't tried it yet because I'm waiting for a compute 3.5 Optimus laptop so that it'll support kernel recursion and HyperQ. In theory the HP Envy 15t-j000 has the GK208 version of the GT 740m, but I'd really rather have an ultrabook form factor like the upcoming Acer S3-392 with GT 735m. The NVIDIA guys at GTC assured me that Optimus should be working with the CUDA 5.5 RC. I found this 'CUDA Getting Started Guide for Linux' released this month that provides some flags for getting Optimus drivers installed correctly:
http://www.google.com/url?q=http://developer.download.nvidia.com/compute/cuda/5_5/rc/docs/CUDA_Getting_Started_Linux.pdf
Also, more information about GK208 Chips and Compute 3.5 in laptops:
https://devtalk.nvidia.com/default/topic/546357/sounds-like-gk208-laptops-cards-will-support-most-sm_35-features/
Anyone have luck with CUDA 5.5 and Optimus laptops under linux?