I have a laptop with a GeForce 940 MX. I want to get Tensorflow up and running on the gpu. I installed everything from their tutorial page, now when I import Tensorflow, I get
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: workLaptop
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1092] LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1093] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libnvidia-fatbinaryloader.so.367.57: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
>>>
after which I think it just switches to running on the cpu.
EDIT: After I nuked everything , started from scratch. Now I get this:
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: workLaptop
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1092] LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1093] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libnvidia-fatbinaryloader.so.367.57: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
libcuda.so.1 is a symlink to a file that is specific to the version of your NVIDIA drivers. It may be pointing to the wrong version or it may not exist.
# See where the link is pointing.
ls /usr/lib/x86_64-linux-gnu/libcuda.so.1 -la
# My result:
# lrwxrwxrwx 1 root root 19 Feb 22 20:40 \
# /usr/lib/x86_64-linux-gnu/libcuda.so.1 -> ./libcuda.so.375.39
# Make sure it is pointing to the right version.
# Compare it with the installed NVIDIA driver.
nvidia-smi
# Replace libcuda.so.1 with a link to the correct version
cd /usr/lib/x86_64-linux-gnu
sudo ln -f -s libcuda.so.<yournvidia.version> libcuda.so.1
Now in the same way, make another symlink from libcuda.so.1 to a link of the same name in your LD_LIBRARY_PATH directory.
You may also find that you need to create a link to libcuda.so.1 in /usr/lib/x86_64-linux-gnu named libcuda.so
In case anyone still encounters this. First make sure to add the --runtime=nvidia parameter in order to run your container.
docker run --runtime=nvidia -t tensorflow/serving:latest-gpu
where tensorflow/serving:latest-gpu is the name of the docker image.
In the case I just solved, it was updating the GPU driver to the latest and installing the cuda toolkit. First, the ppa was added and GPU driver installed:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-390
After adding the ppa, it showed options for driver versions, and 390 was the latest 'stable' version that was shown.
Then install the cuda toolkit:
sudo apt install nvidia-cuda-toolkit
Then reboot:
sudo reboot
It updated the drivers to a newer version than the 390 originally installed in the first step (it was 410; this was a p2.xlarge instance on AWS).
Related
This is a self-answered post documenting a problem I encountered:
When I run cuda_11.5.0_496.13_win10.exe as administrator on windows 10 and CUDA Setup extracts the installer to my computer, it leaves me the error:
CUDA Setup Package: error
Could not create folder “C:\Users[myName]\AppData\Local\Temp\CUDA\CUDADevelopment”. Access is denied.
I’ve tried various versions from CUDA 9.0-11.5 and they all fail; as does extracting to a different folder. I have drivers installed and they work properly (I’ve checked them through OpenGL, and CUDA-based python libraries with their own CUDA installations use GPU successfully).
rename the .exe file to a .7z file (e.g. cuda_11.5.0_496.13_win10.exe → cuda_11.5.0_496.13_win10.7z). I used the linux mv command.
Unzip the file with 7zip or winzip into any temporary folder.
run setup.exe (I ran as admin)
After attempting to install nvidia toolkit on MAC by following guide : http://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html#axzz4FPTBCf7X I received error "Package manifest parsing error" which led me to this : NVidia CUDA toolkit 7.5.27 failing to install on OS X . I unmounted the dmg and upshot was that instead of receiving "Package manifest parsing error" the installer would not launch (it seemed to launch briefly , then quit).
Installing via command brew install Caskroom/cask/cuda (CUDA 7.5 install on Mac missing nvrtc) seems to have successfully installed cuda.
command nvcc --version returns :
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Apr_11_13:23:40_CDT_2016
Cuda compilation tools, release 7.5, V7.5.26
I've built the example in /Developer/NVIDIA/CUDA-7.5/samples/1_Utilities with :
make -C bandwidthTest/
This executed without error.
It appears installing with brew install Caskroom/cask/cuda is safe method of installing ? What is difference between this install method and installing via DMG file from nvidia ?
Caskroom appears to be an extension for brew for installing GUI applications : https://github.com/caskroom/homebrew-cask
Should an IDE also be installed as part of the cuda install ?
Nowadays you have to do the following to install cuda via brew:
brew tap homebrew/cask-drivers
brew cask install nvidia-cuda
See https://github.com/caskroom/homebrew-cask/issues/38325 .
Then you also need to add the following to your file ~/.bash_profile:
export PATH=/Developer/NVIDIA/CUDA-9.0/bin${PATH:+:${PATH}}
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-9.0/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}
See http://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html.
UPDATE: Newer versions of Mac OS X with activated SIP (System integrity protection) will prevent modifying the DYLD_LIBRARY_PATH (see https://groups.google.com/forum/#!topic/caffe-users/waugt62RQMU). You can check that via
source ~/.bash_profile
env | grep DYLD_LIBRARY_PATH
If the output of this command is empty SIP is active and you might want to deactivate it as described at https://www.macworld.com/article/2986118/security/how-to-modify-system-integrity-protection-in-el-capitan.html . After doing this you should see
env | grep DYLD_LIBRARY_PATH
DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-9.0/lib
Both methods download and install from the same .dmg file from NVidia.
The homebrew-cask framework is the preferred method for installing software distributed as binaries in the homebrew paradigm.
This is my understanding.
Using DMG file, follow below:
wget 'https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_mac.dmg' && \
hdiutil attach cuda_10.2.89_mac.dmg \
-nobrowse \
-mountpoint \
/Volumes/CUDAMacOSXInstaller
Open installer:
open /Volumes/CUDAMacOSXInstaller/CUDAMacOSXInstaller.app
Uncheck "CUDA Samples" before continue.
Unmount and remove file:
hdiutil detach /Volumes/CUDAMacOSXInstaller && rm ./cuda_10.2.89_mac.dmg
After I install the diriver of GTX1080, tensorflow shows that it can find the cudnn library.
However, the GPU driver is not recognized by the modprobe.
Detais information are as follows:
$ python
[14:22:14]
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
>>> sess = tf.InteractiveSession()
modprobe: ERROR: could not insert 'nvidia_352_uvm': Invalid argument
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: work-data
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: work-data
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:347] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.27 Thu Jun 9 18:53:27 PDT 2016 GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) """
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 367.27.0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
The version of GTX1080 driver is 367.27, which is provided by the NVIDIA.
I don't know why there is a 'nvidia_352_uvm'?
The result of nvidia-smi is here.
May be I need to reinstall cuda, but I really reinstall it several times.
Should I remove all the cuda library and nvidia dirver, then reinstall them all? Is there any install sequence about this two?
enter image description here
Too long for a comment, but here are some tips I've learned after trying to get NVidia drivers to play nice with Ubuntu.
Upgrading new driver on top of existing driver gives a partially upgraded installation. You need to remove the previous stuff first.
sudo apt-get remove --purge nvidia-*
sudo rm /etc/X11/xorg.conf # if you ran nvidia-xconfig
Reload NVidia driver as follows (from virtual terminal, CTRL+ALT+F7)
sudo service lightdm stop # stop your window manager
killall python # kill all running TensorFlow instances to free GPU
sudo modprobe -r nvidia
sudo modprobe nvidia
dmesg | tail -100 # check for error messages
Check logs for any error messages from NVidia
dmesg | grep -i nvidia
lspci | grep -i nvidia
nvidia-smi # make sure this reports version 367.27
Also, there are two ways to install drivers, using Ubuntu's built-in upgrade with sudo apt-get install nvidia-current, or by getting tar ball from NVidia website. I was not able to get sudo apt-get route to work for TensorFlow, so I would recommend downloading drivers from NVidia website
I'm trying to use cuda to accelerate tensorflow. I'm running tensorflow using the docker image.
Firstly, when I launch the gpu image, it has a mismatch in the LT_LIBRARY_PATH environment variable:
~# echo $LD_LIBRARY_PATH
/usr/local/nvidia/lib:/usr/local/nvidia/lib64:
root#d578acbbc2cd:~# ls /usr/local/
bin cuda cuda-7.0 etc games include lib man sbin share src
There's no nvidia directory there. When I try to run the convolutional.py demo, it can't initialise the cuda support:
# python models/image/mnist/convolutional.py
Succesfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Succesfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Succesfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Succesfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.2.0-23-generic/modules.dep.bin'
E tensorflow/stream_executor/cuda/cuda_driver.cc:466] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:98] retrieving CUDA diagnostic information for host: d578acbbc2cd
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:106] hostname: d578acbbc2cd
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:131] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:242] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.68 Tue Dec 1 17:24:11 PST 2015
GCC version: gcc version 5.2.1 20151010 (Ubuntu 5.2.1-22ubuntu2)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:135] kernel reported version is: 352.68
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA:
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8
It then goes on to train using cpu only.
# find /usr -name libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so
So in the docker image, there's only the gnu cpu cuda implementation. No NVIDIA stuff. In the host ubuntu 15.10 session, I have libcuda.so installed:
$ find /usr -name libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/i386-linux-gnu/libcuda.so
/usr/local/cuda-7.5/targets/x86_64-linux/lib
/stubs/libcuda.so
So these seem to be stubs ... not sure why.
Is there some trick to getting this to work?
Try rebuilding the Docker image directly from the Tensorflow repository (i.e. don't rely on the image on the container registry) and use https://github.com/NVIDIA/nvidia-docker to run the container (the Docker command described in the Tensorflow documentation is not portable).
I had a similar problem, though not in docker. The libcuda.so in /usr/local/cuda/lib64/stubs was a broken sym link. When I searched for libcuda.so it only turned up a file in a lib32 folder.
It seems that the problem was how I originally installed the NVIDIA device driver. At some point in the driver install process you're given the option to install the lib32 drivers. I had thought this meant in addition to lib64 drivers so I selected it. Turns out it only installs lib32 and not lib64 drivers.
I reinstalled the NIVDIA device driver, this time not selecting the lib32 'option'. Now tensorflow finds libcuda.so.
I had the same problem with running tensorflow on a Ubuntu machine after I upgraded my driver to 352.63 and 352.93. (I remember it works with 346.* but when I try to install 346., it installs 352. automatically for some reason).
I finally figured out that it's caused by permission issue. (I can run it with root) So, I changed the permission of the libcuda.so.352-63 file to executable by anyone and it works well now.
Hope this will be helpful to those still struggling with this issue.
I didn't try the docker one, but I guess it's also caused by permission setting.
Try this command
sudo apt-get install nvidia-modprobe
As mentioned here:
https://github.com/tensorflow/tensorflow/issues/394
and
http://kkjkok.blogspot.in/2016_08_01_archive.html
After I updated NVIDIA driver to 378.09 on Ubuntu 14.10 I had the same error,
although all the right for lib files were set correctly.
Thanks to #PhoenixQ, I tried to run with sudo and it worked.
After that I tried to run without sudo one more time and error disappeared. I'm not sure what ecxactly happened, but maybe something was configured during call with sudo, which was not possible withous sudo.
So the solution:
Try to run the same thing with sudo.
After this. Tryu running without sudo. Worked for me.
Running this code on a shared host with a locally installed perl and modules which were installed via perlbrew. It worked fine for several weeks. One day, it started dying with this output:
/home/xxxx/perl5/perlbrew/perls/perl-5.16.2/bin/perl tweet.pl
install_driver(mysql) failed: Can't load '/home/xxxx/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/x86_64-linux/auto/DBD/mysql/mysql.so' for module DBD::mysql: libmysqlclient.so.15: cannot open shared object file: No such file or directory at /home/xxxx/perl5/perlbrew/perls/perl-5.16.2/lib/5.16.2/x86_64-linux/DynaLoader.pm line 190.
at (eval 27) line 3.
Compilation failed in require at (eval 27) line 3.
Perhaps a required shared library or dll isn't installed where expected
at subroutines.pm line 3.
The code hasn't changed. The way I run the script hasn't changed, either. Since I am running this one a shared host, I have no idea what might have been updated or changed on the server, but perl is installed to my home directory, as are all the modules I am using.
It looks like a problem with libmysqlclient. What distribution are you running?
If you are running Debian(based), try "sudo apt-get purge libmysqlclient libmysqlclient-dev" and then "sudo apt-get install libmysqlclient libmysqlclient-dev".