any script to test the installation of Pytorch - deep-learning

I have installed the pytorch, and would like to check are there any script to test whether the installation is correct, e.g., whether it can enable CUDA or not, etc?

Coming to your 1st question,
In your python script....
just add
import torch
if this gives "ModuleNotFoundError: No module named 'torch'",
then your pytorch installation is not complete
And your 2nd question to check if your pytorch is using cuda,use this
torch.cuda.is_available()
this will return True if your pytorch is using cuda

You can use the collect_env.py script provided in the PyTorch utils folder.
Its output is as follows:
Collecting environment information...
PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130
OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.14.6
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce RTX 2080
Nvidia driver version: 410.48
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.4.1
Versions of relevant libraries:
[pip] numpy==1.16.4
[pip] torch==1.2.0
[pip] torchsample==0.1.3
[pip] torchsummary==1.5.1
[pip] torchvision==0.4.0a0+6b959ee
[conda] blas 1.0 mkl
[conda] mkl 2019.4 243
[conda] mkl-service 2.0.2 py37h7b6447c_0
[conda] mkl_fft 1.0.14 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] pytorch 1.2.0 py3.7_cuda10.0.130_cudnn7.6.2_0 pytorch
[conda] torchsample 0.1.3 pypi_0 pypi
[conda] torchsummary 1.5.1 pypi_0 pypi
[conda] torchvision 0.4.0 py37_cu100 pytorch

If you installed it from here you are doing fine.
Check this:
import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
print(dev)
If you have your GPU installed correctly you should have nvidia-smi.
(On Windows it should be inside C:\Program Files\NVIDIA Corporation\NVSMI)

You can use my code. This repo implements a set of PyTorch environment checker and cuda-based operators, which helps you verify whether your GPU-based PyTorch is installed properly.
https://github.com/BAI-Yeqi/PyTorch-Verification

Related

PyTorch - Cuda error: no kernel image is available for execution on the device [duplicate]

I have CUDA 9.2 installed. For example:
(base) c:\>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Wed_Apr_11_23:16:30_Central_Daylight_Time_2018
Cuda compilation tools, release 9.2, V9.2.88
I installed PyTorch on Windows 10 using:
conda install pytorch cuda92 -c pytorch
pip3 install torchvision
I ran the test script:
(base) c:\>python
Python 3.6.5 |Anaconda custom (64-bit)| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from __future__ import print_function
>>> import torch
>>> x = torch.rand(5, 3)
>>> print(x)
tensor([[0.7041, 0.5685, 0.4036],
[0.3089, 0.5286, 0.3245],
[0.3504, 0.8638, 0.1118],
[0.6517, 0.9209, 0.6801],
[0.0315, 0.1923, 0.8720]])
>>> quit()
So for, so good. Then I ran:
(base) c:\>python
Python 3.6.5 |Anaconda custom (64-bit)| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
False
>>>
Why did PyTorch say CUDA was not available?
The GPU is a compute capability 3.0 Quadro K3000M:
(base) C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe
Mon Oct 01 16:36:47 2018
NVIDIA-SMI 385.54 Driver Version: 385.54
-------------------------------+----------------------+----------------------
GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr.
ECC Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M.
0 Quadro K3000M WDDM | 00000000:01:00.0 Off |
N/A N/A 35C P0 N/A / N/A | 29MiB / 2048MiB | 0% Default
Ever since https://github.com/pytorch/pytorch/releases/tag/v0.3.1, PyTorch binary releases had removed support for old GPUs' with CUDA capability 3.0. According to https://en.wikipedia.org/wiki/CUDA, the compute capability of Quadro K3000M is 3.0.
Therefore, you might have to build pytorch from source or try other packages. Please refer to this thread for more information -- https://discuss.pytorch.org/t/pytorch-no-longer-supports-this-gpu-because-it-is-too-old/13803.
PyTorch official call for using CUDA 9.0 and I would suggest the same. In other cases, there are sometimes build issues which leads to 'CUDA not detected'.So, when using PyTorch its best to use CUDA 9.0 and CuDnn 7. I'll add a link where you can easily install Cuda 9.0 and CuDnn 7.
https://yangcha.github.io/CUDA90/
I had a similar problem, you have to check on NVIDIA control panel that your card is selected by default.

Build TensorFlow from the source

In the ./configure stage, after setting all the options by default, the terminal threw me this error:
As you required, here's the full outputs:
Please specify the location of python. [Default is /home/jingw222/anaconda3/bin/python]:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n]
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
No XLA support will be enabled for TensorFlow
Found possible Python library paths:
/home/jingw222/anaconda3/lib/python3.6/site-packages
Please input the desired Python library path to use. Default is [/home/jingw222/anaconda3/lib/python3.6/site-packages]
Using python library path: /home/jingw222/anaconda3/lib/python3.6/site-packages
Do you wish to build TensorFlow with OpenCL support? [y/N]
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N]
No CUDA support will be enabled for TensorFlow
Configuration finished
Extracting Bazel installation...
..................
unexpected pipe read status: (error: 2): No such file or directory
Server presumed dead. Now printing '/home/jingw222/.cache/bazel/_bazel_jingw222/ada033fd33c06190d78b77ab4907f1d0/server/jvm.out':
java.lang.ExceptionInInitializerError
at java.lang.J9VMInternals.ensureError(J9VMInternals.java:141)
at java.lang.J9VMInternals.recordInitializationFailure(J9VMInternals.java:130)
at com.google.devtools.build.lib.skyframe.SkyframeExecutor.skyFunctions(SkyframeExecutor.java:348)
at com.google.devtools.build.lib.skyframe.SkyframeExecutor.init(SkyframeExecutor.java:586)
at com.google.devtools.build.lib.skyframe.SequencedSkyframeExecutor.init(SequencedSkyframeExecutor.java:252)
at com.google.devtools.build.lib.skyframe.SequencedSkyframeExecutor.create(SequencedSkyframeExecutor.java:211)
at com.google.devtools.build.lib.skyframe.SequencedSkyframeExecutor.create(SequencedSkyframeExecutor.java:162)
at com.google.devtools.build.lib.skyframe.SequencedSkyframeExecutorFactory.create(SequencedSkyframeExecutorFactory.java:48)
at com.google.devtools.build.lib.runtime.WorkspaceBuilder.build(WorkspaceBuilder.java:81)
at com.google.devtools.build.lib.runtime.BlazeRuntime.initWorkspace(BlazeRuntime.java:204)
at com.google.devtools.build.lib.runtime.BlazeRuntime.newRuntime(BlazeRuntime.java:1023)
at com.google.devtools.build.lib.runtime.BlazeRuntime.createBlazeRPCServer(BlazeRuntime.java:850)
at com.google.devtools.build.lib.runtime.BlazeRuntime.serverMain(BlazeRuntime.java:789)
at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:570)
at com.google.devtools.build.lib.bazel.BazelMain.main(BazelMain.java:56)
Caused by: java.lang.ClassCastException: com.ibm.lang.management.UnixExtendedOperatingSystem incompatible with com.sun.management.OperatingSystemMXBean
at com.google.devtools.build.lib.util.ResourceUsage.<clinit>(ResourceUsage.java:45)
... 13 more
So, basically, most of them are not enabled by default. How to approach problems like this?
Install jdk 8 manually:
sudo apt-get install openjdk-8-jdk
After that build again.
Check this issue for more info.

multi process multi GPU with tensorflow, windows

I'm little bit new with tensor-flow.. so please be gentle with me..
I have problem with creating second process that load tensorflow on already working GPU.
the error I get is:
\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
\cuda\cuda_dnn.cc:392] error retrieving driver version: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
\cuda\cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
\kernels\conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
Hardware details :
super micro - 4028GR-TRT
8 GPU's 1080
CUDA: 8
cudnn: 5.1
windows: 10
tensorflow: 0.12.1 / 1.0.1
My PC shouldn't be a problem
windows 7
gpu 1070
cuda 8
cudnn 5.1
tensorflow 0.12.1
Can someone tell me why on my PC everything is ok but not on the big one(supermicro)?
is this windows / driver issues maybe?
I try to update NVIDIA driver.. no help on that ..
TensorFlow is not always good at sharing GPUs with other processes (including other instances of itself!). The typical workaround is to use the %CUDA_VISIBLE_DEVICES% environment variable to prevent the two processes from clashing over the same GPU. For example:
C:\>set CUDA_VISIBLE_DEVICES=0
C:\>python tensorflow_program_1.py
While in another command prompt you could tell TensorFlow to use a different GPU as follows:
C:\>set CUDA_VISIBLE_DEVICES=1
C:\>python tensorflow_program_2.py

How to Install the CUDA Driver for TensorFlow (installing from source)

I'm trying to build TensorFlow from source and run it with GPU support. To install the toolkit I use the runfile, to install the driver I used the Additional Drivers Tool, since I did not get Ubuntu to boot into Text mode as specified in the CUDA documentation and stop lightdm and start lightdm does not work either, it gives me (also with sudo):
Name com.ubuntu.Upstart does not exist
So far I could build a release from the TensorFlow repository. However, when I'm trying to run the example as specified in the how-to
bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
the GPU apparently cannot be found:
jonas#jonas-Aspire-V5-591G:~/Documents/repos/tensoflow_fork$ bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: jonas-Aspire-V5-591G
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: jonas-Aspire-V5-591G
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 352.63.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.63 Sat Nov 7 21:25:42 PST 2015 GCC version: gcc version
4.9.2 (Ubuntu 4.9.2-10ubuntu13) """
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 352.63.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:293] kernel version seems to match DSO: 352.63.0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
F tensorflow/cc/tutorials/example_trainer.cc:125] Check failed: ::tensorflow::Status::OK() == (session->Run({{"x", x}}, {"y:0", "y_normalized:0"}, {}, &outputs)) (OK vs. Invalid argument: Cannot assign a device to node 'y': Could not satisfy explicit device specification '/gpu:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
[[Node: y = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/gpu:0"](Const, x)]])
Aborted
I'm using a clean Ubuntu 15.04 installation on an Acer Notebook with the GTX950M.
Can anybody tell me how to properly install the driver?
Can you run deviceQuery (comes with cuda installation)? Can you see nvidia present in lspci/lsmod/nvidia-smi?
lsmod |grep nvidia
dmesg | grep -i nvidia
lspci | grep -i nvidia
nvidia-smi
You can reload nvidia module and look for error messages
modprobe -r nvidia
dmesg | tail
sudo dmesg | grep NVRM
Related issue https://github.com/tensorflow/tensorflow/issues/601

nvcc -arch sm_52 gives error "Value 'sm_52' is not defined for option 'gpu-architecture'"

I updated my cuda toolkit from 5.5 to 6.5. Then following command
nvcc -arch=sm_52
starts to give me an error
nvcc fatal : Value 'sm_52' is not defined for option 'gpu-architecture'
Is this a bug ? or nvcc 6.5 does not support Maxwell virtual architecture.
CUDA Toolkit 6.5 was released before sm_52 architecture came into production.
After the arrival of sm_52 architecture, an update to CUDA 6.5 was released which enabled nvcc to generate code for sm_52.
Make sure you download the newer version of CUDA Toolkit 6.5.
P.S: I would rather use the latest version of toolkit (currently 7.0).