I am trying to get some insight of why my CUDA kernel has a relatively low performance and I am hoping to get some answers with the NVIDIA profiler.
My CUDA program is a 'boiled down' version of a larger application, isolating and exercising the kernel in question. The program launches the kernel several times in order to measure it's execution time as a mean over multiple launches. After the timing loop a memory copy from device to host is issued to make sure all kernel calls have finished. The program is written in CUDA C++.
This is how I built the program:
main.o: main.cu
nvcc -res-usage -arch=sm_61 -c $<
main: main.o stopwatch.o
g++ -o $# $^ -lcudart -L/usr/local/cuda-11.0/lib64
This test was done on a PC with Intel CPU and an NVIDIA GeForce GTX 1070. The OS is Ubuntu 20.04 with a freshly installed CUDA 11 from the NVIDIA website along with driver 450.51.06:
nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 On | 00000000:01:00.0 On | N/A |
| 28% 38C P8 8W / 151W | 317MiB / 8111MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
The following command was used to generate the profiling file:
sudo /usr/local/cuda-11.0/bin/nvprof -o main.nvvp --profile-from-start
off ./main
I also tried with profiling from start but it leads to the same issue below.
The following command was used to launch the visual profiler:
nvvp -vm /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java main.nvvp
The Visual profiler walks me through several steps and when it comes to "Perform Kernel Analysis" the program tells me:
Insufficient kernel bounds data. The data needed to calculate compute,
memory, and latency bounds for the kernel could not be collected
Is this sort of detailed profiling not available on my GPU? (maybe because it's a gamer card)
nvprof by default will capture only a small amount of information in the output file it generates. This is enough to generate an application timeline, when the output file is imported into nvvp, but not enough information to enable all of the different capabilities of nvvp.
According to the documentation, the --analysis-metrics switch for nvprof is recommended for this type of use.
--analysis-metrics is referred to about 6 different times in the profiler documentation, so you may simply want to search on it to see all of the references or recommendations for its use.
Note that --analysis-metrics can capture a large amount of information. For a large, complex application, it may substantially increase the times the profilers spend processing data. Therefore if you know specifically which data you are looking for, you may wish to specify specific metrics instead. Without --analysis-metrics, however, various nvvp analysis tools may not work correctly when you import the file.
Related
How can I find where CUDA 11.x for PyTorch-GPU 1.13 get installed on Windows 10 on my computer?
What I tried:
I installed the NVIDIA CUDA drivers and toolkit for Windows from the NVIDIA website. I can verify this by typing: !nvidia-smi in Jupyter Lab, which gives me the following output. This indicates that the CUDA tools are installed, but not being used by my PyTorch package. I need to find out what version of CUDA drivers are installed so I can install the correct PyTorch-GPU package.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 513.63 Driver Version: 513.63 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P2000 WDDM | 00000000:01:00.0 Off | N/A |
| N/A 46C P8 N/A / N/A | 0MiB / 4096MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I find many Ubuntu questions and answers for locating CUDA to add it to my PATH, but nothing specific for Windows 10.
For example:
Pytorch CUDA installation fails,
Pytorch CUDA installation using conda,
pytorch-says-that-cuda-is-not-available
What are the equivalent Python commands on Windows 10 to locate the CUDA 11.x toolkits and driver version that my PyTorch-GPU package must use? And then how to fix the problem if PyTorch is out of sync?
I am answering my own question here...
PyTorch-GPU must be compiled against specific CUDA binary drivers.
I finally found this hint Why torch.cuda.is_available() returns False even after installing pytorch with cuda? which identifies the issue.
import torch
torch.zeros(1).cuda()
The return value clearly identifies the problem.
AssertionError Traceback (most recent call last)
Cell In [222], line 2
1 import torch
----> 2 torch.zeros(1).cuda()
File C:\ProgramData\Anaconda3\envs\tf210_gpu\lib\site-packages\torch\cuda\__init__.py:221, in _lazy_init()
217 raise RuntimeError(
218 "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
219 "multiprocessing, you must use the 'spawn' start method")
220 if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 221 raise AssertionError("Torch not compiled with CUDA enabled")
222 if _cudart is None:
223 raise AssertionError(
224 "libcudart functions unavailable. It looks like you have a broken build?")
AssertionError: Torch not compiled with CUDA enabled
The problem is: "Torch not compiled with CUDA enabled"
Now I have to see if I can just re-install PyTorch-GPU to replace the current PyTorch-CPU version with one that is compiled against my CUDA CUDA-GPU v11.6 driver, without rebuilding the entire conda environment. I would rather not rebuild the conda environment from scratch unless it is really necessary.
I came across from this post:
How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications?
But when I run ./mps_run before I launch the MPS, I got
kernel duration: 4.999370s
kernel duration: 5.012310s
And when I check nvidia-smi in 5 secs:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000001:00:00.0 Off | 0 |
| N/A 28C P0 38W / 250W | 508MiB / 16280MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Looks like the GPU I am using supports multi-processing somehow,
When I run nvidia-smi -i 2 -c EXCLUSIVE_PROCESS, turned out No devices were found
This is weird.
How do I know my GPU supports multiprocessing or not?
The GPU I am using: Tesla P100 (GP100GL)
In that post you linked, in the UPDATE section of my answer, I indicated that the GPU scheduler has changed in Pascal and beyond (your Tesla P100 is a Pascal GPU).
MPS is supported on all current NVIDIA GPUs.
The results you got are expected (in the non-MPS case) because the GPU scheduler allows both kernels to run, in a time-sliced fashion. All currently supported CUDA GPUs support multiprocessing (in Default compute mode). However the older GPUs (e.g. Kepler) would run the kernel from one process, then the kernel from the other process. Pascal and newer GPUs will run the kernel from one process for a period of time, then the other process for a period of time, then the first process, etc in a round-robin time-sliced fashion.
I want to use gpu acceleration for my android emulator in a compute engine instance.
I added tesla t4 gpu and now trying to install the gpu grid driver according to here.
I use ubuntu 20. please advise
https://cloud.google.com/compute/docs/gpus/install-grid-drivers
I get an error:
in file included from /tmp/selfgz11598/NVIDIA-Linux-x86_64-410.92-grid/kernel/nvidia/nv-rsync.c:24:
/tmp/selfgz11598/NVIDIA-Linux-x86_64-410.92-grid/kernel/common/inc/nv-linux.h:1775:6: error: "NV_BUILD_MODULE_INSTA
NCES" is not defined, evaluates to 0 [-Werror=undef]
1775 | #if (NV_BUILD_MODULE_INSTANCES != 0)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
c1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:275: /tmp/selfgz11598/NVIDIA-Linux-x86_64-410.92-grid/kernel/nvidia/nv_uvm_int
erface.o] Error 1
/tmp/selfgz11598/NVIDIA-Linux-x86_64-410.92-grid/kernel/nvidia/nvlink_linux.c: In function ‘nvlink_sleep’:
/tmp/selfgz11598/NVIDIA-Linux-x86_64-410.92-grid/kernel/nvidia/nvlink_linux.c:570:5: error: implicit declaration of
function ‘do_gettimeofday’; did you mean ‘efi_gettimeofday’? [-Werror=implicit-function-declaration]
570 | do_gettimeofday(&tm_aux);
| ^~~~~~~~~~~~~~~
| efi_gettimeofday
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:275: /tmp/selfgz11598/NVIDIA-Linux-x86_64-410.92-grid/kernel/nvidia/nvlink_lin
ux.o] Error 1
make[2]: Target '__build' not remade because of errors.
make[1]: *** [Makefile:1731: /tmp/selfgz11598/NVIDIA-Linux-x86_64-410.92-grid/kernel] Error 2
make[1]: Target 'modules' not remade because of errors.
make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-1021-gcp'
make: *** [Makefile:79: modules] Error 2
ERROR: The nvidia kernel module was not created.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find sug
gestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.co
m.
(END)
The document you are using to install NVIDIA GRID® drivers for virtual workstations, only contains examples of the commands needed to install the GRID drivers.
The example contained in that guide, is for installing the NVIDIA 410.92 driver, this driver is for GRID7.1, but I recommend to use the latest version of GRID, you can consult the following table to see the drivers available.
I’ve reproduced this scenario on my own project and I was able to install GRID11.0, using the NVIDIA 450.51.05 driver.
I’m using an instance with the following characteristics:
Machine type: n1-standard-1 (1 vCPU, 3.75 GB memory)
GPUs: 1 x NVIDIA Tesla T4
OS ubuntu-minimal-2004-focal-v20200702
Keep in mind that you need to have the option Enable Virtual Workstation (NVIDIA GRID) enabled at the creation moment to avoid issues.
I used the following commands for this installation:
user#instance-1:~$ curl -O https://storage.googleapis.com/nvidia-drivers-us-public/GRID/GRID11.0/NVIDIA-Lin
ux-x86_64-450.51.05-grid.run
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 139M 100 139M 0 0 72.2M 0 0:00:01 0:00:01 --:--:-- 72.1M
user#instance-1:~$ sudo bash NVIDIA-Linux-x86_64-450.51.05-grid.run
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 450.51.05.....................................
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................
........................................................................
user#instance-1:~$ nvidia-smi
Mon Jul 27 21:11:17 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:04.0 Off | 0 |
| N/A 73C P8 21W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
In my case I needed to install some dependencies like the gcc compiler, and I only used the command
$ sudo apt install build-essential
I hope this information is useful for you.
I have followed instructions here and successfully build and setup geth.
Ethminer seems to work except it doesn't use the Titan X GPU and the mining rate is only 341022 H/s.
Also when I try to use the -G option ethminer says it is an invalid argument; the -G flag also doesn't appear in the ethminer help command.
Your GPU must have a minimum memory to perform mining. Upgrade to GPU you with higher memories (minimum 4GB is preferable)
The current DAG size is above (2GB). That means you cant mine with GPU with memory less than 2GB.
I'm installing Caffe.
I'm using Ubuntu 14.04.
I tried to install cuda. On Caffe site is written that I need to install the library and the latest standalone driver separately.
I downloaded driver from there. I tried every product type, but I get the same error:
You do not appear to have an NVIDIA GPU supported by the 346.46
NVIDIA Linux graphics driver installed in this system. For further
details, please see the appendix SUPPORTED NVIDIA GRAPHICS CHIPS in
the README available on the Linux driver download page at www.nvidia.com.
And then
You appear to be running an X server; please exit X before
installing. For further details, please see the section INSTALLING
THE NVIDIA DRIVER in the README available on the Linux driver
download page at www.nvidia.com.
And
Installation has failed. Please see the file
'/var/log/nvidia-installer.log' for details. You may find
suggestions on fixing installation problems in the README available
on the Linux driver download page at www.nvidia.com.
I successfuly installed cuda and cuDNN.
Then I downloaded Caffe from here.
Then I tried to compile and after I did make all and make test,
I did make runtest and get this error:
Check failed: error == cudaSuccess (38 vs. 0) no CUDA-capable device is detected
Also I found that I need to verify that I have a CUDA-Capable GPU.
This command: lspci | grep -i nvidia doesn't return anything. update-pciids doesn't help neither, though it returns Downloaded daily snapshot dated.
Can anyone help me install Caffe and everything correctly?
Your system apparently does not have a CUDA compatible GPU. Depending on what type of system you are using (most likely a desktop or server with appropriate free PCI-e slot(s), case space, and sufficient power supply capacity), it might be possible to purchase and install such a GPU.
Still you can get started with Caffe, by not using GPU by uncommenting CPU_ONLY flag in Makefile.config
Check failed: error == cudaSuccess (38 vs. 0) no CUDA-capable device
is detected
Assuming you have a GPU card, the above error can come if NVDIA Driver is not installed / used by the system.
Please check this link - https://askubuntu.com/questions/670485/how-to-inspect-the-currently-used-nvidia-driver-version-and-switch-it-to-another
Check the latest driver version from Nvidia site for your card. Then add the relevant repository and install via that. Better to restart
sudo apt-add-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-3xx
sudo modeporbe nvidia (also ran this before restart)
Check via nvidia-smi command
alex#alex-Lenovo-G400s-Touch:~$ nvidia-smi
Tue Feb 28 15:10:50 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39 Driver Version: 375.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 720M Off | 0000:01:00.0 N/A | N/A |
| N/A 51C P0 N/A / N/A | 271MiB / 1985MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
Install samples and test via deviceQuery after making the samples --> http://xcat-docs.readthedocs.io/en/stable/advanced/gpu/nvidia/verify_cuda_install.html
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GT 720M"
...
After that Reconfigure Caffe and do a clean make
Below are the CMake settings for reference and the CMake file http://pastebin.com/qAd40uvh
Probably you don't have CUDA compatible card. Also, you may have it, but you are not using it. i.e. If you have a NVidia card and Integrated graphics, you should make sure your monitor has plugged in your NVidia Card output interface.
You should make sure that your graphics card indeed support CUDA at http://www.geforce.com/hardware/technology/cuda/supported-gpus?field_gpu_type_value=All. Find your graphics card in this list, until you find your card.
p.s. To find your graphics card info, you can run lspci | grep VGA in the shell.