Sample deviceQuery cuda program - cuda

I have a Intel Xeon machine with NVIDIA GeForce1080 GTX configured and CentOS 7 as operating system. I have installed NVIDIA-driver 410.93 and cuda-toolkit 10.0. After compiling the cuda-samples, i tried to run ./deviceQuery.
But it throws like this
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL
some command outputs
lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
nvidia-smi
Wed Feb 13 16:08:07 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.93 Driver Version: 410.93 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 On | N/A |
| 0% 54C P0 46W / 240W | 175MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 6275 G /usr/bin/X 94MiB |
| 0 7268 G /usr/bin/gnome-shell 77MiB |
+-----------------------------------------------------------------------------+
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.13
PATH & LD_LIBRARY_PATH
PATH =/usr/local/cuda-10.0/bin:/usr/local/cuda/bin:/usr/local/bin:/usr/local/sbin:
LD_LIBRARY_PATH = /usr/local/cuda-10.0/lib64:/usr/local/cuda/lib64:
lsmod | grep nvidia
nvidia_drm 39819 3
nvidia_modeset 1036573 6 nvidia_drm
nvidia 16628708 273 nvidia_modeset
drm_kms_helper 179394 1 nvidia_drm
drm 429744 6 drm_kms_helper,nvidia_drm
ipmi_msghandler 56032 2 ipmi_devintf,nvidia
lsmod | grep nvidia-uvm
no output
dmesg | grep NVRM
[ 8.237489] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 410.93 Thu Dec 20 17:01:16 CST 2018 (using threaded interrupts)
Is this problem anything related to modprobe or nvidia-uvm?
I asked this in NVIDIA-devtalk forum, but no-reply yet.
Please give some suggestions.
Thanking in advance.

I debugged it. The problem is version mismatch between nvidia-driver(410.93) and cuda(with driver 410.48 came with cuda run file). Gave autoremove all the drivers and reinstalled from the beginning. Deleted all the link files in /var/lib/dkms/nvidia/*.
Now it works fine. And nvidia-uvm also loaded.
lsmod | grep nvidia
nvidia_uvm 786031 0
nvidia_drm 39819 3
nvidia_modeset 1048491 6 nvidia_drm
nvidia 16805034 274 nvidia_modeset,nvidia_uvm
drm_kms_helper 179394 1 nvidia_drm
drm 429744 6 drm_kms_helper,nvidia_drm
ipmi_msghandler 56032 2 ipmi_devintf,nvidia
nvidia-smi
Fri Feb 15 11:46:24 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 On | N/A |
| 0% 45C P8 10W / 240W | 242MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 6063 G /usr/bin/X 120MiB |
| 0 7502 G /usr/bin/gnome-shell 119MiB |
+-----------------------------------------------------------------------------+
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1080"
CUDA Driver Version / Runtime Version 10.0 / 10.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 8119 MBytes (8513585152 bytes)
(20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
GPU Max Clock rate: 1797 MHz (1.80 GHz)
Memory Clock rate: 5005 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS

Related

CUDA is installed, but PyTorch v1.13 on Windows 10 not working. PyTorch not using GPU; how to fix PyTorch out of sync with CUDA 11.x drivers?

How can I find where CUDA 11.x for PyTorch-GPU 1.13 get installed on Windows 10 on my computer?
What I tried:
I installed the NVIDIA CUDA drivers and toolkit for Windows from the NVIDIA website. I can verify this by typing: !nvidia-smi in Jupyter Lab, which gives me the following output. This indicates that the CUDA tools are installed, but not being used by my PyTorch package. I need to find out what version of CUDA drivers are installed so I can install the correct PyTorch-GPU package.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 513.63 Driver Version: 513.63 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P2000 WDDM | 00000000:01:00.0 Off | N/A |
| N/A 46C P8 N/A / N/A | 0MiB / 4096MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I find many Ubuntu questions and answers for locating CUDA to add it to my PATH, but nothing specific for Windows 10.
For example:
Pytorch CUDA installation fails,
Pytorch CUDA installation using conda,
pytorch-says-that-cuda-is-not-available
What are the equivalent Python commands on Windows 10 to locate the CUDA 11.x toolkits and driver version that my PyTorch-GPU package must use? And then how to fix the problem if PyTorch is out of sync?
I am answering my own question here...
PyTorch-GPU must be compiled against specific CUDA binary drivers.
I finally found this hint Why torch.cuda.is_available() returns False even after installing pytorch with cuda? which identifies the issue.
import torch
torch.zeros(1).cuda()
The return value clearly identifies the problem.
AssertionError Traceback (most recent call last)
Cell In [222], line 2
1 import torch
----> 2 torch.zeros(1).cuda()
File C:\ProgramData\Anaconda3\envs\tf210_gpu\lib\site-packages\torch\cuda\__init__.py:221, in _lazy_init()
217 raise RuntimeError(
218 "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
219 "multiprocessing, you must use the 'spawn' start method")
220 if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 221 raise AssertionError("Torch not compiled with CUDA enabled")
222 if _cudart is None:
223 raise AssertionError(
224 "libcudart functions unavailable. It looks like you have a broken build?")
AssertionError: Torch not compiled with CUDA enabled
The problem is: "Torch not compiled with CUDA enabled"
Now I have to see if I can just re-install PyTorch-GPU to replace the current PyTorch-CPU version with one that is compiled against my CUDA CUDA-GPU v11.6 driver, without rebuilding the entire conda environment. I would rather not rebuild the conda environment from scratch unless it is really necessary.

CUDA failed to launch kernel : no kernel image available for execution

i am trying to run CUDA on a rather old GPU. I tried the CUDA Samples vectorAdd which gives me the following error:
Failed to launch vectorAdd kernel (error code no kernel image is available for execution on the device)!
These are the outputs from
deviceQuery:
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 580"
CUDA Driver Version / Runtime Version 9.1 / 9.0
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 1467 MBytes (1538392064 bytes)
MapSMtoCores for SM 2.0 is undefined. Default to use 64 Cores/SM
MapSMtoCores for SM 2.0 is undefined. Default to use 64 Cores/SM
(16) Multiprocessors, ( 64) CUDA Cores/MP: 1024 CUDA Cores
GPU Max Clock rate: 1630 MHz (1.63 GHz)
Memory Clock rate: 2050 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 786432 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (65535, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 3 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.147 Driver Version: 390.147 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 580 Off | 00000000:03:00.0 N/A | N/A |
| 42% 48C P12 N/A / N/A | 257MiB / 1467MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
Now according to the CUDA compatibility PDF
https://docs.nvidia.com/pdf/CUDA_Compatibility.pdf
I assume I have Binary Compatibility from CUDA 9.0.176 to the GPU Driver.
For Compute Capability Support, the table does not list the 390 Driver.
Is it even possible to program CUDA on this GPU or should I get a newer one? If it is possible, what combination of driver and CUDA toolkit version do I need?
The GPU you are using is a Fermi class (compute capability 2.0) device. Support was officially removed from the CUDA toolkit when CUDA 9.0 was released in September 2017. The last release of the CUDA toolkit with Fermi support was CUDA 8.0. You will have to use that (or something even older) if you wish to use that GPU with CUDA.
[Answer assembled from comments an added as a community wiki entry to get this question off the unanswered list for the CUDA tag]

How do I know my GPU supports multiprocessing by default?

I came across from this post:
How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications?
But when I run ./mps_run before I launch the MPS, I got
kernel duration: 4.999370s
kernel duration: 5.012310s
And when I check nvidia-smi in 5 secs:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000001:00:00.0 Off | 0 |
| N/A 28C P0 38W / 250W | 508MiB / 16280MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Looks like the GPU I am using supports multi-processing somehow,
When I run nvidia-smi -i 2 -c EXCLUSIVE_PROCESS, turned out No devices were found
This is weird.
How do I know my GPU supports multiprocessing or not?
The GPU I am using: Tesla P100 (GP100GL)
In that post you linked, in the UPDATE section of my answer, I indicated that the GPU scheduler has changed in Pascal and beyond (your Tesla P100 is a Pascal GPU).
MPS is supported on all current NVIDIA GPUs.
The results you got are expected (in the non-MPS case) because the GPU scheduler allows both kernels to run, in a time-sliced fashion. All currently supported CUDA GPUs support multiprocessing (in Default compute mode). However the older GPUs (e.g. Kepler) would run the kernel from one process, then the kernel from the other process. Pascal and newer GPUs will run the kernel from one process for a period of time, then the other process for a period of time, then the first process, etc in a round-robin time-sliced fashion.

Why does nvidia-smi return "GPU access blocked by the operating system" in WSL2 under Windows 10 21H2 [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
The community reviewed whether to reopen this question 12 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Installing CUDA on WSL2
I've installed Windows 10 21H2 on both my desktop (AMD 5950X system with RTX3080) and my laptop (Dell XPS 9560 with i7-7700HQ and GTX1050) following the instructions on https://docs.nvidia.com/cuda/wsl-user-guide/index.html:
Install CUDA-capable driver in Windows
Update WSL2 kernel in PowerShell: wsl --update
Install CUDA toolkit in Ubuntu 20.04 in WSL2
(Note that you don't install a CUDA driver in WSL2, the instructions explicitly tell that the CUDA driver should not be installed.):
$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
$ sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
$ sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get -y install cuda
The Error
On my desktop nvidia-smi and CUDA samples are working fine in WSL2.
But on my laptop running nvidia-smi in WSL2 returns:
$ nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system
I'm aware my laptop has NVIDIA Optimus with both Intel IGP and NVIDIA GTX1050, but CUDA is working fine in Windows. Only not in WSL2.
But I also could not find any information that CUDA is not supposed to work in WSL2 for Optimus systems.
What I've tried
I've tried the following mitigations, but the error remains:
reinstalling the Windows CUDA driver again and rebooting
Making the GTX1050 the preferred GPU in global settings in the NVIDIA control panel
Making the GTX1050 the default physx processor
Following the same steps for a fresh Ubuntu 18.04 in WSL2
The question
Is this a CUDA WSL2 bug? Or does CUDA simply not work with Optimus? Or how can I fix or further debug this?
More details
I've compared running nvidia-smi.exe in Windows powershell between my desktop and laptop, and they both return the same software versions:
PS C:\WINDOWS\system32> nvidia-smi
Wed Nov 17 21:46:50 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.06 Driver Version: 510.06 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:01:00.0 Off | N/A |
| N/A 44C P8 N/A / N/A | 75MiB / 4096MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Even more details
The full nvidia-smi.exe -q on my laptop in Windows Powershell returns the following information about my laptop's GPU:
PS C:\WINDOWS\system32> nvidia-smi -q
==============NVSMI LOG==============
Timestamp : Wed Nov 17 21:48:19 2021
Driver Version : 510.06
CUDA Version : 11.6
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA GeForce GTX 1050
Product Brand : GeForce
Product Architecture : Pascal
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : N/A
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : WDDM
Pending : WDDM
Serial Number : N/A
GPU UUID : GPU-7645072f-7516-5488-316d-6277d101f64e
Minor Number : N/A
VBIOS Version : 86.07.3e.00.1c
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Module ID : 0
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1C8D10DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x07BE1028
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 4096 MiB
Used : 75 MiB
Free : 4021 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 40 C
GPU Shutdown Temp : 102 C
GPU Slowdown Temp : 97 C
GPU Max Operating Temp : 78 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 0 MHz
SM : 0 MHz
Memory : 405 MHz
Video : 0 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 3504 MHz
Video : 1708 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes : None
Turns out that Windows 10 Update Assistant incorrectly reported it upgraded my OS to 21H2 on my laptop.
Checking Windows version by running winver reports that my OS is still 21H1.
Of course CUDA in WSL2 will not work in Windows 10 without 21H2.
After successfully installing 21H2 I can confirm CUDA works with WSL2 even for laptops with Optimus NVIDIA cards.

Caffe-SSD installed with GPU support thorows error "Cannot use GPU in CPU-only Caffe"

I have installed caffe-ssd with OpenCV version 3.2.0, CUDA version 9.2.148 and CuDNN version 7.2.1.38.
These are my settings in Makefile.config
# cuDNN acceleration switch (uncomment to build with cuDNN).
USE_CUDNN := 1
# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1
# Uncomment if you're using OpenCV 3
OPENCV_VERSION := 3
# We need to be able to find Python.h and numpy/arrayobject.h.
PYTHON_INCLUDE := /usr/include/python2.7 \
/usr/local/lib/python2.7/dist-packages/numpy/core/include
# Uncomment to support layers written in Python (will link against Python libs)
WITH_PYTHON_LAYER := 1
# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial/
All tests were passed.
[----------] Global test environment tear-down
[==========] 1266 tests from 168 test cases ran. (45001 ms total)
[ PASSED ] 1266 tests.
Thereafter I follow this link for SSD. The LMDB creation works without a problem but when I run
python examples/ssd/ssd_pascal.py
I get the following error
I0820 14:16:29.089138 22429 caffe.cpp:217] Using GPUs 0
F0820 14:16:29.089301 22429 common.cpp:66] Cannot use GPU in CPU-only Caffe: check mode.
*** Check failure stack trace: ***
# 0x7f97322a00cd google::LogMessage::Fail()
# 0x7f97322a1f33 google::LogMessage::SendToLog()
# 0x7f973229fc28 google::LogMessage::Flush()
# 0x7f97322a2999 google::LogMessageFatal::~LogMessageFatal()
# 0x7f973284f8a0 caffe::Caffe::SetDevice()
# 0x55b05fe50dcb (unknown)
# 0x55b05fe4c543 (unknown)
# 0x7f9730ae3b97 __libc_start_main
# 0x55b05fe4cffa (unknown)
Aborted (core dumped)
I have an NVIDIA GeForce GTX 1080 Ti graphics card.
Mon Aug 20 14:26:48 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.51 Driver Version: 396.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:01:00.0 Off | N/A |
| 44% 37C P8 19W / 250W | 18MiB / 11177MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1356 G /usr/lib/xorg/Xorg 9MiB |
| 0 1391 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
I've tried compiling a simple Cuda code with nvcc and run it without any problem. I'm able to import caffe without any issue.
I have checked this question and that's not my problem.
for the error error == cudaSuccess (7 vs. 0)
change from gpus = "0,1,2,3" to gpus = "0" in ssd_pascal.py and also check the path of cuda in CUDA_DIR in Makefile.config and update it with the proper path and version that is installed in your system.
and for error “Cannot use GPU in CPU-only Caffe” build the ssd again using make test command