cudaErrorNoDevice GTS250 gentoo - cuda

first question is
cudaGetDeviceCount return cudaErrorNoDevice:
This indicates that no CUDA-capable devices were detected by the installed CUDA driver.
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86 Kernel Module 260.19.26 Sun Nov 28 22:38:24 PST 2010
GCC version: gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5)
lspci -v
...
02:00.0 VGA compatible controller: nVidia Corporation G92 [GeForce GTS 250] (rev a2) (prog-if 00 [VGA controller])
Flags: bus master, fast devsel, latency 0, IRQ 19
Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
I/O ports at ec00 [size=128]
[virtual] Expansion ROM at fafe0000 [disabled] [size=128K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel <?>
Capabilities: [128] Power Budgeting <?>
Capabilities: [600] Vendor Specific Information <?>
Kernel driver in use: nvidia
Kernel modules: nvidia
...
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2010 NVIDIA Corporation
Built on Wed_Nov__3_16:14:08_PDT_2010
Cuda compilation tools, release 3.2, V0.2.1221
CUDA computing SDK 3.2.16
and when i try to make sdk samples i get second problem:
make[1]: Entering directory `/home/style/NVIDIA_GPU_Computing_SDK/C/src/MersenneTwister'
nvcc fatal : Unsupported gpu architecture 'compute_20'
i've tried to edit common.mk, but i don't know what exactly i should edit
Thanks for help

Verify if you have a /dev/nvidia0. If you do not, you may need to restart your X.

I don't see the error where it says you have no CUDA devices.
In answer to your second question:
GeForce GTS 250 supports CUDA 1.1 not 2.0. You need to edit the compilation flags to replace compute_20 with compute_11.

As I had the same problem, the only solution that worked for me was to reinstall the driver. Notice that the driver is not included in the SDK.

Related

No devices were found

No devices were found
error in installing
ZOTAC GAMING GeForce RTX 3080 Ti Trinity OC in Ubuntu 20.04.
cat /proc/driver/nvidia/version command gives
NVRM version: NVIDIA UNIX x86_64 Kernel Module 525.85.05 Sat Jan 14 00:49:50 UTC 2023
GCC version: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
lspci | grep -i nvidia command gives
06:00.0 VGA compatible controller: NVIDIA Corporation Device 2216 (rev a1)
06:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
The system has Nvidia driver version 525.85.05 and
Cuda 11.6.
All installed successfully using run files.
But nvidia-smi command gives
No devices were found
What is wrong with installation?
EDIT:
dpkg -l | grep -i nvidia command has outputs as below
ii libnvidia-compute-510:i386 510.108.03-0ubuntu0.20.04.1 i386 NVIDIA libcompute package
ii libnvidia-decode-510:i386 510.108.03-0ubuntu0.20.04.1 i386 NVIDIA Video Decoding runtime libraries
ii libnvidia-encode-510:i386 510.108.03-0ubuntu0.20.04.1 i386 NVENC Video Encoding runtime library
ii libnvidia-fbc1-510:i386 510.108.03-0ubuntu0.20.04.1 i386 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii screen-resolution-extra 0.18build1 all Extension for the nvidia-settings control panel

Could I install Caffe framework on my laptop?

My graphics card
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV620/M82 [Mobility Radeon HD 3410/3430] (prog-if 00 [VGA controller])
Subsystem: Hewlett-Packard Company Device 30e9
Flags: bus master, fast devsel, latency 0, IRQ 33
Memory at 80000000 (32-bit, prefetchable) [size=256M]
I/O ports at 7000 [size=256]
Memory at 98400000 (32-bit, non-prefetchable) [size=64K]
Expansion ROM at 98420000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: radeon
So i do not have NVIDIA card,does this mean I can not install it?
No, it does not. You can install it with CPU_ONLY=ON variable in cmake. Of course, CUDA will be unavailable for you, but you still can use caffe on CPU.
Moreover, you can try to checkout opencl caffe branch to utilize your AMD hardware (but I didn't try it yet).

CUDA Installation for NVidia Quadro FX 3800

I'm having trouble installing CUDA 7.0 (to use with TensorFlow) on a workstation with the Nvidia Quadro FX 3800. I'm wondering if this is because the GPU is no longer supported.
Installation of the driver (340.96) seems to work fine:
$ sh ./NVIDIA-Linux-x86_64-340.96.run
Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64
(version: 340.96) is now complete. Please update your XF86Config or
xorg.conf file as appropriate; see the file
/usr/share/doc/NVIDIA_GLX-1.0/README.txt for details.
However, I think I may be having trouble with the following:
$ ./cuda_7.0.28_linux.run --kernel-source-path=/usr/src/linux-headers-3.13.0-76-generic
The driver installation is unable to locate the kernel source. Please make sure
that the kernel source packages are installed and set up correctly. If you know
that the kernel source packages are installed and set up correctly, you may pass
the location of the kernel source with the '--kernel-source-path' flag.
...
Logfile is /tmp/cuda_install_1357.log
$ vi /tmp/cuda_install_1357.log
WARNING: The NVIDIA Quadro FX 3800 GPU installed in this system is
supported through the NVIDIA 340.xx legacy Linux graphics drivers.
Please visit http://www.nvidia.com/object/unix.html for more
information. The 346.46 NVIDIA Linux graphics driver will ignore
this GPU.
WARNING: You do not appear to have an NVIDIA GPU supported by the 346.46
NVIDIA Linux graphics driver installed in this system. For
further details, please see the appendix SUPPORTED NVIDIA GRAPHICS
CHIPS in the README available on the Linux driver download page at
www.nvidia.com.
...
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most
frequently when this kernel module was built against the wrong or
improperly configured kernel sources, with a version of gcc that
differs from the one used to build the target kernel, or if a driver
such as rivafb, nvidiafb, or nouveau is present and prevents the
NVIDIA kernel module from obtaining ownership of the NVIDIA graphics
device(s), or no NVIDIA GPU installed in this system is supported by
this NVIDIA Linux graphics driver release.
...
Please see the log entries 'Kernel module load error' and 'Kernel
messages' at the end of the file '/var/log/nvidia-installer.log' for
more information.
Is the installation failure due to CUDA dropping support for this graphics card?
I followed the link trail: https://developer.nvidia.com/cuda-gpus > https://developer.nvidia.com/cuda-legacy-gpus > http://www.nvidia.com/object/product_quadro_fx_3800_us.html and I would have thought the Quadro FX 3800 supported CUDA (at least at the beginning).
Yes, the Quadro FX 3800 GPU is no longer supported by CUDA 7.0 and beyond.
The last CUDA version that supported that GPU was CUDA 6.5.
This answer and this answer may be of interest. Your QFX 3800 is a compute capability 1.3 device.
If you review the release notes that come with CUDA 7, you will find a notice of the elimination of support for these earlier GPUs. Likewise, the newer CUDA driver versions also don't support those GPUs.

cuda-gdb run error:"Failed to read the valid warps mask(dev=0,sm=0,error=16)" [duplicate]

I tried to debug my CUDA application with cuda-gdb but got some weird error.
I set option -g -G -O0 to build my application. I could run my program without cuda-gdb, but didn't get correct result. Hence I decided to use cuda-gdb, however, I got following error message while running program with cuda-gdb
Error: Failed to read the valid warps mask (dev=1, sm=0, error=16).
What does it means? Why sm=0 and what's the meaning of error=16?
Update 1: I tried to use cuda-gdb to CUDA samples, but it fails with same problem. I just installed CUDA 6.0 Toolkit followed by instruction of NVIDIA. Is it a problem of my system?
Update 2:
OS - CentOS 6.5
GPU
1 Quadro 400
2 Tesla C2070
I'm using only 1 GPU for my program, but I've got same bug message from any GPU that I selected
CUDA version - 6.0
GPU Driver
NVRM version: NVIDIA UNIX x86_64 Kernel Module 331.62 Wed Mar 19 18:20:03 PDT 2014
GCC version: gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
Update 3:
I tried to get more information in cuda-gdb, but I got following results
(cuda-gdb) info cuda devices
Error: Failed to read the valid warps mask (dev=1, sm=0, error=16).
(cuda-gdb) info cuda sms
Focus not set on any active CUDA kernel.
(cuda-gdb) info cuda lanes
Focus not set on any active CUDA kernel.
(cuda-gdb) info cuda kernels
No CUDA kernels.
(cuda-gdb) info cuda contexts
No CUDA contexts.
Actually, this issue is only specific to some old NVIDIA GPUs(like "Quadro 400", "GeForce GT220", or "GeForce GT 330M", etc).
On Liam Kim's setup, cuda-gdb should work fine by set environment variable "CUDA_VISIBLE_DEVICES", and let cuda-gdb running on Tesla C2070 GPUs specifically.
I.e
$export CUDA_VISIBLE_DEVICES=0 (or 2)
- the exact CUDA devices index could be found by running cuda sample - "deviceQuery".
And now, this issue has been fixed, the fix would be availble for CUDA developers in the next CUDA release(it will be posted out around early July, 2014).
This is internal cuda-gdb bug. You should report a bug.
Can you try installing CUDA toolkit from the package on NVIDIA site?

CUDA Runtime API error 38: no CUDA-capable device is detected

The Situation
I have a 2 gpu server (Ubuntu 12.04) where I switched a Tesla C1060 with a GTX 670. Than I installed CUDA 5.0 over the 4.2. Afterwards I compiled all examples execpt for simpleMPI without error. But when I run ./devicequery I get following error message:
foo#bar-serv2:~/NVIDIA_CUDA-5.0_Samples/bin/linux/release$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
What I have tried
To solve this I tried all of the thinks recommended by CUDA-capable device, but to no avail:
/dev/nvidia* is there and the permissions are 666 (crw-rw-rw-) and owner root:root
foo#bar-serv2:/dev$ ls -l nvidia*
crw-rw-rw- 1 root root 195, 0 Oct 24 18:51 nvidia0
crw-rw-rw- 1 root root 195, 1 Oct 24 18:51 nvidia1
crw-rw-rw- 1 root root 195, 255 Oct 24 18:50 nvidiactl
I tried executing the code with sudo
CUDA 5.0 installs driver and libraries at the same time
PS here is lspci | grep -i nvidia:
foo#bar-serv2:/dev$ lspci | grep -i nvidia
03:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 670] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
04:00.0 VGA compatible controller: NVIDIA Corporation G94 [Quadro FX 1800] (rev a1)
[update]
foo#bar-serv2:~/NVIDIA_CUDA-5.0_Samples/bin/linux/release$ nvidia-smi -a
NVIDIA: API mismatch: the NVIDIA kernel module has version 295.59,
but this NVIDIA driver component has version 304.54. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
Failed to initialize NVML: Unknown Error
How could that be, if I use the CUDA 5.0 installer to install driver and libs at the same time. Could the old 4.2 version, that is still lying around mess things up?
I came across this issue, and running
nvidia-smi
informed me of an API mismatch. The problem was that my Linux distro had installed updates that required a system restart, so restarting resolved the issue.
See this stack overflow question Installing cuda 5 samples in Ubuntu 12.10.
Ubuntu 12 is not a supported Linux distro (yet). For reference see CUDA 5.0 Toolkit Release Notes And Errata
** Distributions Currently Supported
Distribution 32 64 Kernel GCC GLIBC
----------------- -- -- --------------------- ---------- -------------
Fedora 16 X X 3.1.0-7.fc16 4.6.2 2.14.90
ICC Compiler 12.1 X
OpenSUSE 12.1 X 3.1.0-1.2-desktop 4.6.2 2.14.1
Red Hat RHEL 6.x X 2.6.32-131.0.15.el6 4.4.5 2.12
Red Hat RHEL 5.5+ X 2.6.18-238.el5 4.1.2 2.5
SUSE SLES 11 SP2 X 3.0.13-0.27-pae 4.3.4 2.11.3
SUSE SLES 11.1 X X 2.6.32.12-0.7-pae 4.3.4 2.11.1
Ubuntu 11.10 X X 3.0.0-19-generic-pae 4.6.1 2.13
Ubuntu 10.04 X X 2.6.35-23-generic 4.4.5 2.12.1
If you want to do it run on Ubuntu 12 anyway then see answer of rpardo. It looks like this distro instead of installing 64 bit libraries to /usr/lib64 installs them to /usr/lib/x86_64-linux-gnu/
I'd suggest searching for all instances of libcuda.so and libnvidia-ml.so on the system. Since the driver doesn't support this distro it might have installed libraries to a path that is not pointed by LD_LIBRARY_PATH. Then move the libraries around and/or change the LD_LIBRARY_PATH to point to this location (it should be the first path on the left). Then retry nvidia-smi or deviceQuery
Good luck
I got error 38 for cudaGetDeviceCount on a windows machine with GTX980 GPU.
After I downloaded the latest driver for GTX 980 fro the NVIDIA site, installed it and restarted, everything is fine. Looks like the CUDA installer is not installing the latest driver.
Try running the sample using sudo (or, you might do a 'sudo su', set LD_LIBRARY_PATH to the path of cuda libraries and run the sample while being root). Apparently, since you've probably installed CUDA 5.0 using sudo, the samples doesn't run with normal user. However, if you run a sample with root, then you'll be able to run samples with the regular user too! I've not yet restarted the system to see if samples work with normal user even after reboot, or each time you should run at least one CUDA application with root.
The problem might completely disappear if you install CUDA TookKit without using sudo.
I had very similar problem on Debian and it turns out that loaded nvidia module had different version than libcuda1.
To check for installed nvidia module you should do:
$ sudo modinfo nvidia-current | grep version
version: 319.82
If it doesn't match version of libcuda1 this the root of your problems.