CUDA Toolkit 4.1/4.2: nvcc Crashes with an Access Violation - cuda

I am developing a CUDA application for GTX 580 with Visual Studio 2010 Professional on Windows 7 64bit. My project builds fine with CUDA Toolkit 4.0, but nvcc crashes when I choose CUDA Toolkit 4.1 or 4.2 with the following error:
1> Stack dump:
1> 0. Running pass 'Promote Constant Global' on module 'moduleOutput'.
1>CUDACOMPILE : nvcc error : 'cicc' died with status 0xC0000005 (ACCESS_VIOLATION)
Strangely enough, the program compiles OK with "compute_10,sm_10" specified for "Code Generation", but "compute_20,sm_20" does not work. The code in question can be downloaded here:
http://www.meriken2ch.com/files/CUDA_SHA-1_Tripper_MERIKENs_Branch_0.04_Alpha_1.zip
(README.txt is in Japanese, but comments in source files are in English.)
I am suspecting a newly introduced bug in CUDA Toolkit 4.1/4.2. Has anybody encountered this issue? Is there any workaround for it? Any kind of help will be much appreciated.

This appears to have been a compiler bug in CUDA 4.x that is fixed in CUDA 5.0 (according to a comment from #meriken2ch, the project builds fine with CUDA 5.0 RC).

Related

Autolykos GPU Miner software no longer recognizing NVIDIA GPU on Ubuntu 18.04

A month or so ago, Autolykos miner (https://github.com/ergoplatform/Autolykos-GPU-miner) compiled and ran. Now suddenly it doesn't work because the .cu files don't recognize any installed NVIDIA GPU. I made NO changes to the Autolykos code--it just stopped working. I merely dropped into the source folder (as described by the README) and typed make. But when I install and make all of the CUDA examples, THOSE run just fine. Running on UBUNTU 18.04 with a GeForce TITAN X. For example, the utility "deviceQuery" returns the following:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX TITAN X"
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 5.2
...
Whereas the output at startup of the mining binary spits out ONE line and quits:
Error Checking GPU: Using 0 GPU devices
Any suggestions would be welcome...
SOLVED: After re-compiling the CUDA code from NVIDIA, the miner is working. I suspect that a system update broke something

CUDA Installation for NVidia Quadro FX 3800

I'm having trouble installing CUDA 7.0 (to use with TensorFlow) on a workstation with the Nvidia Quadro FX 3800. I'm wondering if this is because the GPU is no longer supported.
Installation of the driver (340.96) seems to work fine:
$ sh ./NVIDIA-Linux-x86_64-340.96.run
Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64
(version: 340.96) is now complete. Please update your XF86Config or
xorg.conf file as appropriate; see the file
/usr/share/doc/NVIDIA_GLX-1.0/README.txt for details.
However, I think I may be having trouble with the following:
$ ./cuda_7.0.28_linux.run --kernel-source-path=/usr/src/linux-headers-3.13.0-76-generic
The driver installation is unable to locate the kernel source. Please make sure
that the kernel source packages are installed and set up correctly. If you know
that the kernel source packages are installed and set up correctly, you may pass
the location of the kernel source with the '--kernel-source-path' flag.
...
Logfile is /tmp/cuda_install_1357.log
$ vi /tmp/cuda_install_1357.log
WARNING: The NVIDIA Quadro FX 3800 GPU installed in this system is
supported through the NVIDIA 340.xx legacy Linux graphics drivers.
Please visit http://www.nvidia.com/object/unix.html for more
information. The 346.46 NVIDIA Linux graphics driver will ignore
this GPU.
WARNING: You do not appear to have an NVIDIA GPU supported by the 346.46
NVIDIA Linux graphics driver installed in this system. For
further details, please see the appendix SUPPORTED NVIDIA GRAPHICS
CHIPS in the README available on the Linux driver download page at
www.nvidia.com.
...
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most
frequently when this kernel module was built against the wrong or
improperly configured kernel sources, with a version of gcc that
differs from the one used to build the target kernel, or if a driver
such as rivafb, nvidiafb, or nouveau is present and prevents the
NVIDIA kernel module from obtaining ownership of the NVIDIA graphics
device(s), or no NVIDIA GPU installed in this system is supported by
this NVIDIA Linux graphics driver release.
...
Please see the log entries 'Kernel module load error' and 'Kernel
messages' at the end of the file '/var/log/nvidia-installer.log' for
more information.
Is the installation failure due to CUDA dropping support for this graphics card?
I followed the link trail: https://developer.nvidia.com/cuda-gpus > https://developer.nvidia.com/cuda-legacy-gpus > http://www.nvidia.com/object/product_quadro_fx_3800_us.html and I would have thought the Quadro FX 3800 supported CUDA (at least at the beginning).
Yes, the Quadro FX 3800 GPU is no longer supported by CUDA 7.0 and beyond.
The last CUDA version that supported that GPU was CUDA 6.5.
This answer and this answer may be of interest. Your QFX 3800 is a compute capability 1.3 device.
If you review the release notes that come with CUDA 7, you will find a notice of the elimination of support for these earlier GPUs. Likewise, the newer CUDA driver versions also don't support those GPUs.

cuda-gdb run error:"Failed to read the valid warps mask(dev=0,sm=0,error=16)" [duplicate]

I tried to debug my CUDA application with cuda-gdb but got some weird error.
I set option -g -G -O0 to build my application. I could run my program without cuda-gdb, but didn't get correct result. Hence I decided to use cuda-gdb, however, I got following error message while running program with cuda-gdb
Error: Failed to read the valid warps mask (dev=1, sm=0, error=16).
What does it means? Why sm=0 and what's the meaning of error=16?
Update 1: I tried to use cuda-gdb to CUDA samples, but it fails with same problem. I just installed CUDA 6.0 Toolkit followed by instruction of NVIDIA. Is it a problem of my system?
Update 2:
OS - CentOS 6.5
GPU
1 Quadro 400
2 Tesla C2070
I'm using only 1 GPU for my program, but I've got same bug message from any GPU that I selected
CUDA version - 6.0
GPU Driver
NVRM version: NVIDIA UNIX x86_64 Kernel Module 331.62 Wed Mar 19 18:20:03 PDT 2014
GCC version: gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
Update 3:
I tried to get more information in cuda-gdb, but I got following results
(cuda-gdb) info cuda devices
Error: Failed to read the valid warps mask (dev=1, sm=0, error=16).
(cuda-gdb) info cuda sms
Focus not set on any active CUDA kernel.
(cuda-gdb) info cuda lanes
Focus not set on any active CUDA kernel.
(cuda-gdb) info cuda kernels
No CUDA kernels.
(cuda-gdb) info cuda contexts
No CUDA contexts.
Actually, this issue is only specific to some old NVIDIA GPUs(like "Quadro 400", "GeForce GT220", or "GeForce GT 330M", etc).
On Liam Kim's setup, cuda-gdb should work fine by set environment variable "CUDA_VISIBLE_DEVICES", and let cuda-gdb running on Tesla C2070 GPUs specifically.
I.e
$export CUDA_VISIBLE_DEVICES=0 (or 2)
- the exact CUDA devices index could be found by running cuda sample - "deviceQuery".
And now, this issue has been fixed, the fix would be availble for CUDA developers in the next CUDA release(it will be posted out around early July, 2014).
This is internal cuda-gdb bug. You should report a bug.
Can you try installing CUDA toolkit from the package on NVIDIA site?

exception (first chance) ... cudaError_enum at memory

So I am working on a project which is spitting me out that error, after some research showed that the problem lies with the cublas library.
So now I have the following "minimal" problem:
I opened the simpleCUBLAS example out of the NVIDIA CUDA SDK (4.2) to test if I can reproduce the problem .
the programm itself works but VS2010 gives me a similar output:
Eine Ausnahme (erste Chance) bei 0x75e3c41f in simpleCUBLAS.exe: Microsoft C++-Ausnahme: cudaError_enum an Speicherposition 0x003bf704..
7 times
so to my specs:
I use a GTX 460 for computing, compile with sm_20 use VS2010 on Windows 7 64-bit
and nvcc --version gives me:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Fri_Jan_13_01:18:37_PST_2012
Cuda compilation tools, release 4.1, V0.2.1221
this is my first time posting here so I apologize for the horrible format it is posted
The observation you are making has to do with an exception that is caught and handled properly within the CUDA libraries. It is, in some cases, a normal part of CUDA GPU operation. As you have observed, your application returns no API errors and runs correctly. If you were not within the VS environment that can report this, you would not observe this at all.
This is considered normal behavior under CUDA. I believe there were some attempts to eliminate it in CUDA 5.5. You might wish to try that, although it's not considered an issue either way.

Why CUDA Command line profiler doesn't recognize some counters?

I am working remotely on some CUDA program in the Linux environment. Since there are problems with X-forwarding, I cannot use CUDA Visual Profiler and have to use CUDA Command Line profiler instead.
The problem is, it doesn't recognize some basic counters I want it to follow. E.g running the program with the following command
COMPUTE_PROFILE=1 COMPUTE_PROFILE_CSV=0 COMPUTE_PROFILE_LOG=log \
CUDA_PROFILE_CONFIG=Config.txt ./my_program
With the Config.txt file being:
warp_serialize
shared_replay_overhead
Results in the following log:
NV_Warning: Ignoring the invalid profiler config option: warp_serialize
NV_Warning: Ignoring the invalid profiler config option: shared_replay_overhead
CUDA_PROFILE_LOG_VERSION 2.0
CUDA_DEVICE 0 GeForce GTX 580
CUDA_CONTEXT 1
TIMESTAMPFACTOR fffff6c8b2653dd8
...
My enviroment specifications:
Card: GeForce GTX 580
CUDA Driver Version / Runtime Version: 4.1 / 4.1
CUDA Capability Major/Minor version number: 2.0
Any ideas what I might be doing wrong?
The warp_serialize counter is not supported for devices of compute capability 2.x. See Table 6. Profiler Counter Types at Compute Visual Profiler User guide.
Regarding to the shared_replay_overhead I have not found anything related to it.