Numba cuda Possible mix of compiler/IR from different releases - cuda

I've been trying to run some numba/cuda code, like this module:
https://github.com/Maghoumi/pytorch-softdtw-cuda/blob/master/soft_dtw_cuda.py
However I run into the following error:
numba.cuda.cudadrv.error.NvvmError: Failed to compile
IR version 1.6 incompatible with current version 2.0
<unnamed>: error: incompatible IR detected. Possible mix of compiler/IR from different releases.
NVVM_ERROR_IR_VERSION_MISMATCH
I guess I installed incompatible versions for some packages, but have no idea where to start. Which packages are concerned?

The underlying reason for this appears to be using CUDA 12.
According to the CUDA 12 release notes:
NVVM IR Update: with CUDA 12.0 we are releasing NVVM IR 2.0 which is
incompatible with NVVM IR 1.x accepted by the libNVVM compiler in
prior CUDA toolkit releases. Users of the libNVVM compiler in CUDA
12.0 toolkit must generate NVVM IR 2.0.
From the error, it would appear that the Numba CUDA backend is generating NVVM IR 1.6, and from the release notes for CUDA 12, NVVM IR 1.6 is no longer supported by the NVVM compiler library supplied in CUDA 12.
In the short term, use CUDA 11.x or earlier. In the longer term, report this as a bug to the Numba developers and get them to update their compiler infrastructure to match the CUDA 12 NVVM requirements.

Related

What is the minimum compute capability for CUDA compilation supported by LLVM compiler?

A CUDA source file can be compiled into PTX format using LLVM compiler with the command clang -Xclang -I$LIBCLC/include/generic -I$LIBCLC/include/ptx -Dcl_clang_storage_class_specifiers -O3 cudaFile.cu -S -o ptxOutputFile.ptx --cuda-gpu-arch=sm_XX
Where sm_XX can be replaced as sm_20, sm_30. For compute capability 1.0, when sm_XX was replaced with sm_10, it gives the error fatal error: cannot open file '/tmp/shared-25f2f5.s': No such file or directory
1 error generated.
So it seems the LLVM has a minimum compute capability of 2.0. Is this assumption correct?
It should be correct. As from CUDA 7.0, both the toolkit and driver support for sm_1x has stopped. If sm_20 works, it has to be the minimum.
CUDA Toolkit and CUDA Driver Support for Tesla Architecture
The CUDA Toolkit and CUDA Driver no longer supports the sm_10, sm_11, sm_12, and sm_13 architectures. As a consequence, CU_TARGET_COMPUTE_1x enum values have been removed from the CUDA headers.
http://developer.download.nvidia.com/compute/cuda/7_0/Prod/doc/CUDA_Toolkit_Release_Notes.pdf

CUDA Installation for NVidia Quadro FX 3800

I'm having trouble installing CUDA 7.0 (to use with TensorFlow) on a workstation with the Nvidia Quadro FX 3800. I'm wondering if this is because the GPU is no longer supported.
Installation of the driver (340.96) seems to work fine:
$ sh ./NVIDIA-Linux-x86_64-340.96.run
Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64
(version: 340.96) is now complete. Please update your XF86Config or
xorg.conf file as appropriate; see the file
/usr/share/doc/NVIDIA_GLX-1.0/README.txt for details.
However, I think I may be having trouble with the following:
$ ./cuda_7.0.28_linux.run --kernel-source-path=/usr/src/linux-headers-3.13.0-76-generic
The driver installation is unable to locate the kernel source. Please make sure
that the kernel source packages are installed and set up correctly. If you know
that the kernel source packages are installed and set up correctly, you may pass
the location of the kernel source with the '--kernel-source-path' flag.
...
Logfile is /tmp/cuda_install_1357.log
$ vi /tmp/cuda_install_1357.log
WARNING: The NVIDIA Quadro FX 3800 GPU installed in this system is
supported through the NVIDIA 340.xx legacy Linux graphics drivers.
Please visit http://www.nvidia.com/object/unix.html for more
information. The 346.46 NVIDIA Linux graphics driver will ignore
this GPU.
WARNING: You do not appear to have an NVIDIA GPU supported by the 346.46
NVIDIA Linux graphics driver installed in this system. For
further details, please see the appendix SUPPORTED NVIDIA GRAPHICS
CHIPS in the README available on the Linux driver download page at
www.nvidia.com.
...
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most
frequently when this kernel module was built against the wrong or
improperly configured kernel sources, with a version of gcc that
differs from the one used to build the target kernel, or if a driver
such as rivafb, nvidiafb, or nouveau is present and prevents the
NVIDIA kernel module from obtaining ownership of the NVIDIA graphics
device(s), or no NVIDIA GPU installed in this system is supported by
this NVIDIA Linux graphics driver release.
...
Please see the log entries 'Kernel module load error' and 'Kernel
messages' at the end of the file '/var/log/nvidia-installer.log' for
more information.
Is the installation failure due to CUDA dropping support for this graphics card?
I followed the link trail: https://developer.nvidia.com/cuda-gpus > https://developer.nvidia.com/cuda-legacy-gpus > http://www.nvidia.com/object/product_quadro_fx_3800_us.html and I would have thought the Quadro FX 3800 supported CUDA (at least at the beginning).
Yes, the Quadro FX 3800 GPU is no longer supported by CUDA 7.0 and beyond.
The last CUDA version that supported that GPU was CUDA 6.5.
This answer and this answer may be of interest. Your QFX 3800 is a compute capability 1.3 device.
If you review the release notes that come with CUDA 7, you will find a notice of the elimination of support for these earlier GPUs. Likewise, the newer CUDA driver versions also don't support those GPUs.

exception (first chance) ... cudaError_enum at memory

So I am working on a project which is spitting me out that error, after some research showed that the problem lies with the cublas library.
So now I have the following "minimal" problem:
I opened the simpleCUBLAS example out of the NVIDIA CUDA SDK (4.2) to test if I can reproduce the problem .
the programm itself works but VS2010 gives me a similar output:
Eine Ausnahme (erste Chance) bei 0x75e3c41f in simpleCUBLAS.exe: Microsoft C++-Ausnahme: cudaError_enum an Speicherposition 0x003bf704..
7 times
so to my specs:
I use a GTX 460 for computing, compile with sm_20 use VS2010 on Windows 7 64-bit
and nvcc --version gives me:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Fri_Jan_13_01:18:37_PST_2012
Cuda compilation tools, release 4.1, V0.2.1221
this is my first time posting here so I apologize for the horrible format it is posted
The observation you are making has to do with an exception that is caught and handled properly within the CUDA libraries. It is, in some cases, a normal part of CUDA GPU operation. As you have observed, your application returns no API errors and runs correctly. If you were not within the VS environment that can report this, you would not observe this at all.
This is considered normal behavior under CUDA. I believe there were some attempts to eliminate it in CUDA 5.5. You might wish to try that, although it's not considered an issue either way.

CUDA Toolkit 4.1/4.2: nvcc Crashes with an Access Violation

I am developing a CUDA application for GTX 580 with Visual Studio 2010 Professional on Windows 7 64bit. My project builds fine with CUDA Toolkit 4.0, but nvcc crashes when I choose CUDA Toolkit 4.1 or 4.2 with the following error:
1> Stack dump:
1> 0. Running pass 'Promote Constant Global' on module 'moduleOutput'.
1>CUDACOMPILE : nvcc error : 'cicc' died with status 0xC0000005 (ACCESS_VIOLATION)
Strangely enough, the program compiles OK with "compute_10,sm_10" specified for "Code Generation", but "compute_20,sm_20" does not work. The code in question can be downloaded here:
http://www.meriken2ch.com/files/CUDA_SHA-1_Tripper_MERIKENs_Branch_0.04_Alpha_1.zip
(README.txt is in Japanese, but comments in source files are in English.)
I am suspecting a newly introduced bug in CUDA Toolkit 4.1/4.2. Has anybody encountered this issue? Is there any workaround for it? Any kind of help will be much appreciated.
This appears to have been a compiler bug in CUDA 4.x that is fixed in CUDA 5.0 (according to a comment from #meriken2ch, the project builds fine with CUDA 5.0 RC).

Why CUDA Command line profiler doesn't recognize some counters?

I am working remotely on some CUDA program in the Linux environment. Since there are problems with X-forwarding, I cannot use CUDA Visual Profiler and have to use CUDA Command Line profiler instead.
The problem is, it doesn't recognize some basic counters I want it to follow. E.g running the program with the following command
COMPUTE_PROFILE=1 COMPUTE_PROFILE_CSV=0 COMPUTE_PROFILE_LOG=log \
CUDA_PROFILE_CONFIG=Config.txt ./my_program
With the Config.txt file being:
warp_serialize
shared_replay_overhead
Results in the following log:
NV_Warning: Ignoring the invalid profiler config option: warp_serialize
NV_Warning: Ignoring the invalid profiler config option: shared_replay_overhead
CUDA_PROFILE_LOG_VERSION 2.0
CUDA_DEVICE 0 GeForce GTX 580
CUDA_CONTEXT 1
TIMESTAMPFACTOR fffff6c8b2653dd8
...
My enviroment specifications:
Card: GeForce GTX 580
CUDA Driver Version / Runtime Version: 4.1 / 4.1
CUDA Capability Major/Minor version number: 2.0
Any ideas what I might be doing wrong?
The warp_serialize counter is not supported for devices of compute capability 2.x. See Table 6. Profiler Counter Types at Compute Visual Profiler User guide.
Regarding to the shared_replay_overhead I have not found anything related to it.