Using CUDA 8.0 with GCC 6.x - bad function overloading complaint - cuda

I'm trying to build some CUDA code using GCC 6.2.1, the default compiler of my distribution (Note: not a GCC version officially supported by CUDA, so you can call this experimental). This is code which builds fine with GCC 4.9.3 and both CUDA versions 7.5 and 8.0.
Well, if I build the following (close-to) minimal example:
#include <tuple>
int main() { return 0; }
with the command-line
nvcc -std=c++11 -Wno-deprecated-gpu-targets -o main main.cu
I get the following errors:
/usr/local/cuda/bin/../targets/x86_64-linux/include/math_functions.h(8897): error: cannot overload functions distinguished by return type alone
/usr/local/cuda/bin/../targets/x86_64-linux/include/math_functions.h(8901): error: cannot overload functions distinguished by return type alone
2 errors detected in the compilation of "/tmp/tmpxft_000071fe_00000000-9_b.cpp1.ii".
Why is that? How can I correct/circumvent this?

TL;DR: Forget about it. Only use CUDA 8.x with GCC 5.x , and CUDA 9 or later with GCC 6.x
It seems other people have seen this issue with GCC 6.1.x and the suggestion is to add the following flags to nvcc: -Xcompiler -D__CORRECT_ISO_CPP11_MATH_H_PROTO (yes, two successive flags; see nvcc --help for details). (But I can't report complete success since other issues pop up instead.)
But remember that GCC 5.4.x is the latest supported version, and that probably has a good reason, so it's somewhat of a wild goose chase to force GCC 6.x onto it - especially when CUDA 9 is available now.

Related

Undefined reference to `cublasCreate_v2’ in ‘/tmp/tmpxft_0000120b_0000000-10_my_program”

I have tried to compile a code using CUDA 9.0 toolkit on NVIDIA Tesla P100 graphic card (Ubuntu version 16.04) and CUBLAS library is used in the code. For compilation, I have used the following command to compile “my_program.cu”
nvcc -std=c++11 -L/usr/local/cuda-9.0/lib64 my_program.cu -o mu_program.o -lcublas
But, I have got the following error:
nvlink error: Undefined reference to 'cublasCreate_v2’in '/tmp/tmpxft_0000120b_0000000-10_my_program’
As I have already linked the library path in the compilation command, why do I still get the error. Please help me to solve this error.
It seems fairly evident that you are trying to use the CUBLAS library in device code. This is different than ordinary host usage and requires special compilation/linking steps. You need to:
compile for the correct device architecture (must be cc3.5 or higher)
use relocatable device code linking
link in the cublas device library (in addition to the cublas host library)
link in the CUDA device runtime library
Use a CUDA toolkit prior to CUDA 10.0
The following additions to your compile command line should get you there:
nvcc -std=c++11 my_program.cu -o my_program.o -lcublas -arch=sm_60 -rdc=true -lcublas_device -lcudadevrt
The above assumes you are actually using a proper install of CUDA 9.0. The CUBLAS device library was deprecated and is now removed from newer CUDA toolkits (see here).

fail to link cuda example with clang++-9 under Ubuntu 18.04

I am trying to follow the example in
https://llvm.org/docs/CompileCudaWithLLVM.html#invoking-clang
I use Ubuntu 18.04.3 LTS, clang version 9.0.0-2
The device I have is (snippet from the output of deviceQuery):
Detected 1 CUDA Capable device(s)
Device 0: "Quadro P520"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 6.1
I ran the command:
clang++-9 --verbose --cuda-path=/usr/local/cuda-10.2 axpy.cu -o axpy --cuda-gpu-arch=sm_61 -L/usr/local/cuda-10.2 -lcudart_static -ldl -lrt -pthread
And the output is:
clang version 9.0.0-2~ubuntu18.04.1 (tags/RELEASE_900/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7.4.0
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.4.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7.4.0
Candidate multilib: .;#m64
Selected multilib: .;#m64
Found CUDA installation: /usr/local/cuda-10.2, version unknown
clang: error: cannot find libdevice for sm_61. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
As far as I can tell, libdevice is right where it should be:
~>ls /usr/local/cuda-10.2/nvvm/libdevice/
libdevice.10.bc
What am I doing wrong ?
Added Nov 2020:
Following #ArtemB comment, I tried running it with clang++-10, which throws a warning, but compiles and runs just fine.
Short answer: The version of cuda my driver supports (10.2) is too current for my clang (9.0.0).
Here is the top of the output of nvidia-smi on my machine:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
So my driver indeed supports cuda-10.2. However, it seems this version is not supported by clang 9.0.0. Indeed when running the above command with the extra flag -nocudalib , one gets the following response (only showing the last lines):
In file included from <built-in>:1:
/usr/lib/llvm-9/lib/clang/9.0.0/include/__clang_cuda_runtime_wrapper.h:52:2: error: "Unsupported CUDA version!"
#error "Unsupported CUDA version!"
^
axpy.cu:23:7: error: use of undeclared identifier cudaConfigureCall
axpy<<<1, kDataLen>>>(a, device_x, device_y);
^
2 errors generated when compiling for sm_61.
When inspecting the offending file (the clang cuda runtime wrapper), one sees the following in lines 48-53:
#include "cuda.h"
#if !defined(CUDA_VERSION)
#error "cuda.h did not define CUDA_VERSION"
#elif CUDA_VERSION < 7000 || CUDA_VERSION > 10010
#error "Unsupported CUDA version!"
#endif
Until recently clang was rather particular about CUDA versions. I've relaxed it a bit lately, so clang-10 is more lenient and will attempt to use a newer CUDA version at a feature parity with the latest supported CUDA version (currently 10.1). It will also issue a warning. It does work with CUDA-11.0 well enough to compile Tensorflow.
CUDA-11.1 (and I believe 11.0 update1 on windows) have dropped the version.txt file from the distribution and that will break CUDA compilation with the currently released clang versions, again. This should be fixed in clang-11.0.1 when it's released (version match with CUDA is purely coincidental).

How to force cubin file generation for a higher compute version

In the samples provided with CUDA 6.0, I'm running the following compile command with error output:
foo#foo:/usr/local/cuda-6.0/samples/0_Simple/cdpSimpleQuicksort$ nvcc --cubin -I../../common/inc cdpSimpleQuicksort.cu
nvcc warning : The 'compute_10' and 'sm_10' architectures are deprecated, and may be removed in a future release.
cdpSimpleQuicksort.cu(105): error: calling a __global__ function("cdp_simple_quicksort") from a __global__ function("cdp_simple_quicksort") is only allowed on the compute_35 architecture or above
cdpSimpleQuicksort.cu(114): error: calling a __global__ function("cdp_simple_quicksort") from a __global__ function("cdp_simple_quicksort") is only allowed on the compute_35 architecture or above
2 errors detected in the compilation of "/tmp/tmpxft_0000241a_00000000-6_cdpSimpleQuicksort.cpp1.ii".
I then altered the command to this, with a new failure:
foo#foo:/usr/local/cuda-6.0/samples/0_Simple/cdpSimpleQuicksort$ nvcc --cubin -I../../common/inc -gencode arch=compute_35,code=sm_35 cdpSimpleQuicksort.cu
cdpSimpleQuicksort.cu(105): error: kernel launch from __device__ or __global__ functions requires separate compilation mode
cdpSimpleQuicksort.cu(114): error: kernel launch from __device__ or __global__ functions requires separate compilation mode
2 errors detected in the compilation of "/tmp/tmpxft_000024f3_00000000-6_cdpSimpleQuicksort.cpp1.ii".
Does this have anything to do with the fact that the machine I'm on is only Compute 2.1 capable and the build tools are blocking me? What's the resolution... I'm not finding anything in the documentation that is clearly handling this error.
I looked at this question, and that... a link to documentation is simply not helping. I need to know how I have to modify the compile command.
Look at the makefile that comes with that cdpSimpleQuicksort project. It shows some additional switches that are needed to compile it, due to CUDA dynamic parallelism (which is essentially the second set of errors you are seeing.) Go back and study that makefile, and see if you can figure out how to combine some of the compile commands there with --cubin.
The readers digest version is that this should compile without error:
nvcc --cubin -rdc=true -I../../common/inc -arch=sm_35 cdpSimpleQuicksort.cu
Having said all that, you should be able to compile for whatever kind of target you want, but you won't be able to run a cdp code on a cc2.1 architecture.
cdp documentation
and here

CUDA compiler warning about unrecognized GCC pragma

There is some C++ code in a CUDA file that uses this pragma:
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-result"
void foobar()
{
// some code
}
#pragma GCC diagnostic pop
When this CUDA file is compiled using CUDA 5.5 nvcc compiler, the host compiler stage is fine, but the device compiler stage produces this warning:
foobar.cu(420): warning: unrecognized GCC pragma
It looks like the CUDA compiler understands that this is a GCC pragma. I have no idea why it is trying to understand all GCC pragmas. Is there any method to fix this warning or make this warning go away?
Update: Please note that passing the -Xcudafe "--diag_suppress=unrecognized_pragma" option to the nvcc compiler does not seem to have any effect.
Try this one:
-Xcudafe "--diag_suppress=unrecognized_gcc_pragma"
As shown in the duplicate question, you need to pass the following flag to nvcc:
-Xcudafe "--diag_suppress=unrecognized_pragma"

CUDA: nvcc fatal error when trying to make CUDASW++

I keep getting:
nvcc fatal : Value 'sm_20' is not defined for option 'gpu-name'
My GPU is a GTX 590 and is indeed version 2.0 so that's not the problem. I switched to a lower version (sm_20) and get tons of errors with .h files.
Any ideas on what to try? I'm using cuda 5.0.
You could try compute_20 instead of sm_20.
Looking at the nvcc documentation in CUDA 5.0, the --gpu-name command line option is not mentioned. I guess it is an old option and you should probably instead use the -arch and/or -code options.