CUDA compiler warning about unrecognized GCC pragma - cuda

There is some C++ code in a CUDA file that uses this pragma:
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-result"
void foobar()
{
// some code
}
#pragma GCC diagnostic pop
When this CUDA file is compiled using CUDA 5.5 nvcc compiler, the host compiler stage is fine, but the device compiler stage produces this warning:
foobar.cu(420): warning: unrecognized GCC pragma
It looks like the CUDA compiler understands that this is a GCC pragma. I have no idea why it is trying to understand all GCC pragmas. Is there any method to fix this warning or make this warning go away?
Update: Please note that passing the -Xcudafe "--diag_suppress=unrecognized_pragma" option to the nvcc compiler does not seem to have any effect.

Try this one:
-Xcudafe "--diag_suppress=unrecognized_gcc_pragma"

As shown in the duplicate question, you need to pass the following flag to nvcc:
-Xcudafe "--diag_suppress=unrecognized_pragma"

Related

Undefined reference to `cublasCreate_v2’ in ‘/tmp/tmpxft_0000120b_0000000-10_my_program”

I have tried to compile a code using CUDA 9.0 toolkit on NVIDIA Tesla P100 graphic card (Ubuntu version 16.04) and CUBLAS library is used in the code. For compilation, I have used the following command to compile “my_program.cu”
nvcc -std=c++11 -L/usr/local/cuda-9.0/lib64 my_program.cu -o mu_program.o -lcublas
But, I have got the following error:
nvlink error: Undefined reference to 'cublasCreate_v2’in '/tmp/tmpxft_0000120b_0000000-10_my_program’
As I have already linked the library path in the compilation command, why do I still get the error. Please help me to solve this error.
It seems fairly evident that you are trying to use the CUBLAS library in device code. This is different than ordinary host usage and requires special compilation/linking steps. You need to:
compile for the correct device architecture (must be cc3.5 or higher)
use relocatable device code linking
link in the cublas device library (in addition to the cublas host library)
link in the CUDA device runtime library
Use a CUDA toolkit prior to CUDA 10.0
The following additions to your compile command line should get you there:
nvcc -std=c++11 my_program.cu -o my_program.o -lcublas -arch=sm_60 -rdc=true -lcublas_device -lcudadevrt
The above assumes you are actually using a proper install of CUDA 9.0. The CUBLAS device library was deprecated and is now removed from newer CUDA toolkits (see here).

Using CUDA 8.0 with GCC 6.x - bad function overloading complaint

I'm trying to build some CUDA code using GCC 6.2.1, the default compiler of my distribution (Note: not a GCC version officially supported by CUDA, so you can call this experimental). This is code which builds fine with GCC 4.9.3 and both CUDA versions 7.5 and 8.0.
Well, if I build the following (close-to) minimal example:
#include <tuple>
int main() { return 0; }
with the command-line
nvcc -std=c++11 -Wno-deprecated-gpu-targets -o main main.cu
I get the following errors:
/usr/local/cuda/bin/../targets/x86_64-linux/include/math_functions.h(8897): error: cannot overload functions distinguished by return type alone
/usr/local/cuda/bin/../targets/x86_64-linux/include/math_functions.h(8901): error: cannot overload functions distinguished by return type alone
2 errors detected in the compilation of "/tmp/tmpxft_000071fe_00000000-9_b.cpp1.ii".
Why is that? How can I correct/circumvent this?
TL;DR: Forget about it. Only use CUDA 8.x with GCC 5.x , and CUDA 9 or later with GCC 6.x
It seems other people have seen this issue with GCC 6.1.x and the suggestion is to add the following flags to nvcc: -Xcompiler -D__CORRECT_ISO_CPP11_MATH_H_PROTO (yes, two successive flags; see nvcc --help for details). (But I can't report complete success since other issues pop up instead.)
But remember that GCC 5.4.x is the latest supported version, and that probably has a good reason, so it's somewhat of a wild goose chase to force GCC 6.x onto it - especially when CUDA 9 is available now.

How to force cubin file generation for a higher compute version

In the samples provided with CUDA 6.0, I'm running the following compile command with error output:
foo#foo:/usr/local/cuda-6.0/samples/0_Simple/cdpSimpleQuicksort$ nvcc --cubin -I../../common/inc cdpSimpleQuicksort.cu
nvcc warning : The 'compute_10' and 'sm_10' architectures are deprecated, and may be removed in a future release.
cdpSimpleQuicksort.cu(105): error: calling a __global__ function("cdp_simple_quicksort") from a __global__ function("cdp_simple_quicksort") is only allowed on the compute_35 architecture or above
cdpSimpleQuicksort.cu(114): error: calling a __global__ function("cdp_simple_quicksort") from a __global__ function("cdp_simple_quicksort") is only allowed on the compute_35 architecture or above
2 errors detected in the compilation of "/tmp/tmpxft_0000241a_00000000-6_cdpSimpleQuicksort.cpp1.ii".
I then altered the command to this, with a new failure:
foo#foo:/usr/local/cuda-6.0/samples/0_Simple/cdpSimpleQuicksort$ nvcc --cubin -I../../common/inc -gencode arch=compute_35,code=sm_35 cdpSimpleQuicksort.cu
cdpSimpleQuicksort.cu(105): error: kernel launch from __device__ or __global__ functions requires separate compilation mode
cdpSimpleQuicksort.cu(114): error: kernel launch from __device__ or __global__ functions requires separate compilation mode
2 errors detected in the compilation of "/tmp/tmpxft_000024f3_00000000-6_cdpSimpleQuicksort.cpp1.ii".
Does this have anything to do with the fact that the machine I'm on is only Compute 2.1 capable and the build tools are blocking me? What's the resolution... I'm not finding anything in the documentation that is clearly handling this error.
I looked at this question, and that... a link to documentation is simply not helping. I need to know how I have to modify the compile command.
Look at the makefile that comes with that cdpSimpleQuicksort project. It shows some additional switches that are needed to compile it, due to CUDA dynamic parallelism (which is essentially the second set of errors you are seeing.) Go back and study that makefile, and see if you can figure out how to combine some of the compile commands there with --cubin.
The readers digest version is that this should compile without error:
nvcc --cubin -rdc=true -I../../common/inc -arch=sm_35 cdpSimpleQuicksort.cu
Having said all that, you should be able to compile for whatever kind of target you want, but you won't be able to run a cdp code on a cc2.1 architecture.
cdp documentation
and here

Cuda-gdb not stopping at breakpoints inside kernel

Cuda-gdb was obeying all the breakpoints I would set, before adding '-arch sm_20' flag while compiling. I had to add this to avoid error being thrown : 'atomicAdd is undefined' (as pointed here). Here is my current statement to compile the code:
nvcc -g -G --maxrregcount=32 Main.cu -o SW_exe (..including header files...) -arch sm_20
and when I set a breakpoint inside kernel, cuda-gdb stops once at the last line of the kernel, and then the program continues.
(cuda-gdb) b SW_kernel_1.cu:49
Breakpoint 1 at 0x4114a0: file ./SW_kernel_1.cu, line 49.
...
[Launch of CUDA Kernel 5 (diagonalComputation<<<(1024,1,1),(128,1,1)>>>) on Device 0]
Breakpoint 1, diagonalComputation (__cuda_0=15386, __cuda_1=128, __cuda_2=0xf00400000, __cuda_3=0xf00200000,
__cuda_4=100, __cuda_5=0xf03fa0000, __cuda_6=0xf04004000, __cuda_7=0xf040a0000, __cuda_8=0xf00200200,
__cuda_9=15258, __cuda_10=5, __cuda_11=-3, __cuda_12=8, __cuda_13=1) at ./SW_kernel_1.cu:183
183 }
(cuda-gdb) c
Continuing.
But as I said, if I remove the 'atomicAdd()' call and the flag '-arch sm_20' which though makes my code incorrect, but now the cuda-gdb stops at the breakpoint I specify. Please tell me the reasons of this behaviour.
I am using CUDA 5.5 on Tesla M2070 (Compute Capability = 2.0).
Thanks!
From the CUDA DEBUGGER User Manual, Section 3.3.1:
NVCC, the NVIDIA CUDA compiler driver, provides a mechanism for generating the
debugging information necessary for CUDA-GDB to work properly. The -g -G option
pair must be passed to NVCC when an application is compiled in order to debug with
CUDA-GDB; for example,
nvcc -g -G foo.cu -o foo
Using this line to compile the CUDA application foo.cu
forces -O0 compilation, with the exception of very limited dead-code eliminations
and register-spilling optimizations.
makes the compiler include debug information in the executable
This means that, in principle, breakpoints could not be hit in kernel functions even when the code is compiled in debug mode since the CUDA compiler can perform some code optimizations and so the disassembled code could not correspond to the CUDA instructions.
When breakpoints are not hit, a workaround is to put a printf statement immediately after the variable one wants to check, as suggested by Robert Crovella at
CUDA debugging with VS - can't examine restrict pointers (Operation is not valid)
The OP has chosen here a different workaround, i.e., to compile for a different architecture. Indeed, the optimization the compiler does can change from architecture to architecture.

CUDA: nvcc fatal error when trying to make CUDASW++

I keep getting:
nvcc fatal : Value 'sm_20' is not defined for option 'gpu-name'
My GPU is a GTX 590 and is indeed version 2.0 so that's not the problem. I switched to a lower version (sm_20) and get tons of errors with .h files.
Any ideas on what to try? I'm using cuda 5.0.
You could try compute_20 instead of sm_20.
Looking at the nvcc documentation in CUDA 5.0, the --gpu-name command line option is not mentioned. I guess it is an old option and you should probably instead use the -arch and/or -code options.