PyCuda C++ kernel "error: this declaration may not have extern "C" linkage" - cuda

I tried using std::tuple in my kernel code, but received many error: this declaration may not have extern "C" linkage errors that pointed to utility and tuple
It complains on the include. The following repros for me.
from pycuda.compiler import SourceModule
mod = SourceModule("""#include <tuple>""")
Do I need to do something special in my kernel code or in my Python code to specify I want to use the C++ compiler?
Cuda version: 11.8
PyCuda version: 2022.2.1

Do I need to do something special in my kernel code or in my Python code to specify I want to use the C++ compiler?
To be clear, you are using the C++ compiler. But PyCUDA automagically wraps the code you pass into a SourceModule instance in extern “C” unless you explicitly tell it not to:
Unless no_extern_c is True, the given source code is wrapped in extern “C” { … } to prevent C++ name mangling.
The underlying reason from a C++ perspective is that templated instances of types and functions can’t resolve with C linkage, thus the error.
However, even after you fix that problem, prepared to be disappointed. CUDA supports a lot of C++ language features, but it doesn’t support the standard library and you can’t use std::tuple within kernel code. NVIDIA does provide their own (very limited) reimplementation of the C++ standard library, and it does have a basic tuple type. That might work for you.

Related

Is there a way to specify __device__ for an entire file? (Nvidia Cuda Compiler)

I am importing a library and I get this error when compiling:
go.cu(61): error: calling a __host__ function("TinyJS::Interpreter::Interpreter()") from a __global__ function("capnduk_kernel") is not allowed
...is there a way to port an entire file (TinyJS) to run on the device?
I've checked the compiler documentation, and it doesn't look like there's a way to do this. I'm guessing the only way is to rewrite the file by hand, which is a can of worms.
There isn't a way to do this with nvcc. It will require manual effort.
While NVCC does not support this (as Robert points out), this is an option for run-time compilation, via the NVRTC library:
Documentation lists the following compilation option:
--device-as-default-execution-space (-default-device)
Treat entities with no execution space annotation as __device__ entities.
Notes:
With this being the case, I would consider submitting a bug report to NVIDIA and asking them to add this option to NVCC.
clang++ supports compiling CUDA, perhaps it has such a flag.
This NVRTC is also supported by the Modern-C++ wrappers library for CUDA, which is more convenient to use than working with NVRTC directly. (Caveat: That's my own library.)

How to find the CUDA __device__ definition of a function?

I have a specific function I am trying to find the source definition for, specifically what the nvcc compiler is using. This question is phrased to apply to any function (or symbol I suppose), which is used in a __device__ function. Given:
__device__ void Foo(){
int x = round( 0.0f );
}
What is the standard/canonical/recommended way to find the definition for "round( float )" used by the nvcc compiler to generate device code?
Normally I use Visual Studio's F1 "Go to Definition", or search for "round" in project files, etc. I also search the CUDA Toolkit documentation and CUDA MATH API. In this case, I find the VS cmath definition. But how do I determine which definition the nvcc compiler uses?
What is the standard/canonical/recommended way to find the definition for "round( float )" used by the nvcc compiler to generate device code?
Disassembly. Most inbuilt functions exist as stubs in headers that are expanded into inline assembly sequences as part of a device compiler code generating pass. There is no input code to view.

CUDA constant memory symbols

I am using CUDA 5.0 and I have modules which are compiled separately.
I would like to access the same value in the constant memory from all modules.
The problem is the following, when I define the symbol in each
module the linker claims that the symbol has been redefined.
Is there a workaround or a solution for this problem?
Thank you for helping.
In CUDA separate compilation mode, there is a true linker, and every symbol which is linked into the final device binary payload much be uniquely defined. This means __constant__ memory symbols must be only be defined in one place in all the code which is linked together.
The solution is probably to declare the symbol as extern at every translation unit scope except one, which contains the definition of the symbol. Note that this is the only case where it is valid to use extern with __constant__ symbols, otherwise they are implicitly static. There is a general discussion of the separate compilation model which describes this scenario buried in the documentation (both the programming guide and nvcc manual IIRC).

Visual Studio + Nsight : __syncthreads() undefined [duplicate]

At the moment CUDA already recognizes a key CUDA C/C++ function such as cudaMalloc, cudaFree, cudaEventCreate, etc.
It also recognizes certain types like dim3 and cudaEvent_t.
However, it doesn't recognize other functions and types such as the texture template, the __syncthreads functions, or the atomicCAS function.
Everything compiles just fine, but I'm tired of seeing red underlinings all over the place and I want to the see the example parameters displayed when you type in any recognizable function.
How do I get VS to catch these functions?
You could create a dummy #include file of the following form:
#pragma once
#ifdef __INTELLISENSE__
void __syncthreads();
...
#endif
This should hide the fake prototypes from the CUDA and Visual C++ compilers, but still make them visible to IntelliSense.
Source for __INTELLISENSE__ macro: http://blogs.msdn.com/b/vcblog/archive/2011/03/29/10146895.aspx
You need to add CUDA-specific keywords like __syncthreads to the usertype.dat file for visual studio. An example usertype.dat file is included with the NVIDIA CUDA SDK. You also need to make sure that visual studio recognizes .cu files as c/c++ files as described in this post:
Note however that where that post uses $(CUDA_INC_PATH), with recent versions of CUDA you should use $(CUDA_PATH)/include.
Also, I would recommend Visual Assist X -- not free, but worth the money -- to improve intellisense. It works well with CUDA if you follow these instructions:
http://www.wholetomato.com/forum/topic.asp?TOPIC_ID=5481
http://forums.nvidia.com/index.php?showtopic=53690

Does CUDA use an interpreter or a compiler?

This is a bit of silly question, but I'm wondering if CUDA uses an interpreter or a compiler?
I'm wondering because I'm not quite sure how CUDA manages to get source code to run on two cards with different compute capabilities.
From Wikipedia:
Programmers use 'C for CUDA' (C with Nvidia extensions and certain restrictions), compiled through a PathScale Open64 C compiler.
So, your answer is: it uses a compiler.
And to touch on the reason it can run on multiple cards (source):
CUDA C/C++ provides an abstraction, it's a means for you to express how you want your program to execute. The compiler generates PTX code which is also not hardware specific. At runtime the PTX is compiled for a specific target GPU - this is the responsibility of the driver which is updated every time a new GPU is released.
These official documents CUDA C Programming Guide and The CUDA Compiler Driver (NVCC) explain all the details about the compilation process.
From the second document:
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range
of conventional compiler options, such as for defining macros and
include/library paths, and for steering the compilation process.
Not just limited to cuda , shaders in directx or opengl are also complied to some kind of byte code and converted to native code by the underlying driver.