Is there a way to specify __device__ for an entire file? (Nvidia Cuda Compiler) - cuda

I am importing a library and I get this error when compiling:
go.cu(61): error: calling a __host__ function("TinyJS::Interpreter::Interpreter()") from a __global__ function("capnduk_kernel") is not allowed
...is there a way to port an entire file (TinyJS) to run on the device?
I've checked the compiler documentation, and it doesn't look like there's a way to do this. I'm guessing the only way is to rewrite the file by hand, which is a can of worms.

There isn't a way to do this with nvcc. It will require manual effort.

While NVCC does not support this (as Robert points out), this is an option for run-time compilation, via the NVRTC library:
Documentation lists the following compilation option:
--device-as-default-execution-space (-default-device)
Treat entities with no execution space annotation as __device__ entities.
Notes:
With this being the case, I would consider submitting a bug report to NVIDIA and asking them to add this option to NVCC.
clang++ supports compiling CUDA, perhaps it has such a flag.
This NVRTC is also supported by the Modern-C++ wrappers library for CUDA, which is more convenient to use than working with NVRTC directly. (Caveat: That's my own library.)

Related

PyCuda C++ kernel "error: this declaration may not have extern "C" linkage"

I tried using std::tuple in my kernel code, but received many error: this declaration may not have extern "C" linkage errors that pointed to utility and tuple
It complains on the include. The following repros for me.
from pycuda.compiler import SourceModule
mod = SourceModule("""#include <tuple>""")
Do I need to do something special in my kernel code or in my Python code to specify I want to use the C++ compiler?
Cuda version: 11.8
PyCuda version: 2022.2.1
Do I need to do something special in my kernel code or in my Python code to specify I want to use the C++ compiler?
To be clear, you are using the C++ compiler. But PyCUDA automagically wraps the code you pass into a SourceModule instance in extern “C” unless you explicitly tell it not to:
Unless no_extern_c is True, the given source code is wrapped in extern “C” { … } to prevent C++ name mangling.
The underlying reason from a C++ perspective is that templated instances of types and functions can’t resolve with C linkage, thus the error.
However, even after you fix that problem, prepared to be disappointed. CUDA supports a lot of C++ language features, but it doesn’t support the standard library and you can’t use std::tuple within kernel code. NVIDIA does provide their own (very limited) reimplementation of the C++ standard library, and it does have a basic tuple type. That might work for you.

Using NTL and GMP with CUDA [duplicate]

I am trying to use cuda with the GNU multiple precision library (gmp). When I add gmp instructions like mpf_init() to my device code I get this compiler error: tlgmp.cu(37): error: calling a host function("__ gmpf_init") from a __ device__ /__ global__ function("histo") is not allowed.
Is it possible to redefine gmp instructions so that they can can be used in device code?
The GMP library is compiled for the host, and so it can't be used directly in device code. That is the direct reason for the error you are seeing.
Since it's an open-source library, it might be possible with some effort to go through the code and create your own version, that has appropriate __device__ decorators (and possibly other changes) to the various functions you need. This would probably require a substantial amount of work, however.
Another alternative might be to investigate the CUMP library.
Another alternative might be to investigate the xmp library
Another alternative might be to investigate the campary library

Can I use cuda without using nvcc on my host code?

I'm writing a single header library that executes a cuda kernel. I was wondering if there is a way to get around the <<<>>> syntax, or get C source output from nvcc?
You can avoid the host language extensions by using the CUDA driver API instead. It is a little more verbose and you will require a little more boilerplate code to manage the context, but it is not too difficult.
Conventionally, you would compile to PTX or a binary payload to load at runtime, however NVIDIA now also ship an experimental JIT CUDA C compiler library, libNVVM, which you could try if you want JIT from source.

link cuda with gmp

I am trying to use cuda with the GNU multiple precision library (gmp). When I add gmp instructions like mpf_init() to my device code I get this compiler error: tlgmp.cu(37): error: calling a host function("__ gmpf_init") from a __ device__ /__ global__ function("histo") is not allowed.
Is it possible to redefine gmp instructions so that they can can be used in device code?
The GMP library is compiled for the host, and so it can't be used directly in device code. That is the direct reason for the error you are seeing.
Since it's an open-source library, it might be possible with some effort to go through the code and create your own version, that has appropriate __device__ decorators (and possibly other changes) to the various functions you need. This would probably require a substantial amount of work, however.
Another alternative might be to investigate the CUMP library.
Another alternative might be to investigate the xmp library
Another alternative might be to investigate the campary library

Does CUDA use an interpreter or a compiler?

This is a bit of silly question, but I'm wondering if CUDA uses an interpreter or a compiler?
I'm wondering because I'm not quite sure how CUDA manages to get source code to run on two cards with different compute capabilities.
From Wikipedia:
Programmers use 'C for CUDA' (C with Nvidia extensions and certain restrictions), compiled through a PathScale Open64 C compiler.
So, your answer is: it uses a compiler.
And to touch on the reason it can run on multiple cards (source):
CUDA C/C++ provides an abstraction, it's a means for you to express how you want your program to execute. The compiler generates PTX code which is also not hardware specific. At runtime the PTX is compiled for a specific target GPU - this is the responsibility of the driver which is updated every time a new GPU is released.
These official documents CUDA C Programming Guide and The CUDA Compiler Driver (NVCC) explain all the details about the compilation process.
From the second document:
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range
of conventional compiler options, such as for defining macros and
include/library paths, and for steering the compilation process.
Not just limited to cuda , shaders in directx or opengl are also complied to some kind of byte code and converted to native code by the underlying driver.