Clang: gnu standard library requires -fcoroutines but clang only supports -fcoroutines-ts - c++20

It seems like the clang compiler flags are currently not compatible with the gnu standard library. Is there still a way to use coroutines?

It's not quite accurate to say that GCC's header requires -fcoroutines, what it actually requires is for the __cpp_impl_coroutine macro to be defined by the compiler, indicating that the compiler supports C++20 Coroutines. With GCC, you activate that support by using -fcoroutines.

Related

Can libcu++ be included in a regular C++ project?

I have tried using <cuda/std/chrono> and <cuda/std/array> inside my C++ project, not compiled with NVCC. The reason I am not making a CUDA project is that NVCC fails to compile some of my template code.
I am getting several errors about undefined constants such as the ones below:
error C2065: '_LInf': undeclared identifier error C3615: constexpr function 'cuda::std::__4::__libcpp_numeric_limits<long double,true>::infinity' cannot result in a constant expression
Can libcu++ be used inside C++ projects or only CUDA projects?
I have already tried including and linking to my C++ project the headers and libraries that are automatically added to CUDA projects by Visual Studio.
Converting comments into an answer:
The system requirements page for libcu++ indicate that only NVIDIA’s toolchains (so nvcc or the HPC SDK compilers) are supported.
More generally, libcu++ is intended to implement a subset of the C++ standard library in a transparent way which allows it to be used identically in host and device code without any programmer intervention beyond include a header and respecting its namespace conventions. It stands to reason that this requires the NVIDIA toolchain to implement this magic.

Is there a way to specify __device__ for an entire file? (Nvidia Cuda Compiler)

I am importing a library and I get this error when compiling:
go.cu(61): error: calling a __host__ function("TinyJS::Interpreter::Interpreter()") from a __global__ function("capnduk_kernel") is not allowed
...is there a way to port an entire file (TinyJS) to run on the device?
I've checked the compiler documentation, and it doesn't look like there's a way to do this. I'm guessing the only way is to rewrite the file by hand, which is a can of worms.
There isn't a way to do this with nvcc. It will require manual effort.
While NVCC does not support this (as Robert points out), this is an option for run-time compilation, via the NVRTC library:
Documentation lists the following compilation option:
--device-as-default-execution-space (-default-device)
Treat entities with no execution space annotation as __device__ entities.
Notes:
With this being the case, I would consider submitting a bug report to NVIDIA and asking them to add this option to NVCC.
clang++ supports compiling CUDA, perhaps it has such a flag.
This NVRTC is also supported by the Modern-C++ wrappers library for CUDA, which is more convenient to use than working with NVRTC directly. (Caveat: That's my own library.)

How to remove all PTX from compiled CUDA to prevent Intellectual Property leaks

CUDA PTX is analogous to assembly, and as such reveals the source code. I have read Section 3.1 of the CUDA Programming Guide and Section 3.2.7 from the online CUDA compiler documentation. I have a basic understanding of the -arch versus -code compiler options.
If I understand correctly, specifying -arch compute_XX makes PTX. Whereas -code sm_XX makes both PTX and cubin.
I desire only cubin, such that no PTX is in the resulting image. How can I achieve this?
Preferably via Visual Studio settings, although I only find the -gencode option within Visual Studio Project Settings.
PTX is not quite analogous to assembly. PTX is an intermediate representation of the program that can be compiled to the different, incompatible instruction set architectures (ISA) that Nvidia GPUs have been using over time. Usually, a new ISA for Nvidia GPUs comes with an updated version of PTX that can represent the new features of the ISA.
The -arch and -code options to nvcc work slightly differently to what you describe. They are not (mutual exclusive) alternatives, rather they determine different aspects.
-arch controls which PTX version is used as the intermediate representation. As such it is combined with a compute_XX PTX version.
-code controls what code is embedded into the resulting binary - either machine code for the specified ISA if used in the -code sm_XX form, or PTX to be just-in-time compiled by the GPU driver if -code compute_XX is specified.
As a special shortcut, specifying only -arch sm_XX will embed both the compiled code for the specified ISA and PTX code into the binary - this probably is the situation that you are referring to that you want to avoid.
Finally the -gencode option allows you to specify multiple -arch/-code pairs, with the resulting binary containing separate code for each of the pairs.
You can use nvprune to remove all but the desired ISA code from a binary.
If unsure, you can always use cuobjdump to check what is in a specific binary.
So the way to prevent any PTX code from being present in your resulting binary is to call nvcc as nvcc -arch compute_XX -code sm_XX (or use multiple such pairs together with -gencode).

Can I use cuda without using nvcc on my host code?

I'm writing a single header library that executes a cuda kernel. I was wondering if there is a way to get around the <<<>>> syntax, or get C source output from nvcc?
You can avoid the host language extensions by using the CUDA driver API instead. It is a little more verbose and you will require a little more boilerplate code to manage the context, but it is not too difficult.
Conventionally, you would compile to PTX or a binary payload to load at runtime, however NVIDIA now also ship an experimental JIT CUDA C compiler library, libNVVM, which you could try if you want JIT from source.

Does CUDA use an interpreter or a compiler?

This is a bit of silly question, but I'm wondering if CUDA uses an interpreter or a compiler?
I'm wondering because I'm not quite sure how CUDA manages to get source code to run on two cards with different compute capabilities.
From Wikipedia:
Programmers use 'C for CUDA' (C with Nvidia extensions and certain restrictions), compiled through a PathScale Open64 C compiler.
So, your answer is: it uses a compiler.
And to touch on the reason it can run on multiple cards (source):
CUDA C/C++ provides an abstraction, it's a means for you to express how you want your program to execute. The compiler generates PTX code which is also not hardware specific. At runtime the PTX is compiled for a specific target GPU - this is the responsibility of the driver which is updated every time a new GPU is released.
These official documents CUDA C Programming Guide and The CUDA Compiler Driver (NVCC) explain all the details about the compilation process.
From the second document:
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range
of conventional compiler options, such as for defining macros and
include/library paths, and for steering the compilation process.
Not just limited to cuda , shaders in directx or opengl are also complied to some kind of byte code and converted to native code by the underlying driver.