Can libcu++ be included in a regular C++ project? - cuda

I have tried using <cuda/std/chrono> and <cuda/std/array> inside my C++ project, not compiled with NVCC. The reason I am not making a CUDA project is that NVCC fails to compile some of my template code.
I am getting several errors about undefined constants such as the ones below:
error C2065: '_LInf': undeclared identifier error C3615: constexpr function 'cuda::std::__4::__libcpp_numeric_limits<long double,true>::infinity' cannot result in a constant expression
Can libcu++ be used inside C++ projects or only CUDA projects?
I have already tried including and linking to my C++ project the headers and libraries that are automatically added to CUDA projects by Visual Studio.

Converting comments into an answer:
The system requirements page for libcu++ indicate that only NVIDIA’s toolchains (so nvcc or the HPC SDK compilers) are supported.
More generally, libcu++ is intended to implement a subset of the C++ standard library in a transparent way which allows it to be used identically in host and device code without any programmer intervention beyond include a header and respecting its namespace conventions. It stands to reason that this requires the NVIDIA toolchain to implement this magic.

Related

Can I use cuda without using nvcc on my host code?

I'm writing a single header library that executes a cuda kernel. I was wondering if there is a way to get around the <<<>>> syntax, or get C source output from nvcc?
You can avoid the host language extensions by using the CUDA driver API instead. It is a little more verbose and you will require a little more boilerplate code to manage the context, but it is not too difficult.
Conventionally, you would compile to PTX or a binary payload to load at runtime, however NVIDIA now also ship an experimental JIT CUDA C compiler library, libNVVM, which you could try if you want JIT from source.

calling cuda c kernel from fortran 90

I am planning to call a typical matrix multiply CUDA C kernel from a fortran program. I am referring the following link http://www-irma.u-strasbg.fr/irmawiki/index.php/Call_CUDA_from_Fortran . I would be glad if any resources is available on this. I intend to avoid PGI Cuda Fortran as I am not possessing the compiler. In the link above I cannot make out what should be the CUDA.F90 file. I assume the last code given in the link is that of main.F90. Kindly help.
Perhaps you need to re-read the very first line of that page you linked to. Those instructions are relying on a set of external ISO C bindings for the CUDA API. That is where the CUDA.F90 file you are asking about comes from. You will need to download and build the FortCUDA bindings to use the instructions on that wiki page.
Edited to add that given your last question was about compilation in Nsight Visual Studio Edition, it would seem that you are running on a Windows platform. You should know that you can't use gcc to build CUDA applications on Windows platforms. The supplied CUDA libraries will only work with either the Microsoft toolchain or (possibly) Intel's compilers in certain cases.

In CUDA, how can we call a device function in another translation unit?

I'm pretty new to CUDA. I use Microsoft Visual Studio 2010 where I don't need to worry about writing a makefile. A problem arose as I tried to call in a .cu file a device function which was declared in the .h file and defined in another .cu file. At the end of building, I received an error message:
1>ptxas : fatal error : Unresolved extern function '_Z22atomicAddEmulateDoublePdd'
This appears in both CUDA 4.2 and 5.0. I'm wondering how should I configure my MVS to avoid this error. Sorry for the nooby questions and thanks for any suggestion!
CUDA 4.2 and does not support static linking so device functions must be defined in the same compilation unit. A common technique is to write the device function in a .cuh file and include it in the .cu file.
CUDA 5.0 supports a new feature called separate compilation. The CUDA 5.0 VS msbuild rules should be available in the CUDA 5.0 RC download.

Does CUDA use an interpreter or a compiler?

This is a bit of silly question, but I'm wondering if CUDA uses an interpreter or a compiler?
I'm wondering because I'm not quite sure how CUDA manages to get source code to run on two cards with different compute capabilities.
From Wikipedia:
Programmers use 'C for CUDA' (C with Nvidia extensions and certain restrictions), compiled through a PathScale Open64 C compiler.
So, your answer is: it uses a compiler.
And to touch on the reason it can run on multiple cards (source):
CUDA C/C++ provides an abstraction, it's a means for you to express how you want your program to execute. The compiler generates PTX code which is also not hardware specific. At runtime the PTX is compiled for a specific target GPU - this is the responsibility of the driver which is updated every time a new GPU is released.
These official documents CUDA C Programming Guide and The CUDA Compiler Driver (NVCC) explain all the details about the compilation process.
From the second document:
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range
of conventional compiler options, such as for defining macros and
include/library paths, and for steering the compilation process.
Not just limited to cuda , shaders in directx or opengl are also complied to some kind of byte code and converted to native code by the underlying driver.

Difference between cuda.h, cuda_runtime.h, cuda_runtime_api.h

I'm starting to program with CUDA, and in some examples I find the include files cuda.h, cuda_runtime.h and cuda_runtime_api.h included in the code. Can someone explain to me the difference between these files?
In very broad terms:
cuda.h defines the public host
functions and types for the CUDA
driver API.
cuda_runtime_api.h defines the public
host functions and types for the
CUDA runtime API
cuda_runtime.h defines everything cuda_runtime_api.h does, as well as built-in type
definitions and function overlays for the CUDA language extensions and
device intrinsic functions.
If you were writing host code to be compiled with the host compiler which includes API calls, you would include either cuda.h or cuda_runtime_api.h. If you needed other CUDA language built-ins, like types, and were using the runtime API and compiling with the host compiler, you would include cuda_runtime.h. If you are writing code which will be compiled using nvcc, it is all irrelevant, because nvcc takes care of inclusion of all the required headers automatically without programmer intervention.
A few observations in addition to #talonmies answer:
cuda_runtime.h includes cuda_runtime_api.h internally, but not the other way around. So: "runtime includes all of runtime_api" is a mnemonic to remember.
cuda_runtime_api.h does not have the entire runtime API functions you'll find in the official documentation, while cuda_runtime.h will have it all (example: cudaEventCreate()). However, all API calls defined cuda_runtime.h are actually implemented, in the header file itself, using calls to functions in cuda_runtime_api.h. These are the "function overlays" that #talonmies mentioned.
cuda_runtime_api.h is a C-language header (IIANM) with only C-language function declarations; cuda_runtime.h is a C++ header file, with some templated functions implemented.