how Create program using binary in CUDA? - cuda

I had code in OpenCL where I use clCreateProgramWithBinary() to create the program from binary.
I am porting this application to CUDA and I don't find any similar function.
Can someone help me with how I can create the program from binary or equivalent of clCreateProgramWithBinary in CUDA?

AFAIK, the CUDA equivalent of clCreateProgramWithBinary() will be cuModuleLoad (). Please check cuModuleLoad () for precompiled binary. It loads a compute module.
For further reference please check module management in CUDA Driver API API Reference Manual.

Related

Trouble in knowing when the cuda code gets compiled?

I want to know, when the cuda code gets compiled? I mean is it possible to know the values of parameters of the cuda kernel which is given in the command line argument of host code run time? Is it possible to compile cuda code during run time of host code ?
In typical usage of the CUDA runtime API, CUDA device code gets compiled when you pass a file containing CUDA device code to nvcc, the CUDA compiler driver engine.
CUDA device code can/will be compiled at run-time using either the driver API or using the CUDA NVRTC mechanism. There is documentation for each of these approaches, CUDA sample codes for each of these approaches, and various questions here on the cuda SO tag for each.
When you use the CUDA driver API, the device source code you will present for compilation at run-time is in the form of PTX, a CUDA intermediate language.
For compilation of typical CUDA C++ device code at runtime, you would use the NVRTC mechanism.

Can I use cuda without using nvcc on my host code?

I'm writing a single header library that executes a cuda kernel. I was wondering if there is a way to get around the <<<>>> syntax, or get C source output from nvcc?
You can avoid the host language extensions by using the CUDA driver API instead. It is a little more verbose and you will require a little more boilerplate code to manage the context, but it is not too difficult.
Conventionally, you would compile to PTX or a binary payload to load at runtime, however NVIDIA now also ship an experimental JIT CUDA C compiler library, libNVVM, which you could try if you want JIT from source.

calling cuda c kernel from fortran 90

I am planning to call a typical matrix multiply CUDA C kernel from a fortran program. I am referring the following link http://www-irma.u-strasbg.fr/irmawiki/index.php/Call_CUDA_from_Fortran . I would be glad if any resources is available on this. I intend to avoid PGI Cuda Fortran as I am not possessing the compiler. In the link above I cannot make out what should be the CUDA.F90 file. I assume the last code given in the link is that of main.F90. Kindly help.
Perhaps you need to re-read the very first line of that page you linked to. Those instructions are relying on a set of external ISO C bindings for the CUDA API. That is where the CUDA.F90 file you are asking about comes from. You will need to download and build the FortCUDA bindings to use the instructions on that wiki page.
Edited to add that given your last question was about compilation in Nsight Visual Studio Edition, it would seem that you are running on a Windows platform. You should know that you can't use gcc to build CUDA applications on Windows platforms. The supplied CUDA libraries will only work with either the Microsoft toolchain or (possibly) Intel's compilers in certain cases.

how to check cuda availability(C++)?

is it possible to check if any CUDA devices are present before all cudaMalloc... commands are called?
im using C++ and i just want to print an error message before the program launches incase the user doesn't support cuda tech.
EDIT: if i can check it from C#, it will be even better.
thanks!
You can use cudaGetDeviceCount to get the number of cuda devices and use cuda device properties to retrieve your necessary compute capabilities.
A rather old version of the API documentation for cudaGetDeviceCount can be found here.

Does CUDA use an interpreter or a compiler?

This is a bit of silly question, but I'm wondering if CUDA uses an interpreter or a compiler?
I'm wondering because I'm not quite sure how CUDA manages to get source code to run on two cards with different compute capabilities.
From Wikipedia:
Programmers use 'C for CUDA' (C with Nvidia extensions and certain restrictions), compiled through a PathScale Open64 C compiler.
So, your answer is: it uses a compiler.
And to touch on the reason it can run on multiple cards (source):
CUDA C/C++ provides an abstraction, it's a means for you to express how you want your program to execute. The compiler generates PTX code which is also not hardware specific. At runtime the PTX is compiled for a specific target GPU - this is the responsibility of the driver which is updated every time a new GPU is released.
These official documents CUDA C Programming Guide and The CUDA Compiler Driver (NVCC) explain all the details about the compilation process.
From the second document:
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range
of conventional compiler options, such as for defining macros and
include/library paths, and for steering the compilation process.
Not just limited to cuda , shaders in directx or opengl are also complied to some kind of byte code and converted to native code by the underlying driver.