In CUDA, how can we call a device function in another translation unit? - cuda

I'm pretty new to CUDA. I use Microsoft Visual Studio 2010 where I don't need to worry about writing a makefile. A problem arose as I tried to call in a .cu file a device function which was declared in the .h file and defined in another .cu file. At the end of building, I received an error message:
1>ptxas : fatal error : Unresolved extern function '_Z22atomicAddEmulateDoublePdd'
This appears in both CUDA 4.2 and 5.0. I'm wondering how should I configure my MVS to avoid this error. Sorry for the nooby questions and thanks for any suggestion!

CUDA 4.2 and does not support static linking so device functions must be defined in the same compilation unit. A common technique is to write the device function in a .cuh file and include it in the .cu file.
CUDA 5.0 supports a new feature called separate compilation. The CUDA 5.0 VS msbuild rules should be available in the CUDA 5.0 RC download.

Related

Can libcu++ be included in a regular C++ project?

I have tried using <cuda/std/chrono> and <cuda/std/array> inside my C++ project, not compiled with NVCC. The reason I am not making a CUDA project is that NVCC fails to compile some of my template code.
I am getting several errors about undefined constants such as the ones below:
error C2065: '_LInf': undeclared identifier error C3615: constexpr function 'cuda::std::__4::__libcpp_numeric_limits<long double,true>::infinity' cannot result in a constant expression
Can libcu++ be used inside C++ projects or only CUDA projects?
I have already tried including and linking to my C++ project the headers and libraries that are automatically added to CUDA projects by Visual Studio.
Converting comments into an answer:
The system requirements page for libcu++ indicate that only NVIDIA’s toolchains (so nvcc or the HPC SDK compilers) are supported.
More generally, libcu++ is intended to implement a subset of the C++ standard library in a transparent way which allows it to be used identically in host and device code without any programmer intervention beyond include a header and respecting its namespace conventions. It stands to reason that this requires the NVIDIA toolchain to implement this magic.

Trouble in knowing when the cuda code gets compiled?

I want to know, when the cuda code gets compiled? I mean is it possible to know the values of parameters of the cuda kernel which is given in the command line argument of host code run time? Is it possible to compile cuda code during run time of host code ?
In typical usage of the CUDA runtime API, CUDA device code gets compiled when you pass a file containing CUDA device code to nvcc, the CUDA compiler driver engine.
CUDA device code can/will be compiled at run-time using either the driver API or using the CUDA NVRTC mechanism. There is documentation for each of these approaches, CUDA sample codes for each of these approaches, and various questions here on the cuda SO tag for each.
When you use the CUDA driver API, the device source code you will present for compilation at run-time is in the form of PTX, a CUDA intermediate language.
For compilation of typical CUDA C++ device code at runtime, you would use the NVRTC mechanism.

CUDA compile multiple .cu files to one file

I am porting some calculation from C# to CUDA.
There many classes in C# which I want to port, for each c# class I create .cu and .cuh file in my CUDA project.
All classes related, and all they used in calculations.
I need to save structure of my C# code, because it will be very easy to made error in other case.
P.S. In case I put all code in one file - everything works as expected but read or fix some issues becomes real pain.
I want to compile CUDA project and use it in my C# via ManagedCuda library.
I can compile test CUDA project with one .cu file to .ptx file, load it in C# via ManagedCuda and call function from it.
But when I want to compile my real projects with multiple cu files, in result I got multiple .ptx files for each .cu file in project, even more I am not able to load this .ptx file via ManagedCuda, I got next error:
ErrorInvalidPtx: This indicates that a PTX JIT compilation failed.
But this error expected, because there cross reference in ptx files, and they have sense only if the loaded together.
My goal is to compile my CUDA project to one file, but in same time I do not want to be limited to only specific video card which I have. For this I need to use PTX(or cubin with ptx included) this PTX file will be compiled for specific device in moment you load it.
I tried to set Generate Relocatable Device Code to Yes (-rdc=true) and compile to PTX and Cubin - result same I get few independent files for each .cu file.
The very short answer is no, you can't do that. The toolchain cannot merged PTX code at the compilation phase.
If you produce multiple PTX files, you will need to use the JIT linker facilities of the CUDA runtime to produce a module which can be loaded into your context. I have no idea whether Managed CUDA supports that or not.
Edit to add that is appears that Managed CUDA does support runtime linking (see here).

calling cuda c kernel from fortran 90

I am planning to call a typical matrix multiply CUDA C kernel from a fortran program. I am referring the following link http://www-irma.u-strasbg.fr/irmawiki/index.php/Call_CUDA_from_Fortran . I would be glad if any resources is available on this. I intend to avoid PGI Cuda Fortran as I am not possessing the compiler. In the link above I cannot make out what should be the CUDA.F90 file. I assume the last code given in the link is that of main.F90. Kindly help.
Perhaps you need to re-read the very first line of that page you linked to. Those instructions are relying on a set of external ISO C bindings for the CUDA API. That is where the CUDA.F90 file you are asking about comes from. You will need to download and build the FortCUDA bindings to use the instructions on that wiki page.
Edited to add that given your last question was about compilation in Nsight Visual Studio Edition, it would seem that you are running on a Windows platform. You should know that you can't use gcc to build CUDA applications on Windows platforms. The supplied CUDA libraries will only work with either the Microsoft toolchain or (possibly) Intel's compilers in certain cases.

Visual Studio + Nsight : __syncthreads() undefined [duplicate]

At the moment CUDA already recognizes a key CUDA C/C++ function such as cudaMalloc, cudaFree, cudaEventCreate, etc.
It also recognizes certain types like dim3 and cudaEvent_t.
However, it doesn't recognize other functions and types such as the texture template, the __syncthreads functions, or the atomicCAS function.
Everything compiles just fine, but I'm tired of seeing red underlinings all over the place and I want to the see the example parameters displayed when you type in any recognizable function.
How do I get VS to catch these functions?
You could create a dummy #include file of the following form:
#pragma once
#ifdef __INTELLISENSE__
void __syncthreads();
...
#endif
This should hide the fake prototypes from the CUDA and Visual C++ compilers, but still make them visible to IntelliSense.
Source for __INTELLISENSE__ macro: http://blogs.msdn.com/b/vcblog/archive/2011/03/29/10146895.aspx
You need to add CUDA-specific keywords like __syncthreads to the usertype.dat file for visual studio. An example usertype.dat file is included with the NVIDIA CUDA SDK. You also need to make sure that visual studio recognizes .cu files as c/c++ files as described in this post:
Note however that where that post uses $(CUDA_INC_PATH), with recent versions of CUDA you should use $(CUDA_PATH)/include.
Also, I would recommend Visual Assist X -- not free, but worth the money -- to improve intellisense. It works well with CUDA if you follow these instructions:
http://www.wholetomato.com/forum/topic.asp?TOPIC_ID=5481
http://forums.nvidia.com/index.php?showtopic=53690