matrix as a parameter in CUDA - cuda

I have two problems in CUDA programming.
I want to pass matrix as a function parameter in a CUDA program. I tried following. GCC compiler compiles the following code but NVIDIA CUDA C compiler does not compiles this code and prompts error. (I have installed CUDA 7.5)
void printMatrix( size_t rows, size_t cols, int a[][cols] )
and
void printMatrix(int row, int col, int matrix[row][col])
Both are not working. It gives "a parameter is not allowed" error.
Inside the main method I want to declare a matrix
int a[n][n];
where n runs from 1 to 5 (in a for loop). It gives "expression must have a constant value" error.
Where am I making the error.
I have tried to compile the code from this question with gcc and nvcc compiler, gcc compiles and nvcc does not.

Because you are doing this on Windows, nvcc (note nvccis not a compiler) uses the Visual Studio compiler to compile host host. Visual Studio does not support C99 language features, so you cannot use them in any host code you will compile on Windows in conjunction with CUDA. You will have to rewrite your code without using C99 language features in your host code.
If you were doing this on linux, you would be using gcc to compile the host code via nvcc, and C99 language features would be available if you provide the correct command line options and pass the file with a .c extension, as you have ably demonstrated in your question. C99 features and CUDA cannot be mixed in a .cu file because CUDA requires a C++ compiler to compile host code contain CUDA language extensions.

Related

Trouble in knowing when the cuda code gets compiled?

I want to know, when the cuda code gets compiled? I mean is it possible to know the values of parameters of the cuda kernel which is given in the command line argument of host code run time? Is it possible to compile cuda code during run time of host code ?
In typical usage of the CUDA runtime API, CUDA device code gets compiled when you pass a file containing CUDA device code to nvcc, the CUDA compiler driver engine.
CUDA device code can/will be compiled at run-time using either the driver API or using the CUDA NVRTC mechanism. There is documentation for each of these approaches, CUDA sample codes for each of these approaches, and various questions here on the cuda SO tag for each.
When you use the CUDA driver API, the device source code you will present for compilation at run-time is in the form of PTX, a CUDA intermediate language.
For compilation of typical CUDA C++ device code at runtime, you would use the NVRTC mechanism.

How to find the CUDA __device__ definition of a function?

I have a specific function I am trying to find the source definition for, specifically what the nvcc compiler is using. This question is phrased to apply to any function (or symbol I suppose), which is used in a __device__ function. Given:
__device__ void Foo(){
int x = round( 0.0f );
}
What is the standard/canonical/recommended way to find the definition for "round( float )" used by the nvcc compiler to generate device code?
Normally I use Visual Studio's F1 "Go to Definition", or search for "round" in project files, etc. I also search the CUDA Toolkit documentation and CUDA MATH API. In this case, I find the VS cmath definition. But how do I determine which definition the nvcc compiler uses?
What is the standard/canonical/recommended way to find the definition for "round( float )" used by the nvcc compiler to generate device code?
Disassembly. Most inbuilt functions exist as stubs in headers that are expanded into inline assembly sequences as part of a device compiler code generating pass. There is no input code to view.

Unable to call CUDA half precision functions from the host

I am trying to do some FP16 work that will have both CPU and GPU backend. I researched my options and decided to use CUDA's half precision converter and data types. The ones I intent to use are specified as both __device__ and __host__ which according to my understanding (and the official documentation) should mean that the functions are callable from both HOST and DEVICE code. I wrote a simple test program:
#include <iostream>
#include <cuda_fp16.h>
int main() {
const float a = 32.12314f;
__half2 test = __float2half2_rn(a);
__half test2 = __float2half(a);
return 0;
}
However when I try to compile it I get:
nvcc cuda_half2.cu
cuda_half2.cu(6): error: calling a __device__ function("__float2half2_rn") from a __host__ function("main") is not allowed
cuda_half2.cu(7): error: calling a __device__ function("__float2half") from a __host__ function("main") is not allowed
2 errors detected in the compilation of "/tmp/tmpxft_000013b8_00000000-4_cuda_half2.cpp4.ii".
The only thing that comes to mind is that my CUDA is 9.1 and I'm reading the documentation for 9.2 but i can't find an older version of it, nor can I find anything in the changelog. Ideas?
Ideas?
Switch to CUDA 9.2
Your code compiles without error on CUDA 9.2, but throws the errors you indicate on CUDA 9.1. If you have CUDA 9.1 installed, then the documentation for it is already installed on your machine. On a typical linux install, it will be located in /usr/local/cuda-9.1/doc. If you look at /usr/local/cuda-9.1/doc/pdf/CUDA_Math_API.pdf you will see that the corresponding functions are only marked __device__, so this change was indeed made between CUDA 9.1 and CUDA 9.2

complex CUDA kernel in MATLAB

I wrote a CUDA kernel to run via MATLAB,
with several cuDoubleComplex pointers. I activated the kernel with complex double vectors (defined as gpuArray), and gםt the error message: "unsupported type in argument specification cuDoubleComplex".
how do I set MATLAB to know this type?
The short answer, you can't.
The list of supported types for kernels is shown here, and that is all your kernel code can contain to compile correctly with the GPU computing toolbox. You will need either modify you code to use double2 in place of cuDoubleComplex, or supply Matlab with compiled PTX code and a function declaration which maps cuDoubleComplex to double2. For example
__global__ void mykernel(cuDoubleComplex *a) { .. }
would be compiled to PTX using nvcc and then loaded up in Matlab as
k = parallel.gpu.CUDAKernel('mykernel.ptx','double2*');
Either method should work.

Several questions about CUDA and cuPrintf

I can compile successfully my code using cuPrintf by nvcc, but cannot compile it in Visual Studio 2012 environment. It says that "volatile char *" cannot be changed to "const void *" in "cudaMemcpyToSymbol" function.
cuPrintf seems doesn't work, there's no cuPrintf function executed in kernel code.
How to make nvcc export pdb file?
Is there any other convenient way to debug in kernel function? I have only one laptop.
1st , cuPrinft is deprecated (As far as I know it has never been released) you can print data from kernel using print command, but this is a very not recommended way of debugging your kernels.
2nd, You are compiling using CUDA nvcc compiler, there is no such thing pdb file in CUDA, Albeit watch the 'g' and 'G' flags, those may dramatically increase your running time.
3rd,
The best way to debug kernels is using visual Nsight