Unable to call CUDA half precision functions from the host - cuda

I am trying to do some FP16 work that will have both CPU and GPU backend. I researched my options and decided to use CUDA's half precision converter and data types. The ones I intent to use are specified as both __device__ and __host__ which according to my understanding (and the official documentation) should mean that the functions are callable from both HOST and DEVICE code. I wrote a simple test program:
#include <iostream>
#include <cuda_fp16.h>
int main() {
const float a = 32.12314f;
__half2 test = __float2half2_rn(a);
__half test2 = __float2half(a);
return 0;
}
However when I try to compile it I get:
nvcc cuda_half2.cu
cuda_half2.cu(6): error: calling a __device__ function("__float2half2_rn") from a __host__ function("main") is not allowed
cuda_half2.cu(7): error: calling a __device__ function("__float2half") from a __host__ function("main") is not allowed
2 errors detected in the compilation of "/tmp/tmpxft_000013b8_00000000-4_cuda_half2.cpp4.ii".
The only thing that comes to mind is that my CUDA is 9.1 and I'm reading the documentation for 9.2 but i can't find an older version of it, nor can I find anything in the changelog. Ideas?

Ideas?
Switch to CUDA 9.2
Your code compiles without error on CUDA 9.2, but throws the errors you indicate on CUDA 9.1. If you have CUDA 9.1 installed, then the documentation for it is already installed on your machine. On a typical linux install, it will be located in /usr/local/cuda-9.1/doc. If you look at /usr/local/cuda-9.1/doc/pdf/CUDA_Math_API.pdf you will see that the corresponding functions are only marked __device__, so this change was indeed made between CUDA 9.1 and CUDA 9.2

Related

Why the __device__ variable can't be marked as constexpr?

If I declare/define a variable like this:
__device__ constexpr int test{5};
I get an error
error: A __device__ variable cannot be marked constexpr
I can't find in the guide that restriction. The guide says:
I.4.20.9. __managed__ and __shared__ variables cannot be marked with the
keyword constexpr.
Moreover, my colleague with the same major compiler version (11) doesn't have this error.
What exactly causes this error on my machine?
In CUDA 11.2, such usage was not allowed. See here:
G.4.16.9. __device__/__constant__/__shared__ variables
__device__, __constant__ and __shared__ variables cannot be marked with the keyword constexpr.
In CUDA 11.4 (and beyond) such usage is allowed.
The change in behavior allowed by the compiler took place sometime between the CUDA 11.2.0 version and the CUDA 11.4.0 version.
Presumably the passing machine has a newer CUDA 11 version than yours.

matrix as a parameter in CUDA

I have two problems in CUDA programming.
I want to pass matrix as a function parameter in a CUDA program. I tried following. GCC compiler compiles the following code but NVIDIA CUDA C compiler does not compiles this code and prompts error. (I have installed CUDA 7.5)
void printMatrix( size_t rows, size_t cols, int a[][cols] )
and
void printMatrix(int row, int col, int matrix[row][col])
Both are not working. It gives "a parameter is not allowed" error.
Inside the main method I want to declare a matrix
int a[n][n];
where n runs from 1 to 5 (in a for loop). It gives "expression must have a constant value" error.
Where am I making the error.
I have tried to compile the code from this question with gcc and nvcc compiler, gcc compiles and nvcc does not.
Because you are doing this on Windows, nvcc (note nvccis not a compiler) uses the Visual Studio compiler to compile host host. Visual Studio does not support C99 language features, so you cannot use them in any host code you will compile on Windows in conjunction with CUDA. You will have to rewrite your code without using C99 language features in your host code.
If you were doing this on linux, you would be using gcc to compile the host code via nvcc, and C99 language features would be available if you provide the correct command line options and pass the file with a .c extension, as you have ably demonstrated in your question. C99 features and CUDA cannot be mixed in a .cu file because CUDA requires a C++ compiler to compile host code contain CUDA language extensions.

using thrust device_vector as global variable

Why does the following code crash at the end of the main?
#include <thrust/device_vector.h>
thrust::device_vector<float4> v;
int main(){
v.resize(1000);
return 0;
}
The error is:
terminate called after throwing an instance of 'thrust::system::system_error'
what(): unspecified driver error
If I use host_vector instead of device_vector the code run fine.
Do you think it's a Thrust bug, or am I doing something wrong here?
I tried it on ubuntu 10.10 with cuda 4.0 and on Windows 7 with cuda 6.5.
The Thrust version is 1.7 in both cases.
thanks
The problem is neither a bug in Thrust, nor are you doing something wrong. Rather, this is a limitation of the design of the CUDA runtime API.
The underlying reason for the crash is that the destructor for the thrust::vector is being called when the variable falls out of scope, which is happening after the CUDA runtime API context has been torn down. This will produce a runtime error (probably cudaErrorCudartUnloading) because the process is attempting to call cudaFree after it has already disconnected from the CUDA driver.
I am unaware of a workaround other than not using Thrust device containers declared at main() translation unit scope.

complex CUDA kernel in MATLAB

I wrote a CUDA kernel to run via MATLAB,
with several cuDoubleComplex pointers. I activated the kernel with complex double vectors (defined as gpuArray), and gםt the error message: "unsupported type in argument specification cuDoubleComplex".
how do I set MATLAB to know this type?
The short answer, you can't.
The list of supported types for kernels is shown here, and that is all your kernel code can contain to compile correctly with the GPU computing toolbox. You will need either modify you code to use double2 in place of cuDoubleComplex, or supply Matlab with compiled PTX code and a function declaration which maps cuDoubleComplex to double2. For example
__global__ void mykernel(cuDoubleComplex *a) { .. }
would be compiled to PTX using nvcc and then loaded up in Matlab as
k = parallel.gpu.CUDAKernel('mykernel.ptx','double2*');
Either method should work.

assert() in CUDA 5.5

I've just upgraded from CUDA 5.0 to 5.5 and all my VS2012 CUDA projects have stopped compiling due to a problem with assert(). To repro the problem, I created a new CUDA 5.5 project in VS 2012 and added the code straight from Programming Guide and got the same error.
__global__ void testAssert(void)
{
int is_one = 1;
int should_be_one = 0;
// This will have no effect
assert(is_one);
// This will halt kernel execution
assert(should_be_one);
}
This produces the following compiler error:
kernel.cu(22): error : calling a __host__ function("_wassert") from a __global__ function("testAssert") is not allowed
Is there something obvious that I'm missing?
Make sure you are including assert.h, and make sure you are targeting sm_20 or later. Also check you're not including Windows headers, and if you are then try without.