Cannot overload make_uint4 function - cuda

I'm trying to overload make_uint4 in the following manner:
namespace A {
namespace B {
inline __host__ __device__ uint4 make_uint4(uint2 a, uint2 b) {
return make_uint4(a.x, a.y, b.x, b.y);
}
}
}
But when I try to compile it, nvcc returns an error:
error: no suitable constructor exists to convert from "unsigned int" to "uint2"
error: no suitable constructor exists to convert from "unsigned int" to "uint2"
error: too many arguments in function call
All these errors point to the "return…" line.

I was able to get a partial repro on VS 2010 and CUDA 4.0 (the compiler built the code OK but Intellisense flagged the error you are seeing). Try the following:
#include "vector_functions.h"
inline __host__ __device__ uint4 make_uint4(uint2 a, uint2 b)
{
return ::make_uint4(a.x, a.y, b.x, b.y);
}
This fixed it for me.

I have no problem compiling it in Visual Studio+nvcc. What compiler are you using?
If that would be of any help: make_uint4 is defined in vector_functions.h, line 170 as
static __inline__ __host__ __device__ uint4 make_uint4(unsigned int x, unsigned int y, unsigned int z, unsigned int w)
{
uint4 t; t.x = x; t.y = y; t.z = z; t.w = w; return t;
}
Update:
I get similar error when I try to overload the function while being inside my custom namespace. Are you certain you are not inside one? If so, try putting :: in front of function call to refer to global scope, i.e:
return ::make_uint4(a.x, a.y, b.x, b.y);

I don't have the library code, but it seems like the compiler doesn't like overloaded device functions (as they are treated just like really fancy inline macros). What is does is shadow (hide) the old make_uint4(a,b,c,d) with your new make_uint4(va, vb) and try to call the latter with 4 uint parameters. That doesn't work because there is no conversion from uint to uint2 (as indicated by the first two error messages) and there are 4 instead of 2 arguments (the last error message).
Use a slightly different function name like make_uint4_from_uint2s and you'll be fine.

Related

why the output of this program comes only by using extern ? Why not without it?

#include<stdio.h>
int main ()
{
printf("%d\n",z);
return 0;
}
int z=25;
why is output to this code is showing an error ?
The order in which you declare your functions/variables count in C. In your code, when the compiler parses your code, it encounters the symbol z, which has not yet been declared.
So, you need to put your int z = ... before the first time you use z, hence before main.
The extern keyword tells the compiler that the variable has been declared in another file, so it will be resolved during linking, i.e. when all the files are assembled into a program. So for the compilation of this file, the unresolved symbol z can be ignored => no compilation error.
Try this :
#include<stdio.h>
int z=25;
int main ()
{
printf("%d\n",z);
return 0;
}

how can i execute a host class function in a CUDA kernel

I have a genetic algorithm and i'm traying to evaluate a population of chromosome on GPU :
class chromosome
{
int fitness;
int gene(int pos) { .... };
};
class eval
{
public :
__global__ doEval(Chromosome *population)
{
....
int jobid = population[tid].gene(X);
population[tid].fitness = Z;
....
}
};
int main()
{
Chromosome *dev_population;
Eval eval;
eval.doEval<<<1,N>>>(dev_population);
}
and i have this errors :
ga3.cu(121): warning: inline qualifier ignored for "global" function
ga3.cu(121): error: illegal combination of memory qualifiers
ga3.cu(323): error: a pointer to a bound function may only be used to call the function
ga3.cu(398): warning: nested comment is not allowed
where are the problems ?
i remove Eval class and left only doEval function , and make device host gene() , like this :
\__device\__ \__host\__ gene()
{....};
\__global\__ doEval(Chromosome *population)
{
....
int jobid = population[tid].gene(X);
population[tid].fitness = Z;
....
}
int main()
{
Chromosome *dev_population;
doEval<<<1,N>>>(dev_population);
}
but now i have have other errors , and it's not compile :
/usr/include/c++/4.6/iomanip(66): error: expected an expression
/usr/include/c++/4.6/iomanip(96): error: expected an expression
/usr/include/c++/4.6/iomanip(127): error: expected an expression
/usr/include/c++/4.6/iomanip(195): error: expected an expression
/usr/include/c++/4.6/iomanip(225): error: expected an expression
5 errors detected in the compilation of "/tmp/tmpxft_00006fe9_00000000-4_ga3.cpp1.ii".
There are two problems here, one soluble, the other one not.
It is illegal in CUDA for a __global__ function (ie. kernel) to be defined as a class member function. So doEval can never be defined as a member of eval. You are free to call a kernel in a structure or class member function, but a kernel cannot be a member function. You will have to redesign this class, there is no work around.
Any function called device code must be explicitly denoted as a device function and be instantiated and compiled for the device. This applies to both regular functions and class member functions. All functions are treated by nvcc as host functions unless identified as otherwise. You can, therefore, fix this error by doing something like the following:
class chromosome
{
int fitness;
__device__ __host__ int gene(int pos) { .... };
};
Note that every function called by gene must also have a valid device definition for the code to successfully compile.

Overloading the CUDA shuffle function makes the original ones invisible

I'm trying to implement my own 64-bit shuffle function in CUDA. However, if I do it like this:
static __inline__ __device__ double __shfl_xor(double var, int laneMask, int width=warpSize)
{
int hi, lo;
asm volatile( "mov.b64 { %0, %1 }, %2;" : "=r"(lo), "=r"(hi) : "d"(var) );
hi = __shfl_xor( hi, laneMask, width );
lo = __shfl_xor( lo, laneMask, width );
return __hiloint2double( hi, lo );
}
All subsequent calls to __shfl_xor will be instantiated from this 64-bit version, no matter what the type of the argument is. For example, if I am doing
int a;
a = __shfl_xor( a, 16 );
It would still use the double version.
A work-around might be using different function names. But since I'm calling this shuffle function from a template function, using different names means that I have to make a different version for 64-bit floating points, which is not quite neat.
So how can I overload the __shfl_xor(double,...) function while on the same time still make sure the __shfl_xor(int,...) can be called appropriately?
All integral types and float can be upcasted to double. When given a choice between in-built function and your specialized double function, the compiler here might be picking yours for all types.
Have you tried creating a function with a different name and using that to create both your specialized double variant and as dummies for the other types?
For example:
static __inline__ __device__ double foo_shfl_xor(double var, int laneMask, int width=warpSize)
{
// Your double shuffle implementation
}
static __inline__ __device__ int foo_shfl_xor(int var, int laneMask, int width=warpSize)
{
// For every non-double data type you use
// Just call the original shuffle function
return __shfl_xor(var, laneMask, width);
}
// Your code that uses shuffle
double d;
int a;
foo_shfl_xor(d, ...); // Calls your custom shuffle
foo_shfl_xor(a, ...); // Calls default shuffle

operator overloading in Cuda

I successfully created an operator+ between two float4 by doing :
__device__ float4 operator+(float4 a, float4 b) {
// ...
}
However, if in addition, I want to have an operator+ for uchar4, by doing the same thing with uchar4, i get the following error:
"error: more than one instance of overloaded function "operator+" has "C" linkage" "
I get a similar error message when I declare multiple functions with the same name but different arguments.
So, two questions :
Polymorphism : Is-it possible to have multiple functions with the same name and different arguments in Cuda ? If so, why do I have this error message ?
operator+ for float4 : it seems that this feature is already included by including "cutil_math.h", but when I include that (#include <cutil_math.h>) it complains that there is no such file or directory... anything particular I should do ? Note: I am using pycuda, which is a cuda for python.
Thanks!
Note the "has "C" linkage" in the error. You are compiling your code with C linkage (pyCUDA does this by default to circumvent symbol mangling issues). C++ can't support multiple definitions of the same function name using C linkage.
The solution is to compile code without automatically generated "extern C", and explicitly specify C linkage only for kernels. So your code would looks something like:
__device__ float4 operator+(float4 a, float4 b) { ... };
extern "C"
__global__ void kernel() { };
rather than the standard pyCUDA emitted:
extern "C"
{
__device__ float4 operator+(float4 a, float4 b) { ... };
__global__ void kernel() { };
}
pycuda.compiler.SourceModule has an option no_extern_c which can be used to control whether extern "C" is emitted by the just in time compilation system or not.

C++: Explicit DLL Loading: First-chance Exception on non "extern C" functions

I am having trouble importing my C++ functions. If I declare them as C functions I can successfully import them. When explicit loading, if any of the functions are missing the extern as C decoration I get a the following exception:
First-chance exception at 0x00000000 in cpp.exe: 0xC0000005: Access violation.
DLL.h:
extern "C" __declspec(dllimport) int addC(int a, int b);
__declspec(dllimport) int addCpp(int a, int b);
DLL.cpp:
#include "DLL.h"
int addC(int a, int b) {
return a + b;
}
int addCpp(int a, int b) {
return a + b;
}
main.cpp:
#include "..DLL/DLL.h"
#include <stdio.h>
#include <windows.h>
int main() {
int a = 2;
int b = 1;
typedef int (*PFNaddC)(int,int);
typedef int (*PFNaddCpp)(int,int);
HMODULE hDLL = LoadLibrary(TEXT("../Debug/DLL.dll"));
if (hDLL != NULL)
{
PFNaddC pfnAddC = (PFNaddC)GetProcAddress(hDLL, "addC");
PFNaddCpp pfnAddCpp = (PFNaddCpp)GetProcAddress(hDLL, "addCpp");
printf("a=%d, b=%d\n", a,b);
printf("pfnAddC: %d\n", pfnAddC(a,b));
printf("pfnAddCpp: %d\n", pfnAddCpp(a,b)); //EXCEPTION ON THIS LINE
}
getchar();
return 0;
}
How can I import c++ functions for dynamic loading? I have found that the following code works with implicit loading by referencing the *.lib, but I would like to learn about dynamic loading.
Thank you to all in advance.
Update:
bindump /exports
1 00011109 ?addCpp##YAHHH#Z = #ILT+260(?addCpp##YAHHH#Z)
2 00011136 addC = #ILT+305(_addC)
Solution:
Create a conversion struct as
found here
Take a look at the
file exports and copy explicitly the
c++ mangle naming convention.
PFNaddCpp pfnAddCpp = (PFNaddCpp)GetProcAddress(hDLL, "?addCpp##YAHHH#Z");
Inevitably, the access violation on the null pointer is because GetProcAddress() returns null on error.
The problem is that C++ names are mangled by the compiler to accommodate a variety of C++ features (namespaces, classes, and overloading, among other things). So, your function addCpp() is not really named addCpp() in the resulting library. When you declare the function with extern "C", you give up overloading and the option of putting the function in a namespace, but in return you get a function whose name is not mangled, and which you can call from C code (which doesn't know anything about name mangling.)
One option to get around this is to export the functions using a .def file to rename the exported functions. There's an article, Explicitly Linking to Classes in DLLs, that describes what is necessary to do this.
It's possible to just wrap a whole header file in extern "C" as follows. Then you don't need to worry about forgetting an extern "C" on one of your declarations.
#ifdef __cplusplus
extern "C" {
#endif
__declspec(dllimport) int addC(int a, int b);
__declspec(dllimport) int addCpp(int a, int b);
#ifdef __cplusplus
} /* extern "C" */
#endif
You can still use all of the C++ features that you're used to in the function bodies -- these functions are still C++ functions -- they just have restrictions on the prototypes to make them compatible with C code.