how can i execute a host class function in a CUDA kernel - cuda

I have a genetic algorithm and i'm traying to evaluate a population of chromosome on GPU :
class chromosome
{
int fitness;
int gene(int pos) { .... };
};
class eval
{
public :
__global__ doEval(Chromosome *population)
{
....
int jobid = population[tid].gene(X);
population[tid].fitness = Z;
....
}
};
int main()
{
Chromosome *dev_population;
Eval eval;
eval.doEval<<<1,N>>>(dev_population);
}
and i have this errors :
ga3.cu(121): warning: inline qualifier ignored for "global" function
ga3.cu(121): error: illegal combination of memory qualifiers
ga3.cu(323): error: a pointer to a bound function may only be used to call the function
ga3.cu(398): warning: nested comment is not allowed
where are the problems ?
i remove Eval class and left only doEval function , and make device host gene() , like this :
\__device\__ \__host\__ gene()
{....};
\__global\__ doEval(Chromosome *population)
{
....
int jobid = population[tid].gene(X);
population[tid].fitness = Z;
....
}
int main()
{
Chromosome *dev_population;
doEval<<<1,N>>>(dev_population);
}
but now i have have other errors , and it's not compile :
/usr/include/c++/4.6/iomanip(66): error: expected an expression
/usr/include/c++/4.6/iomanip(96): error: expected an expression
/usr/include/c++/4.6/iomanip(127): error: expected an expression
/usr/include/c++/4.6/iomanip(195): error: expected an expression
/usr/include/c++/4.6/iomanip(225): error: expected an expression
5 errors detected in the compilation of "/tmp/tmpxft_00006fe9_00000000-4_ga3.cpp1.ii".

There are two problems here, one soluble, the other one not.
It is illegal in CUDA for a __global__ function (ie. kernel) to be defined as a class member function. So doEval can never be defined as a member of eval. You are free to call a kernel in a structure or class member function, but a kernel cannot be a member function. You will have to redesign this class, there is no work around.
Any function called device code must be explicitly denoted as a device function and be instantiated and compiled for the device. This applies to both regular functions and class member functions. All functions are treated by nvcc as host functions unless identified as otherwise. You can, therefore, fix this error by doing something like the following:
class chromosome
{
int fitness;
__device__ __host__ int gene(int pos) { .... };
};
Note that every function called by gene must also have a valid device definition for the code to successfully compile.

Related

exposing nonstatic member function of class to chaiscript

I have a project that tries to implement keyboard macro scripting with chaiscript. I am writing a class based on xlib to wrap the xlib code.
I have a member function to add a modifier key to an ignored list, because of a xlib quirk.
how could i do something like the following minimal example.
#include <chaiscript/chaiscript.hpp>
#include <functional>
class MacroEngine{
public:
MacroEngine() = default;
//...
void addIgnoredMod(int modifier){
ignoredMods |= modifier;
}
//...
private:
int ignoredMods;
};
int main(int argc, char *argv[]){
MacroEngine me;
chaiscript::ChaiScript chai;
//...
chai.add(chaiscript::fun(std::bind(&MacroEngine::addIgnoredMod, me, std::placeholders::_1)), "setIgnoredMods");
//...
return 0;
}
I tried bind and failed with the following error message:
In file included from ../deps/ChaiScript/include/chaiscript/dispatchkit/proxy_functions_detail.hpp:24:0,
from ../deps/ChaiScript/include/chaiscript/dispatchkit/proxy_functions.hpp:27,
from ../deps/ChaiScript/include/chaiscript/dispatchkit/proxy_constructors.hpp:14,
from ../deps/ChaiScript/include/chaiscript/dispatchkit/dispatchkit.hpp:34,
from ../deps/ChaiScript/include/chaiscript/chaiscript_basic.hpp:12,
from ../deps/ChaiScript/include/chaiscript/chaiscript.hpp:823,
from ../src/main.cpp:2:
../deps/ChaiScript/include/chaiscript/dispatchkit/callable_traits.hpp: In instantiation of ‘struct chaiscript::dispatch::detail::Callable_Traits<std::_Bind<void (MacroEngine::*(MacroEngine, std::_Placeholder<1>))(unsigned int)> >’:
../deps/ChaiScript/include/chaiscript/language/../dispatchkit/register_function.hpp:45:72: required from ‘chaiscript::Proxy_Function chaiscript::fun(const T&) [with T = std::_Bind<void (MacroEngine::*(MacroEngine, std::_Placeholder<1>))(unsigned int)>; chaiscript::Proxy_Function = std::shared_ptr<chaiscript::dispatch::Proxy_Function_Base>]’
../src/main.cpp:21:95: required from here
../deps/ChaiScript/include/chaiscript/dispatchkit/callable_traits.hpp:99:84: error: decltype cannot resolve address of overloaded function
typedef typename Function_Signature<decltype(&T::operator())>::Signature Signature;
^~~~~~~~~
../deps/ChaiScript/include/chaiscript/dispatchkit/callable_traits.hpp:100:86: error: decltype cannot resolve address of overloaded function
typedef typename Function_Signature<decltype(&T::operator())>::Return_Type Return_Type;
^~~~~~~~~~~
I also tried to make the variable static which worked, but it wont work if I try to make it possible to ignore modifiers on a per hotkey basis.
what am i doing wrong? and how can I fix it?
You can do this instead:
chai.add(chaiscript::fun(&MacroEngine::addIgnoredMod, &me), "addIgnoredMod");
Or use a lambda:
chai.add(chaiscript::fun([&me](int modifier){ me.addIgnoredMod(modifier); }), "addIgnoredMod");
Jason Turner, the creator of Chaiscript, commented on it here: http://discourse.chaiscript.com/t/may-i-use-std-bind/244/4
"There’s really never any good reason to use std::bind. I much better solution is to use a lambda (and by much better, I mean much much better. std::bind adds to compile size, compile time and runtime)."

C++11: Why result_of can accept functor type as lvalue_reference, but not function type as lvalue_reference?

I've got program below:
#include<type_traits>
#include<iostream>
using namespace std;
template <class F, class R = typename result_of<F()>::type>
R call(F& f) { return f(); }
struct S {
double operator()(){return 0.0;}
};
int f(){return 1;}
int main()
{
S obj;
call(obj);//ok
call(f);//error!
return 0;
}
It fails to compile in the line of "call(f)".
It's weird that "call(obj)" is OK.
(1) I've a similar post in another thread C++11 result_of deducing my function type failed. But it doesn't tell why functor objects are OK while functions are not.
(2) I'm not sure if this is related to "R call(F& f)": a function type cannot declare a l-value?
(3) As long as I know, any token with a name, like variable/function, should be considered a l-value. And in the case of function parameter, compiler should "decay" my function name "f" to a function pointer, right?
(4) This is like decaying an array and pass it to a function----And a function pointer could be an l-value, then what's wrong with "call(F& f)"?
Would you help to give some further explanations on "why" is my case, where did I get wrong?
Thanks.
The problem with call(f) is that you deduce F as a function type, so it doesn't decay to a function pointer. Instead you get a reference to a function. Then the result_of<F()> expression is invalid, because F() is int()() i.e. a function that returns a function, which is not a valid type in C++ (functions can return pointers to functions, or references to functions, but not functions).
It will work if you use result_of<F&()> which is more accurate anyway, because that's how you're calling the callable object. Inside call(F& f) you do f() and in that context f is an lvalue, so you should ask what the result of invoking an lvalue F with no arguments is, otherwise you could get the wrong answer. Consider:
struct S {
double operator()()& {return 0.0;}
void operator()()&& { }
};
Now result_of<F()>::type is void, which is not the answer you want.
If you use result_of<F&()> then you get the right answer, and it also works when F is a function type, so call(f) works too.
(3) As long as I know, any token with a name, like variable/function, should be considered a l-value. And in the case of function parameter, compiler should "decay" my function name "f" to a function pointer, right?
No, see above. Your call(F&) function takes its argument by reference, so there is no decay.
(4) This is like decaying an array and pass it to a function----And a function pointer could be an l-value, then what's wrong with "call(F& f)"?
Arrays don't decay when you pass them by reference either.
If you want the argument to decay then you should write call(F f) not call(F& f). But even if you do that you still need to use result_of correctly to get the result of f() where f is an lvalue.

How do I cast C++/CX runtime object to native C pointer type?

I am doing a C++/CX runtime wrapper, and I need pass C++/CX Object pointer to native C. How do I do it, and convert the native pointer back to C++/CX Object reference type?
void XClassA::do(XClass ^ B)
{
void * ptr = (void*)(B); // how to convert it?
}
And also, C++/CX uses Reference Counting, if I cast the Object reference to native pointer, how do I manage the pointer life cycle?
update (request from #Hans Passant)
Background of the question,
Native C
I am trying to use C++/CX wrap Native C library (not C++) as Windows Runtime Component. Native c has many callback functions which declared as the following,
for example,
//declare in native c
typedef int (GetData*)(void *, char* arg1, size_t arg2);
void * is a pointer to object instance.
and the callback will be executed in native c during runtime.
We expect Application(C#/C++CX ...) to implement the method.
WinRT wrapper (C++/CX)
my idea is the following,
(1) Provide interface to Application
// XRtWrapperNamespace
public interface class XWinRtDataWrapper
{
//declare in base class
void getData(IVector<byte> ^ data);
}
to let Application implement the function. As I cannot export native data type, I provide IVector to get data from Application.
(2) Declare a global callback function to convert IVector<byte>^ to native data type char *, like following,
// when Native C executes callback function,
// it will forward in the method in C++/CX.
// The method calls the implementation method via object pointer.
// (And here is my my question)
void XRtWrapperNamespace::callbackWrapper(void * ptr, char *, int length)
{
// create Vector to save "out" data
auto data = ref new Vector<byte>();
// I expect I could call the implementation from Application.
ptr->getData(data); // bad example.
// convert IVector data to char *
// ...
}
My question is
How do I keep windows object reference to native C?
It looks impossible, but any solution to do it?
Application (example)
//Application
public ref class XAppData: public XWinRtDataWrapper
{
public:
virtual void getData(IVector<byte> ^ data)
{
//implementation here
}
}
You are not on the right track. I'll assume you #include a c header in your component:
extern "C" {
#include "native.h"
}
And this header contains:
typedef int (* GetData)(void* buffer, int buflen);
void initialize(GetData callback);
Where the initialize() function must be called to initialize the C code, setting the callback function pointer. And that you want the client code to directly write into buffer whose allocated size is buflen. Some sort of error indication would be useful, as well as allowing the client code to specify how many bytes it actually wrote into the buffer. Thus the int return value.
The equivalent of function pointers in WinRT are delegates. So you'll want to declare one that matches your C function pointer in functionality. In your .cpp file write:
using namespace Platform;
namespace YourNamespace {
public delegate int GetDataDelegate(WriteOnlyArray<byte>^ buffer);
// More here...
}
There are two basic ways to let the client code use the delegate. You can add a method that lets the client set the delegate, equivalent to way initialize() works. Or you can raise an event, the more WinRT-centric way. I'll use an event. Note that instancing is an issue, their is no decent mapping from having multiple component objects to a single C function pointer. I'll gloss this over by declaring the event static. Writing the ref class declaration:
public ref class MyComponent sealed
{
public:
MyComponent();
static event GetDataDelegate^ GetData;
private:
static int GetDataImpl(void* buffer, int buflen);
};
The class constructor needs to initialize the C code:
MyComponent::MyComponent() {
initialize(GetDataImpl);
}
And we need the little adapter method that makes the C callback raise the event so the client code can fill the buffer:
int MyComponent::GetDataImpl(void* buffer, int buflen) {
return GetData(ArrayReference<byte>((byte*)buffer, buflen));
}

operator overloading in Cuda

I successfully created an operator+ between two float4 by doing :
__device__ float4 operator+(float4 a, float4 b) {
// ...
}
However, if in addition, I want to have an operator+ for uchar4, by doing the same thing with uchar4, i get the following error:
"error: more than one instance of overloaded function "operator+" has "C" linkage" "
I get a similar error message when I declare multiple functions with the same name but different arguments.
So, two questions :
Polymorphism : Is-it possible to have multiple functions with the same name and different arguments in Cuda ? If so, why do I have this error message ?
operator+ for float4 : it seems that this feature is already included by including "cutil_math.h", but when I include that (#include <cutil_math.h>) it complains that there is no such file or directory... anything particular I should do ? Note: I am using pycuda, which is a cuda for python.
Thanks!
Note the "has "C" linkage" in the error. You are compiling your code with C linkage (pyCUDA does this by default to circumvent symbol mangling issues). C++ can't support multiple definitions of the same function name using C linkage.
The solution is to compile code without automatically generated "extern C", and explicitly specify C linkage only for kernels. So your code would looks something like:
__device__ float4 operator+(float4 a, float4 b) { ... };
extern "C"
__global__ void kernel() { };
rather than the standard pyCUDA emitted:
extern "C"
{
__device__ float4 operator+(float4 a, float4 b) { ... };
__global__ void kernel() { };
}
pycuda.compiler.SourceModule has an option no_extern_c which can be used to control whether extern "C" is emitted by the just in time compilation system or not.

Cannot overload make_uint4 function

I'm trying to overload make_uint4 in the following manner:
namespace A {
namespace B {
inline __host__ __device__ uint4 make_uint4(uint2 a, uint2 b) {
return make_uint4(a.x, a.y, b.x, b.y);
}
}
}
But when I try to compile it, nvcc returns an error:
error: no suitable constructor exists to convert from "unsigned int" to "uint2"
error: no suitable constructor exists to convert from "unsigned int" to "uint2"
error: too many arguments in function call
All these errors point to the "return…" line.
I was able to get a partial repro on VS 2010 and CUDA 4.0 (the compiler built the code OK but Intellisense flagged the error you are seeing). Try the following:
#include "vector_functions.h"
inline __host__ __device__ uint4 make_uint4(uint2 a, uint2 b)
{
return ::make_uint4(a.x, a.y, b.x, b.y);
}
This fixed it for me.
I have no problem compiling it in Visual Studio+nvcc. What compiler are you using?
If that would be of any help: make_uint4 is defined in vector_functions.h, line 170 as
static __inline__ __host__ __device__ uint4 make_uint4(unsigned int x, unsigned int y, unsigned int z, unsigned int w)
{
uint4 t; t.x = x; t.y = y; t.z = z; t.w = w; return t;
}
Update:
I get similar error when I try to overload the function while being inside my custom namespace. Are you certain you are not inside one? If so, try putting :: in front of function call to refer to global scope, i.e:
return ::make_uint4(a.x, a.y, b.x, b.y);
I don't have the library code, but it seems like the compiler doesn't like overloaded device functions (as they are treated just like really fancy inline macros). What is does is shadow (hide) the old make_uint4(a,b,c,d) with your new make_uint4(va, vb) and try to call the latter with 4 uint parameters. That doesn't work because there is no conversion from uint to uint2 (as indicated by the first two error messages) and there are 4 instead of 2 arguments (the last error message).
Use a slightly different function name like make_uint4_from_uint2s and you'll be fine.