What causes the dynamic initialization of a managed variable in my code? - cuda

I have the following code (godbolt link):
#include <cstdint>
class V
{
private:
class Buffer
{
private:
uint8_t m_buffer[10];
};
Buffer m_values;
std::size_t m_nFilled{0};
};
__managed__ V v;
which fails to compile with the message:
error: dynamic initialization is not supported for a managed
variable
I don't see where the dynamic initialization happens here.
What's interesting is that if I comment out either m_values or m_nFilled or both, it compiles.

Related

How to provide cpp class destructor member information in cython [duplicate]

I'm using the pimpl-idiom with std::unique_ptr:
class window {
window(const rectangle& rect);
private:
class window_impl; // defined elsewhere
std::unique_ptr<window_impl> impl_; // won't compile
};
However, I get a compile error regarding the use of an incomplete type, on line 304 in <memory>:
Invalid application of 'sizeof' to an incomplete type 'uixx::window::window_impl'
For as far as I know, std::unique_ptr should be able to be used with an incomplete type. Is this a bug in libc++ or am I doing something wrong here?
Here are some examples of std::unique_ptr with incomplete types. The problem lies in destruction.
If you use pimpl with unique_ptr, you need to declare a destructor:
class foo
{
class impl;
std::unique_ptr<impl> impl_;
public:
foo(); // You may need a def. constructor to be defined elsewhere
~foo(); // Implement (with {}, or with = default;) where impl is complete
};
because otherwise the compiler generates a default one, and it needs a complete declaration of foo::impl for this.
If you have template constructors, then you're screwed, even if you don't construct the impl_ member:
template <typename T>
foo::foo(T bar)
{
// Here the compiler needs to know how to
// destroy impl_ in case an exception is
// thrown !
}
At namespace scope, using unique_ptr will not work either:
class impl;
std::unique_ptr<impl> impl_;
since the compiler must know here how to destroy this static duration object. A workaround is:
class impl;
struct ptr_impl : std::unique_ptr<impl>
{
~ptr_impl(); // Implement (empty body) elsewhere
} impl_;
As Alexandre C. mentioned, the problem comes down to window's destructor being implicitly defined in places where the type of window_impl is still incomplete. In addition to his solutions, another workaround that I've used is to declare a Deleter functor in the header:
// Foo.h
class FooImpl;
struct FooImplDeleter
{
void operator()(FooImpl *p);
};
class Foo
{
...
private:
std::unique_ptr<FooImpl, FooImplDeleter> impl_;
};
// Foo.cpp
...
void FooImplDeleter::operator()(FooImpl *p)
{
delete p;
}
Note that using a custom Deleter function precludes the use of std::make_unique (available from C++14), as already discussed here.
use a custom deleter
The problem is that unique_ptr<T> must call the destructor T::~T() in its own destructor, its move assignment operator, and unique_ptr::reset() member function (only). However, these must be called (implicitly or explicitly) in several PIMPL situations (already in the outer class's destructor and move assignment operator).
As already pointed out in another answer, one way to avoid that is to move all operations that require unique_ptr::~unique_ptr(), unique_ptr::operator=(unique_ptr&&), and unique_ptr::reset() into the source file where the pimpl helper class is actually defined.
However, this is rather inconvenient and defies the very point of the pimpl idoim to some degree. A much cleaner solution that avoids all that is to use a custom deleter and only move its definition into the source file where the pimple helper class lives. Here is a simple example:
// file.h
class foo
{
struct pimpl;
struct pimpl_deleter { void operator()(pimpl*) const; };
std::unique_ptr<pimpl,pimpl_deleter> m_pimpl;
public:
foo(some data);
foo(foo&&) = default; // no need to define this in file.cc
foo&operator=(foo&&) = default; // no need to define this in file.cc
//foo::~foo() auto-generated: no need to define this in file.cc
};
// file.cc
struct foo::pimpl
{
// lots of complicated code
};
void foo::pimpl_deleter::operator()(foo::pimpl*ptr) const { delete ptr; }
Instead of a separate deleter class, you can also use a free function or static member of foo:
class foo {
struct pimpl;
static void delete_pimpl(pimpl*);
using deleter = void(&)(pimpl*);
std::unique_ptr<pimpl,deleter> m_pimpl;
public:
foo(some data);
};
Probably you have some function bodies within .h file within class that uses incomplete type.
Make sure that within your .h for class window you have only function declaration. All function bodies for window must be in .cpp file. And for window_impl as well...
Btw, you have to explicitly add destructor declaration for windows class in your .h file.
But you CANNOT put empty dtor body in you header file:
class window {
virtual ~window() {};
}
Must be just a declaration:
class window {
virtual ~window();
}
To add to the other's replies about the custom deleter, in our internal "utilities library" I added a helper header to implement this common pattern (std::unique_ptr of an incomplete type, known only to some of the TU to e.g. avoid long compile times or to provide just an opaque handle to clients).
It provides the common scaffolding for this pattern: a custom deleter class that invokes an externally-defined deleter function, a type alias for a unique_ptr with this deleter class, and a macro to declare the deleter function in a TU that has a complete definition of the type. I think that this has some general usefulness, so here it is:
#ifndef CZU_UNIQUE_OPAQUE_HPP
#define CZU_UNIQUE_OPAQUE_HPP
#include <memory>
/**
Helper to define a `std::unique_ptr` that works just with a forward
declaration
The "regular" `std::unique_ptr<T>` requires the full definition of `T` to be
available, as it has to emit calls to `delete` in every TU that may use it.
A workaround to this problem is to have a `std::unique_ptr` with a custom
deleter, which is defined in a TU that knows the full definition of `T`.
This header standardizes and generalizes this trick. The usage is quite
simple:
- everywhere you would have used `std::unique_ptr<T>`, use
`czu::unique_opaque<T>`; it will work just fine with `T` being a forward
declaration;
- in a TU that knows the full definition of `T`, at top level invoke the
macro `CZU_DEFINE_OPAQUE_DELETER`; it will define the custom deleter used
by `czu::unique_opaque<T>`
*/
namespace czu {
template<typename T>
struct opaque_deleter {
void operator()(T *it) {
void opaque_deleter_hook(T *);
opaque_deleter_hook(it);
}
};
template<typename T>
using unique_opaque = std::unique_ptr<T, opaque_deleter<T>>;
}
/// Call at top level in a C++ file to enable type %T to be used in an %unique_opaque<T>
#define CZU_DEFINE_OPAQUE_DELETER(T) namespace czu { void opaque_deleter_hook(T *it) { delete it; } }
#endif
May be not a best solution, but sometimes you may use shared_ptr instead.
If course it's a bit an overkill, but... as for unique_ptr, I'll perhaps wait 10 years more until C++ standard makers will decide to use lambda as a deleter.
Another side.
Per your code it may happen, that on destruction stage window_impl will be incomplete. This could be a reason of undefined behaviour.
See this:
Why, really, deleting an incomplete type is undefined behaviour?
So, if possible I would define a very base object to all your objects, with virtual destructor. And you're almost good. You just should keep in mind that system will call virtual destructor for your pointer, so you should define it for every ancestor. You should also define base class in inheritance section as a virtual (see this for details).
Using extern template
The issue with using std::unique_ptr<T> where T is an incomplete type is that unique_ptr needs to be able to delete an instance of T for various operations. The class unique_ptr uses std::default_delete<T> to delete the instance. Hence, in an ideal world, we would just write
extern template class std::default_delete<T>;
to prevent std::default_delete<T> from being instantiated. Then, declaring
template class std::default_delete<T>;
at a place where T is complete, would instantiate the template.
The issue here is that default_delete actually defines inline methods that will not be instantiated. So, this idea does not work. We can, however, work around this problem.
First, let us define a deleter that does not inline the call operator.
/* --- opaque_ptr.hpp ------------------------------------------------------- */
#ifndef OPAQUE_PTR_HPP_
#define OPAQUE_PTR_HPP_
#include <memory>
template <typename T>
class opaque_delete {
public:
void operator() (T* ptr);
};
// Do not move this method into opaque_delete, or it will be inlined!
template <typename T>
void opaque_delete<T>::operator() (T* ptr) {
std::default_delete<T>()(ptr);
}
Furthermore, for ease of use, define a type opaque_ptr which combines unique_ptr with opaque_delete, and analogously to std::make_unique, we define make_opaque.
/* --- opaque_ptr.hpp cont. ------------------------------------------------- */
template <typename T>
using opaque_ptr = std::unique_ptr<T, opaque_delete<T>>;
template<typename T, typename... Args>
inline opaque_ptr<T> make_opaque(Args&&... args)
{
return opaque_ptr<T>(new T(std::forward<Args>(args)...));
}
#endif
The type opaque_delete can now be used with the extern template construction. Here is an example.
/* --- foo.hpp -------------------------------------------------------------- */
#ifndef FOO_HPP_
#define FOO_HPP_
#include "opaque_ptr.hpp"
class Foo {
public:
Foo(int n);
void print();
private:
struct Impl;
opaque_ptr<Impl> m_ptr;
};
// Do not instantiate opaque_delete.
extern template class opaque_delete<Foo::Impl>;
#endif
Since we prevent opaque_delete from being instantiated this code compiles without errors. To make the linker happy, we instantiate opaque_delete in our foo.cpp.
/* --- foo.cpp -------------------------------------------------------------- */
#include "foo.hpp"
#include <iostream>
struct Foo::Impl {
int n;
};
// Force instantiation of opaque_delete.
template class opaque_delete<Foo::Impl>;
The remaining methods could be implemented as follows.
/* --- foo.cpp cont. -------------------------------------------------------- */
Foo::Foo(int n)
: m_ptr(new Impl)
{
m_ptr->n = n;
}
void Foo::print() {
std::cout << "n = " << m_ptr->n << std::endl;
}
The advantage of this solution is that, once opaque_delete is defined, the required boilerplate code is rather small.

exposing nonstatic member function of class to chaiscript

I have a project that tries to implement keyboard macro scripting with chaiscript. I am writing a class based on xlib to wrap the xlib code.
I have a member function to add a modifier key to an ignored list, because of a xlib quirk.
how could i do something like the following minimal example.
#include <chaiscript/chaiscript.hpp>
#include <functional>
class MacroEngine{
public:
MacroEngine() = default;
//...
void addIgnoredMod(int modifier){
ignoredMods |= modifier;
}
//...
private:
int ignoredMods;
};
int main(int argc, char *argv[]){
MacroEngine me;
chaiscript::ChaiScript chai;
//...
chai.add(chaiscript::fun(std::bind(&MacroEngine::addIgnoredMod, me, std::placeholders::_1)), "setIgnoredMods");
//...
return 0;
}
I tried bind and failed with the following error message:
In file included from ../deps/ChaiScript/include/chaiscript/dispatchkit/proxy_functions_detail.hpp:24:0,
from ../deps/ChaiScript/include/chaiscript/dispatchkit/proxy_functions.hpp:27,
from ../deps/ChaiScript/include/chaiscript/dispatchkit/proxy_constructors.hpp:14,
from ../deps/ChaiScript/include/chaiscript/dispatchkit/dispatchkit.hpp:34,
from ../deps/ChaiScript/include/chaiscript/chaiscript_basic.hpp:12,
from ../deps/ChaiScript/include/chaiscript/chaiscript.hpp:823,
from ../src/main.cpp:2:
../deps/ChaiScript/include/chaiscript/dispatchkit/callable_traits.hpp: In instantiation of ‘struct chaiscript::dispatch::detail::Callable_Traits<std::_Bind<void (MacroEngine::*(MacroEngine, std::_Placeholder<1>))(unsigned int)> >’:
../deps/ChaiScript/include/chaiscript/language/../dispatchkit/register_function.hpp:45:72: required from ‘chaiscript::Proxy_Function chaiscript::fun(const T&) [with T = std::_Bind<void (MacroEngine::*(MacroEngine, std::_Placeholder<1>))(unsigned int)>; chaiscript::Proxy_Function = std::shared_ptr<chaiscript::dispatch::Proxy_Function_Base>]’
../src/main.cpp:21:95: required from here
../deps/ChaiScript/include/chaiscript/dispatchkit/callable_traits.hpp:99:84: error: decltype cannot resolve address of overloaded function
typedef typename Function_Signature<decltype(&T::operator())>::Signature Signature;
^~~~~~~~~
../deps/ChaiScript/include/chaiscript/dispatchkit/callable_traits.hpp:100:86: error: decltype cannot resolve address of overloaded function
typedef typename Function_Signature<decltype(&T::operator())>::Return_Type Return_Type;
^~~~~~~~~~~
I also tried to make the variable static which worked, but it wont work if I try to make it possible to ignore modifiers on a per hotkey basis.
what am i doing wrong? and how can I fix it?
You can do this instead:
chai.add(chaiscript::fun(&MacroEngine::addIgnoredMod, &me), "addIgnoredMod");
Or use a lambda:
chai.add(chaiscript::fun([&me](int modifier){ me.addIgnoredMod(modifier); }), "addIgnoredMod");
Jason Turner, the creator of Chaiscript, commented on it here: http://discourse.chaiscript.com/t/may-i-use-std-bind/244/4
"There’s really never any good reason to use std::bind. I much better solution is to use a lambda (and by much better, I mean much much better. std::bind adds to compile size, compile time and runtime)."

nvcc warns about a device variable being a host variable - why?

I've been reading in the CUDA Programming Guide about template functions and is something like this working?
#include <cstdio>
/* host struct */
template <typename T>
struct Test {
T *val;
int size;
};
/* struct device */
template <typename T>
__device__ Test<T> *d_test;
/* test function */
template <typename T>
T __device__ testfunc() {
return *d_test<T>->val;
}
/* test kernel */
__global__ void kernel() {
printf("funcout = %g \n", testfunc<float>());
}
I get the correct result but a warning:
"warning: a host variable "d_test [with T=T]" cannot be directly read in a device function" ?
Has the struct in the testfunction to be instantiated with *d_test<float>->val ?
KR,
Iggi
Unfortunately, the CUDA compiler seems to generally have some issues with variable templates. If you look at the assembly, you'll see that everything works just fine. The compiler clearly does instantiate the variable template and allocates a corresponding device object.
.global .align 8 .u64 _Z6d_testIfE;
The generated code uses this object just like it's supposed to
ld.global.u64 %rd3, [_Z6d_testIfE];
I'd consider this warning a compiler bug. Note that I cannot reproduce the issue with CUDA 10 here, so this issue has most likely been fixed by now. Consider updating your compiler…
#MichaelKenzel is correct.
This is almost certainly an nvcc bug - which I have now filed (you might need an account to access that.
Also note I've been able to reproduce the issue with less code:
template <typename T>
struct foo { int val; };
template <typename T>
__device__ foo<T> *x;
template <typename T>
int __device__ f() { return x<T>->val; }
__global__ void kernel() { int y = f<float>(); }
and have a look at the result on GodBolt as well.

Declaring a global device array using a pointer with cudaMemcpyFromSymbol

When I use the following code, it shows the correct value 3345.
#include <iostream>
#include <cstdio>
__device__ int d_Array[1];
__global__ void foo(){
d_Array[0] = 3345;
}
int main()
{
foo<<<1,1>>>();
cudaDeviceSynchronize();
int h_Array[1];
cudaMemcpyFromSymbol(&h_Array, d_Array, sizeof(int));
std::cout << "values: " << h_Array[0] << std::endl;
}
But if we replace the line of code __device__ int d_Array[1]; by
__device__ int *d_Array; it shows a wrong value. Why?
The problem is in memory allocation. Try the same thing on C++ (on host) and you will either get an error or unexpected value.
In addition, you can check the Cuda errors calling cudaGetLastError() after your kernel. In the first case everything is fine, and the result is cudaSuccess. In the second case there is cudaErrorLaunchFailure error. Here is the explanation of this error (from cuda toolkit documentation):
"An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. The device cannot be used until cudaThreadExit() is called. All existing device memory allocations are invalid and must be reconstructed if the program is to continue using CUDA."
Note that cudaMemcpyToSymbol also supports the offset parameter for array indexing

how can i execute a host class function in a CUDA kernel

I have a genetic algorithm and i'm traying to evaluate a population of chromosome on GPU :
class chromosome
{
int fitness;
int gene(int pos) { .... };
};
class eval
{
public :
__global__ doEval(Chromosome *population)
{
....
int jobid = population[tid].gene(X);
population[tid].fitness = Z;
....
}
};
int main()
{
Chromosome *dev_population;
Eval eval;
eval.doEval<<<1,N>>>(dev_population);
}
and i have this errors :
ga3.cu(121): warning: inline qualifier ignored for "global" function
ga3.cu(121): error: illegal combination of memory qualifiers
ga3.cu(323): error: a pointer to a bound function may only be used to call the function
ga3.cu(398): warning: nested comment is not allowed
where are the problems ?
i remove Eval class and left only doEval function , and make device host gene() , like this :
\__device\__ \__host\__ gene()
{....};
\__global\__ doEval(Chromosome *population)
{
....
int jobid = population[tid].gene(X);
population[tid].fitness = Z;
....
}
int main()
{
Chromosome *dev_population;
doEval<<<1,N>>>(dev_population);
}
but now i have have other errors , and it's not compile :
/usr/include/c++/4.6/iomanip(66): error: expected an expression
/usr/include/c++/4.6/iomanip(96): error: expected an expression
/usr/include/c++/4.6/iomanip(127): error: expected an expression
/usr/include/c++/4.6/iomanip(195): error: expected an expression
/usr/include/c++/4.6/iomanip(225): error: expected an expression
5 errors detected in the compilation of "/tmp/tmpxft_00006fe9_00000000-4_ga3.cpp1.ii".
There are two problems here, one soluble, the other one not.
It is illegal in CUDA for a __global__ function (ie. kernel) to be defined as a class member function. So doEval can never be defined as a member of eval. You are free to call a kernel in a structure or class member function, but a kernel cannot be a member function. You will have to redesign this class, there is no work around.
Any function called device code must be explicitly denoted as a device function and be instantiated and compiled for the device. This applies to both regular functions and class member functions. All functions are treated by nvcc as host functions unless identified as otherwise. You can, therefore, fix this error by doing something like the following:
class chromosome
{
int fitness;
__device__ __host__ int gene(int pos) { .... };
};
Note that every function called by gene must also have a valid device definition for the code to successfully compile.