Using functors in CUDA - cuda

I have the following class functor in CUDA
class forSecondMax{
private:
int toExclude;
public:
__device__ void setToExclude(int val){
toExclude = val;
}
__device__ bool operator ()
(const DereferencedIteratorTuple& lhs, const DereferencedIteratorTuple& rhs)
{
using thrust::get;
//if you do <=, returns last occurence of largest element. < returns first
if (get<0>(lhs)== get<2>(lhs) /*&& get<0>(rhs) == get<2>(rhs)*/ && get<0>(lhs) != toExclude/* && get<0>(rhs)!= toExclude */) return get<1>(lhs) < get<1>(rhs); else
return true ;
}
};
is there a way to set the value of toExclude from the host?

All you need to do to solve achieve this is to define a constructor for the functor which sets the data member from an argument. So your class would look something like this:
class forSecondMax{
private:
int toExclude;
public:
__device__ __host__ forSecondMax(int x) : toExclude(x) {};
__device__ __host__ bool operator ()
(const DereferencedIteratorTuple& lhs,
const DereferencedIteratorTuple& rhs)
{
using thrust::get;
if (get<0>(lhs)== get<2>(lhs) && get<0>(lhs) != toExclude)
return get<1>(lhs) < get<1>(rhs);
else
return true ;
}
};
[disclaimer: written in browser, never tested or compiled, use at own risk]
To set the value prior to passing the functor to a thrust algorithm, create and instance of the functor and pass it to the thrust call, for example:
forSecondMax op(10);
thrust::remove_if(A.begin(), A.end(), op);
which would set the data member toExclude to a value of 10 in a new instance of the class, and use the instance in the stream compaction call.

Related

Is there a transparent string_view hash in the standard? [duplicate]

C++14 introduces Compare::is_transparent for equivalent find operations in associative containers.
template< class K > iterator find( const K& x );
template< class K > const_iterator find( const K& x ) const;
Finds an element with key that compares equivalent to the value x.
This overload only participates in overload resolution if the
qualified-id Compare::is_transparent is valid and denotes a type. It
allows calling this function without constructing an instance of Key
Since there is no longer temporary instance of Key constructed, these can be more efficient.
There does not seem to be an equivalent for unordered containers.
Why is there no Compare::key_equal / Compare::hash_equal?
I imagine it would be relatively simple to allow efficiently looking up of, eg, string literals in unordered containers?
template<>
struct hash<string>
{
std::size_t operator()(const string& s) const
{
return ...;
}
// hash_equal=true allows hashing string literals
std::size_t operator()(const char* s) const
{
return ...;
}
};
Keys that compare equal should produce the same hash value. Decoupling the hash function and the predicate, and at the same time making one or both heterogeneous, could be too much error prone.
Recent paper, P0919r2, brings up the following example:
std::hash<long>{}(-1L) == 18446744073709551615ULL
std::hash<double>{}(-1.0) == 11078049357879903929ULL
Although -1L and -1.0 compare equal, some heterogeneous hash function, not in line with the selected equality comparison logic, could produce different values. The paper adds heterogeneous lookup-enabled function templates --
find, count, equal_­range, and contains -- but makes them available when the below requirements are met [unord.req]/p17:
If the qualified-id Hash::transparent_­key_­equal is valid and denotes a type ([temp.deduct]), then the program is ill-formed if either:
qualified-id Hash::transparent_­key_­equal::is_­transparent is not valid or does not denote a type, or
Pred is a different type than equal_­to<Key> or Hash::transparent_­key_­equal.
The member function templates find, count, equal_­range, and contains shall not participate in overload resolution unless the qualified-id Hash::transparent_­key_equal is valid and denotes a type ([temp.deduct]).
In such a case, Hash::transparent_­key_­equal overwrites the default predicate (std::equal_to<Key>) and is used for (transparent) equality checking, together with Hash itself for (transparent) hashing.
Under these conditions, the below transparent function objects could be used to enable heterogeneous lookup:
struct string_equal
{
using is_transparent = void;
bool operator()(const std::string& l, const std::string& r) const
{
return l.compare(r) == 0;
}
bool operator()(const std::string& l, const char* r) const
{
return l.compare(r) == 0;
}
bool operator()(const char* l, const std::string& r) const
{
return r.compare(l) == 0;
}
};
struct string_hash
{
using transparent_key_equal = string_equal; // or std::equal_to<>
std::size_t operator()(const std::string& s) const
{
return s.size();
}
std::size_t operator()(const char* s) const
{
return std::strlen(s);
}
};
Both -- string_equal and std::equal_to<> -- are transparent comparators and can be used as transparent_key_equal for string_hash.
Having this type alias (or a type definition itself) within the hash function class definition makes it clear that it is a valid predicate that works fine with that particular hashing logic and the two can't diverge. Such an unordered set can be declared as:
std::unordered_set<std::string, string_hash> u;
or:
std::unordered_set<std::string, string_hash, string_hash::transparent_key_equal> u;
Either will use string_hash and string_equal.
If you watch the Grill the committee video from CppCon, they explain why stuff like this happens: nobody fought for it.
C++ is standardized by committee but that committee requires input from the community. Someone has to write papers, respond to criticism, go to the meetings, etc... Then the feature can be voted on. The committee doesn't just sit there inventing language and library features. It only discusses and votes on those that are brought forward to it.
The following example (derived from the accepted answer) compiles on Apple clang version 13.1.6. Note that I had to put is_transparent both in NodeHash and NodeEq.
#include <unordered_set>
struct Node {
int id;
int count;
};
struct NodeEq {
using is_transparent = void;
bool operator() (Node const& a, Node const& b) const { return a.id == b.id; };
bool operator() (Node const& n, int const i) const { return n.id == i; };
bool operator() (int const i, Node const& n) const { return n.id == i; };
};
struct NodeHash {
using is_transparent = void;
using transparent_key_equal = NodeEq;
std::size_t operator() (Node const& n) const noexcept { return n.id; };
std::size_t operator() (int n) const noexcept { return n; };
};
using nodes_t = std::unordered_set< Node, NodeHash, NodeHash::transparent_key_equal >;
int main() {
nodes_t nodes;
nodes.find(1);
}

CUDA thrust device pointer with transform copy crash

In CUDA 9.2 I have something like this:
#ifdef __CUDA_ARCH__
struct Context { float n[4]; } context;
#else
typedef __m128 Context;
#endif
struct A { float k[2]; };
struct B { float q[4]; };
struct FTransform : thrust::unary_function<A, B>
{
const Context context;
FTransform(Context context) : context(context){}
__device__ __host__ B operator()(const A& a) const
{
B b{{a.k[0], a.k[1], a.k[0]*context.n[0], a.k[1]*context.n[1]}};
return b;
}
};
void DoThrust(B* _bs, const Context& context, A* _as, uint32_t count)
{
thrust::device_ptr<B> bs = thrust::device_pointer_cast(_bs);
thrust::device_ptr<A> as = thrust::device_pointer_cast(_as);
FTransform fTransform(context);
auto first = thrust::make_transform_iterator(as, fTransform);
auto last = thrust::make_transform_iterator(as + count, fTransform);
thrust::copy(first, last, bs);
}
int main(int c, char **argv)
{
const uint32_t Count = 4;
Context context;
A* as;
B* bs;
cudaMalloc(&as, Count*sizeof(A));
cudaMalloc(&bs, Count*sizeof(B));
A hostAs[Count];
cudaMemcpy(as, hostAs, Count * sizeof(A), cudaMemcpyHostToDevice);
DoThrust(bs, context, as, Count);
B hostBs[Count];
cudaMemcpy(hostBs, bs, Count * sizeof(B), cudaMemcpyDeviceToHost);//crash
return 0;
}
Then when I call a standard cudaMemcpy() call later on the results I get the exception "an illegal memory access was encountered".
If I replace the thrust code with a non-thrust equivalent there is no error and everything works fine. Various combinations of trying to copy to device_vectors etc I get different crashes that seem to be thrust trying to release the device_ptr's for some reason - so maybe it is here for some reason?
== UPDATE ==
Ok that was confusing it appears it's due to the functor FTransform context member variable in my actual more complicated case. This specifically:
struct FTransform : thrust::unary_function<A, B>
{
#ifdef __CUDA_ARCH__
struct Context { float v[4]; } context;
#else
__m128 context;
#endif
...
};
So I guess it's an alignment problem somehow => in fact it is, as this works:
#ifdef __CUDA_ARCH__
struct __align__(16) Context { float v[4]; } context;
#else
__m128 context;
#endif
The solution is to ensure that if you use aligned types in thrust functor members (such as __m128 SSE types) that are copied to the GPU, that they are defined as aligned both during NVCC's CPU and GPU code build passes - and not accidentally assume even if a type may seem to naturally align to it's equivalent in the other pass that it will be ok, as otherwise bad hard to understand things may happen.
So for example the _ align _(16) is necessary in code like this:
struct FTransform : thrust::unary_function<A, B>
{
#ifdef __CUDA_ARCH__
struct __align__(16) Context { float v[4]; } context;
#else
__m128 context;
#endif
FTransform(Context context) : context(context){}
__device__ __host__ B operator()(const A& a) const; // function makes use of context
};

thrust copy_if with const source

My problem is in the following code:
The filter function compiles, and runs as it should when the source is not constant (the iterators are adjusted accordingly). However when I change the source to const, the compiler gives me the following error for the first two variables of the copy_if statement:
"the object has type qualifiers that are not compatible with the member function".
I believe there is a const to not const conversion error somewhere but frankly I have no idea where. Any help would be appreciated.
#include "thrust\device_vector.h"
#include "thrust\copy.h"
typedef thrust::device_vector<float>::const_iterator Dc_FloatIterator;
typedef thrust::device_vector<float>::iterator D_FloatIterator;
typedef thrust::device_vector<int>::const_iterator Dc_IntIterator;
typedef thrust::device_vector<int>::iterator D_IntIterator;
typedef thrust::tuple< Dc_IntIterator, Dc_IntIterator, Dc_FloatIterator> Dc_ListIteratorTuple;
typedef thrust::zip_iterator<Dc_ListIteratorTuple> Dc_ListIterator;//type of the class const iterator
typedef thrust::tuple< D_IntIterator, D_IntIterator, D_FloatIterator > D_ListIteratorTuple;
typedef thrust::zip_iterator<D_ListIteratorTuple> D_ListIterator;//type of the class iterator
struct selector{//selector functor for the copy if call
const int val;
selector(int _val) : val(_val) {}
__host__ __device__
bool operator()(const int& x ) {
return ( x == val );
}
};
class Foo{
public:
thrust::device_vector<int> ivec1;
thrust::device_vector<int> ivec2;
thrust::device_vector<float> fvec1;
Foo(){;}
~Foo(){;}
D_ListIterator begin(){//cast of begin iterator
return D_ListIterator(D_ListIteratorTuple( ivec1.begin(), ivec2.begin(), fvec1.begin() ));
}
D_ListIterator end(){//cast of end iterator
return D_ListIterator(D_ListIteratorTuple( ivec1.end(), ivec2.end(), fvec1.end() ));
}
Dc_ListIterator cbegin(){//cast of const begin iterator
return Dc_ListIterator(Dc_ListIteratorTuple( ivec1.cbegin(), ivec2.cbegin(), fvec1.cbegin() ));
}
Dc_ListIterator cend(){//cast of const end iterator
return Dc_ListIterator(Dc_ListIteratorTuple( ivec1.cend(), ivec2.cend(), fvec1.cend() ));
}
void const_filter( const Foo& TheOther, const int& target ){//doesnt work
//This function should copy those member of the vectors where
//the ivec2[i] == target is true
thrust::copy_if(
TheOther.cbegin(),
TheOther.cend(),
TheOther.ivec2.cbegin(),
this->begin(),
selector(target) );
}
void filter( Foo& TheOther, const int& target ){//works
//This function should copy those member of the vectors where
//the ivec2[i] == target is true
thrust::copy_if(
TheOther.begin(),
TheOther.end(),
TheOther.ivec2.cbegin(),
this->begin(),
selector(target) );
}
void insert(const int& one, const int& two,const float& three ){
ivec1.push_back(one);
ivec2.push_back(two);
fvec1.push_back(three);
}
int size(){
return ivec1.size();
}
};
bool CheckIfSublistIsConnected(const Foo& list,const int& sublist_num){
Foo tmp;
tmp.const_filter( list, sublist_num );
return (bool)tmp.size();//for symplicity, othervise here is a function that check if
//the edge list represents a connected graph
}
int main(void){
Foo list;
bool connected;
list.insert(10,2,1.0);
list.insert(11,2,1.0);
list.insert(12,2,1.0);
list.insert(10,3,1.0);
list.insert(10,3,1.0);
connected=CheckIfSublistIsConnected(list,2);
if( connected ) return 0;
else return -1;
}
I've found that replacing TheOther.cbegin() / .cend() with the folowing the compiler accepts it. This means I messed up somewhere in the typedef section, but where?
thrust::make_zip_iterator(
thrust::make_tuple(
TheOther.ivec1.cbegin(),
TheOther.ivec2.cbegin(),
TheOther.fvec1.cbegin() ))
As it comes out I've frogotten to add the const magic word at the definition of cend/cbegin.
Dc_ListIterator cbegin() const {
return Dc_ListIterator(Dc_ListIteratorTuple( ivec1.cbegin(), ivec2.cbegin(), fvec1.cbegin() ));
}
Dc_ListIterator cend() const {
return Dc_ListIterator(Dc_ListIteratorTuple( ivec1.cend(), ivec2.cend(), fvec1.cend() ));
}

use host function on device

How can I use a host function in a device one ?
For example in below function ,I want to return a value
__device__ float magnitude2( void ) {
return r * r + i * i;
}
But this function is a device function and I received this error :
calling a host function from a __device__/__global__ function is not allowed
What's the best approach for this problem ?
for extra comment on the code :
I want to define this struct :
struct cuComplex {
float r;
float i;
cuComplex( float a, float b ) : r(a), i(b) {}
__device__ float magnitude2( void ) {
return r * r + i * i;
}
__device__ cuComplex operator*(const cuComplex& a) {
return cuComplex(r*a.r - i*a.i, i*a.r + r*a.i);
}
__device__ cuComplex operator+(const cuComplex& a) {
return cuComplex(r+a.r, i+a.i);
}
};
Now that we know the question involves a C++ structure, the answer is obvious - the constructor of the class must also be available as a __device__ function in order to be able to instantiate the class inside a kernel. In your example, the structure should be defined like this:
struct cuComplex {
float r;
float i;
__device__ __host__
cuComplex( float a, float b ) : r(a), i(b) {}
__device__
float magnitude2( void ) {
return r * r + i * i;
}
__device__
cuComplex operator*(const cuComplex& a) {
return cuComplex(r*a.r - i*a.i, i*a.r + r*a.i);
}
__device__
cuComplex operator+(const cuComplex& a) {
return cuComplex(r+a.r, i+a.i);
}
};
The error you are seeing arises because the constructor needs to be called whenever the class is instantiated. In your original code, the constructor is a declared only as a host function, leading to a compilation error.

change SWIG wrapper-function return value

I'm using SWIG to make C# bindings that are compatible with the compact framework (WinCE). I've got most of the immediate issues worked through, but my next blocker is that some of the functions return a double. Wrappers are generated but they fail at run time because the CF framework will not marshal non-integral datatypes (http://msdn.microsoft.com/en-us/library/aa446536.aspx)
My example failure is an attempt to wrap this function:
double getMaxMagnification() const
{
return m_maxMag;
}
SWIG generates this wrapper:
SWIGEXPORT double SWIGSTDCALL CSharp_LTIImageFilter_getMaxMagnification(void * jarg1) {
double jresult ;
LizardTech::LTIImageFilter *arg1 = (LizardTech::LTIImageFilter *) 0 ;
double result;
arg1 = (LizardTech::LTIImageFilter *)jarg1;
result = (double)((LizardTech::LTIImageFilter const *)arg1)->getMaxMagnification();
jresult = result;
return jresult;
}
which is NG because it requires marshalling a double return value.
I manually changed this to return the double via a passed-in pointer:
SWIGEXPORT void SWIGSTDCALL CSharp_LTIImageFilter_getMaxMagnification(void * jarg1, void *jarg2) {
fprintf(stderr, "CSharp_LTIImageFilter_getMaxMagnification\n");
//double jresult ;
LizardTech::LTIImageFilter *arg1 = (LizardTech::LTIImageFilter *) 0 ;
double result;
arg1 = (LizardTech::LTIImageFilter *)jarg1;
result = (double)((LizardTech::LTIImageFilter const *)arg1)->getMaxMagnification();
*((double*)jarg2) = result;
//jresult = result ;
//return jresult;
}
After making the corresponding changes in the C# declaration file and implementation class, this works as expected.
That is,
Interop Declaration
NG:
[DllImport("LizardTech_SdkInterop.dll", EntryPoint="CSharp_LTIImageFilter_getMaxMagnification")]
public static extern double LTIImageFilter_getMaxMagnification(IntPtr jarg1);
OK:
[DllImport("LizardTech_SdkInterop.dll", EntryPoint="CSharp_LTIImageFilter_getMaxMagnification")]
public static extern void LTIImageFilter_getMaxMagnification(IntPtr jarg1, ref double jarg2);
Implementation class
NG:
public override double getMaxMagnification() {
double ret = RasterSDKPINVOKE.LTIImageFilter_getMaxMagnification(swigCPtr);
return ret;
}
OK:
public override double getMaxMagnification() {
double ret = 0;
RasterSDKPINVOKE.LTIImageFilter_getMaxMagnification(swigCPtr, ref ret);
return ret;
}
How can I get SWIG to do this for me? I think the tasks are:
(a) change the return type of the wrapper function (only) from double to void
(b) add an argument (pointer to double) to the argument list so that the wrapper can send back the value that way
(c) make the interop declaration reflect the above two changes
(d) make the C# wrapper invoke the new wrapper function.
As always big-picture re-orientation is appreciated.
I'm indebted to David Piepgrass for this. It's not perfect but its good enough for me.
http://sourceforge.net/mailarchive/message.php?msg_id=26952332
////////////////////////////////////////////////////////////////////////////////
// Floating-point value marshalling for .NET Compact Framework:
// All floating-point values must be passed by reference. MULTITHREADING DANGER:
// For return values a pointer to a static variable is returned.
%define %cs_compact_framework_float(FLOAT)
%typemap(ctype, out="FLOAT*") FLOAT "FLOAT*"
%typemap(ctype, out="FLOAT*") FLOAT*, FLOAT&, const FLOAT& "FLOAT*"
%typemap(imtype, out="IntPtr") FLOAT, FLOAT*, FLOAT&, const FLOAT& "ref FLOAT"
%typemap(cstype, out="FLOAT") FLOAT, const FLOAT& "FLOAT"
%typemap(cstype, out="FLOAT") FLOAT*, FLOAT& "ref FLOAT"
%typemap(in) FLOAT %{ $1 = *$input; %}
%typemap(in) FLOAT*, FLOAT&, const FLOAT& %{ $1 = $input; %}
%typemap(out, null="NULL") FLOAT, FLOAT*, FLOAT&, const FLOAT& %{
// Not thread safe! FLOAT must be returned as a pointer in Compact Framework
static FLOAT out_temp;
out_temp = $1;
$result = &out_temp;
%}
%typemap(csin) FLOAT, const FLOAT& "ref $csinput"
%typemap(csin) FLOAT*, FLOAT& "ref $csinput"
%typemap(csout, excode=SWIGEXCODE) FLOAT, FLOAT*, FLOAT&, const FLOAT& {
IntPtr ptr = $imcall;$excode
FLOAT ret = (FLOAT)Marshal.PtrToStructure(ptr, typeof(FLOAT));
return ret;
}
%typemap(csvarout, excode=SWIGEXCODE2) FLOAT, FLOAT*, FLOAT&, const FLOAT&
%{
get {
IntPtr ptr = $imcall;$excode
FLOAT ret = (FLOAT)Marshal.PtrToStructure(ptr, typeof(FLOAT));
return ret;
}
%}
%enddef
%cs_compact_framework_float(float)
%cs_compact_framework_float(double)