I am using heavily BOOST_FOREACH for iteration over containers, and since I moved to c++0x recently, I thought I could replace BOOST_FOREACH with range-based for construct. The following piece of code
#include<vector>
#include<boost/shared_ptr.hpp>
#include<boost/range.hpp>
using std::vector; using boost::shared_ptr;
class Node;
int main(void){
vector<shared_ptr<Node>> nodes;
for(const shared_ptr<Node>& n: nodes);
}
does not compile with gcc 4.6, leading to
error: call of overloaded 'end(std::vector<boost::shared_ptr<Node> >&)' is ambiguous
note: candidates are:
/usr/include/c++/4.6/bits/range_access.h:78:5: note: decltype (__cont->end()) std::end(const _Container&) [with _Container = std::vector<boost::shared_ptr<Node> >, decltype (__cont->end()) = __gnu_cxx::__normal_iterator<const boost::shared_ptr<Node>*, std::vector<boost::shared_ptr<Node> > >]
/usr/include/c++/4.6/bits/range_access.h:68:5: note: decltype (__cont->end()) std::end(_Container&) [with _Container = std::vector<boost::shared_ptr<Node> >, decltype (__cont->end()) = __gnu_cxx::__normal_iterator<boost::shared_ptr<Node>*, std::vector<boost::shared_ptr<Node> > >]
/usr/include/boost/range/end.hpp:103:47: note: typename boost::range_iterator<const T>::type boost::end(const T&) [with T = std::vector<boost::shared_ptr<Node> >, typename boost::range_iterator<const T>::type = __gnu_cxx::__normal_iterator<const boost::shared_ptr<Node>*, std::vector<boost::shared_ptr<Node> > >]
/usr/include/boost/range/end.hpp:92:41: note: typename boost::range_iterator<C>::type boost::end(T&) [with T = std::vector<boost::shared_ptr<Node> >, typename boost::range_iterator<C>::type = __gnu_cxx::__normal_iterator<boost::shared_ptr<Node>*, std::vector<boost::shared_ptr<Node> > >]
Is there a way to avoid such ambiguity, or is range-based for simply unusable in such situation?
Tricky. You're pulling in std::end and boost::end because the associated namespaces of std::vector<boost::shared_ptr> are both std and boost. Both are templates that match.
However, a non-template end() would be an even better match. So, just provide your own:
inline std::vector<boost::shared_ptr<Node> >::iterator
end(std::vector<boost::shared_ptr<Node> > vsn&)
{
return std::end(vsn);
}
Related
I browsed through the internet, and I've only saw people doing forward declarations on class using the typedef keyword. But, I was wondering how'd I do that with functions/tasks?
I wanted to put the main function above the definitions of other functions/tasks to ease the reader when one is previewing it. In C++, forward declaration for a function looks something like this:
//forward declaration of sub2
int sub2(int A, int B);
int main(){
cout << "Difference: " << sub2(25, 10);
return 0;
}
int sub2(int A, int B) //Defining sub2 here{
return A - B;
}
For SystemVerilog, will it be something like this?
function somefunction();
virtual task body();
somefunction();
endtask: body
function somefunction();
// do something here.
endfunction: somefunction
Should I use typedef for forward declarations with functions/tasks?
Functions and tasks do not need to be declared before use as long as they have a set of trailing parenthesis () which may also include required arguments. They use search rules similar to hierarchical references. See section 23.8.1 Task and function name resolution in the IEEE 1800-2017 SystemVerilog LRM
Function declaration order doesn't matter, like in C. You can call somefunction in body before declaring it.
You don't need to do any kind of declarations.
I have the following function:
function void foo_arr_bit (int seed, ref bit [*] mem, string mem_name);
for (int i=0; i< mem.size(); i++)
mem[i] = my_randomize_int(seed, mem[i], mem_name);
endfunction: foo_arr_bit
I call the function by:
foo_arr_bit(seed, data_bit, "data_bit");
Where data_bit can be:
bit [1:0]/ bit [2:0]/ bit [3:0]/ bit [4:0]/ bit [5:0] etc...
When I try to compile I got the following error:
near "[": syntax error, unexpected [ , expecting IDENTIFIER or TYPE_IDENTIFIER or NETTYPE_IDENTIFIER.
[*] is not the correct syntax for a dynamic array, use [].
Your array can only be dynamic in the unpacked dimension. So you cannot have bit [] mem_array, but must have bit mem_array[].
Finally a function using pass by reference cannot have a static lifetime. That is, it must be declared as automatic.
function automatic void foo_arr_bit (int seed, ref bit mem[], string mem_name);
for (int i=0; i< mem.size(); i++)
mem[i] = my_randomize_int(seed, mem[i], mem_name);
endfunction: foo_arr_bit
Edit: But even with these changes you face a bigger issue. Passing by reference demands very strict typing. There is no casting allowed so I expect there to be issues with type conversion.
Furthermore, passing by reference is not really necessary in your case. Use inout instead.
function automatic void foo_arr_bit (input int seed, string mem_name, inout bit mem[]);
for (int i=0; i< mem.size(); i++)
mem[i] = my_randomize_int(seed, mem[i], mem_name);
endfunction: foo_arr_bit
I want to use thrust::reduce to find the max value in an array A. However, A[i]should only be chosen as the max if it also satisfies a particular boolean condition in another array B. For example, B[i] should be true. Is their a version of thrust::reduce that does this. I looked at the documentation and found only following API;
thrust::reduce(begin,end, default value, operator)
However, i was curious is their a version more suitable to my problem?
EDIT: Compilation fails in last line!
typedef thrust::device_ptr<int> IntIterator;
typedef thrust::device_ptr<float> FloatIterator;
typedef thrust::tuple<IntIterator,FloatIterator> IteratorTuple;
typedef thrust::zip_iterator<IteratorTuple> myZipIterator;
thrust::device_ptr<int> deviceNBMInt(gpuNBMInt);
thrust::device_ptr<int> deviceIsActive(gpuIsActive);
thrust::device_ptr<float> deviceNBMSim(gpuNBMSim);
myZipIterator iter_begin = thrust::make_zip_iterator(thrust::make_tuple(deviceIsActive,deviceNBMSim));
myZipIterator iter_end = thrust::make_zip_iterator(thrust::make_tuple(deviceIsActive + numRow,deviceNBMSim + numRow));
myZipIterator result = thrust::max_element(iter_begin, iter_end, Predicate());
Yes, there is. I guess you should take a look at Extrema And Zip iterator
Something like this should do the trick (not sure if this code works out of the box):
typedef thrust::device_ptr<bool> BoolIterator;
typedef thrust::device_ptr<float> ValueIterator;
BoolIterator bools_begin, bools_end;
ValueIterator values_begin, values_end;
// initialize these pointers
// ...
typedef thrust::tuple<BoolIterator, ValueIterator> IteratorTuple;
typedef thrust::tuple<bool, value> DereferencedIteratorTuple;
typedef thrust::zip_iterator<IteratorTuple> ZipIterator;
ZipIterator iter_begin(thrust::make_tuple(bools_begin, values_begin));
ZipIterator iter_end(thrust::make_tuple(bools_end, values_end));
struct Predicate
{
__host__ __device__ bool operator ()
(const DereferencedIteratorTuple& lhs,
const DereferencedIteratorTuple& lhs)
{
using thrust::get;
if (get<0>(lhs) && get<0>(rhs) ) return get<1>(lhs) < get<1>(rhs); else
return ! get<0>(lhs) ;
}
};
ZipIterator result = thrust::max_element(iter_begin, iter_end, Predicate());
Or you may consider similar technique with zip iterator with thrust::reduce. Or you can try with inner_product Not sure what will work faster.
I would like to use Thrust's stream compaction functionality (copy_if) for distilling indices of elements from a vector if the elements adhere to a number of constraints. One of these constraints depends on the values of neighboring elements (8 in 2D and 26 in 3D). My question is: how can I obtain the neighbors of an element in Thrust?
The function call operator of the functor for the 'copy_if' basically looks like:
__host__ __device__ bool operator()(float x) {
bool mark = x < 0.0f;
if (mark) {
if (left neighbor of x > 1.0f) return false;
if (right neighbor of x > 1.0f) return false;
if (top neighbor of x > 1.0f) return false;
//etc.
}
return mark;
}
Currently I use a work-around by first launching a CUDA kernel (in which it is easy to access neighbors) to appropriately mark the elements. After that, I pass the marked elements to Thrust's copy_if to distill the indices of the marked elements.
I came across counting_iterator as a sort of substitute for directly using threadIdx and blockIdx to acquire the index of the processed element. I tried the solution below, but when compiling it, it gives me a "/usr/include/cuda/thrust/detail/device/cuda/copy_if.inl(151): Error: Unaligned memory accesses not supported". As far as I know I'm not trying to access memory in an unaligned fashion. Anybody knows what's going on and/or how to fix this?
struct IsEmpty2 {
float* xi;
IsEmpty2(float* pXi) { xi = pXi; }
__host__ __device__ bool operator()(thrust::tuple<float, int> t) {
bool mark = thrust::get<0>(t) < -0.01f;
if (mark) {
int countindex = thrust::get<1>(t);
if (xi[countindex] > 1.01f) return false;
//etc.
}
return mark;
}
};
thrust::copy_if(indices.begin(),
indices.end(),
thrust::make_zip_iterator(thrust::make_tuple(xi, thrust::counting_iterator<int>())),
indicesEmptied.begin(),
IsEmpty2(rawXi));
#phoad: you're right about the shared mem, it struck me after I already posted my reply, subsequently thinking that the cache probably will help me. But you beat me with your quick response. The if-statement however is executed in less than 5% of all cases, so either using shared mem or relying on the cache will probably have negligible impact on performance.
Tuples only support 10 values, so that would mean I would require tuples of tuples for the 26 values in the 3D case. Working with tuples and zip_iterator was already quite cumbersome, so I'll pass for this option (also from a code readability stand point). I tried your suggestion by directly using threadIdx.x etc. in the device function, but Thrust doesn't like that. I seem to be getting some unexplainable results and sometimes I end up with an Thrust error. The following program for example generates a 'thrust::system::system_error' with an 'unspecified launch failure', although it first correctly prints "Processing 10" to "Processing 41":
struct printf_functor {
__host__ __device__ void operator()(int e) {
printf("Processing %d\n", threadIdx.x);
}
};
int main() {
thrust::device_vector<int> dVec(32);
for (int i = 0; i < 32; ++i)
dVec[i] = i + 10;
thrust::for_each(dVec.begin(), dVec.end(), printf_functor());
return 0;
}
Same applies to printing blockIdx.x Printing blockDim.x however generates no error. I was hoping for a clean solution, but I guess I am stuck with my current work-around solution.
I'm trying to find the minimum number in a array using Thrust and CUDA.
The following device example returns with 0 :
thrust::device_vector<float4>::iterator it = thrust::min_element(IntsOnDev.begin(),IntsOnDev.end(),equalOperator());
int pos = it - IntsOnDev.begin();
However, this host version works perfectly:
thrust::host_vector<float4>arr = IntsOnDev;
thrust::host_vector<float4>::iterator it2 = thrust::min_element(arr.begin(),arr.end(),equalOperator());
int pos2 = it2 - arr.begin();
the comperator type :
struct equalOperator
{
__host__ __device__
bool operator()(const float4 x,const float4 y) const
{
return ( x.w < y.w );
}
};
I just wanted to add that thrust::sort works with the same predicate.
Unfortunately, nvcc disagrees with some host compilers (some 64 bit versions of MSVC, if I recall correctly) about the size of certain aligned types. float4 is one of these. This often results in undefined behavior.
The work-around is to use types without alignment, for example my_float4:
struct my_float4
{
float x, y, z, w;
};