wrap a cusp sparse matrix variable in thrust pointer - cuda

I am using cusp for sparse matrix multiplication. From the resultant matrix i need the max value without copying the matrix from device memory to host memory. I am planning to wrap the resultant matrix in thrust device pointer and then use the function thrust::max_element to get max element. The matrices are in coo format. If C is resultant sparse matrix then
C.row_indices[] : contains row number
C.column_indices[]: contains column number
C.values[]: contain actual value
So basically i need highest value from C.values array.
Using
thrust::device_ptr<int> dev_ptr = C.values;
is giving error
error: no instance of constructor "thrust::device_ptr<T>::device_ptr [with T=int]" matches the argument list
argument types are: (cusp::array1d<float, cusp::host_memory>)
How can i wrap my resultant matrix in order to use it in thrust library ?

If my device matrix definition is like this:
cusp::coo_matrix<int, double, cusp::device_memory> A_d = A;
Then try this:
thrust::device_ptr<double> dev_ptr = &(A_d.values[0]);
thrust::device_ptr<double> max_ptr = thrust::max_element(dev_ptr, dev_ptr + 6);

Related

Convert raw data to a vector of complex numbers in Thrust

I have a pointer to raw data of complex numbers in interleaved format, i.e. real and imaginary parts stored alternately - R I R I R I ...
How do I convert this to a host (or device) vector of thrust::complex without incurring extra copy?
The following does not work -
double dos[8] = {9.3252,2.3742,7.2362,5.3562,2.3323,2.2322,7.2362,3.2352};
thrust::host_vector<thrust::complex<double > > comp(dos, dos+8);
Just cast. Something like this:
double dos[8] = {9.3252,2.3742,7.2362,5.3562,2.3323,2.2322,7.2362,3.2352};
typedef thrust::complex<double> cdub;
cdub* cdos = reinterpret_cast<cdub*>(&dos[0]);
thrust::host_vector<cdub> comp(cdos, cdos+4);
should work.

Cuda/Thrust: remove_if doesn't change device_vector.size()?

I have a somewhat rather simple cuda question that seems like it should be a straight forward operation: removing elements from 1 array based on the value of a 2nd bool array. The steps I take are:
Create a device_vector of bools with the same size as the processed input array.
Call kernel which will set some of the elements from (1) to true
Call remove_if on input array with predicate using processed array from (2).
For each value in the bool array that is set to true, remove the corresponding element from the input array.
What I am seeing is that the input array isn't changed and I am not sure why ?
struct EntryWasDeleted
{
__device__ __host__
bool operator()(const bool ifDeleted)
{ return true; }
};
//This array has about 200-300 elements
//thrust::device_vector<SomeStruct> & arrayToDelete
thrust::device_vector<bool>* deletedEntries =
new thrust::device_vector<bool>(arrayToDelete.size(), false);
cuDeleteTestEntries<<<grid, block>>>( thrust::raw_pointer_cast(arrayToDelete.data()), countToDelete, heapAccess, thrust::raw_pointer_cast(deletedEntries->data()));
cudaDeviceSynchronize();
thrust::remove_if(arrayToDelete.begin(), arrayToDelete.end(), deletedEntries->begin(), EntryWasDeleted());
//I am expecting testEntries to have 0 elements
thrust::host_vector<SomeStruct> testEntries = arrayToDelete;
for( int i = 0; i<testEntries.size(); i++)
{ printf( "%d", testEntries[i].someValue); }
In this sample, I am always returning true in the predicate for testing. However, when I do: testEntries = deletedEntries and output the members. I can validate that deletedEntries is properly filled in with trues and falses.
My expectation would be that testEntries would have 0 elements. But it doesn't and I get an output as if remove_if didn't do anything. ie: the output is showing ALL elements from the input array. I am not sure why? Is there a specific way to remove elements from a device_vector?
So you need to capture the iterator that is being returned from remove_if
thrust::device_vector<SomeStruct>::iterator endIterator =
thrust::remove_if(arrayToDelete.begin(), arrayToDelete.end(),
deletedEntries->begin(), EntryWasDeleted());
Then when you copy data back to the host instead of using thrusts default assignment operator between host and device do this:
thrust::host_vector<SomeStruct> testEntries(arrayToDelete.begin(),endIterator);
As a side note working with arrays of primitives can often be much more efficient. Like can you store the index of your structs in an array instead and operate on those indexes?

How to store data of a file using thrust::host_vector or device_vector?

The format of data is something like this:
TGCCACAGGTTCCACACAACGGGACTTGGTTGAAATATTGAGATCCTTGGGGGTCTGTTAATCGGAGACAGTATCTCAACCGCAATAAACCC
GTTCACGGGCCTCACGCAACGGGGCCTGGCCTAGATATTGAGGCACCCAACAGCTCTTGGCCTGAGAGTGTTGTCTCGATCACGACGCCAGT
TGCCACAGGTTCCACACAACGGGACTTGGTTGAAATATTGAGATCCTTGGGGGTCTGTTAATCGAAGACAGTATCTCAACCGCAATAAACCT
TGCCACAGGTTCCACACAACGGGACTTGGTTGAAATATTGAGATCCTTGGGGGTCTGTTAATCGAAGACAGTATCTCAACCGCAATAAACCT
Each line contains one sequence, I want to make a pair of (key ,value), key is one sequence and value is 1. Then use reduce_by_key to count the number of each sequence.
But I found that thrust::host_vector can only store one sequence, if I push_back the 2nd sequence the program crashed.
Here is my code:
int main()
{
ifstream input_subset("subset.txt");
thrust::host_vector < string > h_output_subset;
string s;
while (getline(input_subset, s)) {
h_output_subset.push_back(s);
}
cout << h_output_subset.size() << endl;
return 0;
}
Is that possible to store all of data in a host_vector or a device_vector? Or is there any way to solve this problem?
The host_vector segfault was confirmed as a bug in thrust::uninitialised_copy and a patch has been applied to fix it.
The problem doing this with a device_vector is a genuine limitation of CUDA (no std::string support) and can't be avoided. An alternative would be to use a fixed length char[] array as a data member in a device_vector, or use a single large device_vector to hold all the string data, with a second device_vector holding the starting index of each sub-string within the character array.

Thrust Complex Transform of 3 different size vectors

Hello I have this loop in C+, and I was trying to convert it to thrust but without getting the same results...
Any ideas?
thank you
C++ Code
for (i=0;i<n;i++)
for (j=0;j<n;j++)
values[i]=values[i]+(binv[i*n+j]*d[j]);
Thrust Code
thrust::fill(values.begin(), values.end(), 0);
thrust::transform(make_zip_iterator(make_tuple(
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))),
binv.begin(),
thrust::make_permutation_iterator(d.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n))))),
make_zip_iterator(make_tuple(
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))) + n,
binv.end(),
thrust::make_permutation_iterator(d.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n))) + n)),
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))),
function1()
);
Thrust Functions
struct IndexDivFunctor: thrust::unary_function<int, int>
{
int n;
IndexDivFunctor(int n_) : n(n_) {}
__host__ __device__
int operator()(int idx)
{
return idx / n;
}
};
struct IndexModFunctor: thrust::unary_function<int, int>
{
int n;
IndexModFunctor(int n_) : n(n_) {}
__host__ __device__
int operator()(int idx)
{
return idx % n;
}
};
struct function1
{
template <typename Tuple>
__host__ __device__
double operator()(Tuple v)
{
return thrust::get<0>(v) + thrust::get<1>(v) * thrust::get<2>(v);
}
};
To begin with, some general comments. Your loop
for (i=0;i<n;i++)
for (j=0;j<n;j++)
v[i]=v[i]+(B[i*n+j]*d[j]);
is the equivalent of the standard BLAS gemv operation
where the matrix is stored in row major order. The optimal way to do this on the device would be using CUBLAS, not something constructed out of thrust primitives.
Having said that, there is absolutely no way the thrust code you posted is ever going to do what your serial code does. The errors you are seeing are not as a result of floating point associativity. Fundamentally thrust::transform applies the functor supplied to every element of the input iterator and stores the result on the output iterator. To yield the same result as the loop you posted, the thrust::transform call would need to perform (n*n) operations of the fmad functor you posted. Clearly it does not. Further, there is no guarantee that thrust::transform would perform the summation/reduction operation in a fashion that would be safe from memory races.
The correct solution is probably going to be something like:
Use thrust::transform to compute the (n*n) products of the elements of B and d
Use thrust::reduce_by_key to reduce the products into partial sums, yielding Bd
Use thrust::transform to add the resulting matrix-vector product to v to yield the final result.
In code, firstly define a functor like this:
struct functor
{
template <typename Tuple>
__host__ __device__
double operator()(Tuple v)
{
return thrust::get<0>(v) * thrust::get<1>(v);
}
};
Then do the following to compute the matrix-vector multiplication
typedef thrust::device_vector<int> iVec;
typedef thrust::device_vector<double> dVec;
typedef thrust::counting_iterator<int> countIt;
typedef thrust::transform_iterator<IndexDivFunctor, countIt> columnIt;
typedef thrust::transform_iterator<IndexModFunctor, countIt> rowIt;
// Assuming the following allocations on the device
dVec B(n*n), v(n), d(n);
// transformation iterators mapping to vector rows and columns
columnIt cv_begin = thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n));
columnIt cv_end = cv_begin + (n*n);
rowIt rv_begin = thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n));
rowIt rv_end = rv_begin + (n*n);
dVec temp(n*n);
thrust::transform(make_zip_iterator(
make_tuple(
B.begin(),
thrust::make_permutation_iterator(d.begin(),rv_begin) ) ),
make_zip_iterator(
make_tuple(
B.end(),
thrust::make_permutation_iterator(d.end(),rv_end) ) ),
temp.begin(),
functor());
iVec outkey(n);
dVec Bd(n);
thrust::reduce_by_key(cv_begin, cv_end, temp.begin(), outkey.begin(), Bd.begin());
thrust::transform(v.begin(), v.end(), Bd.begin(), v.begin(), thrust::plus<double>());
Of course, this is a terribly inefficient way to do the computation compared to using a purpose designed matrix-vector multiplication code like dgemv from CUBLAS.
How much your results differ? Is it a completely different answer, or differs only on the last digits? Is the loop executed only once, or is it some kind of iterative process?
Floating point operations, especially those that repetedly add up or multiply certain values, are not associative, because of precision issues. Moreover, if you use fast-math optimisations, the operations may not be IEEE compilant.
For starters, check out this wikipedia section on floating-point numbers: http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems

What is the value of a dereferenced pointer

I realized that I had some confusion regarding the value of a dereferenced pointer, as I was reading a C text with the following code snippet:
int main()
{
int matrix[3][10]; // line 3: matrix is tentatively defined
int (* arrPtr)[10] = matrix; // line 4: arrPtr is defined and initialize
(*arrPtr)[0] = 5; // line 5: what is the value of (*arrPtr) ?
My confusion is in regards to the value of *arrPtr in the last line. This is my understanding upto that point.
Line 3, matrix is declard (tentatively defined) to be an array of 3 elements of type array of 10 elements of type int.
Line 4, arrPtr is defined as a pointer to an array of 10 elements of type int. It is also initialized as a ptr to an array of 10 elements (i.e. the first row of matrix)
Now Line 5, arrPtr is dereferenced, yielding the actual array, so it's type is array of 10 ints.
MY question: Why is the value of the array, just the address of the array and not in someway related to it's elements?
The value of the array variable matrix is the array, however it (easily) "degrades" into a pointer to its first item, which you then assign to arrPtr.
To see this, use &matrix (has type int (*)[3][10]) or sizeof matrix (equals sizeof(int) * 3 * 10).
Additionally, there's nothing tentative about that definition.
Edit: I missed the question hiding in the code comments: *arrPtr is an object of type int[10], so when you use [0] on it, you get the first item, to which you then assign 5.
Pointers and arrays are purposefully defined to behave similiarly, and this is sometimes confusing (before you learn the various quirks), but also extremely versatile and useful.
I think you need to clarify your question. If you mean what is the value of printf("%i", arrPtr); then it will be the address of the array. If you mean printf("$i",(*arrPtr)[0] ); then we've got a more meaty question.
In C, arrays are pretty much just a convenience thing. All an “array” variable is is a pointer to the start of a block of data; just as an int [] equates to an int*, i.e. the location in memory of an int, an int [][] is a double pointer, an int**, which points to the location in memory of... another pointer, which in turn points to an actual particular int.