How to create a shallow copy of a thrust device_vector - cuda

I have a device_vector H. I want to create a shallow copy of H using selected indices. I call it J. I want to modify elements of J thereby modifying corresponding elements of H.
My attempt below fails to modify the elements of H when I change elements of J. It appears that thrust allocates new memory to J, instead of using the memory allocated to H.
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sequence.h>
#include <thrust/execution_policy.h>
#include <iostream>
int main(void)
{
// H has storage for 4 integers
thrust::device_vector<int> H(10);
thrust::sequence(thrust::device, H.begin(), H.end(),1);
std::cout << "H="<< std::endl;
thrust::copy(H.begin(), H.end(), std::ostream_iterator<int>(std::cout, ","));
std::cout<< std::endl;
thrust::device_vector<int> J(H.begin()+3,H.begin()+9);
std::cout << "Before modifying J="<< std::endl;
thrust::copy(J.begin(), J.end(), std::ostream_iterator<int>(std::cout, ","));
std::cout<< std::endl;
thrust::sequence(thrust::device, J.begin(), J.end(),10);
std::cout << "after modifying J="<< std::endl;
thrust::copy(J.begin(), J.end(), std::ostream_iterator<int>(std::cout, ","));
std::cout<< std::endl;
std::cout << "After modifying H="<< std::endl;
thrust::copy(H.begin(), H.end(), std::ostream_iterator<int>(std::cout, ","));
std::cout<< std::endl;
return 0;
}

This:
thrust::device_vector<int> J(H.begin()+3,H.begin()+9);
is copy construction. There is no way to do what you want without resorting to pointers to the underlying storage, and even then you need to be careful that the source vector never falls out of scope
Two vectors cannot use the same underlying allocation. That is true for std::vector and it is true for thrust vectors.
You can do something similar to what you suggest with thrust::device_ptr:
$ cat t4.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sequence.h>
#include <thrust/execution_policy.h>
#include <thrust/device_ptr.h>
#include <iostream>
int main(void)
{
// H has storage for 4 integers
thrust::device_vector<int> H(10);
thrust::sequence(thrust::device, H.begin(), H.end(),1);
std::cout << "H="<< std::endl;
thrust::copy(H.begin(), H.end(), std::ostream_iterator<int>(std::cout, ","));
std::cout<< std::endl;
thrust::device_ptr<int> J(H.data()+3);
thrust::device_ptr<int> J_end = J+6;
std::cout << "Before modifying J="<< std::endl;
thrust::copy(J, J_end, std::ostream_iterator<int>(std::cout, ","));
std::cout<< std::endl;
thrust::sequence(thrust::device, J, J_end,10);
std::cout << "after modifying J="<< std::endl;
thrust::copy(J, J_end, std::ostream_iterator<int>(std::cout, ","));
std::cout<< std::endl;
std::cout << "After modifying H="<< std::endl;
thrust::copy(H.begin(), H.end(), std::ostream_iterator<int>(std::cout, ","));
std::cout<< std::endl;
return 0;
}
$ nvcc -o t4 t4.cu
$ ./t4
H=
1,2,3,4,5,6,7,8,9,10,
Before modifying J=
4,5,6,7,8,9,
after modifying J=
10,11,12,13,14,15,
After modifying H=
1,2,3,10,11,12,13,14,15,10,
$

I tried with iterators. It seems to work. The results are posted after the code. The memory location seems to be overwritten as well.
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sequence.h>
#include <thrust/execution_policy.h>
#include <thrust/device_ptr.h>
#include <iostream>
int main(void)
{
// H has storage for 4 integers
thrust::device_vector<int> H(10);
thrust::sequence(thrust::device, H.begin(), H.end(),1);
std::cout << "H="<< std::endl;
thrust::copy(H.begin(), H.end(), std::ostream_iterator<int>(std::cout, ","));
std::cout<< std::endl;
thrust::device_vector<int>::iterator J = H.begin()+3;
thrust::device_vector<int>::iterator J_end = J+6;
std::cout << "Before modifying J="<< std::endl;
thrust::copy(J, J_end, std::ostream_iterator<int>(std::cout, ","));
std::cout<< std::endl;
thrust::sequence(thrust::device, J, J_end,10);
std::cout << "after modifying J="<< std::endl;
thrust::copy(J, J_end, std::ostream_iterator<int>(std::cout, ","));
std::cout<< std::endl;
std::cout << "After modifying H="<< std::endl;
thrust::copy(H.begin(), H.end(), std::ostream_iterator<int>(std::cout, ","));
std::cout<< std::endl;
return 0;
}
Results:
./a.out
H=
1,2,3,4,5,6,7,8,9,10,
Before modifying J=
4,5,6,7,8,9,
after modifying J=
10,11,12,13,14,15,
After modifying H=
1,2,3,10,11,12,13,14,15,10,

I want to create a shallow copy of H using selected indices.
No, you don't want to create a shallow copy.
I call it J [and] to modify elements of J thereby modifying corresponding elements of H.
What you actually want to do - and ended up doing - is modifying a subrange of a container's range of elements. In C++ we do this using iterators; in many cases, these iterator are essentially just pointers.
Another way to do it - when the elements are contiguous in memory - is with an std::span - but that's a C++20 construct (and you may have some trouble with it due to lack of explicit CUDA support, i.e. potential lack of __device__ attributes; same goes for gsl::span in some implementations).

Related

Using thrust::reverse with make_zip_iterator+make_tuple

I'm experiencing odd behavior while using the thrust::reverse function on a zip_iterator constructed with a thrust::make_zip_iterator( thrust::make_tuple( )) type syntax (see the answer from JackOLantern here for a good example of that combination).
I wish to reverse some arbitrarily-indicated section of multiple device vectors as in the example code below. When I do the reversing in one go by tupling and zipping them together, unexpected behavior ensues. The first half of the range is correctly changed to an inversion of the second half of the range, however, the second half of the range is left unchanged.
I've been using other thrust functions in a similar fashion (sort_by_key, uniqe_by_key, adjacent_difference, etc.) without issue. Am I just executing this incorrectly or is there some reason that this will not work on a fundamental level? A thought I had is that perhaps the zip_iterator is not bidirectional as required for reverse. Is this true? I couldn't find documentation indicating as such.
A workaround is just to reverse the vector individually, which works as shown below. However, I suspect this will be less efficient. Note that in my actual use-case I have vectors with sizes of the order of 10,000 and I'm zipping up anywhere from 3-7 vectors for the operations.
#include <iostream>
#include <ostream>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/tuple.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/sequence.h>
#include <thrust/reverse.h>
int main(){
// initial host vectors
const int N=10;
thrust::host_vector<int> h1(N);
thrust::host_vector<float> h2(N);
// fill them
thrust::sequence( h1.begin(), h1.end(), 0);
thrust::sequence( h2.begin(), h2.end(), 10., 0.5);
// print initial contents
for (size_t i=0; i<N; i++){
std::cout << h1[i] << " " << h2[i] << std::endl;
}
// transfer to device
thrust::device_vector<int> d1 = h1;
thrust::device_vector<float> d2 = h2;
// what chunk to invert
int iStart = 3; int iEnd = 8;
// attempt to reverse middle via zip_iterators
thrust::reverse(
thrust::make_zip_iterator( thrust::make_tuple( d1.begin()+iStart, d2.begin()+iStart)),
thrust::make_zip_iterator( thrust::make_tuple( d1.begin()+iEnd, d2.begin()+iEnd))
);
// pull back and write out unexpected ordering
thrust::host_vector<int> temp1 = d1;
thrust::host_vector<float> temp2 = d2;
std::cout << "<==========>" << std::endl;
for (size_t i=0; i<N; i++){
std::cout << temp1[i] << " " << temp2[i] << std::endl;
}
// reset device variables
d1 = h1;
d2 = h2;
// reverse individually
thrust::reverse( d1.begin()+iStart, d1.begin()+iEnd);
thrust::reverse( d2.begin()+iStart, d2.begin()+iEnd);
// pull back and write out the desired ordering
temp1 = d1;
temp2 = d2;
std::cout << "<==========>" << std::endl;
for (size_t i=0; i<N; i++){
std::cout << temp1[i] << " " << temp2[i] << std::endl;
}
return 0;
}
Output
0 10
1 10.5
2 11
3 11.5
4 12
5 12.5
6 13
7 13.5
8 14
9 14.5
<==========>
0 10
1 10.5
2 11
7 13.5
6 13
5 12.5
6 13
7 13.5
8 14
9 14.5
<==========>
0 10
1 10.5
2 11
7 13.5
6 13
5 12.5
4 12
3 11.5
8 14
9 14.5
The information from Robert Crovella in the comments combined with the initially given workaround in the initial post appears to answer the question - thus, I will combine them here so the question can be marked as "answered." If others wish to post other solutions, I'm more than willing to look at them and move the "official answer" check mark. That being said...
The solution to the question has two parts:
If using an older version of CUDA and upgrading is an option: upgrade to the newest CUDA version and the operation should work (tested to work on CUDA 9.2.148 - thanks Robert!)
If unable to upgrade to a newer version of CUDA: apply reverse to the vectors individually to achieve the same result as given in the initial post. The code with only the working solution is copied below for completeness.
#include <iostream>
#include <ostream>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/tuple.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/sequence.h>
#include <thrust/reverse.h>
int main(){
// initial host vectors
const int N=10;
thrust::host_vector<int> h1(N);
thrust::host_vector<float> h2(N);
// fill them
thrust::sequence( h1.begin(), h1.end(), 0);
thrust::sequence( h2.begin(), h2.end(), 10., 0.5);
// print initial contents
for (size_t i=0; i<N; i++){
std::cout << h1[i] << " " << h2[i] << std::endl;
}
// transfer to device
thrust::device_vector<int> d1 = h1;
thrust::device_vector<float> d2 = h2;
// what chunk to invert
int iStart = 3; int iEnd = 8;
// reverse individually
thrust::reverse( d1.begin()+iStart, d1.begin()+iEnd);
thrust::reverse( d2.begin()+iStart, d2.begin()+iEnd);
// pull back and write out the desired ordering
temp1 = d1;
temp2 = d2;
std::cout << "<==========>" << std::endl;
for (size_t i=0; i<N; i++){
std::cout << temp1[i] << " " << temp2[i] << std::endl;
}
return 0;
}

Parameter Passing in C-like

int i, a[] = {0, 1, 2};
void foo(int x) {
i++;
x++;
cout << a[0] << " " << a[1] << " " << a[2];
}
void main() {
i = 0;
foo(a[i]);
}
So, the printing output will be:
By value-result: 0 - 1 - 2
By reference: 1 - 1 - 2
By name: 0 - 2 - 2
By constant reference: 0 - 1 - 2
Right ?
Beware, the cout stream and << operators are pure C++ primitives, you are NOT in C !
You also have to understand that side-effect inside a sequence of cout << foo << bar << fuzz; will produce a totally impredictable output depending on the choices made by the compiler (and NOT triggered by the language specification because the variables are supposed to stay constant along the evaluation of the expression).
To illustrate what I am saying, try to compile the following small program (example.cpp):
#include <iostream>
using namespace std;
int main ()
{
int i = 0;
cout << "i++: " << i++ << i++ << i++ << i++ << endl;
i = 0;
cout << "++i: " << ++i << ++i << ++i << ++i << endl;
cout << "The address of i is: " << &i << endl;
return 0;
}
When compiled with g++ -o example example.cpp, you should get something like:
i++: 3210
++i: 4321
The address of i is: 0xbfb1c44c
Then, try to compile it with g++ -O2 -o example example.cpp, you should get something like:
i++: 3210
++i: 4444
The address of i is: 0xbf9af0cc
In fact, the difference of these two execution comes from the fact that once you trigger the optimization in g++, the compiler assumes that you conform to the specification of C++ and that there will be no side-effect inside the cout << ... << endl; expression. So, it will use the last value of i all the time.

CUDA Thrust - Run Length Encoding with run index

I am trying to build a "run length encoder" which produces a report of occurrences of runs within a file using CUDA Thrust. I will use this "report" to perform the run length encoding step later.
e.g.
Input sequence:
inputSequence = [a, a, b, c, a, a, a];
Output sequences:
runChar = [a, a];
runCount = [2, 3];
runPosition = [0, 4];
The output desribes a run of 2 a's starting at position 0 and a run of 3 a's starting at the position 4.
The Thrust run length encoder example described below outputs two arrays - one for the output char and one for its length.
I would like to modify this so runs of less than 2 are excluded and it also outputs the position each run occurs.
// input data on the host
const char data[] = "aaabbbbbcddeeeeeeeeeff";
const size_t N = (sizeof(data) / sizeof(char)) - 1;
// copy input data to the device
thrust::device_vector<char> input(data, data + N);
// allocate storage for output data and run lengths
thrust::device_vector<char> output(N);
thrust::device_vector<int> lengths(N);
// print the initial data
std::cout << "input data:" << std::endl;
thrust::copy(input.begin(), input.end(), std::ostream_iterator<char>(std::cout, ""));
std::cout << std::endl << std::endl;
// compute run lengths
size_t num_runs = thrust::reduce_by_key
(input.begin(), input.end(), // input key sequence
thrust::constant_iterator<int>(1), // input value sequence
output.begin(), // output key sequence
lengths.begin() // output value sequence
).first - output.begin(); // compute the output size
// print the output
std::cout << "run-length encoded output:" << std::endl;
for(size_t i = 0; i < num_runs; i++)
std::cout << "(" << output[i] << "," << lengths[i] << ")";
std::cout << std::endl;
return 0;
One possible approach, building on what you have shown already:
Take your output lengths, and do an exclusive_scan on them. This creates a corresponding vector of the starting indexes of each run.
Use stream compaction (remove_if) to remove elements from all arrays (output, lengths, and indexes) whose corresponding length is 1. We do this in two steps, the first remove_if operation to clean up output and indexes, using lengths as the stencil, and the second operating directly on lengths. This can probably be significantly improved by operating on all 3 at once, which will make the output length calculation a bit more complicated. How you handle this exactly will depend on which sets of data you intend to retain.
Here is a fully worked example, extending your code:
$ cat t601.cu
#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/reduce.h>
#include <thrust/scan.h>
#include <thrust/iterator/constant_iterator.h>
#include <thrust/iterator/zip_iterator.h>
struct is_not_one{
template <typename T>
__host__ __device__
bool operator()(T data){
return data != 1;
}
};
int main(){
// input data on the host
const char data[] = "aaabbbbbcddeeeeeeeeeff";
const size_t N = (sizeof(data) / sizeof(char)) - 1;
// copy input data to the device
thrust::device_vector<char> input(data, data + N);
// allocate storage for output data and run lengths
thrust::device_vector<char> output(N);
thrust::device_vector<int> lengths(N);
// print the initial data
std::cout << "input data:" << std::endl;
thrust::copy(input.begin(), input.end(), std::ostream_iterator<char>(std::cout, ""));
std::cout << std::endl << std::endl;
// compute run lengths
size_t num_runs = thrust::reduce_by_key
(input.begin(), input.end(), // input key sequence
thrust::constant_iterator<int>(1), // input value sequence
output.begin(), // output key sequence
lengths.begin() // output value sequence
).first - output.begin(); // compute the output size
// print the output
std::cout << "run-length encoded output:" << std::endl;
for(size_t i = 0; i < num_runs; i++)
std::cout << "(" << output[i] << "," << lengths[i] << ")";
std::cout << std::endl;
thrust::device_vector<int> indexes(num_runs);
thrust::exclusive_scan(lengths.begin(), lengths.begin()+num_runs, indexes.begin());
thrust::device_vector<char> foutput(num_runs);
thrust::device_vector<int> findexes(num_runs);
thrust::device_vector<int> flengths(num_runs);
thrust::copy_if(thrust::make_zip_iterator(thrust::make_tuple(output.begin(), indexes.begin())), thrust::make_zip_iterator(thrust::make_tuple(output.begin()+num_runs, indexes.begin()+num_runs)), lengths.begin(), thrust::make_zip_iterator(thrust::make_tuple(foutput.begin(), findexes.begin())), is_not_one());
size_t fnum_runs = thrust::copy_if(lengths.begin(), lengths.begin()+num_runs, flengths.begin(), is_not_one()) - flengths.begin();
std::cout << "output: " << std::endl;
thrust::copy_n(foutput.begin(), fnum_runs, std::ostream_iterator<char>(std::cout, ","));
std::cout << std::endl << "lengths: " << std::endl;
thrust::copy_n(flengths.begin(), fnum_runs, std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl << "indexes: " << std::endl;
thrust::copy_n(findexes.begin(), fnum_runs, std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl;
return 0;
}
$ nvcc -arch=sm_20 -o t601 t601.cu
$ ./t601
input data:
aaabbbbbcddeeeeeeeeeff
run-length encoded output:
(a,3)(b,5)(c,1)(d,2)(e,9)(f,2)
output:
a,b,d,e,f,
lengths:
3,5,2,9,2,
indexes:
0,3,9,11,20,
$
I'm certain that this code can be improved upon, but my purpose is to show you one possible general approach.
In my opinion, for future reference, it's not very helpful for you to strip off the include headers from your sample code. I think it's better to provide a complete, compilable code. Not a big deal in this case.
Also note that there are thrust example codes for run length encoding and decoding.

get reverse_iterator from device_ptr in CUDA

For a device_vector, I can use its rbegin() method to get its reverse iterator. But how to construct a reverse iterator directly from device_ptr?
May be this can be achieved by constructing a device_vector with the device_ptr, the code is as follows:
thrust::device_ptr<int> ptr = get_ptr();
thrust::device_vector<int> tmpVector(ptr , ptr + N)
thrust::inclusive_scan_by_key(tmpVector.rbegin(), tmpVector.rend(), ......);
But I don't know if thrust::device_vector<int> tmpVector(ptr , ptr + N) will construct a new vector and copy the data from ptr or it just reserve a reference from ptr? The documentation of Thrust doesn't mension this.
Any ideas?
Providing an answer based on the comment by Jared, to get this off the unanswered list, and to preserve the question for future readers.
To make a reverse iterator from any kind of iterator, including thrust::device_ptr, use the thrust::make_reverse_iterator function.
Here is a simple example:
$ cat t615.cu
#include <thrust/device_vector.h>
#include <thrust/iterator/reverse_iterator.h>
#include <thrust/device_ptr.h>
#include <thrust/sequence.h>
#include <thrust/copy.h>
#include <iostream>
#define DSIZE 4
int main(){
int *data;
cudaMalloc(&data, DSIZE*sizeof(int));
thrust::device_ptr<int> my_data = thrust::device_pointer_cast<int>(data);
thrust::sequence(my_data, my_data+DSIZE);
thrust::copy_n(my_data, DSIZE, std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl;
typedef thrust::device_vector<int>::iterator Iterator;
thrust::reverse_iterator<Iterator> r_iter = make_reverse_iterator(my_data+DSIZE); // note that we point the iterator to the "end" of the device pointer area
thrust::copy_n(r_iter, DSIZE, std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl;
return 0;
}
$ nvcc -arch=sm_35 -o t615 t615.cu
$ ./t615
0,1,2,3,
3,2,1,0,
$
The creation of a reverse iterator does not create any "extra array".

qserialport does not send a char to arduino

I'm having a trouble in trying to send a char (i.e. "R") from my qt5 application on WIN7 to comport which is connected to an Arduino.
I intend to blink a led on Arduino and my arduino part works OK.
Here is my qt code:
#include <QTextStream>
#include <QCoreApplication>
#include <QtSerialPort/QSerialPortInfo>
#include <QSerialPort>
#include <iostream>
#include <QtCore>
QT_USE_NAMESPACE
using namespace std;
QSerialPort serial;
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
QTextStream out(stdout);
QList<QSerialPortInfo> serialPortInfoList = QSerialPortInfo::availablePorts();
out << QObject::tr("Total number of ports available: ") << serialPortInfoList.count() << endl;
foreach (const QSerialPortInfo &serialPortInfo, serialPortInfoList) {
out << endl
<< QObject::tr("Port: ") << serialPortInfo.portName() << endl
<< QObject::tr("Location: ") << serialPortInfo.systemLocation() << endl
<< QObject::tr("Description: ") << serialPortInfo.description() << endl
<< QObject::tr("Manufacturer: ") << serialPortInfo.manufacturer() << endl
<< QObject::tr("Vendor Identifier: ") << (serialPortInfo.hasVendorIdentifier() ? QByteArray::number(serialPortInfo.vendorIdentifier(), 16) : QByteArray()) << endl
<< QObject::tr("Product Identifier: ") << (serialPortInfo.hasProductIdentifier() ? QByteArray::number(serialPortInfo.productIdentifier(), 16) : QByteArray()) << endl
<< QObject::tr("Busy: ") << (serialPortInfo.isBusy() ? QObject::tr("Yes") : QObject::tr("No")) << endl;
}
serial.setPortName("COM5");
serial.open(QIODevice::ReadWrite);
serial.setBaudRate(QSerialPort::Baud9600);
serial.setDataBits(QSerialPort::Data8);
serial.setParity(QSerialPort::NoParity);
serial.setStopBits(QSerialPort::OneStop);
serial.setFlowControl(QSerialPort::NoFlowControl);
if(!serial.isOpen())
{
std::cout<<"port is not open"<<endl;
//serial.open(QIODevice::ReadWrite);
}
if(serial.isWritable()==true)
{
std::cout<<"port writable..."<<endl;
}
QByteArray data("R");
serial.write(data);
serial.flush();
std::cout<<"value sent!!! "<<std::endl;
serial.close();
return 0;
}
My source code consists of two parts,
1- serialportinfolist .... which works just fine
2- opening and writing data... I get no issue when running the code and the display shows the result as if nothing has gone wrong!
HOWEVER, the led on the board does not turn on when I run this code.
I test this with Arduino Serial Monitor and it turns on but cant turn on from Qt.
Are you waiting for cr lf (0x0D 0x0A) in your arduino code?
QByteArray ba;
ba.resize(3);
ba[0] = 0x5c; //'R'
ba[1] = 0x0d;
ba[2] = 0x0a;
Or append it to your string with
QByteArray data("R\r\n");
Or
QByteArray data("R\n");
I think I have found a partial solution but it is still incomplete.
When I press debug the first time, qt does not send any signal to Arduino, but when I press debug for the second time it behaves as expected.
So, is'nt it so weird that one has to run it twice to get it working???
Let me know if the problem exists somewhere else,
any help...