Convert raw data to a vector of complex numbers in Thrust - cuda

I have a pointer to raw data of complex numbers in interleaved format, i.e. real and imaginary parts stored alternately - R I R I R I ...
How do I convert this to a host (or device) vector of thrust::complex without incurring extra copy?
The following does not work -
double dos[8] = {9.3252,2.3742,7.2362,5.3562,2.3323,2.2322,7.2362,3.2352};
thrust::host_vector<thrust::complex<double > > comp(dos, dos+8);

Just cast. Something like this:
double dos[8] = {9.3252,2.3742,7.2362,5.3562,2.3323,2.2322,7.2362,3.2352};
typedef thrust::complex<double> cdub;
cdub* cdos = reinterpret_cast<cdub*>(&dos[0]);
thrust::host_vector<cdub> comp(cdos, cdos+4);
should work.

Related

How to use Critcl ByteArray?

I want to try out Critcl to enhance memory performance using a Z-order curve for a 2d-grid. What I need from Critcl is allocation, setter, getter and some size info. Reading about the Critcl ByteArray and examples does not make me confident on how to do it.
How do I create and return a ByteArray (i.e. Z-order curve)?
Any caveats I should know about when using ByteArray?
According to the documentation, you should be using the bytes type instead (when you get a pointer to a structure that has a len field with the number of bytes in it, and an s field that is the pointer to the actual read only block of bytes. (As a char * and not an unsigned char * for reasons I don't know. And why it isn't const is another mystery to me; there are cases where that's indeed true, but you need to look at the o field to figure that out.)
To return a byte array, you use the object (or object0) result type, and make the object with, for example, Tcl_NewByteArrayObj(), or Tcl_NewObj() and Tcl_SetByteArrayLength().
Here's an example (just the command definition) that does trivial byte reversing (since I don't understand Z-order curves at all):
critcl::cproc example {bytes dataPtr} object0 {
Tcl_Obj *result = Tcl_NewObj();
unsigned char *targetBytes = Tcl_SetByteArrayLength(result, dataPtr->len);
for (int i = 0, j = dataPtr->len - 1; j >= 0; i++, j--) {
targetBytes[i] = (unsigned byte) dataPtr->s[j];
}
return result;
}
Naturally, you'll want to read the Critcl usage guide when getting this to work, and if you're going to produce errors (by returning NULL), remember to set an error message in the interpreter. You can get access to that by using Tcl_Interp* interp as your first pseudo-argument to the command you create with critcl::cproc (it's documented, but easily missed).

How to store data of a file using thrust::host_vector or device_vector?

The format of data is something like this:
TGCCACAGGTTCCACACAACGGGACTTGGTTGAAATATTGAGATCCTTGGGGGTCTGTTAATCGGAGACAGTATCTCAACCGCAATAAACCC
GTTCACGGGCCTCACGCAACGGGGCCTGGCCTAGATATTGAGGCACCCAACAGCTCTTGGCCTGAGAGTGTTGTCTCGATCACGACGCCAGT
TGCCACAGGTTCCACACAACGGGACTTGGTTGAAATATTGAGATCCTTGGGGGTCTGTTAATCGAAGACAGTATCTCAACCGCAATAAACCT
TGCCACAGGTTCCACACAACGGGACTTGGTTGAAATATTGAGATCCTTGGGGGTCTGTTAATCGAAGACAGTATCTCAACCGCAATAAACCT
Each line contains one sequence, I want to make a pair of (key ,value), key is one sequence and value is 1. Then use reduce_by_key to count the number of each sequence.
But I found that thrust::host_vector can only store one sequence, if I push_back the 2nd sequence the program crashed.
Here is my code:
int main()
{
ifstream input_subset("subset.txt");
thrust::host_vector < string > h_output_subset;
string s;
while (getline(input_subset, s)) {
h_output_subset.push_back(s);
}
cout << h_output_subset.size() << endl;
return 0;
}
Is that possible to store all of data in a host_vector or a device_vector? Or is there any way to solve this problem?
The host_vector segfault was confirmed as a bug in thrust::uninitialised_copy and a patch has been applied to fix it.
The problem doing this with a device_vector is a genuine limitation of CUDA (no std::string support) and can't be avoided. An alternative would be to use a fixed length char[] array as a data member in a device_vector, or use a single large device_vector to hold all the string data, with a second device_vector holding the starting index of each sub-string within the character array.

Extracting integers from a query string

I am creating a program that can make mysql transactions through C and html.
I have this query string
query = -id=103&-id=101&-id=102&-act=Delete
Extracting "Delete" by sscanf isn't that hard, but I need help extracting the integers and putting them in an array of int id[]. The number of -id entries can vary depending on how many checkboxes were checked in the html form.
I've been searching for hours but haven't found any applicable solution; or I just did not understand them. Any ideas?
Thanks
You can use strstr and atoi to extract the numbers in a loop, like this:
char *query = "-id=103&-id=101&-id=102&-act=Delete";
char *ptr = strstr(query, "-id=");
if (ptr) {
ptr += 4;
int n = atoi(ptr);
printf("%d\n", n);
for (;;) {
ptr = strstr(ptr, "&-id=");
if (!ptr) break;
ptr += 5;
int n = atoi(ptr);
printf("%d\n", n);
}
}
Demo on ideone.
You want to use strtok or a better solution, to tokenize this string with & and = as tokens.
Take a look at cplusplus.com for more information and an example.
This is the output you would get from strtok
Output:
Splitting string "- This, a sample string." into tokens:
This
a
sample
string
Once you figure out how to split them, the next hurdle is to convert the numbers from strings to ints. For this you need to look at atoi or its safer more robust cousin strtol
Most likely I would write a small lexical scanner to tackle the task. Meaning, I would analyze the string one character at a time, according to a regular expression representing the set of possible inputs.

wrap a cusp sparse matrix variable in thrust pointer

I am using cusp for sparse matrix multiplication. From the resultant matrix i need the max value without copying the matrix from device memory to host memory. I am planning to wrap the resultant matrix in thrust device pointer and then use the function thrust::max_element to get max element. The matrices are in coo format. If C is resultant sparse matrix then
C.row_indices[] : contains row number
C.column_indices[]: contains column number
C.values[]: contain actual value
So basically i need highest value from C.values array.
Using
thrust::device_ptr<int> dev_ptr = C.values;
is giving error
error: no instance of constructor "thrust::device_ptr<T>::device_ptr [with T=int]" matches the argument list
argument types are: (cusp::array1d<float, cusp::host_memory>)
How can i wrap my resultant matrix in order to use it in thrust library ?
If my device matrix definition is like this:
cusp::coo_matrix<int, double, cusp::device_memory> A_d = A;
Then try this:
thrust::device_ptr<double> dev_ptr = &(A_d.values[0]);
thrust::device_ptr<double> max_ptr = thrust::max_element(dev_ptr, dev_ptr + 6);

Allocating array of strings in cuda

Let us assume that we have the following strings that we need to store in a CUDA array.
"hi there"
"this is"
"who is"
How do we declare a array on the GPU to do this. I tried using C++ strings but it does not work.
Probably the best way to do this is to use structure that is similar to common compressed sparse matrix formats. Store the character data packed into a single piece of linear memory, then use a separate integer array to store the starting indices, and perhaps a third array to store the string lengths. The storage overhead of the latter might be more efficient that storing a string termination byte for every entry in the data and trying to parse for the terminator inside the GPU code.
So you might have something like this:
struct gpuStringArray {
unsigned int * pos;
unsigned int * length; // could be a smaller type if strings are short
char4 * data; // 32 bit data type will improve memory throughput, could be 8 bit
}
Note I used a char4 type for the string data; the vector type will give better memory throughput, but it will mean strings need to be aligned/suitably padded to 4 byte boundaries. That may or may not be a problem depending on what a typical real string looks like in your application. Also, the type of the (optional) length parameter should probably be chosen to reflect the maximum admissible string length. If you have a lot of very short strings, it might be worth using an 8 or 16 bit unsigned type for the lengths to save memory.
A really simplistic code to compare strings stored this way in the style of strcmp might look something like this:
__device__ __host__
int cmp4(const char4 & c1, const char4 & c2)
{
int result;
result = c1.x - c2.x; if (result !=0) return result;
result = c1.y - c2.y; if (result !=0) return result;
result = c1.z - c2.z; if (result !=0) return result;
result = c1.w - c2.w; if (result !=0) return result;
return 0;
}
__device__ __host__
int strncmp4(const char4 * s1, const char4 * s2, const unsigned int nwords)
{
for(unsigned int i=0; i<nwords; i++) {
int result = cmp4(s1[i], s2[i]);
if (result != 0) return result;
}
return 0;
}
__global__
void tkernel(const struct gpuStringArray a, const gpuStringArray b, int * result)
{
int idx = threadIdx.x + blockIdx.x * blockDim.x;
char4 * s1 = a.data + a.pos[idx];
char4 * s2 = b.data + b.pos[idx];
unsigned int slen = min(a.length[idx], b.length[idx]);
result[idx] = strncmp4(s1, s2, slen);
}
[disclaimer: never compiled, never tested, no warranty real or implied, use at your own risk]
There are some corner cases and assumptions in this which might catch you out depending on exactly what the real strings in your code look like, but I will leave those as an exercise to the reader to resolve. You should be able to adapt and expand this into whatever it is you are trying to do.
You have to use C-style character strings char *str. Searching for "CUDA string" on google would have given you this CUDA "Hello World" example as first hit: http://computer-graphics.se/hello-world-for-cuda.html
There you can see how to use char*-strings in CUDA. Be aware that standard C-functions like strcpy or strcmp are not available in CUDA!
If you want an array of strings, you just have to use char** (as in C/C++). As for strcmp and similar functions, it highly depends on what you want to do. CUDA is not really well suited for string operations, maybe it would help if you would provide a little more detail about what you want to do.