How to return string from __global__ function to main function in C CUDA [duplicate] - cuda

I am trying to add 2 char arrays in cuda, but nothing is working.
I tried to use:
char temp[32];
strcpy(temp, my_array);
strcat(temp, my_array_2);
When I used this in kernel - I am getting error : calling a __host__ function("strcpy") from a __global__ function("Process") is not allowed
After this, I tried to use these functions in host, not in kernel - no error,but after addition I am getting strange symbols like ĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶ.
So, how I can add two ( or more ) char arrays in CUDA ?

So, how I can add two ( or more ) char arrays in CUDA ?
write your own functions:
__device__ char * my_strcpy(char *dest, const char *src){
int i = 0;
do {
dest[i] = src[i];}
while (src[i++] != 0);
return dest;
}
__device__ char * my_strcat(char *dest, const char *src){
int i = 0;
while (dest[i] != 0) i++;
my_strcpy(dest+i, src);
return dest;
}
And while we're at it, here is strcmp

As the error message explains, you are trying to call host functions ("CPU functions") from a global kernel ("GPU function"). Within a global kernel you only have access to functions provided by the CUDA runtime API, which doesn't include the C standard library (where strcpy and strcat are defined).
You have to create your own str* functions according to what you want to do. Do you want to concatenate an array of chars in parallel, or do it serially in each thread?

Related

Send arguments to a function with argv and argc

Can someone help me to understand how i need to send the parameters to the function "lora_rf_config" ? Thank you so much !
I try with:
char cfgred[7][10]={'lora_rf_config','915000000','10','0','1','8','14'};
lora_rf_config(7,&cfgred);
The function that im trying to use is:
static void lora_rf_config(int argc, char *argv[])
{
if (argc == 1) {
e_printf("OK%d,%d,%d,%d,%d,%d\r\n", g_lora_config.lorap2p_param.Frequency,
g_lora_config.lorap2p_param.Spreadfact,
g_lora_config.lorap2p_param.Bandwidth,
g_lora_config.lorap2p_param.Codingrate,
g_lora_config.lorap2p_param.Preamlen,
g_lora_config.lorap2p_param.Powerdbm );
return;
} else {
if (argc != 7) {
out_error(RAK_ARG_ERR);
return;
}
if (!(CHECK_P2P_FREQ(atoi(argv[1])) &&
CHECK_P2P_SF(atoi(argv[2])) &&
CHECK_P2P_BDW(atoi(argv[3])) &&
CHECK_P2P_CR(atoi(argv[4])) &&
CHECK_P2P_PREMLEN(atoi(argv[5])) &&
CHECK_P2P_PWR(atoi(argv[6])))) {
out_error(RAK_ARG_ERR);
return;
}
if (read_partition(PARTITION_0, (char *)&g_lora_config, sizeof(g_lora_config)) < 0) {
out_error(RAK_RD_CFG_ERR);
return;
}
g_lora_config.lorap2p_param.Frequency = atoi(argv[1]);
g_lora_config.lorap2p_param.Spreadfact = atoi(argv[2]);
g_lora_config.lorap2p_param.Bandwidth = atoi(argv[3]);
g_lora_config.lorap2p_param.Codingrate = atoi(argv[4]);
g_lora_config.lorap2p_param.Preamlen = atoi(argv[5]);
g_lora_config.lorap2p_param.Powerdbm = atoi(argv[6]);
write_partition(PARTITION_0, (char *)&g_lora_config, sizeof(g_lora_config));
e_printf("OK\r\n");
}
return;
}
The error that i got is:
..\..\..\src\application\RAK811\app.c(107): error: #26: too many characters in character constant
char cfgred[7][10]={'lora_rf_config','915000000','10','0','1','8','14'};
I dont have experience with this kind of arguments.
Thank you for your time.
lora_rf_config expects same arguments than main function: array of pointers to strings, and its length.
Strings in C are pointers to char, where the char buffer they point to has terminating NUL character (if NUL char is missing, then it's not a string, just a character array). In other words, there is no string type in C, but stringiness is determined by the actual data in the char array or buffer. Using "" string literal creates a string, IOW it adds that terminating NUL char in addition to what you write.
// cfgred is array([) of 7 pointers(*) to char.
// Note: string literals are read-only, so you must not modify these
// strings. If you want a modifiable string, this would be a bit more complex,
// but I think this is out of the scope of your question.
char *cfgred[7] = { "lora_rf_config" , "915000000", "10","0", "1", "8", "14"};
// you can get the number of elements in array by dividing its sizeof size (bytes)
// with the size of it's elements in bytes. Just make sure cfgred here is array...
// in the function it is pointer already (arrays get converted to pointers, so
// you can't do this inside the function, you have to do it where you still have
// the original array
int cfgred_len = sizeof cfgred / sizeof(cfgred[0]);
// when you pass array to function, it is automatically converted to pointer,
// so you must not use & when passing an array like this, otherwise types don't
// match
lora_rf_config(cfgred_len, cfgred);
As a side note, always turn on compiler warnings... They help you a lot, fix them. For gcc and clagn, use -Wall -Wextra, for Visual Studio use /W3 or prefereably /W4. And then fix any warnings you get, because they are probably something that doesn't do what you expect.
Your initialization is not done correctly, try changing
char cfgred[7][10]={'lora_rf_config','915000000','10','0','1','8','14'};
into
char cfgred[7][16]={"lora_rf_config","915000000","10","0","1","8","14"};

C MySQL Types Error

I'm trying to store results taken from a MySQL query into an array of structs. I can't seem to get the types to work though, and I've found the MySQL documentation difficult to sort through.
My struct is:
struct login_session
{
char* user[10];
time_t time;
int length;
};
And the loop where I'm trying to get the data is:
while ( (row = mysql_fetch_row(res)) != NULL ) {
strcpy(records[cnt].user, &row[0]);
cnt++;
}
No matter what I try though I constantly get the error:
test.c:45: warning: passing argument 1 of ‘strcpy’ from incompatible pointer type
/usr/include/string.h:128: note: expected ‘char * __restrict__’ but argument is of type ‘char **’
test.c:45: warning: passing argument 2 of ‘strcpy’ from incompatible pointer type
/usr/include/string.h:128: note: expected ‘const char * __restrict__’ but argument is of type ‘MYSQL_ROW’
Any pointers?
Multiple problems, all related to pointers and arrays, I recommend you do some reading.
First, char * user[10] is defining an array of 10 char * values, not an array of char, which is was I suspect you want. The warning even says as much, strcpy() expects a char *, the user field on its own is seen as a char **.
Second, you're one & away from what you want in the second argument.
Copied from mysql.h header:
typedef char **MYSQL_ROW; /* return data as array of strings */
A MYSQL_ROW is an array of char arrays. Using [] does a dereference, so you dereference down to a char * which is what strcpy() takes, but then you take the address of it using &.
Your code should look more like this:
struct login_session
{
char user[10];
time_t time;
int length;
};
while ( (row = mysql_fetch_row(res)) != NULL ) {
strcpy(records[cnt].user, row[0]);
cnt++;
}
I don't know what guarantees you have about the data coming from mysql, but if you can't be absolutely sure that the rows are <= 10 characters long and null ('\0') terminated, you should use strncpy() to avoid any possibility of overflowing the user array.

Allocating array of strings in cuda

Let us assume that we have the following strings that we need to store in a CUDA array.
"hi there"
"this is"
"who is"
How do we declare a array on the GPU to do this. I tried using C++ strings but it does not work.
Probably the best way to do this is to use structure that is similar to common compressed sparse matrix formats. Store the character data packed into a single piece of linear memory, then use a separate integer array to store the starting indices, and perhaps a third array to store the string lengths. The storage overhead of the latter might be more efficient that storing a string termination byte for every entry in the data and trying to parse for the terminator inside the GPU code.
So you might have something like this:
struct gpuStringArray {
unsigned int * pos;
unsigned int * length; // could be a smaller type if strings are short
char4 * data; // 32 bit data type will improve memory throughput, could be 8 bit
}
Note I used a char4 type for the string data; the vector type will give better memory throughput, but it will mean strings need to be aligned/suitably padded to 4 byte boundaries. That may or may not be a problem depending on what a typical real string looks like in your application. Also, the type of the (optional) length parameter should probably be chosen to reflect the maximum admissible string length. If you have a lot of very short strings, it might be worth using an 8 or 16 bit unsigned type for the lengths to save memory.
A really simplistic code to compare strings stored this way in the style of strcmp might look something like this:
__device__ __host__
int cmp4(const char4 & c1, const char4 & c2)
{
int result;
result = c1.x - c2.x; if (result !=0) return result;
result = c1.y - c2.y; if (result !=0) return result;
result = c1.z - c2.z; if (result !=0) return result;
result = c1.w - c2.w; if (result !=0) return result;
return 0;
}
__device__ __host__
int strncmp4(const char4 * s1, const char4 * s2, const unsigned int nwords)
{
for(unsigned int i=0; i<nwords; i++) {
int result = cmp4(s1[i], s2[i]);
if (result != 0) return result;
}
return 0;
}
__global__
void tkernel(const struct gpuStringArray a, const gpuStringArray b, int * result)
{
int idx = threadIdx.x + blockIdx.x * blockDim.x;
char4 * s1 = a.data + a.pos[idx];
char4 * s2 = b.data + b.pos[idx];
unsigned int slen = min(a.length[idx], b.length[idx]);
result[idx] = strncmp4(s1, s2, slen);
}
[disclaimer: never compiled, never tested, no warranty real or implied, use at your own risk]
There are some corner cases and assumptions in this which might catch you out depending on exactly what the real strings in your code look like, but I will leave those as an exercise to the reader to resolve. You should be able to adapt and expand this into whatever it is you are trying to do.
You have to use C-style character strings char *str. Searching for "CUDA string" on google would have given you this CUDA "Hello World" example as first hit: http://computer-graphics.se/hello-world-for-cuda.html
There you can see how to use char*-strings in CUDA. Be aware that standard C-functions like strcpy or strcmp are not available in CUDA!
If you want an array of strings, you just have to use char** (as in C/C++). As for strcmp and similar functions, it highly depends on what you want to do. CUDA is not really well suited for string operations, maybe it would help if you would provide a little more detail about what you want to do.

Thrust Complex Transform of 3 different size vectors

Hello I have this loop in C+, and I was trying to convert it to thrust but without getting the same results...
Any ideas?
thank you
C++ Code
for (i=0;i<n;i++)
for (j=0;j<n;j++)
values[i]=values[i]+(binv[i*n+j]*d[j]);
Thrust Code
thrust::fill(values.begin(), values.end(), 0);
thrust::transform(make_zip_iterator(make_tuple(
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))),
binv.begin(),
thrust::make_permutation_iterator(d.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n))))),
make_zip_iterator(make_tuple(
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))) + n,
binv.end(),
thrust::make_permutation_iterator(d.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n))) + n)),
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))),
function1()
);
Thrust Functions
struct IndexDivFunctor: thrust::unary_function<int, int>
{
int n;
IndexDivFunctor(int n_) : n(n_) {}
__host__ __device__
int operator()(int idx)
{
return idx / n;
}
};
struct IndexModFunctor: thrust::unary_function<int, int>
{
int n;
IndexModFunctor(int n_) : n(n_) {}
__host__ __device__
int operator()(int idx)
{
return idx % n;
}
};
struct function1
{
template <typename Tuple>
__host__ __device__
double operator()(Tuple v)
{
return thrust::get<0>(v) + thrust::get<1>(v) * thrust::get<2>(v);
}
};
To begin with, some general comments. Your loop
for (i=0;i<n;i++)
for (j=0;j<n;j++)
v[i]=v[i]+(B[i*n+j]*d[j]);
is the equivalent of the standard BLAS gemv operation
where the matrix is stored in row major order. The optimal way to do this on the device would be using CUBLAS, not something constructed out of thrust primitives.
Having said that, there is absolutely no way the thrust code you posted is ever going to do what your serial code does. The errors you are seeing are not as a result of floating point associativity. Fundamentally thrust::transform applies the functor supplied to every element of the input iterator and stores the result on the output iterator. To yield the same result as the loop you posted, the thrust::transform call would need to perform (n*n) operations of the fmad functor you posted. Clearly it does not. Further, there is no guarantee that thrust::transform would perform the summation/reduction operation in a fashion that would be safe from memory races.
The correct solution is probably going to be something like:
Use thrust::transform to compute the (n*n) products of the elements of B and d
Use thrust::reduce_by_key to reduce the products into partial sums, yielding Bd
Use thrust::transform to add the resulting matrix-vector product to v to yield the final result.
In code, firstly define a functor like this:
struct functor
{
template <typename Tuple>
__host__ __device__
double operator()(Tuple v)
{
return thrust::get<0>(v) * thrust::get<1>(v);
}
};
Then do the following to compute the matrix-vector multiplication
typedef thrust::device_vector<int> iVec;
typedef thrust::device_vector<double> dVec;
typedef thrust::counting_iterator<int> countIt;
typedef thrust::transform_iterator<IndexDivFunctor, countIt> columnIt;
typedef thrust::transform_iterator<IndexModFunctor, countIt> rowIt;
// Assuming the following allocations on the device
dVec B(n*n), v(n), d(n);
// transformation iterators mapping to vector rows and columns
columnIt cv_begin = thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n));
columnIt cv_end = cv_begin + (n*n);
rowIt rv_begin = thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n));
rowIt rv_end = rv_begin + (n*n);
dVec temp(n*n);
thrust::transform(make_zip_iterator(
make_tuple(
B.begin(),
thrust::make_permutation_iterator(d.begin(),rv_begin) ) ),
make_zip_iterator(
make_tuple(
B.end(),
thrust::make_permutation_iterator(d.end(),rv_end) ) ),
temp.begin(),
functor());
iVec outkey(n);
dVec Bd(n);
thrust::reduce_by_key(cv_begin, cv_end, temp.begin(), outkey.begin(), Bd.begin());
thrust::transform(v.begin(), v.end(), Bd.begin(), v.begin(), thrust::plus<double>());
Of course, this is a terribly inefficient way to do the computation compared to using a purpose designed matrix-vector multiplication code like dgemv from CUBLAS.
How much your results differ? Is it a completely different answer, or differs only on the last digits? Is the loop executed only once, or is it some kind of iterative process?
Floating point operations, especially those that repetedly add up or multiply certain values, are not associative, because of precision issues. Moreover, if you use fast-math optimisations, the operations may not be IEEE compilant.
For starters, check out this wikipedia section on floating-point numbers: http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems

Why is mysql_num_rows returning zero?

I am using C MySQL API
int numr=mysql_num_rows(res);
It always returns zero, but in my table there are 4 rows are there. However, I am getting the correct fields count.
what is the problem? Am i doing anything wrong?
Just a guess:
If you use mysql_use_result(), mysql_num_rows() does not return the correct value until all the rows in the result set have been retrieved.
(from the mysql manual)
The only reason to receive a zero from mysql_num_rows(<variable_name>) is because the query did not return anything.
You haven't posted the query here that you run and then assign the result to your res variable so we can't check it.
But try running that exact query in your DB locally through whatever DB management software you use and see if you are able to achieve any results.
If the query is working fine, then it must be the way you're running the query in C, otherwise your query is broken.
Maybe post up a bit more of your code from C where you make the query and then run it.
Thanks
If you just want to count the number of rows in a table, say
SELECT COUNT(*) FROM table_name
You will get back a single column in a single row containing the answer.
I too have this problem. But I noticed that mysql.h defines mysql_num_rows() to return a "my_ulonglong". Also in the header file you will see that there is a type def for my_ulonglong. On my system size of a my_ulonglong is 8 bytes. When we try to print this out or cast this to an int we probably get the first four bytes which are zero. However I printed out the eight bytes at the address of my_ulonglong variable and it prints all zeros. So I think this function just doesn't work.
`my_ulonglong numOfRows;
MYSQL *resource;
MYSQL *connection;
mysql_query(connection,"SELECT * FROM channels");
resource = mysql_use_result(connection);
numChannels = mysql_num_rows(resource);
printf("Writing numChannels: %lu\n", numChannels); // returns 0
printf("Size of numChannels is %d.\n", sizeof(numChannels)); // returns 8
// however
unsigned char * tempChar;
tempChar = (unsigned char *) &numChannels;
for (i=0; i< (int) sizeof(numChannels); ++i) {
printf("%02x", (unsigned int) *tempChar++);
}
printf("\n");
// returned 0000000000000000 so I think its a bug.
//mysql.h typedef for my_ulonglong and function mysql_num_rows()
#ifndef _global_h
#if defined(NO_CLIENT_LONG_LONG)
typedef unsigned long my_ulonglong;
#elif defined (__WIN__)
typedef unsigned __int64 my_ulonglong;
#else
typedef unsigned long long my_ulonglong;
#endif
#endif
my_ulonglong STDCALL mysql_num_rows(MYSQL_RES *res);
`