CUresult vs cudaError - how to get readable error descriptions? - cuda

I want to get a human readable description of result returned by cuInit. Every single page I've found recommnend using cudaGetErrorString for this purpose, but this results in error:
error: cannot convert CUresult {aka cudaError_enum} to cudaError_t
{aka cudaError} for argument 1 to const char* cudaGetErrorString(cudaError_t)
What's the difference between CUresult and cudaError and what function can I use to interpret the former?

The current driver API as time of writing does support translating an CUresult to its string representation and description.
From the Driver API doc:
CUresult cuGetErrorName ( CUresult error, const char** pStr )
Gets the string representation of an error code enum name.
CUresult cuGetErrorString ( CUresult error, const char** pStr )
Gets the string description of an error code.

LIBRARY RETURN TYPE VALUES
CUDA Driver CUresult enum cudaError_enum in cuda.h
CUDA Runtime cudaError_t enum cudaError in driver_types.h
The CUDA Runtime provides the function cudaGetErrorString to convert a cudaError enum value to a string.
The CUDA Driver API does not provide a function to return string.
The error names and values for CUresult and cudaError_t do not match.
NVIDIA does not currently supply a library for returning an error string for CUresult. It should only take a few minutes to convert the data in cuda.h into a function.

Related

Can I use vararg functions in CUDA device-side code?

I know that we can't write CUDA kernels with a variable number of parameters:
Is it possible to have a CUDA kernel with varying number of parameters?
(at least not in the C varargs sense; we can use C++ variadic templates.)
But what about non-kernel device-side code, i.e. __device__ functions? Can these be varargs functions?
Yes, we can write varargs device-side functions.
For example:
#include <stdio.h>
#include <stdarg.h>
__device__ void foo(const char* str, ...)
{
va_list ap;
va_start(ap, str);
int arg = va_arg(ap, int); // making an assumption here
printf("str is \"%s\", first va_list argument is %d\n", str, arg);
}
This compiles fine with NVCC - and works, provided you actually pass a null-terminated string and an int. I would not be surprised if CUDA's printf() itself were implemented this way.

dlsym function return type

i am loading libslabhidtouart.so file using dlopen() without any error but when i am calling a function using dlsym() ,I got no such process error
here is my code
int main(int argc, char **argv)
{
typedef unsigned int DWORD;
typedef unsigned short WORD;
typedef int HID_UART_STATUS;
void *handle;
HID_UART_STATUS (*cosine)( DWORD*,WORD,WORD);
//typedef void (*simple_demo_function)(void);
char *error;
handle = dlopen("libslabhidtouart.so.1.0", RTLD_NOW);
if (!handle) {
fprintf(stderr, " %s\n", dlerror());
getchar();
exit(EXIT_FAILURE);
}
dlerror(); /* Clear any existing error */
/* Writing: cosine = (double (*)(double)) dlsym(handle, "cos");
would seem more natural, but the C99 standard leaves
casting from "void *" to a function pointer undefined.
The assignment used below is the POSIX.1-2003 (Technical
Corrigendum 1) workaround; see the Rationale for the
POSIX specification of dlsym(). */
*(void **) (&cosine) = dlsym(handle, "HidUart_GetNumDevices");
if ((error = dlerror()) != NULL) {
fprintf(stderr, " %s\n", error);
getchar();
exit(EXIT_FAILURE);
}
getchar();
dlclose(handle);
exit(EXIT_SUCCESS);
return 0;
}
/**** return type of function HidUart_GetNumDevices is int,so is there any casting problem or my method signature is wrong or what else plz guide me,i am no good at c .
I also got strange "No such process" errors, yet already directly upon a dlopen() call within OpenSSL:
2675996:error:25066067:lib(37):func(102):reason(103):dso_dlfcn.c:187:filename(./engine_pkcs11.dll): No such process
It turned out that the referenced DLL (or .so file) exists, but has a dependency on some other library (cygp11-2.dll in my case) that could not be resolved in the context of the application's process (taking into account its PATH environment variable setting). In this case, use ldd (or cygcheck.exe if applicable) to see if all dependencies are correctly resolved.
So the "no such process" error message returned by dlerror() can be quite misleading.

casting to const void* arguments on typdef function to use qsort in C

I have made the following typedefs in my program (C):
typedef void* ListElement;
typedef int(*CompareListElements)(ListElement, ListElement);
i have made a function pointer in my code:
CompareListElements compareElement
Later in the code i wish to use qsort on an array of ListElements:
qsort(elementsArray,listGetSize(list),sizeof(list->dummyHead->next->element, compareElement);
However the compiler states: "passing argument 4 of 'qsort' from incompatible pointer type".
I fear that it is because the qsort requires a function in the format of int (const void*, const void*). when i supply int (void*, void*).
Is there a way of casting the arguments of compareElement to (const void*, const void*), while calling qsort or before, WITHOUT changing the typedef?
Thanks
Simply cast the pointer to the appropriate type.
typedef int(*ConstCompareListElements)(const void *, const void *);
qsort(elementsArray,listGetSize(list),sizeof(list->dummyHead->next->element,
(ConstCompareListElements)compareElement);

how can i execute a host class function in a CUDA kernel

I have a genetic algorithm and i'm traying to evaluate a population of chromosome on GPU :
class chromosome
{
int fitness;
int gene(int pos) { .... };
};
class eval
{
public :
__global__ doEval(Chromosome *population)
{
....
int jobid = population[tid].gene(X);
population[tid].fitness = Z;
....
}
};
int main()
{
Chromosome *dev_population;
Eval eval;
eval.doEval<<<1,N>>>(dev_population);
}
and i have this errors :
ga3.cu(121): warning: inline qualifier ignored for "global" function
ga3.cu(121): error: illegal combination of memory qualifiers
ga3.cu(323): error: a pointer to a bound function may only be used to call the function
ga3.cu(398): warning: nested comment is not allowed
where are the problems ?
i remove Eval class and left only doEval function , and make device host gene() , like this :
\__device\__ \__host\__ gene()
{....};
\__global\__ doEval(Chromosome *population)
{
....
int jobid = population[tid].gene(X);
population[tid].fitness = Z;
....
}
int main()
{
Chromosome *dev_population;
doEval<<<1,N>>>(dev_population);
}
but now i have have other errors , and it's not compile :
/usr/include/c++/4.6/iomanip(66): error: expected an expression
/usr/include/c++/4.6/iomanip(96): error: expected an expression
/usr/include/c++/4.6/iomanip(127): error: expected an expression
/usr/include/c++/4.6/iomanip(195): error: expected an expression
/usr/include/c++/4.6/iomanip(225): error: expected an expression
5 errors detected in the compilation of "/tmp/tmpxft_00006fe9_00000000-4_ga3.cpp1.ii".
There are two problems here, one soluble, the other one not.
It is illegal in CUDA for a __global__ function (ie. kernel) to be defined as a class member function. So doEval can never be defined as a member of eval. You are free to call a kernel in a structure or class member function, but a kernel cannot be a member function. You will have to redesign this class, there is no work around.
Any function called device code must be explicitly denoted as a device function and be instantiated and compiled for the device. This applies to both regular functions and class member functions. All functions are treated by nvcc as host functions unless identified as otherwise. You can, therefore, fix this error by doing something like the following:
class chromosome
{
int fitness;
__device__ __host__ int gene(int pos) { .... };
};
Note that every function called by gene must also have a valid device definition for the code to successfully compile.

Writing a simple thrust functor operating on some zipped arrays

I am trying to perform a thrust::reduce_by_key using zip and permutation iterators.
i.e. doing this on a zipped array of several 'virtual' permuted arrays.
I am having trouble in writing the syntax for the functor density_update.
But first the setup of the problem.
Here is my function call:
thrust::reduce_by_key( dflagt,
dflagtend,
thrust::make_zip_iterator(
thrust::make_tuple(
thrust::make_permutation_iterator(dmasst, dmapt),
thrust::make_permutation_iterator(dvelt, dmapt),
thrust::make_permutation_iterator(dmasst, dflagt),
thrust::make_permutation_iterator(dvelt, dflagt)
)
),
thrust::make_discard_iterator(),
danswert,
thrust::equal_to<int>(),
density_update()
)
dmapt, dflagt are of type thrust::device_ptr<int> and dvelt , dmasst and danst are of type
thrust::device_ptr<double>.
(They are thrust wrappers to my raw cuda arrays)
The arrays mapt and flagt are both index vectors from which I need to perform a gather operation from the arrays dmasst and dvelt.
After the reduction step I intend to write my data to the danswert array. Since multiple arrays are being used in the reduction, obviously I am using zip iterators.
My problem lies in writing the functor density_update which is binary operation.
struct density_update
{
typedef thrust::device_ptr<double> ElementIterator;
typedef thrust::device_ptr<int> IndexIterator;
typedef thrust::permutation_iterator<ElementIterator,IndexIterator> PIt;
typedef thrust::tuple< PIt , PIt , PIt, PIt> Tuple;
__host__ __device__
double operator()(const Tuple& x , const Tuple& y)
{
return thrust::get<0>(*x) * (thrust::get<1>(*x) - thrust::get<3>(*x)) + \
thrust::get<0>(*y) * (thrust::get<1>(*y) - thrust::get<3>(*y));
}
};
The value being returned is a double . Why the binary operation looks like the above functor is
not important. I just want to know how I would go about correcting the above syntactically.
As shown above the code is throwing a number of compilation errors. I am not sure where I have gone wrong.
I am using CUDA 4.0 on GTX 570 on Ubuntu 10.10
density_update should not receive tuples of iterators as parameters -- it needs tuples of the iterators' references.
In principle you could write density_update::operator() in terms of the particular reference type of the various iterators, but it's simpler to have the compiler infer the type of the parameters:
struct density_update
{
template<typename Tuple>
__host__ __device__
double operator()(const Tuple& x, const Tuple& y)
{
return thrust::get<0>(x) * (thrust::get<1>(x) - thrust::get<3>(x)) + \
thrust::get<0>(y) * (thrust::get<1>(y) - thrust::get<3>(y));
}
};