use of comparison operator in return - stl

I often use standard function in cpp
sort(A.begin(),A.end(),mycmp)
where,
bool mycmp(int a, int b)
return (a>b);
to sort the vector A. But, when the question asks customized sorting then I often seem confused and take many tries to fix my compare function.Can someone explain that what exactly return (a>b); means; I read some posts also but still cant figure out how a>b determine the order to descending.
posts:
configure the compare function to work with std::sort

You can think cmp as an implementation of less than operator <. I use lt other than < after.
To simplify the question, assume we have bubble sort:
typedef bool (*cmp)(int, int);
bool inc(int a, int b) { return a < b; }
bool dec(int a, int b) { return a > b; }
void bubble_sort(int a[], int n, cmp lt) {
for (int i=0; i<n; ++i) {
for (int j=0; j+1<n-i; ++j) {
if (lt(a[j+1], a[j])) swap(a[j], a[j+1]);
}
}
}
a[j], a[j+1] swaps only if lt(a[j+1],a[j]) is true.
if we pass inc as comparing function, then lt(a[j+1],a[j]) is true means a[j+1]<a[j], so it sorts increasingly.
If we pass dec as comparing function, then lt(a[j+1],a[j]) is true means a[j+1]>a[j], so it sorts decreasingly.
You can check sgi sort implementation for more details.
https://www.sgi.com/tech/stl/sort.html
https://www.sgi.com/tech/stl/download.html

Related

Pass by result-value and values of variables after method calls

First, I understand the code is set up for pass by value, I think the question is hypothetical.
Also, this is not homework, it is from a study guide for the final.
This may appear to be a duplicate but the other threads had a lot of emphasis on the code not being set up for anything other than pass by value, so the comments were not very relevant.
I would like to know if b) and c) generate the same answer.
I worked out the problem and have b) and c) with the same result but feel it may be a bit of a trick question, and easy to miss something. Here's the question and the code:
Consider the following program written in C syntax:
void swap( int a, int b)
{
int temp;
temp = a;
a = b;
b = temp;
}
void main()
{
int value = 2, list[5] = {1, 3, 5, 7, 9};
swap(value, list[0]);
swap(list[0], list[1]);
swap(value, list[value]);
}
For each of the following parameter-passing methods, what are all of the values of the variables value and list after each of the three calls to swap?
a) Passed by value
b) Passed by reference
c) Passed by value-result

How to access the first element in a pointer to an array using the C mysql API

I am trying to implement part of the mysql C API to retrieve one known field which will be a TINYINT value (boolean, either 1 or 0)
The mysql C API offers a type which is a pointer to an array MYSQL_ROW row; where the elements of the array are accessed via row[i] where i is the index. The elements are returned as strings whatever the data type in the database.
The field I am trying to access is obviously boolean and will be either 1 or 0 if the query finds the field. I want to do a logic check as to the value of this field but am struggling with types. I tried casting row[i] to an int but no good, I seem to get the pointer returned. I know that C doesn't have a native bool type but can be implemented. Any ideas there would be welcome.. here's my code, many thanks in advance - Paul
void process_result_set (MYSQL *conn, MYSQL_RES *res_set) {
MYSQL_ROW row;
unsigned int i;
unsigned int logonstatus;
while ((row = mysql_fetch_row (res_set)) != NULL)
{
for (i = 0; i < mysql_num_fields (res_set); i++)
{
logonstatus = (int)(row[i]); // gives an int return but appears to return a memory location i.e. a pointer
printf("The value of logon status is: %d\n", logonstatus);
printf("\nThe value of the logon field is:%s\n", row[i]);
}
}
if (mysql_errno (conn) != 0)
print_error (conn, "mysql_fetch_row() failed");
else
printf ("%lu rows returned\n",(unsigned long) mysql_num_rows (res_set));
}

controlling program flow without if-else / switch-case statements

Let's say I have 1000 functions defined as follows
void func dummy1(int a);
void func dummy2(int a, int aa);
void func dummy3(int a, int aa, int aaa);
.
.
.
void func dummy1000(int a, int aa, int aaa, ...);
I want to write a function that takes an integer, n (n < 1000) and calls nth dummy function (in case of 10, dummy10) with exactly n arguments(arguments can be any integer, let's say 0) as required. I know this can be achieved by writing a switch case statement with 1000 cases which is not plausible.
In my opinion, this cannot be achieved without recompilation at run time so languages like java, c, c++ will never let such a thing happen.
Hopefully, there is a way to do this. If so I am curious.
Note: This is not something that I will ever use, I asked question just because of my curiosity.
In modern functional languages, you can make a list of functions which take a list as an argument. This will arguably solve your problem, but it is also arguably cheating, as it is not quite the statically-typed implementation your question seems to imply. However, it is pretty much what dynamic languages such as Python, Ruby, or Perl do when using "manual" argument handling...
Anyway, the following is in Haskell: it supplies the nth function (from its first argument fs) a list of n copies of the second argument (x), and returns the result. Of course, you will need to put together the list of functions somehow, but unlike a switch statement this list will be reusable as a first-class argument.
selectApplyFunction :: [ [Int] -> a ] -> Int -> Int -> a
selectApplyFunction fs x n = (fs !! (n-1)) (replicate n x)
dummy1 [a] = 5 * a
dummy2 [a, b] = (a + 3) * b
dummy3 [a, b, c] = (a*b*c) / (a*b + b*c + c*a)
...
myFunctionList = [ dummy1, dummy2, dummy3, ... ]
-- (myfunction n) provides n copies of the number 42 to the n'th function
myFunction = selectApplyFunction myFunctionList 42
-- call the 666'th function with 666 copies of 42
result = myFunction 666
Of course, you will get an exception if n is greater than the number of functions, or if the function can't handle the list it is given. Note, too, that it is poor Haskell style -- mainly because of the way it abuses lists to (abusively) solve your problem...
No, you are incorrect. Most modern languages support some form of Reflection that will allow you to call a function by name and pass params to it.
You can create an array of functions in most of modern languages.
In pseudo code,
var dummy = new Array();
dummy[1] = function(int a);
dummy[2] = function(int a, int aa);
...
var result = dummy[whateveryoucall](1,2,3,...,whateveryoucall);
In functional languages you could do something like this, in strongly typed ones, like Haskell, the functions must have the same type, though:
funs = [reverse, tail, init] -- 3 functions of type [a]->[a]
run fn arg = (funs !! fn) $ args -- applies function at index fn to args
In object oriented languages, you can use function objects and reflection together to achieve exactly what you want. The problem of the variable number of arguments is solved by passing appropriate POJOs (recalling C stucts) to the function object.
interface Functor<A,B> {
public B compute(A input);
}
class SumInput {
private int x, y;
// getters and setters
}
class Sum implements Functor<SumInput, Integer> {
#Override
public Integer compute(SumInput input) {
return input.getX() + input.getY();
}
}
Now imagine you have a large number of these "functors". You gather them in a configuration file (maybe an XML file with metadata about each functor, usage scenarios, instructions, etc...) and return the list to the user.
The user picks one of them. By using reflection, you can see what is the required input and the expected output. The user fills in the input, and by using reflection you instantiate the functor class (newInstance()), call the compute() function and get the output.
When you add a new functor, you just have to change the list of the functors in the config file.

Allocating array of strings in cuda

Let us assume that we have the following strings that we need to store in a CUDA array.
"hi there"
"this is"
"who is"
How do we declare a array on the GPU to do this. I tried using C++ strings but it does not work.
Probably the best way to do this is to use structure that is similar to common compressed sparse matrix formats. Store the character data packed into a single piece of linear memory, then use a separate integer array to store the starting indices, and perhaps a third array to store the string lengths. The storage overhead of the latter might be more efficient that storing a string termination byte for every entry in the data and trying to parse for the terminator inside the GPU code.
So you might have something like this:
struct gpuStringArray {
unsigned int * pos;
unsigned int * length; // could be a smaller type if strings are short
char4 * data; // 32 bit data type will improve memory throughput, could be 8 bit
}
Note I used a char4 type for the string data; the vector type will give better memory throughput, but it will mean strings need to be aligned/suitably padded to 4 byte boundaries. That may or may not be a problem depending on what a typical real string looks like in your application. Also, the type of the (optional) length parameter should probably be chosen to reflect the maximum admissible string length. If you have a lot of very short strings, it might be worth using an 8 or 16 bit unsigned type for the lengths to save memory.
A really simplistic code to compare strings stored this way in the style of strcmp might look something like this:
__device__ __host__
int cmp4(const char4 & c1, const char4 & c2)
{
int result;
result = c1.x - c2.x; if (result !=0) return result;
result = c1.y - c2.y; if (result !=0) return result;
result = c1.z - c2.z; if (result !=0) return result;
result = c1.w - c2.w; if (result !=0) return result;
return 0;
}
__device__ __host__
int strncmp4(const char4 * s1, const char4 * s2, const unsigned int nwords)
{
for(unsigned int i=0; i<nwords; i++) {
int result = cmp4(s1[i], s2[i]);
if (result != 0) return result;
}
return 0;
}
__global__
void tkernel(const struct gpuStringArray a, const gpuStringArray b, int * result)
{
int idx = threadIdx.x + blockIdx.x * blockDim.x;
char4 * s1 = a.data + a.pos[idx];
char4 * s2 = b.data + b.pos[idx];
unsigned int slen = min(a.length[idx], b.length[idx]);
result[idx] = strncmp4(s1, s2, slen);
}
[disclaimer: never compiled, never tested, no warranty real or implied, use at your own risk]
There are some corner cases and assumptions in this which might catch you out depending on exactly what the real strings in your code look like, but I will leave those as an exercise to the reader to resolve. You should be able to adapt and expand this into whatever it is you are trying to do.
You have to use C-style character strings char *str. Searching for "CUDA string" on google would have given you this CUDA "Hello World" example as first hit: http://computer-graphics.se/hello-world-for-cuda.html
There you can see how to use char*-strings in CUDA. Be aware that standard C-functions like strcpy or strcmp are not available in CUDA!
If you want an array of strings, you just have to use char** (as in C/C++). As for strcmp and similar functions, it highly depends on what you want to do. CUDA is not really well suited for string operations, maybe it would help if you would provide a little more detail about what you want to do.

Thrust Complex Transform of 3 different size vectors

Hello I have this loop in C+, and I was trying to convert it to thrust but without getting the same results...
Any ideas?
thank you
C++ Code
for (i=0;i<n;i++)
for (j=0;j<n;j++)
values[i]=values[i]+(binv[i*n+j]*d[j]);
Thrust Code
thrust::fill(values.begin(), values.end(), 0);
thrust::transform(make_zip_iterator(make_tuple(
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))),
binv.begin(),
thrust::make_permutation_iterator(d.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n))))),
make_zip_iterator(make_tuple(
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))) + n,
binv.end(),
thrust::make_permutation_iterator(d.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n))) + n)),
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))),
function1()
);
Thrust Functions
struct IndexDivFunctor: thrust::unary_function<int, int>
{
int n;
IndexDivFunctor(int n_) : n(n_) {}
__host__ __device__
int operator()(int idx)
{
return idx / n;
}
};
struct IndexModFunctor: thrust::unary_function<int, int>
{
int n;
IndexModFunctor(int n_) : n(n_) {}
__host__ __device__
int operator()(int idx)
{
return idx % n;
}
};
struct function1
{
template <typename Tuple>
__host__ __device__
double operator()(Tuple v)
{
return thrust::get<0>(v) + thrust::get<1>(v) * thrust::get<2>(v);
}
};
To begin with, some general comments. Your loop
for (i=0;i<n;i++)
for (j=0;j<n;j++)
v[i]=v[i]+(B[i*n+j]*d[j]);
is the equivalent of the standard BLAS gemv operation
where the matrix is stored in row major order. The optimal way to do this on the device would be using CUBLAS, not something constructed out of thrust primitives.
Having said that, there is absolutely no way the thrust code you posted is ever going to do what your serial code does. The errors you are seeing are not as a result of floating point associativity. Fundamentally thrust::transform applies the functor supplied to every element of the input iterator and stores the result on the output iterator. To yield the same result as the loop you posted, the thrust::transform call would need to perform (n*n) operations of the fmad functor you posted. Clearly it does not. Further, there is no guarantee that thrust::transform would perform the summation/reduction operation in a fashion that would be safe from memory races.
The correct solution is probably going to be something like:
Use thrust::transform to compute the (n*n) products of the elements of B and d
Use thrust::reduce_by_key to reduce the products into partial sums, yielding Bd
Use thrust::transform to add the resulting matrix-vector product to v to yield the final result.
In code, firstly define a functor like this:
struct functor
{
template <typename Tuple>
__host__ __device__
double operator()(Tuple v)
{
return thrust::get<0>(v) * thrust::get<1>(v);
}
};
Then do the following to compute the matrix-vector multiplication
typedef thrust::device_vector<int> iVec;
typedef thrust::device_vector<double> dVec;
typedef thrust::counting_iterator<int> countIt;
typedef thrust::transform_iterator<IndexDivFunctor, countIt> columnIt;
typedef thrust::transform_iterator<IndexModFunctor, countIt> rowIt;
// Assuming the following allocations on the device
dVec B(n*n), v(n), d(n);
// transformation iterators mapping to vector rows and columns
columnIt cv_begin = thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n));
columnIt cv_end = cv_begin + (n*n);
rowIt rv_begin = thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n));
rowIt rv_end = rv_begin + (n*n);
dVec temp(n*n);
thrust::transform(make_zip_iterator(
make_tuple(
B.begin(),
thrust::make_permutation_iterator(d.begin(),rv_begin) ) ),
make_zip_iterator(
make_tuple(
B.end(),
thrust::make_permutation_iterator(d.end(),rv_end) ) ),
temp.begin(),
functor());
iVec outkey(n);
dVec Bd(n);
thrust::reduce_by_key(cv_begin, cv_end, temp.begin(), outkey.begin(), Bd.begin());
thrust::transform(v.begin(), v.end(), Bd.begin(), v.begin(), thrust::plus<double>());
Of course, this is a terribly inefficient way to do the computation compared to using a purpose designed matrix-vector multiplication code like dgemv from CUBLAS.
How much your results differ? Is it a completely different answer, or differs only on the last digits? Is the loop executed only once, or is it some kind of iterative process?
Floating point operations, especially those that repetedly add up or multiply certain values, are not associative, because of precision issues. Moreover, if you use fast-math optimisations, the operations may not be IEEE compilant.
For starters, check out this wikipedia section on floating-point numbers: http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems