How to multiply two iterators and return the product to a thrust::reduce algorithm? - cuda

I have this reduce_by_key working fine, except I want to multiply dv_Vals with another vector (dv_Active) that contains 0 or 1 values so that the resultant product is a new iterator with either a value from dv_Vals or 0.
thrust::reduce_by_key(
//dv_Keys = { 0,0,0,1,1,1,2,2,2,3,3,3 }
dv_Keys.begin(),
dv_Keys.end(),
thrust::make_permutation_iterator(
dv_Vals.begin(), //I want to multiply each element of dv_Vals with each element of dv_Active here.
dv_Map.begin()
),
thrust::make_discard_iterator(),
dv_NewVals.begin()
);
My attempt at rewriting the reduce_by_key (replaced dv_Vals with make_transform_iterator) gives me errors:
thrust::reduce_by_key(
//dv_Keys = { 0,0,0,1,1,1,2,2,2,3,3,3 }
dv_Keys.begin(),
dv_Keys.end(),
thrust::make_permutation_iterator(
thrust::make_transform_iterator(
thrust::make_zip_iterator(
thrust::make_tuple(
dv_Vals.begin(),
dv_Active.begin()
)
),
_1 * _2 //My attempt to multiply the value in dv_Vals with dv_Active.
),
dv_Map.begin()
),
thrust::make_discard_iterator(),
dv_NewVals.begin()
);
What am I doing wrong? Or is there a better way to multiply two vectors within a reduce?

The problem lies in
thrust::make_transform_iterator(
thrust::make_zip_iterator(
thrust::make_tuple(
dv_Vals.begin(),
dv_Active.begin()
)
),
_1 * _2 // does **not** compile!
)
The zip_iterator gives you a tuple, i.e. a single argument. _1 * _2 generates a functor template with two arguments (a "binary functor"). This can be solved using the zip_function feature which wraps the functor with another one (a "unary functor") unpacking the tuple:
#include <thrust/zip_function.h>
/ ...
thrust::make_transform_iterator(
thrust::make_zip_iterator(
thrust::make_tuple(
dv_Vals.begin(),
dv_Active.begin()
)
),
thrust::make_zip_function(_1 * _2)
)
Alternatively you can use a device lambda or functor which uses thrust::get:
thrust::make_transform_iterator(
thrust::make_zip_iterator(
thrust::make_tuple(
dv_Vals.begin(),
dv_Active.begin()
)
),
[] __device__ (auto val_and_active) {
return thrust::get<0>(val_and_active) * thrust::get<1>(val_and_active);
}
)
nvcc needs the --extended-lambda flag for this to compile.
Something like thrust::get<0>(_1) * thrust::get<1>(_1) doesn't work at the time of writing, although I believe it could be implemented in Thrust by specializing thrust::get if someone asks for it (posts an issue). Although the maintainers might not see a point to it due to the existence of thrust::make_zip_function which lets you write nicer looking code (without overhead?).

Related

Recursive function that divides the number by 2 without using /

I need help.
I need to create a recursive function that receives a "n" number and returns "n/2" without dividing.
Edit :
This is what I wrote, but it works only if after dividing it will still be a decimal number and not a float, that's why I asked.
int recursive(int a, int b)
{
if ( a == (0.5 * b) )
return a;
return recursive(a-1, b);
}
Recursive solutions are much slower, but here's an implementation for positive integers using python 3
def divRecursive(a,b):
if a<b:
return 0
return 1+divRecursive(a-b,b)

controlling program flow without if-else / switch-case statements

Let's say I have 1000 functions defined as follows
void func dummy1(int a);
void func dummy2(int a, int aa);
void func dummy3(int a, int aa, int aaa);
.
.
.
void func dummy1000(int a, int aa, int aaa, ...);
I want to write a function that takes an integer, n (n < 1000) and calls nth dummy function (in case of 10, dummy10) with exactly n arguments(arguments can be any integer, let's say 0) as required. I know this can be achieved by writing a switch case statement with 1000 cases which is not plausible.
In my opinion, this cannot be achieved without recompilation at run time so languages like java, c, c++ will never let such a thing happen.
Hopefully, there is a way to do this. If so I am curious.
Note: This is not something that I will ever use, I asked question just because of my curiosity.
In modern functional languages, you can make a list of functions which take a list as an argument. This will arguably solve your problem, but it is also arguably cheating, as it is not quite the statically-typed implementation your question seems to imply. However, it is pretty much what dynamic languages such as Python, Ruby, or Perl do when using "manual" argument handling...
Anyway, the following is in Haskell: it supplies the nth function (from its first argument fs) a list of n copies of the second argument (x), and returns the result. Of course, you will need to put together the list of functions somehow, but unlike a switch statement this list will be reusable as a first-class argument.
selectApplyFunction :: [ [Int] -> a ] -> Int -> Int -> a
selectApplyFunction fs x n = (fs !! (n-1)) (replicate n x)
dummy1 [a] = 5 * a
dummy2 [a, b] = (a + 3) * b
dummy3 [a, b, c] = (a*b*c) / (a*b + b*c + c*a)
...
myFunctionList = [ dummy1, dummy2, dummy3, ... ]
-- (myfunction n) provides n copies of the number 42 to the n'th function
myFunction = selectApplyFunction myFunctionList 42
-- call the 666'th function with 666 copies of 42
result = myFunction 666
Of course, you will get an exception if n is greater than the number of functions, or if the function can't handle the list it is given. Note, too, that it is poor Haskell style -- mainly because of the way it abuses lists to (abusively) solve your problem...
No, you are incorrect. Most modern languages support some form of Reflection that will allow you to call a function by name and pass params to it.
You can create an array of functions in most of modern languages.
In pseudo code,
var dummy = new Array();
dummy[1] = function(int a);
dummy[2] = function(int a, int aa);
...
var result = dummy[whateveryoucall](1,2,3,...,whateveryoucall);
In functional languages you could do something like this, in strongly typed ones, like Haskell, the functions must have the same type, though:
funs = [reverse, tail, init] -- 3 functions of type [a]->[a]
run fn arg = (funs !! fn) $ args -- applies function at index fn to args
In object oriented languages, you can use function objects and reflection together to achieve exactly what you want. The problem of the variable number of arguments is solved by passing appropriate POJOs (recalling C stucts) to the function object.
interface Functor<A,B> {
public B compute(A input);
}
class SumInput {
private int x, y;
// getters and setters
}
class Sum implements Functor<SumInput, Integer> {
#Override
public Integer compute(SumInput input) {
return input.getX() + input.getY();
}
}
Now imagine you have a large number of these "functors". You gather them in a configuration file (maybe an XML file with metadata about each functor, usage scenarios, instructions, etc...) and return the list to the user.
The user picks one of them. By using reflection, you can see what is the required input and the expected output. The user fills in the input, and by using reflection you instantiate the functor class (newInstance()), call the compute() function and get the output.
When you add a new functor, you just have to change the list of the functors in the config file.

Passing multiple parameters

I'm trying to convert an existing C function to Erlang but am having a bit of trouble understanding how it's going to work. Let's say I have the following function in C:
void(int *x,int *y,int z,int a)
{
if(z<a)
{
*x = z + a;
*y = z - a;
}
}
How would I write something like that in Erlang as a function module? I understand that normally you write your function and it would return an operation. But what if I have to do calculations on multiple variables?
you may return a tuple like: {X, Y}
Here is a function that doubles two values given as input:
-module(my_module).
-export([doubleus/2]).
doubleus(X, Y) ->
{X*2, Y*2}.
In the shell:
1> c(my_module).
{ok, my_module}
2> {A, B} = my_module:doubleus(3,4).
{6, 8}
Operating with pointers - means that you may change state of some location in memory (for sequential flow it is not so bad).
But in concurrency environment this may indirectly cause unpredictable changes in every process, that pointed on that location (especially in race condition).
That's why there are so many concurrency oriented mechanisms in Java.
But this is not Erlang way. In general - there is no pointers, and no shared memory in Erlang.
You may store state, for example, in tuple { X, Y, Z, A }, and pass it from function to function. Sometimes your functions will return new state tuple.
In the context of the above, your function may look like:
-module( my_module ).
-export( [ f/1 ] ).
f( { _X, _Y, Z, A } ) when Z < A -> { Z + A, Z - A, Z, A };
%% othervise - don't change the state
f( State ) -> State.

Thrust Complex Transform of 3 different size vectors

Hello I have this loop in C+, and I was trying to convert it to thrust but without getting the same results...
Any ideas?
thank you
C++ Code
for (i=0;i<n;i++)
for (j=0;j<n;j++)
values[i]=values[i]+(binv[i*n+j]*d[j]);
Thrust Code
thrust::fill(values.begin(), values.end(), 0);
thrust::transform(make_zip_iterator(make_tuple(
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))),
binv.begin(),
thrust::make_permutation_iterator(d.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n))))),
make_zip_iterator(make_tuple(
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))) + n,
binv.end(),
thrust::make_permutation_iterator(d.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n))) + n)),
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))),
function1()
);
Thrust Functions
struct IndexDivFunctor: thrust::unary_function<int, int>
{
int n;
IndexDivFunctor(int n_) : n(n_) {}
__host__ __device__
int operator()(int idx)
{
return idx / n;
}
};
struct IndexModFunctor: thrust::unary_function<int, int>
{
int n;
IndexModFunctor(int n_) : n(n_) {}
__host__ __device__
int operator()(int idx)
{
return idx % n;
}
};
struct function1
{
template <typename Tuple>
__host__ __device__
double operator()(Tuple v)
{
return thrust::get<0>(v) + thrust::get<1>(v) * thrust::get<2>(v);
}
};
To begin with, some general comments. Your loop
for (i=0;i<n;i++)
for (j=0;j<n;j++)
v[i]=v[i]+(B[i*n+j]*d[j]);
is the equivalent of the standard BLAS gemv operation
where the matrix is stored in row major order. The optimal way to do this on the device would be using CUBLAS, not something constructed out of thrust primitives.
Having said that, there is absolutely no way the thrust code you posted is ever going to do what your serial code does. The errors you are seeing are not as a result of floating point associativity. Fundamentally thrust::transform applies the functor supplied to every element of the input iterator and stores the result on the output iterator. To yield the same result as the loop you posted, the thrust::transform call would need to perform (n*n) operations of the fmad functor you posted. Clearly it does not. Further, there is no guarantee that thrust::transform would perform the summation/reduction operation in a fashion that would be safe from memory races.
The correct solution is probably going to be something like:
Use thrust::transform to compute the (n*n) products of the elements of B and d
Use thrust::reduce_by_key to reduce the products into partial sums, yielding Bd
Use thrust::transform to add the resulting matrix-vector product to v to yield the final result.
In code, firstly define a functor like this:
struct functor
{
template <typename Tuple>
__host__ __device__
double operator()(Tuple v)
{
return thrust::get<0>(v) * thrust::get<1>(v);
}
};
Then do the following to compute the matrix-vector multiplication
typedef thrust::device_vector<int> iVec;
typedef thrust::device_vector<double> dVec;
typedef thrust::counting_iterator<int> countIt;
typedef thrust::transform_iterator<IndexDivFunctor, countIt> columnIt;
typedef thrust::transform_iterator<IndexModFunctor, countIt> rowIt;
// Assuming the following allocations on the device
dVec B(n*n), v(n), d(n);
// transformation iterators mapping to vector rows and columns
columnIt cv_begin = thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n));
columnIt cv_end = cv_begin + (n*n);
rowIt rv_begin = thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n));
rowIt rv_end = rv_begin + (n*n);
dVec temp(n*n);
thrust::transform(make_zip_iterator(
make_tuple(
B.begin(),
thrust::make_permutation_iterator(d.begin(),rv_begin) ) ),
make_zip_iterator(
make_tuple(
B.end(),
thrust::make_permutation_iterator(d.end(),rv_end) ) ),
temp.begin(),
functor());
iVec outkey(n);
dVec Bd(n);
thrust::reduce_by_key(cv_begin, cv_end, temp.begin(), outkey.begin(), Bd.begin());
thrust::transform(v.begin(), v.end(), Bd.begin(), v.begin(), thrust::plus<double>());
Of course, this is a terribly inefficient way to do the computation compared to using a purpose designed matrix-vector multiplication code like dgemv from CUBLAS.
How much your results differ? Is it a completely different answer, or differs only on the last digits? Is the loop executed only once, or is it some kind of iterative process?
Floating point operations, especially those that repetedly add up or multiply certain values, are not associative, because of precision issues. Moreover, if you use fast-math optimisations, the operations may not be IEEE compilant.
For starters, check out this wikipedia section on floating-point numbers: http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems

Is there a programming language that allows variable declaration at call site?

Update 2: examples removed, because they were misleading. The ones below are more relevant.
My question:
Is there a programming language with such a construct?
Update:
Now when I think about it, Prolog has something similar.
I even allows defining operations at definition line.
(forget about backtracking and relations - think about syntax)
I asked this question because I believe, it's a nice thing to have symmetry in a language.
Symmetry between "in" parameters and "out" parameters.
If returning values like that would be easy, we could drop explicit returning in designed language.
retruning pairs ... I think this is a hack. we do not need a data structure to pass multiple parameters to a function.
Update 2:
To give an example of syntax I'm looking for:
f (s, d&) = // & indicates 'out' variable
d = s+s.
main =
f("say twice", &twice) // & indicates 'out' variable declaration
print(twice)
main2 =
print (f("say twice", _))
Or in functional + prolog style
f $s (s+s). // use $ to mark that s will get it's value in other part of the code
main =
f "say twice" $twice // on call site the second parameter will get it's value from
print twice
main2 =
print (f "Say twice" $_) // anonymous variable
In a proposed language, there are no expressions, because all returns are through parameters. This would be cumbersome in situations where deep hierarchical function calls are natural. Lisp'ish example:
(let x (* (+ 1 2) (+ 3 4))) // equivalent to C x = ((1 + 2) * (3 + 4))
would need in the language names for all temporary variables:
+ 1 2 res1
+ 3 4 res2
* res1 res2 x
So I propose anonymous variables that turn a whole function call into value of this variable:
* (+ 1 2 _) (+ 3 4 _)
This is not very natural, because all the cultural baggage we have, but I want to throw away all preconceptions about syntax we currently have.
<?php
function f($param, &$ret) {
$ret = $param . $param;
}
f("say twice", $twice);
echo $twice;
?>
$twice is seen after the call to f(), and it has the expected value. If you remove the ampersand, there are errors. So it looks like PHP will declare the variable at the point of calling. I'm not convinced that buys you much, though, especially in PHP.
"Is there a programming language with such a construct?"
Your question is in fact a little unclear.
In a sense, any language that supports assignment to [the variable state associated with] a function argument, supports "such a construct".
C supports it because "void f (type *address)" allows modification of anything address points to. Java supports it because "void f (Object x)" allows any (state-modifying) invocation of some method of x. COBOL supports it because "PROCEDURE DIVISION USING X" can involve an X that holds a pointer/memory address, ultimately allowing to go change the state of the thing pointed to by that address.
From that perspective, I'd say almost every language known to mankind supports "such a construct", with the exception perhaps of languages such as Tutorial D, which claim to be "absolutely pointer-free".
I'm having a hard time understanding what you want. You want to put the return type on call signature? I'm sure someone could hack that together but is it really useful?
// fakelang example - use a ; to separate ins and outs
function f(int in1, int in2; int out1, int out2, int out3) {...}
// C++0x-ish
auto f(int in1, int in2) -> int o1, int o2, int o3 {...}
int a, b, c;
a, b, c = f(1, 2);
I get the feeling this would be implemented internally this way:
LEA EAX, c // push output parameter pointers first, in reverse order
PUSH EAX
LEA EAX, b
PUSH EAX
LEA EAX, a
PUSH EAX
PUSH 1 // push input parameters
PUSH 2
CALL f // Caller treat the outputs as references
ADD ESP,20 // clean the stack
For your first code snippet, I'm not aware of any such languages, and frankly I'm glad it is the case. Declaring a variable in the middle of expression like that, and then using it outside said expression, looks very wrong to me. If anything, I'd expect the scope of such variable to be restricted to the function call, but then of course it's quite pointless in the first place.
For the second one - multiple return values - pretty much any language with first-class tuple support has something close to that. E.g. Python:
def foo(x, y):
return (x + 1), (y + 1)
x, y = foo(1, 2)
Lua doesn't have first-class tuples (i.e. you can't bind a tuple value to a single variable - you always have to expand it, possibly discarding part of it), but it does have multiple return values, with essentially the same syntax:
function foo(x, y)
return (x + 1), (y + 1)
end
local x, y = foo(x, y)
F# has first-class tuples, and so everything said earlier about Python applies to it as well. But it can also simulate tuple returns for methods that were declared in C# or VB with out or ref arguments, which is probably the closest to what you describe - though it is still implicit (i.e. you don't specify the out-argument at all, even as _). Example:
// C# definition
int Foo(int x, int y, out int z)
{
z = y + 1;
return x + 1;
}
// explicit F# call
let mutable y = 0
let x = Foo(1, 2, byref y);
// tupled F# call
let x, y = Foo(1, 2)
Here is how you would do it in Perl:
sub f { $_[1] = $_[0] . $_[0] } #in perl all variables are passed by reference
f("say twice", my $twice);
# or f("...", our $twice) or f("...", $twice)
# the last case is only possible if you are not running with "use strict;"
print $twice;
[edit] Also, since you seem interested in minimal syntax:
sub f { $_[1] = $_[0] x 2 } # x is the repetition operator
f "say twice" => $twice; # => is a quoting comma, used here just for clarity
print $twice;
is perfectly valid perl. Here's an example of normal quoting comma usage:
("abc", 1, "d e f", 2) # is the same as
(abc => 1, "d e f" => 2) # the => only quotes perl /\w+/ strings
Also, on return values, unless exited with a "return" early, all perl subroutines automatically return the last line they execute, be it a single value, or a list. Lastly, take a look at perl6's feed operators, which you might find interesting.
[/edit]
I am not sure exactly what you are trying to achieve with the second example, but the concept of implicit variables exists in a few languages, in Perl, it is $_.
an example would be some of perl's builtins which look at $_ when they dont have an argument.
$string = "my string\n";
for ($string) { # loads "my string" into $_
chomp; # strips the last newline from $_
s/my/our/; # substitutes my for our in $_
print; # prints $_
}
without using $_, the above code would be:
chomp $string;
$string =~ s/my/our/;
print $string;
$_ is used in many cases in perl to avoid repeatedly passing temporary variables to functions
Not programming languages, but various process calculi have syntax for binding names at the receiver call sites in the scope of process expressions dependent on them. While Pict has such syntax, it doesn't actually make sense in the derived functional syntax that you're asking about.
You might have a look at Oz. In Oz you only have procedures and you assign values to variables instead of returning them.
It looks like this:
proc {Max X Y Z}
if X >= Y then Z = X else Z = Y end
end
There are functions (that return values) but this is only syntactic sugar.
Also, Concepts, Techniques, and Models of Computer Programming is a great SICP-like book that teaches programming by using Oz and the Mozart Programming System.
I don't think so. Most languages that do something like that use Tuples so that there can be multiple return values. Come to think of it, the C-style reference and output parameters are mostly hacks around not being about to return Tuples...
Somewhat confusing, but C++ is quite happy with declaring variables and passing them as out parameters in the same statement:
void foo ( int &x, int &y, int &z ) ;
int a,b,c = (foo(a,b,c),c);
But don't do that outside of obfuscation contests.
You might also want to look at pass by name semantics in Algol, which your fuller description smells vaguely similar to.