Does CUDA's thrust::inclusive_scan() have an 'init' parameter?

Does CUDA's thrust::inclusive_scan() have an 'init' parameter? - cuda

According to CUDA's Thrust library documentation, thrust::inclusive_scan() has 4 parameters:
OutputIterator thrust::inclusive_scan(InputIterator first,
InputIterator last,
OutputIterator result,
AssociativeOperator binary_op
)
Yet in the usage demonstration (in the same documentation), they pass 5 parameters. An extra 4th parameter is passed as an intial value for the scan (exactly like in thrust::exclusive_scan()):
int data[10] = {-5, 0, 2, -3, 2, 4, 0, -1, 2, 8};
thrust::maximum<int> binary_op;
thrust::inclusive_scan(data, data + 10, data, 1, binary_op); // in-place scan
Now, my code will only compile passing 4 parameters (passing 5 gives error no instance of overloaded function "thrust::inclusive_scan" matches the argument list), but I happen to need to initialise my rolling maximum just like in the example.
Can anyone clarify how to initialise the inclusive scan?
Many thanks.

It seems you don't understand what the inclusive scan operation is. There is no such thing as initialising an inclusive scan. By definition, the first value of an inclusive scan is always the first element of the sequence.
So for the sequence
[ 1, 2, 3, 4, 5, 6, 7 ]
the inclusive scan is
[ 1, 3, 6, 10, 15, 21, 28 ]
and the exclusive scan (initialised to zero) is
[ 0, 1, 3, 6, 10, 15, 21 ]

Related

How to make the sum of output to 1

My (PyTorch) sum of model’s output isn’t 1. And this is the structure of model.
LSTM(4433, 64)
LSTM(64, 64)
Linear(64, 4433)
Sigmoid()
And this is the predicted output of the model.
Input
[1, 0, 0, …, 0, 0]
Output
[.7842, .5, .5, …, .5, .5]
Do you know any function that can make its sum 1?

Sigmoid activation function maps every input to a value between [0, 1], without taking into account other elements in the input vector. However, Softmax does a similar transformation but the output vector sums 1.
TL;DR: use softmax instead of sigmoid.

How to write a function in Julia that accepts a variable number of arguments?

I am trying to write a function that should be allowed to take in a variable number of arguments. However, it's not too clear how I can do that in Julia.

In Julia, as with many other languages, there exists the ability to write Vararg Functions. These functions allow for a variable number of arguments to be passed in. Here a quick reference to the Julia docs on this idea and an example:
julia> varargs(a,b,c...) = (a,b,c)
varargs (generic function with 1 method)
julia> varargs(5, 10)
(5, 10, ())
julia> varargs(3,4,5)
(3, 4, (5,))
julia> varargs(10, 20, 30, 40, 50, 60, 70, 80)
(10, 20, (30, 40, 50, 60, 70, 80))
julia> d = (2,3,4,5,6,7,8,9)
(2, 3, 4, 5, 6, 7, 8, 9)
julia> varargs(1,2,d)
(1, 2, ((2, 3, 4, 5, 6, 7, 8, 9),))
To reiterate, the magic is happening here when we define the varargs function and write c.... This notation enables the whole concept of variable-sized arguments.

Multiple Return Calls In Function

I'm trying to wrap my head around how the return call works in a function. In the example below, I'm assigning 5 to number1 and 6 to number2. Then I return both below. When I print the output, I only get "5" as a result.
Can someone please explain why it's doing this? Why does it not print both numbers?
Thanks!
def numberoutput ():
number1 = 5
number2 = 6
return number1
return number2
print (numberoutput())

Here's a compact way to do the loops that you ask for. Your lists should not contain 1, though.
>>> list1 = list(range(2,11))
>>> list2 = list(range(2,11))
>>> primes = [a for a in list1 if all((a % b) != 0 for b in list2 if a != b) ]
>>> primes
[2, 3, 5, 7]
There are no duplicates in the results, because the comprehension just collects elements of list1. But there are plenty of ways to improve prime number detection, of course. This just shows you how to apply comprehensions to your algorithm.

Try this (change 10 by the number you want)
primes = []
for number in range(1,10):
is_prime = True
for div in range(2, number-1):
if number % div == 0:
is_prime = False
break
if is_prime:
primes.append(number)
Be careful though, this is not efficient at all. A little improvment is to change (number - 1) by int(sqrt(number)). But that's math rules. If you want the first 1000000 primes, that won't work. You wanna perhaps check more advanced methods to find primes if you need more.
Explanation:
you iterate first with all numbers between 1 and 10 - 1 = 9. This number is store into the variable "number". Then you iterate other the possible dividers. If the modulo for each pair of number and divider is 0, then it is not a prime number, you can mark it as not prime (is_prime = False) then quit your loop. At the end of the inner loop, you check the boolean is_prime and then add to the list if the boolean is set at True.

Here's a reasonably efficient way to find primes with a list comprehension, although it's not as efficient as the code by Robert William Hanks that I linked in the comments.
We treat 2 as a special case so we don't need to bother with any higher even numbers. And we only need to check factors less than the square root of the number we're testing.
from math import floor, sqrt
primes = [2] + [i for i in range(3, 100, 2)
if all(i % j != 0 for j in range(3, 1 + floor(sqrt(i)), 2))]
print(primes)
output
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
Here's an alternative (less efficient) version that iterates over your list1 (which isn't really a list, it's a range object).
list1 = range(2, 100)
primes = [i for i in list1 if not [j for j in list1 if j*j <= i and i % j == 0]]
print(primes)

"TypeError: need a summation variable" and variable scope in Sage program

I tried to make a summation list in Sage. The commands were:
sage: var('n')
sage: var('x')
sage: f = (2/n)*(sin(n*x)*(-1)^(n+1))
sage: funclist = [sum(f,n,1,20) for n in range(1,3)]
but it was error:
TypeError: need a summation variable
but when i tried some similar things on python shell. There was no any problem.
>>> x=1
>>> [pow(x,2) for x in range(1,9)]
[1, 4, 9, 16, 25, 36, 49, 64]
and return to Sage, there was no problem if i run program on Sage like this:
sage: var('n')
sage: var('x')
sage: sum(f,n,1,20)
-1/2*sin(4*x) + 2/3*sin(3*x) - sin(2*x) + 2*sin(x)
I don't know how Sage combine a 'sum' function into its program. And don't know how to solve this problem.

Sage shell is different from the Pytyhon shell, and the function sum is different too. In Sage, it tries to find a symbolic sum, that's why the second argument needs to be a variable. In your first code block, you are essentially trying to evaluate
[sum(f, 1, 1, 20), sum(f, 2, 1, 20)]
From the mathematical point of view, how do you sum over 1? That's why Sage gives you an error. Notice that in the last code block, when you use the variable n, Sage is able to calculate the sum.

Simple Linear Transform Algorithm not working

__global__
void transpose(double *input, double *output, int *width, int *height)
{
int threadidx = (blockIdx.x * blockDim.x) + threadIdx.x;
int row = threadidx / (*width);
int column = (threadidx+3) % (*height);
output[column * (*height) + row] = input[threadidx];
}
Above is my kernel for linear transformations. For an input matrix of [0, 1, 2, 3, 4, 5, 6, 7, 8] the output matrix should be [0, 3, 6, 1, 4, 7, 2, 5, 8], but when I run this code using the aforementioned example, the output is [0, 3, 6, 0, 0, 0, 0, 0, 0]. I've written a serial implementation of the algorithm in Python, and it works. The only thing I can think of is some sort of thread memory access problem. Any help? Thanks.

As the comments have already pointed out, your code happens to work correctly for the sample input case you have identified:
[0, 1, 2, 3, 4, 5, 6, 7, 8]
And if you are not getting the results you have indicated, then the error is outside of the code you have shown. However, it appears you are trying to transpose an array.
This code will not work for the general case (e.g. try a 2x2 array: [0, 1, 2, 3])
This line of code in particular isn't right, if your intention is to transpose an array:
int column = (threadidx+3) % (*height);
If you change it to:
int column = (threadidx) % (*width);
Your code will produce a correct transpose result for various matrix sizes.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Does CUDA's thrust::inclusive_scan() have an 'init' parameter? - cuda

Related

How to make the sum of output to 1

How to write a function in Julia that accepts a variable number of arguments?

Multiple Return Calls In Function

"TypeError: need a summation variable" and variable scope in Sage program

Simple Linear Transform Algorithm not working

Categories

Resources