Simple Linear Transform Algorithm not working - cuda

__global__
void transpose(double *input, double *output, int *width, int *height)
{
int threadidx = (blockIdx.x * blockDim.x) + threadIdx.x;
int row = threadidx / (*width);
int column = (threadidx+3) % (*height);
output[column * (*height) + row] = input[threadidx];
}
Above is my kernel for linear transformations. For an input matrix of [0, 1, 2, 3, 4, 5, 6, 7, 8] the output matrix should be [0, 3, 6, 1, 4, 7, 2, 5, 8], but when I run this code using the aforementioned example, the output is [0, 3, 6, 0, 0, 0, 0, 0, 0]. I've written a serial implementation of the algorithm in Python, and it works. The only thing I can think of is some sort of thread memory access problem. Any help? Thanks.

As the comments have already pointed out, your code happens to work correctly for the sample input case you have identified:
[0, 1, 2, 3, 4, 5, 6, 7, 8]
And if you are not getting the results you have indicated, then the error is outside of the code you have shown. However, it appears you are trying to transpose an array.
This code will not work for the general case (e.g. try a 2x2 array: [0, 1, 2, 3])
This line of code in particular isn't right, if your intention is to transpose an array:
int column = (threadidx+3) % (*height);
If you change it to:
int column = (threadidx) % (*width);
Your code will produce a correct transpose result for various matrix sizes.

Related

How to make the sum of output to 1

My (PyTorch) sum of model’s output isn’t 1. And this is the structure of model.
LSTM(4433, 64)
LSTM(64, 64)
Linear(64, 4433)
Sigmoid()
And this is the predicted output of the model.
Input
[1, 0, 0, …, 0, 0]
Output
[.7842, .5, .5, …, .5, .5]
Do you know any function that can make its sum 1?
Sigmoid activation function maps every input to a value between [0, 1], without taking into account other elements in the input vector. However, Softmax does a similar transformation but the output vector sums 1.
TL;DR: use softmax instead of sigmoid.

How to write a function in Julia that accepts a variable number of arguments?

I am trying to write a function that should be allowed to take in a variable number of arguments. However, it's not too clear how I can do that in Julia.
In Julia, as with many other languages, there exists the ability to write Vararg Functions. These functions allow for a variable number of arguments to be passed in. Here a quick reference to the Julia docs on this idea and an example:
julia> varargs(a,b,c...) = (a,b,c)
varargs (generic function with 1 method)
julia> varargs(5, 10)
(5, 10, ())
julia> varargs(3,4,5)
(3, 4, (5,))
julia> varargs(10, 20, 30, 40, 50, 60, 70, 80)
(10, 20, (30, 40, 50, 60, 70, 80))
julia> d = (2,3,4,5,6,7,8,9)
(2, 3, 4, 5, 6, 7, 8, 9)
julia> varargs(1,2,d)
(1, 2, ((2, 3, 4, 5, 6, 7, 8, 9),))
To reiterate, the magic is happening here when we define the varargs function and write c.... This notation enables the whole concept of variable-sized arguments.

Does CUDA's thrust::inclusive_scan() have an 'init' parameter?

According to CUDA's Thrust library documentation, thrust::inclusive_scan() has 4 parameters:
OutputIterator thrust::inclusive_scan(InputIterator first,
InputIterator last,
OutputIterator result,
AssociativeOperator binary_op
)
Yet in the usage demonstration (in the same documentation), they pass 5 parameters. An extra 4th parameter is passed as an intial value for the scan (exactly like in thrust::exclusive_scan()):
int data[10] = {-5, 0, 2, -3, 2, 4, 0, -1, 2, 8};
thrust::maximum<int> binary_op;
thrust::inclusive_scan(data, data + 10, data, 1, binary_op); // in-place scan
Now, my code will only compile passing 4 parameters (passing 5 gives error no instance of overloaded function "thrust::inclusive_scan" matches the argument list), but I happen to need to initialise my rolling maximum just like in the example.
Can anyone clarify how to initialise the inclusive scan?
Many thanks.
It seems you don't understand what the inclusive scan operation is. There is no such thing as initialising an inclusive scan. By definition, the first value of an inclusive scan is always the first element of the sequence.
So for the sequence
[ 1, 2, 3, 4, 5, 6, 7 ]
the inclusive scan is
[ 1, 3, 6, 10, 15, 21, 28 ]
and the exclusive scan (initialised to zero) is
[ 0, 1, 3, 6, 10, 15, 21 ]

Octave Boxplot Axis

I would like to create multiple boxplot in one graph by using octave. I try to set the x-axis that associates each data.
Here is my code
x = [1, 2, 4];
y1 = [6, 2, 3];
y2 = [1, 7, 3];
y3 = [1, 9, 2];
boxplot ({y1,y2,y3});
set(gca,'XTickLabel',x);
refresh;
but the result looks strange. The axis appears three times.
I want to see x-axis 1 for data y1, 2 for data y2 and 4 for data y3
According to Octave Documentation, I could not find how we could set the axis. I found Matlab could do that :(
Please help me to solve this problem.
Before set(gca,'XTickLabel',x); you have to add set(gca, 'xtick', [1:3]);. This makes sure that each (and only each) box in the plot is assigned an x-axis number before these numbers are overridden by manual labels.
Here's the full code:
x = [1, 2, 4];
y1 = [6, 2, 3];
y2 = [1, 7, 3];
y3 = [1, 9, 2];
boxplot ({y1,y2,y3});
set(gca, 'xtick', [1:3]);
set(gca,'XTickLabel',x);
refresh;

Efficient way to get bit in binary expansion of [0,1] real in Mathematica?

As is well known, any real in [0,1] can be written as a binary expansion in base 1/2:
x = b1 * 1/2^1 + b2 * 1/2^2 + ...
I would like an efficient way to get bi for a given x and index i, and I don't think there's any built-in way to do that in Mathematica. IntegerDigits and RealDigits don't seem to be able to help, and none of the related functions are pertinent.
The obvious solution is to do the manual conversion, but I was hoping to avoid that. Am I missing something?
EDIT: for future reference, what I was looking for can be done this way,
BinaryExpansionBit[p, j] := RealDigits[p, 2, 1, -j][[1]][[1]]
where
BinaryExpansionBit[x, i]
gives the bi I was talking about.
I don't see what's wrong with RealDigits.
rd=RealDigits[0.1,2]
gives a nice binary expansion:
(* out:
{{1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0,
0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1,
1, 0, 0, 1, 1, 0, 1, 0}, -3}
*)
testing:
rd[[1]].Table[1/2^(n - rd[[2]]), {n, Length[rd[[1]]]}]
(* out: 3602879701896397 / 36028797018963968, which is 0.1*)
The second element of RealDigit's output tells you location of the first element with respect to the decimal point. So, for a real r, 0<r<1 your bi = rd[[1,i-rd[[2]]].
It depends on what you mean by "efficient". Mathematica can easily convert to binary, as this Wolfram Alpha example shows.
Otherwise what you are looking for is the parity of the integer part of x * 2^i.