It has to be a simple one, though I can't find an answer. I'm writing a program which has to calculate states of cellular automatons and in order to get a feeling how does CUDA works I tried to write a very simple program first. It takes a matrix, and every thread has to increment a value in its cell and in the cells which are above and below of this cell. So, if i give it the following matrix:
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
I expect to get the following result:
[2 2 2 2 2 2 2]
[3 3 3 3 3 3 3]
[3 3 3 3 3 3 3]
[3 3 3 3 3 3 3]
[3 3 3 3 3 3 3]
[3 3 3 3 3 3 3]
[2 2 2 2 2 2 2]
The first row has values of 2, because it doesn't have a row above which could increment values of first row one more time. And in a similar manner the last row has values of 2.
But I'm getting a matrix which looks like this:
[2 2 2 2 2 2 2]
[3 3 3 3 3 3 3]
[3 3 3 3 3 3 3]
[3 3 3 3 2 2 2]
[2 2 2 2 2 2 2]
[2 2 2 2 3 3 3]
[2 2 2 2 2 2 2]
And I can't understand why there are values of 2 in the 4th, 5th and ath 6th row - there have to be 3, not 2.
Here goes my code:
import numpy
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
w = 7
mod = SourceModule("""
__global__ void diffusion( int* result, int width, int height) {
int xIndex = blockDim.x * blockIdx.x + threadIdx.x;
int yIndex = blockDim.y * blockIdx.y + threadIdx.y;
int flatIndex = xIndex + width * yIndex;
int topIndex = xIndex + width * (yIndex - 1);
int bottomIndex = xIndex + width * (yIndex + 1);
int inc = 1;
result[flatIndex] += inc;
result[bottomIndex] += inc;
result[topIndex] += inc;
}
""")
diff_func = mod.get_function("diffusion")
def diffusion(res):
height, width = numpy.int32(len(res)), numpy.int32(len(res[0]))
diff_func(
cuda.InOut(res),
width,
height,
block=(w,w,1)
)
def run(res, step):
diffusion(res)
print res
res = numpy.array([[0 \
for _ in xrange(0, w)]\
for _ in xrange(0, w)], dtype='int32')
run(res, 0)
One more interesting thing: if I comment one of the following lines:
result[bottomIndex] += inc;
result[topIndex] += inc;
Everything works as expected and there aren't any unexpected values. It looks like in some cases CUDA can't work with values of three adjacent cells in one thread.
You have what is know as a memory race: multiple independent threads are attempting to update the same value in memory at the same time. The CUDA memory model doesn't define what happens when two threads try to update the same memory location simultaneously.
The solution is either to use atomic memory operations (see the CUDA programming guide for more information), or a different approach for updating adjacent cells (for example, colour the grid and update like coloured cells on separate passes through the grid).
Related
I am confused by the very last figure (figure 20) here:
https://towardsdatascience.com/a-comprehensible-explanation-of-the-dimensions-in-cnns-841dba49df5e
What I understood is that the output dimension of a convolution layer is calculated by
[(INPUT - KERNEL + 2xPADDING) / STRIDE] + 1
So in the first example where the input is 1 x 28 x 28, the kernel is 5 x 5 and valid-padding (which essentially means no padding at all as far as I know), stride is 1.
This would mean: [(28 - 5 + 0) / 1] + 1 = 24, so essentially 1 x 24 x 24 but instead the output is 16 x 13 x 13 - how can this be?
Let's say I've got a matrix with n columns, and I've got n different functions.
Is it possible to apply i-th function per each element in i-th column efficiently, that is without using loop?
For example for the following variables:
funs = #(x) [x, cos(x), x.^2]
A = [1 0 1
2 0 2
3 0 3
4 0 4] ;
I would like to obtain the following result:
B = [1 1 1
2 1 4
3 1 9
4 1 16] ;
without looping through columns...
I know how to find the number of triangles in an adjacency matrix.
tri = trace(A^3) / 6
But i require to find the nodes so that i can finally find the value of the edges from adjacency matrix since it's a sign graph. Is there already existing function which does that?
Taking the power of the adjacency matrix loses information about the intermediate nodes. Instead of a 2-dimensional matrix, we need 3 dimensions.
Given a graph:
and its adjacency matrix:
A =
0 0 0 0 1 1 0 1 0 0
0 0 0 1 0 1 0 0 0 0
0 0 0 1 0 0 0 1 0 1
0 1 1 0 1 0 1 0 0 0
1 0 0 1 0 0 1 0 0 0
1 1 0 0 0 0 0 1 1 0
0 0 0 1 1 0 0 0 1 0
1 0 1 0 0 1 0 0 0 0
0 0 0 0 0 1 1 0 0 0
0 0 1 0 0 0 0 0 0 0
Compute the 3d matrix T such that T(i,j,k) == 1 iff there is a path in the graph i=>j=>k=>i.
T = and(A, permute(A, [3 1 2]))
This is the equivalent of squaring the adjacency matrix, but keeping the path information. and is used here instead of multiplication in case A is a weighted adjacency matrix. If you sum along the 2nd dimension, you'll get A^2:
>> isequal(squeeze(sum(T,2)), A^2)
ans = 1
Now that we've got the paths of length 2, we just need to filter so we keep only the paths that return to their starting points.
T = and(T, permute(A.', [1 3 2])); % Transpose A in case graph is directed
Now, if T(i,j,k) == 1, then there is a triangle starting at node i, through nodes j and k and returning to node i. If you want to find all such paths:
[M,N,P] = ind2sub(size(T), find(T));
P = [M,N,P];
P will be a list of all triangular paths:
P =
8 6 1
6 8 1
7 5 4
5 7 4
7 4 5
4 7 5
8 1 6
1 8 6
5 4 7
4 5 7
6 1 8
1 6 8
In this case we get 12 paths. All paths in an undirected graph have 6 duplicates: one starting at each triangle point, times 2 directions. This gives the same results as trace:
>> trace(A^3)
ans = 12
If you want to remove the duplicates, the simplest way for triangles is to simply sort the vertex ordering and then take the unique rows of the list. This works for triangles only because all permutations of the nodes in the cycle are present. For longer cycles, this will not work.
P = unique(sort(P, 2), 'rows');
P =
1 6 8
4 5 7
Here is a solution using matrix multiplication:
C = (A * A.') & A;
[x, y] = find(tril(C));
n = numel(x);
D = sparse([x; y], [1:n 1:n].', 1, size(A,1), n);
[X, ~, V] = find(C * D);
tri = [x y X(V == 2)]
tri = unique(sort(tri, 2), 'rows');
First we need to know what are triangle nodes. Two nodes are triangle nodes if they have a common neighbor and both of them are neighbor of each other.
We take the definition to compute an adjacency matrix C that only contains triangle nodes and all other node are removed.
The expression A * A.' selects nodes that have common neighbors and the & A operator says that those nodes that have common neighbors should by neighbor of each other.
Now we can use [x, y] = find(tril(C)); to extract the first and the second points of each triangle as x and y respectively.
For the third node we need to find a node that has x and y as its neighbors. As before we can use the multiplication of boolean matrix trick to speed up the computation.
Finally the result tri has duplicates that should be remove using unique and sort.
What is the best method of detecting and dropping duplicate rows from an array in Julia?
x = Integer.(round.(10 .* rand(1000,4)))
# In R I would apply the duplicated function.
x = x[duplicated(x),:]
unique is what you are looking for: (this does not answer the question for the detection part.)
julia> x = Integer.(round.(10 .* rand(1000,4)))
1000×4 Array{Int64,2}:
7 3 10 1
7 4 8 9
7 7 3 0
3 4 8 2
⋮
julia> unique(x, 1)
973×4 Array{Int64,2}:
7 3 10 1
7 4 8 9
7 7 3 0
3 4 8 2
⋮
As for the detection part, a dirty fix would be editing this line:
#nref $N A d->d == dim ? sort!(uniquerows) : (indices(A, d))
to:
(#nref $N A d->d == dim ? sort!(uniquerows) : (indices(A, d))), uniquerows
Alternatively, you could define your own unique2 with abovementioned changes:
using Base.Cartesian
import Base.Prehashed
#generated function unique2(A::AbstractArray{T,N}, dim::Int) where {T,N}
......
end
julia> y, idx = unique2(x, 1)
julia> y
960×4 Array{Int64,2}:
8 3 1 5
8 3 1 6
1 1 0 1
8 10 1 10
9 1 8 7
⋮
julia> setdiff(1:1000, idx)
40-element Array{Int64,1}:
99
120
132
140
216
227
⋮
The benchmark on my machine is:
x = rand(1:10,1000,4) # 48 dups
#btime unique2($x, 1);
124.342 μs (31 allocations: 145.97 KiB)
#btime duplicated($x);
407.809 μs (9325 allocations: 394.78 KiB)
x = rand(1:4,1000,4) # 751 dups
#btime unique2($x, 1);
66.062 μs (25 allocations: 50.30 KiB)
#btime duplicated($x);
222.337 μs (4851 allocations: 237.88 KiB)
The result shows that the convoluted-metaprogramming-hashtable way in Base benefits a lot from lower memory allocation.
Julia v1.4 and above you would need to type unique(a, dims=1) where a is your N by 2 Array
julia> a=[2 2 ; 2 2; 1 2; 3 1]
4×2 Array{Int64,2}:
2 2
2 2
1 2
3 1
julia> unique(a,dims=1)
3×2 Array{Int64,2}:
2 2
1 2
3 1
You can also go with:
duplicated(x) = foldl(
(d,y)->(x[y,:] in d[1] ? (d[1],push!(d[2],y)) : (push!(d[1],x[y,:]),d[2])),
(Set(), Vector{Int}()),
1:size(x,1))[2]
This collects a set of seen rows, and outputs the indices of those already seen. This is essentially the minimal effort needed to get the result, so it should be fast.
julia> x = rand(1:2,5,2)
5×2 Array{Int64,2}:
2 1
1 2
1 2
1 1
1 1
julia> duplicated(x)
2-element Array{Int64,1}:
3
5
julia> x[duplicated(x),:]
2×2 Array{Int64,2}:
1 2
1 1
In Octave, I can do
octave:1> A = [1 2; 3 4]
A =
1 2
3 4
octave:2> A(A>1) -= 1
A =
1 1
2 3
but in Julia, the equivalent syntax does not work.
julia> A = [1 2; 3 4]
2x2 Array{Int64,2}:
1 2
3 4
julia> A[A>1] -= 1
ERROR: `isless` has no method matching isless(::Int64, ::Array{Int64,2})
in > at operators.jl:33
How do you conditionally assign values to certain array or matrix elements in Julia?
Your problem isn't with the assignment, per se, it's that A > 1 itself doesn't work. You can use the elementwise A .> 1 instead:
julia> A = [1 2; 3 4];
julia> A .> 1
2×2 BitArray{2}:
false true
true true
julia> A[A .> 1] .-= 1000;
julia> A
2×2 Array{Int64,2}:
1 -998
-997 -996
Update:
Note that in modern Julia (>= 0.7), we need to use . to say that we want to broadcast the action (here, subtracting by the scalar 1000) to match the size of the filtered target on the left. (At the time this question was originally asked, we needed the dot in A .> 1 but not in .-=.)
In Julia v1.0 you can use the replace! function instead of logical indexing, with considerable speedups:
julia> B = rand(0:20, 8, 2);
julia> #btime (A[A .> 10] .= 10) setup=(A=copy($B))
595.784 ns (11 allocations: 4.61 KiB)
julia> #btime replace!(x -> x>10 ? 10 : x, A) setup=(A=copy($B))
13.530 ns ns (0 allocations: 0 bytes)
For larger matrices, the difference hovers around 10x speedup.
The reason for the speedup is that the logical indexing solution relies on creating an intermediate array, while replace! avoids this.
A slightly terser way of writing it is
replace!(x -> min(x, 10), A)
There doesn't seem to be any speedup using min, though.
And here's another solution that is almost as fast:
A .= min.(A, 10)
and that also avoids allocations.
To make it work in Julia 1.0 one need to change = to .=. In other words:
julia> a = [1 2 3 4]
julia> a[a .> 1] .= 1
julia> a
1×4 Array{Int64,2}:
1 1 1 1
Otherwise you will get something like
ERROR: MethodError: no method matching setindex_shape_check(::Int64, ::Int64)