In my script I am creating an array, the number of dimensions of which are not known in advance. I want to get a certain sub-matrix from the array. Typically with an array of a known number of dimensions I would just write array(i1, i2, i3, ... iN, :, :). However, the indices are contained in an array I = [i1, i2, i3, ... iN). How do I accomplish this?
You can convert the index vector to a cell first:
A(num2cell(idx){:})
but there is probably a more elegant solution to that.
Related
I am trying to compute the variance of a 2D gpu_array. A reduction kernel sounds like a good idea:
http://documen.tician.de/pycuda/array.html
However, this documentation implies that reduction kernels just reduce 2 arrays into 1 array. How do I reduce a single 2D array into a single value?
I guess the first step is to define variance for this case. In matlab, the variance function on a 2D array returns a vector (1D-array) of values. But it sounds like you want a single-valued variance, so as others have already suggested, probably the first thing to do is to treat the 2D-array as 1D. In C we won't require any special steps to accomplish this. If you have a pointer to the array you can index into it as if it were a 1D array. I'm assuming you don't need help on how to handle a 2D array with a 1D index.
Now if it's the 1D variance you're after, I'm assuming a function like variance(x)=sum((x[i]-mean(x))^2) where the sum is over all i, is what you're after (based on my read of the wikipedia article ). We can break this down into 3 steps:
compute the mean (this is a classical reduction - one value is produced for the data set - sum all elements then divide by the number of elements)
compute the value (x[i]-mean)^2 for all i - this is an element by element operation producing an output data set equal in size (number of elements) to the input data set
compute the sum of the elements produced in step 2 - this is another classical reduction, as one value is produced for the entire data set.
Both steps 1 and 3 are classical reductions which are summing all elements of an array. Rather than cover that ground here, I'll point you to Mark Harris' excellent treatment of the topic as well as some CUDA sample code. For step 2, I'll bet you could figure out the kernel code on your own, but it would look something like this:
#include <math.h>
__global__ void var(float *input, float *output, unsigned N, float mean){
unsigned idx=threadIdx.x+(blockDim.x*blockIdx.x);
if (idx < N) output[idx] = __powf(input[idx]-mean, 2);
}
Note that you will probably want to combine the reductions and the above code into a single kernel.
I want to perform a sort_by_key where I have a single key-sequence
and multiple value sequences.
One usually performs this with
sort_by_key(
key,
key + N,
make_zip_iterator(
make_tuple(x1 , x2 , ...)
)
)
However I want to perform a sort with > 10 sequences each of length N. Thrust does not support
tuples of size >= 10. So is there a way around this ?
Of course one can keep a separate copy of the key vector and perform
sorts on bunches of 10 sequences. But I would like to do everything in a single call.
thrust::tuple is hardcoded to always have 10 elements, so there isn't a direct way to form a zip_iterator from more than ten individual iterators, and therefore no way of sorting more than 10 distinct iterators by key in a single fused operation (and implicitly no way of passing more than 10 iterators into a user functor as well).
If you really can't think of a useful way to combine some of the individual vectors into a single iterator (for example form a vector of tuple values), then one alternative might be to use permutation iterators. If you create an array from a counting iterator and sort that, so something like:
device_vector<int> indices(N);
copy(make_counting_iterator(0), make_counting_iterator(N), indices.begin());
sort_by_key(key, key+N, indices);
indices now holds ordered indices into the vectors you would otherwise have sorted. You can then create a permutation iterator which can be used to "gather" the input data by your key as part of subsequent algorithm calls. You can make as many permutation iterators as needed, and they can be permutations of zip iterators to providing different "views" of the 12 input iterators as you need them in subsequent code.
Actually you may use the simple "scatter" operation. Perform only one "thrust::sort_by_key" operation, then for each data vector apply "thrust::scatter" operation. The values will be distributed to according locations.
thrust::sequence(indices.begin(), indices.end());
thrust::sort_by_key(keyvals.begin(), keyvals.end(), indices.begin());
//now indices keep the locations of the sorted key values
foreach ( ... ) {
thrust::scatter(data.begin(), data.end(), indices.begin(), sorteddata.begin());
}
Gather and scatter operations are quite powerful and opens many opportunities.
Most of the programming books I have ever read, have the following line:
"X language does not support true multidimensional arrays, but you can simulate (approximate) them with arrays of arrays."
Since most of my experience has been with C-based languages, i.e. C++, Java, JavaScript, php, etc., I'm not sure of what a "true" multidimensional array is.
What is the definition of a true multidimensional array and what languages support it?
Also, please show an example of a true multidimensional array in code if possible.
C# supports both true multi-dimensional arrays, and "jagged" arrays (array of arrays) which can be a replacement.
// jagged array
string[][] jagged = new string[12][7];
// multidimensional array
string[,] multi = new string[12,7];
Jagged arrays are generally considered better since they can do everything a multi-dimensional array can do and more. In a jagged array you can have each sub-array be a different size, whereas you cannot do that in a multi-dimensional array. There is even a Code Analysis rule to this effect (http://msdn.microsoft.com/en-us/library/ms182277.aspx)
Java uses them too
int[][] a2 = new int[10][5];
Here's an interesting use of it that I've found
String[][] Data;
//Assign the values, do it either dynamically or statically
//For first fow
Data[0][0] = "S"; //lastname
Data[0][1] = "Pradeep"; //firstname
Data[0][2] = "Kolkata"; //location
//Second row
Data[1][0] = "Bhimani"; //lastname
Data[1][1] = "Shabbir"; //firstname
Data[1][2] = "Kolkata"; //location
//Add as many rows you want
//printing
System.out.print("Lastname\tFirstname\tLocation\n");
for(i=0;i<2;i++)
{
for(j=0;j<3;j++)
{
System.out.print(Data[i][j]+"\t");
}
//move to new line
System.out.print("\n");
}
Without going through the reams of literature on the Sun and Microsoft sites, this is what I remember from my C days. Hope this helps.
To make it simple, if we just think in 2 dimensions - Arrays can either be represented as a two-dimensional array and an array of pointers. In code this amounts to
int x[15][20];
int *y[15];
In this example, x[5][6] and b[5][6] are both valid syntactically and end up referring to a single int.
That being said, x is a true two-dimensional array: Once you create it , there will be 300 locations (that can contain int) that have been set aside, and you can use the well known subscript convention to access this rectangular (with 15 rows and 20 columns) array where you can get to x[row,col] by calculating (20 * row) + col.
However in case of y, while the structure is being defined, only 15 pointers are allocated, but not initialized. (Initialization will need to be done explicitly)
There are advantages and disadvantages of this approach (pointer array or "array of arrays" or jagged array as it is called):
Advantage:
The rows of this array can be of different lengths i.e. each element of y does not need to point to a twenty-element ROW; one element may point to a 2 elements, 2nd element may point to 3 elements, and 3rd to zero elements and so on.
Disadvantage:
However given a best case scenario, if each element of y does point to a twenty-element array, then there will be 300 integer locations set aside, plus ten cells for the pointers which is additional.
From a current example perspective, the C sharp examples given above (in one of the previous posts) should suffice.
Common Lisp supports both types of arrays.
The multidimensional array is called Array, while the "one-dimensional" one is called Vector.
Just like topic says. Can one access CUDA texture using integer coordinates?
ex.
tex2D(myTex, 1, 1);
I'd like to store float values in texture, and use it as my framebuffer.
I will pass it to OpenGL than to render on a screen.
Is this addressing possible? I don't want to interpolate between pixels. I want value from exactly specified point.
Note: there isn't really interpolation going on when you use the 0.5 offset notation for multi-dimensional textures (the actual pixel values start at (0.5, 0.5)). If you're really worried, set round-to-nearest point rather than default of bilinear.
If you use 1D textures instead (when the underlying data is 2D), you may lose performance due to lack of data locality in the other dimension.
If you want to use the texture cache without using any of the texture-specific operations such as interpolation, you can use tex1Dfetch(). This lets you index with integers.
The size limit is 2^27 elements, so you will be able to access 512 MB with floats, or 1GB with int2 [which can also be used to retrieve doubles via __hiloint2double()]. Larger data can be accessed by mapping multiple textures on top of it that cover the data.
You will have to map any multi-dimensional array accesses to the one-dimensional array supported by tex1Dfetch(). I have always used simple C macros for that.
If I have a 2d array such as
smallArray = [[1,0],[0,1]]
and I have a larger 2d array such as
largeArray = [[0,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0]]
What would be the most efficient way to "tile" the smaller array in the bigger one so that the bigger array would end up looking like
largeArray = [[1,0,1,0],[0,1,0,1],[1,0,1,0],[0,1,0,1]]
A complicated sequence of for loops?
In AS3,an array doesn't care what the types of its elements are, right? Why not just largeArray.push(smallArrayN). And if efficiency is a consideration, you should probably be using vectors, as they are like arrays only extremely faster.
Yes, it requires nested for-loop because you are merging small array elements to large array recursively. When each element is updated in array , we use loops or nested loops.