CUDA texture memory with floating point co-ordinates - cuda

Is it possible to assign value to a texture memory, for a non-integer co-ordinate?
i.e. assume we have a 1 Dimensional texture memory array. I understand we can allocate array elements at integer co-ordinates. We can then READ values at fractional co-ordinates, using linear interpolation.
My question is: does CUDA allow the programmer to WRITE values to fractional co-ordinates?
Thanks.

It is not possible to write to fractional coordinates. There would be nowhere for the hardware to store the new values. Even though you can read with linear interpolation, the values between which interpolation is being performed can only be stored at integer locations in memory.
One way to implement this might be to write a kernel that reads your initial array of values and creates a higher resolution array with interpolated values. Then, you write your new values in this new array at the integer locations that are closest to the ones you actually want to write to.

Related

Creating a 2D Array in MIPS with Single Point numbers

I am struggling to understand how to create a 2D array in MIPS using Single Point numbers.
I get how to create a 2D array by itself but can not figure out how to make it able to have single point numbers instead of just integers.
Single precision floating point numbers are 4 bytes long just like integer words, 4 bytes long.  An array, whether 1D or 2D or 3D.., whether characters/bytes, integers or single precision floating point, simply uses memory — as much memory as needed for all the elements in the array.
Memory doesn't care how you use the bytes — it is up to the program to use the array in a meaningful way.  If you want to copy the array to another memory location, the equivalent of memcpy or memmove is appropriate.  If you want to do floating point arithmetic on elements of the array, then you would use floating point instructions.
Basically you don't have to do anything except have some memory to have single precision floating point array.  You might want to initialize the array, though if it is declared as a global array (e.g. using .space 400), all the bytes will be zero in value, and that corresponds to the integer 0 as well as the floating point value 0.0.  Just like an integer array, you would want the array to start aligned on a 4 byte boundary, which .space does not do, but .align 2 in front of that will work.

Where does normalized texture memory start?

If I have a 200 size array in texture memory with linear interpolation enabled, to access the value of the first element I need to access value 0.5, not 0. Basically I need to access desiredValue+0.5. This ensures that the indexes cover [0-200] inside the image.
How is that with normalized texture memory? are 0-1 the corners of the array, or the element values? to access the first element, would I need to use 0+0.5/200?
As seen in the documentation about Texture Fetching and specifically seen in the images there:
[0-1] are the corners of the array, so indeed to access a specific array value in normalized units one would need to do (desiredValue+0.5)/totalSize

Allocating memory for multiple images in cuda to calculate their mean image

I want to calculate the mean image from the dataset of images(around 100). All the images are 2 dimensional. Can i go for cudaMalloc3D inbuilt function or is their any other way to allocate memory..
I often treat multidimensional array as 1D array in cuda. Let's say, you want to allocate 3D array of size (NxMxK). Then, with the cudaMalloc command, you can allocate 1D array a of size (N*M*K). In order to access element with indexes [i][j][k], you just call a[i+j*N+k*N*M] (assuming 0-based indexing, column-major ordering).
This is also a way to index threads in multidimensional blocks (you can have 1D,2D or 3D blocks):
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-hierarchy

Cuda linear interpolation using textures

I have a curve as follows:
float points[] = {1, 4, 6, 9, 14, 25, 69};
float images[] = {0.3, 0.4, 0.7, 0.9, 1, 2.5, 5.3};
In order to interpolate let's say f(3) I would use linear interpolation between 1 and 4
In order to interpolate let's say f(15) I would apply a binary search on the array of points and get the lowerBound which is 25 and consider interpolation in the interval [14,25] and so on..
I have found out this method is making my device function very slow. I've heard I can use texture memory and tex1D in order to do so ! is it possible even if points[] is not let's say uniform (incremented by constant step)
Any idea ?
It looks like this problem can be broken into two parts:
Use the points array to convert the x value in f(x) to a floating point index between 0 and 7 (requires binary search on points[])
Use that floating point index to get a linearly interpolated value from the images array
Cuda texture memory can make step 2 very fast. I am guessing, however, that most of the time in your kernel is spent on step 1, and I don't think texture memory can help you there.
If you aren't already taking advantage of shared memory, moving your arrays to shared memory will give you a much bigger speedup than using texture memory. There is 48k of shared memory on recent hardware, so if your arrays are less than 24k (6k elements) they should both fit in shared memory. Step 1 can benefit greatly from shared memory because it requires non-contiguous reads of points[], which is very very slow in global memory.
If your arrays don't fit in shared memory, you should break up your arrays into equally sized pieces with 6k elements each and assign each piece to a block. Have each block read through all of the points you are iterpolating, and have it ignore the point if it's not within the portion of the points[] array stored in its shared memory.

Can one index CUDA texture with integers

Just like topic says. Can one access CUDA texture using integer coordinates?
ex.
tex2D(myTex, 1, 1);
I'd like to store float values in texture, and use it as my framebuffer.
I will pass it to OpenGL than to render on a screen.
Is this addressing possible? I don't want to interpolate between pixels. I want value from exactly specified point.
Note: there isn't really interpolation going on when you use the 0.5 offset notation for multi-dimensional textures (the actual pixel values start at (0.5, 0.5)). If you're really worried, set round-to-nearest point rather than default of bilinear.
If you use 1D textures instead (when the underlying data is 2D), you may lose performance due to lack of data locality in the other dimension.
If you want to use the texture cache without using any of the texture-specific operations such as interpolation, you can use tex1Dfetch(). This lets you index with integers.
The size limit is 2^27 elements, so you will be able to access 512 MB with floats, or 1GB with int2 [which can also be used to retrieve doubles via __hiloint2double()]. Larger data can be accessed by mapping multiple textures on top of it that cover the data.
You will have to map any multi-dimensional array accesses to the one-dimensional array supported by tex1Dfetch(). I have always used simple C macros for that.