When using 3D Texture In CUDA, why we don't need to set Texture coordinate? - cuda

In OpenGL, after creating a 3D Texture, we always need to draw a proxy geometry such as GL_QUADS to contain the 3D Texture and set the texture coordinate in this function: glTexCoord3f.
However, when I use 3D texture in CUDA, I've never found a function like glTexCoord3f to point out the texture coordinate. Actually, we just use the CUDA array and then bind the array to the texture. After this, we can use texture fetch function tex3D to get the value.
Therefore, I'm very confused about that how can the tex3D function run correctly even though we've never set the texture coordinate before????
Thanks for answering.

The texture coordinates are the input arguments to the tex3D() fetch function.
In more detail, in OpenGL, when you call glTexCoord3f() it specifies the texture coordinates at the next issued vertex. The texture coordinates at each pixel of the rendered polygon are interpolated from the texture coordinates specified at the vertices of the polygon (typically triangles).
In CUDA, there is no concept of polygons, or interpolation of coordinates. Instead, each thread is responsible for computing (or loading) its texture coordinates and specifying them explicitly for each fetch. This is where tex3D() comes in.
Note that if you use GLSL pixel shaders to shade your polygons in OpenGL, you actually do something very similar to CUDA -- you explicitly call a texture fetch function, passing it coordinates. These coordinates can be computed arbitrarily. The difference is that you have the option of using input coordinates that are interpolated at each pixel. (And you can think of each pixel as a thread!)

Related

How to obtain the physical coordinates of the nodes in an ITK :: Mesh obtained from a 3D volume of ct images

I am using ITK library to get a mesh from a 3D image, the 3D image is a volume of slices. I get the mesh using itk::BinaryMask3DMeshSource. But I need to get its physical coordinate for each mesh node and I don't know how to do it.
I know how to obtain with ITK the physical coordinate of a voxel in a image using the TransformIndexToPhysicalPoint function. But when I have a mesh like this or an ITK::Mesh I don't know how to do it. I need to know if there is any relationship between the nodes of the mesh and the voxels in the image to find the physical coordinates.
Mesh points should already be in physical space, judging by both the code and the accompanying comment.

D3D11VA/CUDA interoperability issue with NV12 surfaces

I'm trying to build a transcoding pipeline in which video is decoded using D3D11VA, then brought to CUDA, optionally modified and/or analyzed using CUDA kernel and finally encoded using NVENC (using CUDA-NVENC interop); idea is to do everything on GPU without video frames ever hitting main memory. Some thing I was able to do:
D3D11VA decoding works (using Texture2D array with 20 surfaces in NV12 format bound to video decoder); decoder gives me an index into this array for every decoded frame
I can easily get the data out to main memory by using separate Texture2D of same dimensions and format as for decoding array but with D3D11_USAGE_STAGING and D3D11_CPU_ACCESS_READ; once decoder provided me with an index to decoder array, I just do CopySubresourceRegion from the decoder array slice to this staging texture, and then map the staging texture and read the data (I can successfully read data for Y and UV planes)
I can also register staging texture as CUDA resource (even though CUDA manual doesn't list NV12 as a supported pixel format); I can then map this resource, apply cudaGraphicsSubResourceGetMappedArray to the resource and copy data from received cudaArray into malloced CUDA memory.
So the issue is: I can only copy Y plane from cudaArray. I tried everything I could think of to get UV data from the texture somehow to no avail. Only "solution" which worked was to create yet another texture with 1.5x height of original texture in R8 format, to create two shader views into staging texture and to use a shader which just copies the data from both views into this helper texture; I could then map this texture to CUDA and copy all the data into CUDA memory.
I really dislike this solution - its ugly, bloated and involves extra useless data copy. Is there any other way to achieve this? A way to get CUDA to see all the data in NV12 texture, or alternatively to copy all the data out of NV12 texture into single R8 texture or a pair of R8/R8 or R8/R8G8 textures?

Using CUDA textures to store 2D surfaces

I am currently developing a 3D heat flow simulation on a 3D triangular mesh (basically any shape) with CUDA.
I was thinking of exploiting spatial locality by using CUDA textures or surfaces. Since I have a 3D mesh I thought that a 3D texture would be appropriate. After looking on different examples, however, I am not so sure anymore: 3D Textures are often used for volumes not for surfaces like in my case.
Can I use 3D textures for polygon meshes? Does it make sense? If not, are there other approaches or data structures in CUDA of use for my case?
Using 3D textures to store surface meshes is in fact a good idea. To better point this out, let me recall the clever approach in
Octree Textures on the GPU, GPU Gems 2
using 2D and 3D meshes to store an OctTree and to
Create an OctTree using a 3D texture;
Fastly traverse the OctTree by exploiting the filtering properties of the 3D texture;
Storing the surface polygons by a 2D texture.
OCTTREE TRAVERSAL BY THE FILTERING FEATURES OF A 3D TEXTURE
The tree is stored as an 8-bit RGBA 3D texture mapped in the unit cube [0,1]x[0,1]x[0,1], named as indirection pool. Each node of the tree is an indirection grid. Each child node is identified by the first three coordinates of the RGBA, while the fourth stores some other information, for example, if the node is a leaf or not or if it is empty.
Consider the QuadTree example reported in the paper (figure borrowed from the paper).
The A, B, C and D nodes (boxes) are stored as the texture elements (0,0), (1,0), (2,0) and (3,0), respectively, containing, for a QuadTree, 4 elements, each element storing a link to the child node. In this way, any access to the tree can be done by exploiting the hardware filtering features of the texture memory, a possibility that is illustrated in the following figure:
and by the following code (it is written in Cg, but I'm sure it can be easily ported to CUDA):
STORING THE TREE ELEMENTS BY A 2D TEXTURE
The elements of the tree can be stored by the classical approach exploiting the (u,v) coordinates, see UV mapping. The paper linked to above discusses a way to improve this method, but this is beyond the scope of this answer.

Bulk texture uploads

I have a specialised rendering app that needs to load up any number of jpegs from a pdf, and then write out the images into a rendered page inside a kernel. This is oversimplified, but the point is that I want to find a way to collectively send up 'n' images as textures, and then, within the kernel, to index into this collective of textures for tex2d() calls. Any ideas welcome for doing this gracefully.
As a side question, I haven't yet found a way to decode the jpeg images in the kernel, forcing me to decode on the CPU and then send up (slowly) a large bitmap. Can i improve this?
First: if texture upload performance is not a bottleneck, consider not bulk uploading. Here are some suggestions, each with different trade-offs.
For varying-sized textures, consider creating a texture atlas. This is a technique popular in game development that packs many textures into a single 2D image. This requires offsetting texture coordinates to the corner of the image in question, and it precludes the use of texture coordinate clamping and wrapping. So you would need to store the offset of the corner of each sub-texture instead of its ID. There are various tools available for creating texture atlases.
For constant-sized textures, or for the case where you don't mind the waste of varying-sized textures, you could consider using a layered texture. This is a texture with a number of independent layers that can be indexed at texture fetch time using a separate layer index. Quote from the link above:
A one-dimensional or two-dimensional layered texture (also know as texture array in Direct3D and array texture in OpenGL) is a texture made up of a sequence of layers, all of which are regular textures of same dimensionality, size, and data type.
A one-dimensional layered texture is addressed using an integer index and a floating-point texture coordinate; the index denotes a layer within the sequence and the coordinate addresses a texel within that layer. A two-dimensional layered texture is addressed using an integer index and two floating-point texture coordinates; the index denotes a layer within the sequence and the coordinates address a texel within that layer.
A layered texture can only be a CUDA array by calling cudaMalloc3DArray() with the cudaArrayLayered flag (and a height of zero for one-dimensional layered texture).
Layered textures are fetched using the device functions described in tex1Dlayered() and tex2Dlayered(). Texture filtering (see Texture Fetching) is done only within a layer, not across layers.
Layered textures are only supported on devices of compute capability 2.0 and higher.
You could consider a hybrid approach: sort the textures into same-sized groups and use a layered texture for each group. Or use a layered texture atlas, where the groups are packed such that each layer contains one or a few textures from each group to minimize waste.
Regarding your side question: a google search for "cuda jpeg decode" turns up a lot of results, including at least one open source project.

Get current buffer depth in fragment shader in agal

Is it possible in AGAL in fragment shader get current fragment depth if any?
No, I'm afraid there is no way to read from the depth buffer in AGAL.
You can however do a workaround by rendering a depthmap first into a texture, and then using it (which may be enought, depending on the effect you are trying to implement).
In fact, even rendering a depth map with good precision can be (a little) tricky, because there are no float32 textures in flash, so the depth as to be stored in a R8G8B8A8 texture (by packing and unpackings values on the GPU).