Bulk texture uploads - cuda

I have a specialised rendering app that needs to load up any number of jpegs from a pdf, and then write out the images into a rendered page inside a kernel. This is oversimplified, but the point is that I want to find a way to collectively send up 'n' images as textures, and then, within the kernel, to index into this collective of textures for tex2d() calls. Any ideas welcome for doing this gracefully.
As a side question, I haven't yet found a way to decode the jpeg images in the kernel, forcing me to decode on the CPU and then send up (slowly) a large bitmap. Can i improve this?

First: if texture upload performance is not a bottleneck, consider not bulk uploading. Here are some suggestions, each with different trade-offs.
For varying-sized textures, consider creating a texture atlas. This is a technique popular in game development that packs many textures into a single 2D image. This requires offsetting texture coordinates to the corner of the image in question, and it precludes the use of texture coordinate clamping and wrapping. So you would need to store the offset of the corner of each sub-texture instead of its ID. There are various tools available for creating texture atlases.
For constant-sized textures, or for the case where you don't mind the waste of varying-sized textures, you could consider using a layered texture. This is a texture with a number of independent layers that can be indexed at texture fetch time using a separate layer index. Quote from the link above:
A one-dimensional or two-dimensional layered texture (also know as texture array in Direct3D and array texture in OpenGL) is a texture made up of a sequence of layers, all of which are regular textures of same dimensionality, size, and data type.
A one-dimensional layered texture is addressed using an integer index and a floating-point texture coordinate; the index denotes a layer within the sequence and the coordinate addresses a texel within that layer. A two-dimensional layered texture is addressed using an integer index and two floating-point texture coordinates; the index denotes a layer within the sequence and the coordinates address a texel within that layer.
A layered texture can only be a CUDA array by calling cudaMalloc3DArray() with the cudaArrayLayered flag (and a height of zero for one-dimensional layered texture).
Layered textures are fetched using the device functions described in tex1Dlayered() and tex2Dlayered(). Texture filtering (see Texture Fetching) is done only within a layer, not across layers.
Layered textures are only supported on devices of compute capability 2.0 and higher.
You could consider a hybrid approach: sort the textures into same-sized groups and use a layered texture for each group. Or use a layered texture atlas, where the groups are packed such that each layer contains one or a few textures from each group to minimize waste.
Regarding your side question: a google search for "cuda jpeg decode" turns up a lot of results, including at least one open source project.

Related

TextureAtlas - how does it affect on RAM usage in open world game?

I plan to use TextureAtlas in my open world 2d game.
I need to load textures dynamically (because there are thousands of them, so I cannot load all at once). I plan to load textures that are needed at specific moment in gameplay. Also for some reasons I cannot have many texture atlases per map location.
Generally, I must avoid situation that I read all textures (entire atlas), because RAM usage will be too large. How the TextureAtlas work? Is is possible to keep the atlas open during entire game, but read from the atlas (to the RAM) only chosen textures when needed without worrying about RAM usage?
Best regards.
You cannot load portions of a TextureAtlas. It is all or nothing. You will have to use multiple atlases and carefully plan what to put in each atlas such that you don’t have to load all of them simultaneously.
A Texture represents a single image loaded into GPU memory for use in OpenGL. A page of a TextureAtlas corresponds to a single Texture. Typically, you will have multiple TextureRegions on each page (Texture) of a TextureRegion. If you don’t, there’s no point in using TextureAtlas. The point of TextureAtlas is to avoid SpriteBatch having to flush vertex data and swap OpenGL textures for every sprite you draw. If you draw consecutive TextureRegions from the same Texture (or atlas page), they are batched together into a single mesh and OpenGL texture so it performs better.
Libgdx doesn’t support loading individual pages of a TextureAtlas though. That would make it too complicated to use. So, if you are doing a lot of loading and unloading, I recommend avoiding multipage atlases. Use AssetManager to very easily load and unload what you need.

Writing an HD Game in Staged 3D

I'm trying to write a 2D Game using Starling, and most textures are up to 4k, and I'm getting a resource limit exception.
Is there a way or an algorithm, idea, to get HD Textures and use them with that limited texture resources in stage3d ? Compression ?
Stage3D (which Starling uses underneath) has a max texture size of 2048x2048. If your textures are larger than this, then you are going to have to split them and stitch them together at runtime.
If you find yourself running out of memory (rather than the dimensional size limit) then you can look into compression using ATF textures.

Using CUDA textures to store 2D surfaces

I am currently developing a 3D heat flow simulation on a 3D triangular mesh (basically any shape) with CUDA.
I was thinking of exploiting spatial locality by using CUDA textures or surfaces. Since I have a 3D mesh I thought that a 3D texture would be appropriate. After looking on different examples, however, I am not so sure anymore: 3D Textures are often used for volumes not for surfaces like in my case.
Can I use 3D textures for polygon meshes? Does it make sense? If not, are there other approaches or data structures in CUDA of use for my case?
Using 3D textures to store surface meshes is in fact a good idea. To better point this out, let me recall the clever approach in
Octree Textures on the GPU, GPU Gems 2
using 2D and 3D meshes to store an OctTree and to
Create an OctTree using a 3D texture;
Fastly traverse the OctTree by exploiting the filtering properties of the 3D texture;
Storing the surface polygons by a 2D texture.
OCTTREE TRAVERSAL BY THE FILTERING FEATURES OF A 3D TEXTURE
The tree is stored as an 8-bit RGBA 3D texture mapped in the unit cube [0,1]x[0,1]x[0,1], named as indirection pool. Each node of the tree is an indirection grid. Each child node is identified by the first three coordinates of the RGBA, while the fourth stores some other information, for example, if the node is a leaf or not or if it is empty.
Consider the QuadTree example reported in the paper (figure borrowed from the paper).
The A, B, C and D nodes (boxes) are stored as the texture elements (0,0), (1,0), (2,0) and (3,0), respectively, containing, for a QuadTree, 4 elements, each element storing a link to the child node. In this way, any access to the tree can be done by exploiting the hardware filtering features of the texture memory, a possibility that is illustrated in the following figure:
and by the following code (it is written in Cg, but I'm sure it can be easily ported to CUDA):
STORING THE TREE ELEMENTS BY A 2D TEXTURE
The elements of the tree can be stored by the classical approach exploiting the (u,v) coordinates, see UV mapping. The paper linked to above discusses a way to improve this method, but this is beyond the scope of this answer.

Get current buffer depth in fragment shader in agal

Is it possible in AGAL in fragment shader get current fragment depth if any?
No, I'm afraid there is no way to read from the depth buffer in AGAL.
You can however do a workaround by rendering a depthmap first into a texture, and then using it (which may be enought, depending on the effect you are trying to implement).
In fact, even rendering a depth map with good precision can be (a little) tricky, because there are no float32 textures in flash, so the depth as to be stored in a R8G8B8A8 texture (by packing and unpackings values on the GPU).

Cuda 2d or 3d arrays

I am dealing with a set of (largish 2k x 2k) images
I need to do per-pixel operations down a stack of a few sequential images.
Are there any opinions on using a single 2D large texture + calculating offsets vs using 3D arrays?
It seems that 3D arrays are a bit 'out of the mainstream' in the CUDA api, the allocation transfer functions are very different from the same 2D functions.
There doesn't seem to be any good documentation on the higher level "how and why" of CUDA rather than the specific calls
There is the best practices guide but it doesn't address this
I would recommend you to read the book "Cuda by Example". It goes through all these things that aren't documented as well and it'll explain the "how and why".
I think what you should use if you're rendering the result of the CUDA kernel is to use OpenGL interop. This way, your code processes the image on the GPU and leaves the processed data there, making it much faster to render. There's a good example of doing this in the book.
If each CUDA thread needs to read only one pixel from the first frame and one pixel from the next frame, you don't need to use textures. Textures only benefit you if each thread is reading in a bunch of consecutive pixels. So you're best off using a 3D array.
Here is an example of using CUDA and 3D cuda arrays:
https://github.com/nvpro-samples/gl_cuda_interop_pingpong_st