I am currently developing a 3D heat flow simulation on a 3D triangular mesh (basically any shape) with CUDA.
I was thinking of exploiting spatial locality by using CUDA textures or surfaces. Since I have a 3D mesh I thought that a 3D texture would be appropriate. After looking on different examples, however, I am not so sure anymore: 3D Textures are often used for volumes not for surfaces like in my case.
Can I use 3D textures for polygon meshes? Does it make sense? If not, are there other approaches or data structures in CUDA of use for my case?
Using 3D textures to store surface meshes is in fact a good idea. To better point this out, let me recall the clever approach in
Octree Textures on the GPU, GPU Gems 2
using 2D and 3D meshes to store an OctTree and to
Create an OctTree using a 3D texture;
Fastly traverse the OctTree by exploiting the filtering properties of the 3D texture;
Storing the surface polygons by a 2D texture.
OCTTREE TRAVERSAL BY THE FILTERING FEATURES OF A 3D TEXTURE
The tree is stored as an 8-bit RGBA 3D texture mapped in the unit cube [0,1]x[0,1]x[0,1], named as indirection pool. Each node of the tree is an indirection grid. Each child node is identified by the first three coordinates of the RGBA, while the fourth stores some other information, for example, if the node is a leaf or not or if it is empty.
Consider the QuadTree example reported in the paper (figure borrowed from the paper).
The A, B, C and D nodes (boxes) are stored as the texture elements (0,0), (1,0), (2,0) and (3,0), respectively, containing, for a QuadTree, 4 elements, each element storing a link to the child node. In this way, any access to the tree can be done by exploiting the hardware filtering features of the texture memory, a possibility that is illustrated in the following figure:
and by the following code (it is written in Cg, but I'm sure it can be easily ported to CUDA):
STORING THE TREE ELEMENTS BY A 2D TEXTURE
The elements of the tree can be stored by the classical approach exploiting the (u,v) coordinates, see UV mapping. The paper linked to above discusses a way to improve this method, but this is beyond the scope of this answer.
Related
I often see guides and examples using Convolutional Layers when implementing Deep Q-Networks. This makes sense for some scenarios, typically where you do not have access to the state in for example an array representation.
In my case, I have a game environment which gives me complete access to the state, in form of a 2D array. This 2D array is later interpreted by a graphics engine and dawn to the screen.
I have been recommended to use Convolutional Layers for interpreting images, but I have yet to see any recommendations about flattening the 2D State representation directly and utilize dense layers instead.
Does it make any sense to use Convolutional Networks/Layers for data which are not an image?
I have a closed surface mesh generated using Meshlab from point clouds. I need to get a volume mesh for that so that it is not a hollow object. I can't figure it out. I need to get an *.stl file for printing. Can anyone help me to get a volume mesh? (I would prefer an easy solution rather than a complex algorithm).
Given an oriented watertight surface mesh, an oracle function can be derived that determines whether a query line segment intersects the surface (and where): shoot a ray from one end-point and use the even-odd rule (after having spatially indexed the faces of the mesh).
Volumetric meshing algorithms can then be applied using this oracle function to tessellate the interior, typically variants of Marching Cubes or Delaunay-based approaches (see 3D Surface Mesh Generation in the CGAL documentation). The initial surface will however not be exactly preserved.
To my knowledge, MeshLab supports only surface meshes, so it is unlikely to provide a ready-to-use filter for this. Volume mesher packages should however offer this functionality (e.g. TetGen).
The question is not perfectly clear. I try to give a different interpretation. According to your last sentence:
I need to get an *.stl file for printing
It means that you need a 3D model that is ok for being fabricated using a 3D printer, i.e. you need a watertight mesh. A watertight mesh is a mesh that define in a unambiguous way the interior of a volume and corresponds to a mesh that is closed (no boundary), 2-manifold (mainly that each edge is shared exactly by two face), and without self intersections.
MeshLab provide tools for both visualizing boundaries, non manifold and self-intersection. Correcting them is possible in many different ways (deletion of non manifoldness and hole filling or drastic remeshing).
I have a specialised rendering app that needs to load up any number of jpegs from a pdf, and then write out the images into a rendered page inside a kernel. This is oversimplified, but the point is that I want to find a way to collectively send up 'n' images as textures, and then, within the kernel, to index into this collective of textures for tex2d() calls. Any ideas welcome for doing this gracefully.
As a side question, I haven't yet found a way to decode the jpeg images in the kernel, forcing me to decode on the CPU and then send up (slowly) a large bitmap. Can i improve this?
First: if texture upload performance is not a bottleneck, consider not bulk uploading. Here are some suggestions, each with different trade-offs.
For varying-sized textures, consider creating a texture atlas. This is a technique popular in game development that packs many textures into a single 2D image. This requires offsetting texture coordinates to the corner of the image in question, and it precludes the use of texture coordinate clamping and wrapping. So you would need to store the offset of the corner of each sub-texture instead of its ID. There are various tools available for creating texture atlases.
For constant-sized textures, or for the case where you don't mind the waste of varying-sized textures, you could consider using a layered texture. This is a texture with a number of independent layers that can be indexed at texture fetch time using a separate layer index. Quote from the link above:
A one-dimensional or two-dimensional layered texture (also know as texture array in Direct3D and array texture in OpenGL) is a texture made up of a sequence of layers, all of which are regular textures of same dimensionality, size, and data type.
A one-dimensional layered texture is addressed using an integer index and a floating-point texture coordinate; the index denotes a layer within the sequence and the coordinate addresses a texel within that layer. A two-dimensional layered texture is addressed using an integer index and two floating-point texture coordinates; the index denotes a layer within the sequence and the coordinates address a texel within that layer.
A layered texture can only be a CUDA array by calling cudaMalloc3DArray() with the cudaArrayLayered flag (and a height of zero for one-dimensional layered texture).
Layered textures are fetched using the device functions described in tex1Dlayered() and tex2Dlayered(). Texture filtering (see Texture Fetching) is done only within a layer, not across layers.
Layered textures are only supported on devices of compute capability 2.0 and higher.
You could consider a hybrid approach: sort the textures into same-sized groups and use a layered texture for each group. Or use a layered texture atlas, where the groups are packed such that each layer contains one or a few textures from each group to minimize waste.
Regarding your side question: a google search for "cuda jpeg decode" turns up a lot of results, including at least one open source project.
I am dealing with a set of (largish 2k x 2k) images
I need to do per-pixel operations down a stack of a few sequential images.
Are there any opinions on using a single 2D large texture + calculating offsets vs using 3D arrays?
It seems that 3D arrays are a bit 'out of the mainstream' in the CUDA api, the allocation transfer functions are very different from the same 2D functions.
There doesn't seem to be any good documentation on the higher level "how and why" of CUDA rather than the specific calls
There is the best practices guide but it doesn't address this
I would recommend you to read the book "Cuda by Example". It goes through all these things that aren't documented as well and it'll explain the "how and why".
I think what you should use if you're rendering the result of the CUDA kernel is to use OpenGL interop. This way, your code processes the image on the GPU and leaves the processed data there, making it much faster to render. There's a good example of doing this in the book.
If each CUDA thread needs to read only one pixel from the first frame and one pixel from the next frame, you don't need to use textures. Textures only benefit you if each thread is reading in a bunch of consecutive pixels. So you're best off using a 3D array.
Here is an example of using CUDA and 3D cuda arrays:
https://github.com/nvpro-samples/gl_cuda_interop_pingpong_st
I'm new to OpenGL and graphics programming in general, though I've always been interested in the topic so have a grounding in the theory.
What I'd like to do is create a scene in which a set of objects move about. Specifically, they're robotic soccer players on a field. The objects are:
The lighting, field and goals, which don't change
The ball, which is a single mesh which will undergo translation and rotation but not scaling
The players, which are each composed of body parts, each of which are translated and rotated to give the illusion of a connected body
So to my GL novice mind, I'd like to load these objects into the scene and then just move them about. No properties of the vertices will change, either their positioning nor texture/normals/etc. Just the transformation of their 'parent' object as a whole.
Furthermore, the players all have identical bodies. Can I optimise somehow by loading the model into memory once, then painting it multiple times with a different transformation matrix each time?
I'm currently playing with OpenTK which is a lightweight wrapper on top of OpenGL libraries.
So a helpful answer to this question would either be:
What parts of OpenGL give me what I need? Do I have to redraw all the faces every frame? Just those that move? Can I just update some transformation matrices? How simple can I make this using OpenTK? What would psuedocode look like? Or,
Is there a better framework that's free (ideally open source) and provides this level of abstraction?
Note that I require any solution to run in .NET across multiple platforms.
Using so called vertex arrays is probably the surest way to optimize such a scene. Here's a good tutorial:
http://www.songho.ca/opengl/gl_vertexarray.html
A vertex array or more generally, a gl data array holds data like vertex positions, normals, colors. You can also have an array that hold indexes to these buffers to indicate in which order to draw them.
Then you have a few closely related functions which manage these arrays, allocate them, set data to them and paint them. You can perform a rendering of a complex mesh with just a single OpenGL command like glDrawElements()
These arrays generally reside on the host memory, A further optimization is to use vertex buffer objects which are the same concept as regular arrays but reside on the GPU memory and can be somewhat faster. Here's abit about that:
http://www.songho.ca/opengl/gl_vbo.html
Working with buffers as opposed to good old glBegin() .. glEnd() has the advantage of being compatible with OpenGL ES. in OpenGL ES, arrays and buffers are the only way to draw stuff.
--- EDIT
Moving things, rotating them and transforming them in the scene is done using the Model View matrix and does not require any changes to the mesh data. To illustrate:
you have your initialization:
void initGL() {
// create set of arrays to draw a player
// set data in them
// create set of arrays for ball
// set data in them
}
void drawScene {
glMatrixMode(GL_MODEL_VIEW);
glLoadIdentity();
// set up view transformation
gluLookAt(...);
drawPlayingField();
glPushMatrix();
glTranslate( player 1 position );
drawPlayer();
glPopMatrix();
glPushMatrix();
glTranslate( player 2 position );
drawPlayer();
glPopMatrix();
glPushMatix();
glTranslate( ball position );
glRotate( ball rotation );
drawBall();
glPopMatrix();
}
Since you are beginning, I suggest sticking to immediate mode rendering and getting that to work first. If you get more comfortable, you can improve to vertex arrays. If you get even more comfortable, VBOs. And finally, if you get super comfortable, instancing which is the fastest possible solution for your case (no deformations, only whole object transformations).
Unless you're trying to implement something like Fifa 2009, it's best to stick to the simple methods until you have a demonstrable efficiency problem. No need to give yourself headaches prematurely.
For whole object transformations, you typically transform the model view matrix.
glPushMatrix();
// do gl transforms here and render your object
glPopMatrix();
For loading objects, you'll even need to come up with some format or implement something that can load mesh formats (obj is one of the easiest formats to support). There are high-level libraries to simplify this but I recommend going with OpenGL for the experience and control that you'll have.
I'd hoped the OpenGL API might be easy to navigate via the IDE support (intellisense and such). After a few hours it became apparent that some ground rules need to be established. So I stopped typing and RTFM.
http://www.glprogramming.com/red/
Best advice I could give to anyone else who finds this question when finding their OpenGL footing. A long read, but empowering.