Allocate memory in GPU, flash/air - actionscript-3

This is more an "implementation" of technology kind of question.
In old times, when I worked with C language, you could specify to use VGA memory or ram memory for allocation of bitmaps structures, then you could work with them a lot faster.
Now we are in 2013, I create bitmap in AS3, and it is allocated in ram (I've seen no option to use the GPU and 100% of cases im sure it is using the RAM, because it increases exactly the expected bitmap size.
¿Is there any option to use GPU memory?
Thanks

Check out the API docs for flash.display3D.Texture - there are 3 methods:
uploadCompressedTextureFromByteArray(data:ByteArray, byteArrayOffset:uint, async:Boolean = false):void
Uploads a compressed texture in Adobe Texture Format (ATF) from a ByteArray object.
uploadFromBitmapData(source:BitmapData, miplevel:uint = 0):void
Uploads a texture from a BitmapData object.
uploadFromByteArray(data:ByteArray, byteArrayOffset:uint, miplevel:uint = 0):void
Uploads a texture from a ByteArray.
So you can't allocate the memory directly in the GPU. You must upload data from a ByteArray or BitmapData, which first exists in RAM. However, to minimize CPU RAM usage, you could potentially reuse a single ByteArray or BitmapData in RAM, change its contents, and upload it many times, or release it after loading. But you can't access the contents of GPU memory directly, as far as I know.
As far as "read access", the only way to get data back from the GPU memory (again, a slow workaround) is to draw the Context3D back into a BitmapData via Context3D.drawToBitmapData... basically like a screen grab. The Starling Framework has an example of this functionality via Stage.drawToBitmapData.
Basically, the Stage3D APIs weren't setup so you can easily access the GPU memory.

You cannot allocate GPU memory manually like in other languages, but you can indeed accelerate your graphics using the GPU with different Adobe technologies.
For example if you want GPU accelerated video decoding you should be using StageVideo, or if you want to accelerate 2d or 3d graphics you could use Stage3D.
Unless you want to work in a low level fashion with Stage3D, it is recommended you use an intermediary framework.
For 2d the best solution is by far Starling. It is a solid framework endorsed by Adobe which has been used in countless commercial projects and is constantly optimised.
As for 3d take a look at Flare3D or Away3D.

Related

D3D Texture convert Format

I have a D3D11 Texture2d with the format DXGI_FORMAT_R10G10B10A2_UNORM and want to convert this into a D3D11 Texture2d with a DXGI_FORMAT_R32G32B32A32_FLOAT or DXGI_FORMAT_R8G8B8A8_UINT format, as those textures can only be imported into CUDA.
For performance reasons I want this to fully operate on the GPU. I read some threads suggesting, I should set the second texture as a render target and render the first texture onto it or to convert the texture via a pixel shader.
But as I don't know a lot about D3D I wasn't able to do it like that.
In an ideal world I would be able to do this stuff without setting up a whole rendering pipeline including IA, VS, etc...
Does anyone maybe has an example of this or any hints?
Thanks in advance!
On the GPU, the way you do this conversion is a render-to-texture which requires at least a minimal 'offscreen' rendering setup.
Create a render target view (DXGI_FORMAT_R32G32B32A32_FLOAT, DXGI_FORMAT_R8G8B8A8_UINT, etc.). The restriction here is it needs to be a format supported as a render target view on your Direct3D Hardware Feature level. See Microsoft Docs.
Create a SRV for your source texture. Again, needs to be supported as a texture by your Direct3D Hardware device feature level.
Render the source texture to the RTV as a 'full-screen quad'. with Direct3D Hardware Feature Level 10.0 or greater, you can have the quad self-generated in the Vertex Shader so you don't really need a Vertex Buffer for this. See this code.
Given your are starting with DXGI_FORMAT_R10G10B10A2_UNORM, then you pretty much require Direct3D Hardware Feature Level 10.0 or better. That actually makes it pretty easy. You still need to actually get a full rendering pipeline going, although you don't need a 'swapchain'.
You may find this tutorial helpful.

TextureAtlas - how does it affect on RAM usage in open world game?

I plan to use TextureAtlas in my open world 2d game.
I need to load textures dynamically (because there are thousands of them, so I cannot load all at once). I plan to load textures that are needed at specific moment in gameplay. Also for some reasons I cannot have many texture atlases per map location.
Generally, I must avoid situation that I read all textures (entire atlas), because RAM usage will be too large. How the TextureAtlas work? Is is possible to keep the atlas open during entire game, but read from the atlas (to the RAM) only chosen textures when needed without worrying about RAM usage?
Best regards.
You cannot load portions of a TextureAtlas. It is all or nothing. You will have to use multiple atlases and carefully plan what to put in each atlas such that you don’t have to load all of them simultaneously.
A Texture represents a single image loaded into GPU memory for use in OpenGL. A page of a TextureAtlas corresponds to a single Texture. Typically, you will have multiple TextureRegions on each page (Texture) of a TextureRegion. If you don’t, there’s no point in using TextureAtlas. The point of TextureAtlas is to avoid SpriteBatch having to flush vertex data and swap OpenGL textures for every sprite you draw. If you draw consecutive TextureRegions from the same Texture (or atlas page), they are batched together into a single mesh and OpenGL texture so it performs better.
Libgdx doesn’t support loading individual pages of a TextureAtlas though. That would make it too complicated to use. So, if you are doing a lot of loading and unloading, I recommend avoiding multipage atlases. Use AssetManager to very easily load and unload what you need.

Does watching HD videos slow down my program using the CUDA CPU? [duplicate]

I'm trying to figure out if I can use OpenACC in place of normal CPU serial execution calls. Usually my programming is all about 3D programming, or uses the GPU normally in some way. I.E. Image processing, or some other type of rendering that requires the use of shaders. I'm trying to figure out if this Library would benefit me or not.
The reason I ask this is because if I'm rendering 3D Graphics (as fast as possible) would it slow down that process in away? Or is it able to maintain it's (in theory) "high frame rates" or not.
If so, what's the trade off, and how much? I'm not willing to loose 3D Graphics (display) performance to enhance operations that can be done on the CPU serially.
Edit:
This is a C++ context.
On the AMD and NVIDIA GPUs that I am familiar with, OpenACC programs will make use of compute resources that would also be used to some degree by shader programs. There are many other pieces of graphics hardware in a GPU that are not shared between compute and graphics, but there are some shared resources. Likewise, the GPU may be connected to the system by PCIE, and so this can also present a shared resource or contention point (however it's the rare compute or graphics program that would even come close to using up the bandwidth of a modern Gen3 x16 PCIE connection.)
So if you were using both graphics (or compute) shaders, as well as OpenACC acceleration, there would be contention for resources, to some degree. The level of contention, or the trade off, is not something that I can generalize about. It will depend very much on the specifics of your program, and the extent and the detail sequencing of the compute functions and the graphics functions.
GPU designers have these types of use-cases in mind, and so GPUs are generally pretty good at rapid context switching between the various tasks that may compete for resources.

Writing an HD Game in Staged 3D

I'm trying to write a 2D Game using Starling, and most textures are up to 4k, and I'm getting a resource limit exception.
Is there a way or an algorithm, idea, to get HD Textures and use them with that limited texture resources in stage3d ? Compression ?
Stage3D (which Starling uses underneath) has a max texture size of 2048x2048. If your textures are larger than this, then you are going to have to split them and stitch them together at runtime.
If you find yourself running out of memory (rather than the dimensional size limit) then you can look into compression using ATF textures.

CUDA: Is texture memory still useful to speed up access times for compute capability 2.x and newer?

I'm writing an image processing app where I have to fetch pixel data in uncoalesced manner.
Initially I implemented my algorithm using global memory. Later I reimplemented it using texture memory. To my amazement it became slower! I thought, maybe something wrong with cudaMalloc/text1Dfetch style, so I changed it to cudaArray/tex2D. Nothing changed.
Then I stumbled upon Shane Cook's "CUDA Programming", where he wrote:
As compute 1.x hardware has no cache to speak of, the 6–8K of texture memory per SM provides the
only method to truly cache data on such devices. However, with the advent of Fermi and its up to 48 K
L1 cache and up to 768 K shared L2 cache, this made the usage of texture memory for its cache
properties largely obsolete. The texture cache is still present on Fermi to ensure backward compati-
bility with previous generations of code.
I have GeForce GT 620M (Fermi, compute cap. 2.1).
So I need some advice from professionals! Should I dig deeper into texture memory with its texture cache trying to optimize performance? Or I should better stick with global memory and L1/L2 cache?
Textures can indeed be useful on devices of compute capability >= 2.0.
Textures and cudaArrays can use memory stored in a space filling curve, which can allow for a better cache hit rate due to better 2D spatial locality.
The texture cache is separate from the other caches. So it has its own dedicated memory and bandwidth and reading from it does not interfere with the other caches. This can become important if there is a lot of pressure on your L1/L2 caches.
Textures also provide built in functionality such as interpolation, various addressing modes (clamp, wrap, mirror) and normalized addressing with floating point coordinates. These can be used without any extra cost and can greatly improve performance in kernels where such functionality is needed.
On early CUDA architectures, textures and cudaArrays could not be written by a kernel. On architectures of compute capability >= 2.0, they can be written via CUDA surfaces.
Determining if you should use textures or a regular buffer in global memory comes down to the intended usage and access patterns for the memory. It will be project specific.
You are using the Fermi architecture, with a device that has been rebranded into the 6xx series.
For those on the Kepler architecture, take a look at NVIDIA's Inside Kepler Presentation. In particular, the slides, Texture Performance, Texture Cache Unlocked and const __restrict Example.