I'm trying to improve FPS of my deferred renderer and so I stumbled on chrome://tracing (here and here). I added console.time and console.timeEnd for every layer I render. You can see them here:
http://i.imgur.com/Rh5jfpN.jpg
My trace points are the topmost items starting with node/..., together, they run for about 2ms. All is done within these two milliseconds, all the gl* calls. My current FPS is around 38 FPS, one frame account for around 28ms. Now another, zoomed out picture with GPU process:
http://i.imgur.com/sM4aAXB.jpg
You can still see the trace points there, tiny bars at the top. Strangely, Chrome renders two frames very quickly, then runs this mysterious DoSwapBuffers / Onscreen tasks that halts the rendering for about 25ms, plus the MakeCurrent tasks that overlaps with the second (as in the two fast consecutive frames) frame, that takes 15ms.
It seems to me that Chrome is deferring all the WebGL tasks for later, making it impossible to make any kind of profiling. But that's just my guess. What to do with this and how can I profile my WebGL code?
As the GPU is running independent of the main JS thread, it is not synchronized to the JS activity as long as there is no direct feedback of the rendered data into the JS context. So issuing any WebGL command does not block JavaScript from going on, and the frame is only presented if the GPU has finished all operations issued by JS.
So it is impossible to see the actual computation time for any WebGL operation in this trace.
There is however a simple trick: Synchronize the JS thread by pulling some rendering-dependent data back into the JS world. The easiest thing to do so is to render some pixels of every intermediate texture to the 3D canvas, and this canvas into a 2D canvas by the 2D context drawImage method (and just to be sure access this canvas by ctx.getImageData()). This method is quite slow and will add some time, by also make all WebGL draw operations visible to profiling / tracing.
Beware that this may also skew the results to some extend as everything will be forced to compute in-order and the GPU driver cannot optiimize independent draw to operations to interleave each other.
Related
I am making a VR animation using A-Frame (HTML). My animation has many 3D models. But problem is that when I run the animation it gives low fps (15-20) and high draw calls (230-240). Due to this both animation and camera control are lagging. So, how to increase fps and reduce draw calls?
The number of draw calls sounds high, but not so high as to cause a frame rate drop as low as 15-20 FPS (though it depends a bit what spec system you are running on).
As well as looking at ways to reduce draw calls, you might also want to reduce the complexity of the models you are using, or the resolution of the textures, and look into other possible causes of performance problems.
Some options:
reducing texture resolutions - just open in a picture editor like Paint or GIMP and reduce the resolution. Keep textures to power of two resolutions where possible, e.g. 512 x 512 or 1024 x 1024.
reducing model complexity. Look at decimation. Best done outside the browser as a pre-processing step, with a 3D modelling tool such a blender. Also, worth checking how many meshes are in each model, and whether those can be combined in a single mesh.
reducing calls. You need to either merge geometries or (if you are using the same model multiple times) use instancing. Some suggestions for instancing here: Is there an instancing component available for A-Frame to optimize my scene with many repeated objects?. Merging geometries will involve writing some Javascript code yourself, but might be a better option if you don't have repeated geometries.
If you haven't already done this, also worth reviewing this list of performance tips here: https://aframe.io/docs/1.2.0/introduction/best-practices.html#performance
It could be something else on that list, e.g. raycasting, garbage collection issues etc. that's causing the problem.
Using the browser debuger to profile your code may give some further clues as to what's going on with performance.
Is there a way to directly test whether a cuda device is currently in use by any kernels?
I have a background thread that launches "raw" cuda kernels at full occupancy for a fractal program. The thread builds up large image arrays that I then want to let the user smoothly pan, rotate and zoom.
My GUI thread would like to use the GPU if it is not currently in use for the large image transformations since this runs at 100 fps. If the GPU is in use I can fall back to using CPU code instead at 10-20 fps.
If the GUI-thread GPU code is used when a background thread kernel is already running then the GUI-thread will freeze noticeably until the background kernel finishes. This freezing is what I'm seeking to eliminate by switching instead to CPU code for those frames. I've looked into interrupting the background kernel but solutions I've seen that do this add computational cost to the kernel and/or reset the context, both of which seem like overkill.
Is there a way to directly (asynchronously) detect whether the GPU is in use (by any kernel)? I suppose the GPU is always technically in use as a 2-D display driver, so excluding that activity of course.
My workaround would be to have a flag in my program which keeps track of whether all the kernels have completed. I would need to pass that flag between the two host threads and between the most nested objects within Model and View in my program. I started writing this and thought it was a bit of a messy solution and even then not always 100% accurate. So I wondered if there was a better way and in particular if the GPU could be tested directly at the point in the GUI thread that the decision is needed on whether to use GPU or CPU code for the next frame.
I'm using python 3.7, with cupy to access the GPU, but I would be willing to try to adapt a C++ solution.
I've looked in the docs, but with only basic knowledge of cuda it feels like looking for a needle in a haystack:
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE
This is the solution I used following help from #RobertCrovella.
import cupy as cp
stream_done: bool = cp.cuda.get_current_stream().done
if stream_done or worker_ready:
# use cupy to draw next frame
else:
# use numpy to draw next frame
Where worker_ready is a bool passed from the background worker GPU thread indicating it's activity.
For stream_done, see the docs. In my program I'm only using 1 cuda stream, the (unspecified) default stream. Otherwise I imagine you would need to test each stream depending on the problem.
After a lot of testing I found that:
cp.cuda.get_current_stream().done is True in the background thread immediately after the kernel has run but then can become False where I need to do the test despite my code not calling the GPU between the True and the False states. I haven't been able to explain this behaviour but I found I could not rely solely on stream_done. My testing suggests that: if stream_done is True at the point required then it is always safe to use the GPU; if stream_done is False it may or may not be safe to use the GPU.
I also have the background thread fire an event when it starts and stops, this event changes the worker_ready bool for the GUI thread. My testing showed worker_ready was more accurate for determining if the GPU could be used than stream_done. In cases where stream_done was True and worker_ready was False my testing showed the GPU code would also run quickly, presumably because the background thread was performing CPU code at that point in time.
So the best solution to the problem as I asked it was to use the GPU code if either condition is met. However even this didn't remove the visual lag I was seeking to eliminate.
The problem I was trying to solve was that when a background process is running on the GPU and the user tries to pan then occasionally there is a noticable lag of at least 0.5s. I attempted to quantify this lag by measuring the time from mouse press to the panned image being displayed. The time delay measured was 0.1s or less. Therefore no matter how fast the code is after the mouse click it cannot remove the lag whether using the GPU or the CPU.
To me this implies that the starting mouse press event itself has a delay in firing when the GPU is occupied. Presumably this is because the GPU is also running the display driver. I don't have any solid evidence of this beyond:
If the background thread does not run then the lag is removed.
Making the kernels orders of magnitude shorter did not reduce the lag at all.
Increasing the block_size to move away from full occupancy seemed to remove the lag most of the time, although it did not eliminate it completely.
I am making racing game in Libgdx.My game apk size is 9.92 mb and I am using four texture packer of total size is 9.92 Mb. My game is running on desktop but its run on android device very slow. What is reason behind it?
There are few loopholes which we neglect while programming.
Desktop processors are way more powerful so the game may run smoothly on Desktop but may slow on mobile Device.
Here are some key notes which you should follow for optimum game flow:
No I/O operations in render method.
Avoid creating Objects in Render Method.
Objects must be reused (for instance if your game have 1000 platforms but on current screen you can display only 3, than instead of making 1000 objects make 5 or 6 and reuse them). You can use Pool class provided by LibGdx for object pooling.
Try to load only those assets which are necessary to show on current screen.
Try to check your logcat if the Garbage collector is called. If so than try to use finalize method of object class to find which class object are collected as garbage and try to improve on it.
Good luck.
I've got some additional tips for improving performance:
Try to minimize texture bindings (or generally bindings when you're making a 3D game for example) in you render loop. Use texture atlases and try to use one texture after binding as often as possible, before binding another texture unit.
Don't display things that are not in the frustum/viewport. Calculate first if the drawn object can even be seen by the active camera or not. If it's not seen, just don't load it onto your GPU when rendering!
Don't use spritebatch.begin() or spritebatch.end() too often in the render loop, because every time you begin/end it, it's flushed and loaded onto the GPU for rendering its stuff.
Do NOT load assets while rendering, except you're doing it once in another thread.
The latest versions of libgdx also provide a GLProfiler where you can measure how many draw calls, texture bindings, vertices, etc. you have per frame. I'd strongly recommend this since there always can be situations where you would not expect an overhead of memory/computational usage.
Use libgdx Poolable (interface) objects and Pool for pooling objects and minimizing the time for object creation, since the creation of objects might cause tiny but noticable stutterings in your game-render loop
By the way, without any additional information, no one's going to give you a good or precise answer. If you think it's not worth it to write enough text or information for your question, why should it be worth it to answer it?
To really understand why your game is running slow you need to profile your application.
There are free tools avaiable for this.
On Desktop you can use VisualVM.
On Android you can use Android Monitor.
With profiling you will find excatly which methods are taking up the most time.
A likely cause of slowdowns is texture binding. Do you switch between different pages of packed textures often? Try to draw everything from one page before switching to another page.
The answer is likely a little more that just "Computer fast; phone slow". Rather, it's important to note that your computer Java VM is likely Oracles very nicely optimized JVM while your phone's Java VM is likely Dalvik, which, to say nothing else of its performance, does not have the same optimizations for object creation and management.
As others have said, libGDX provides a Pool class for just this reason. Take a look here: https://github.com/libgdx/libgdx/wiki/Memory-management
One very important thing in LibGDX is that you should make sure that sometimes loading assets from the memory cannot go in the render() method. Make sure that you are loading the assets in the right times and they are not coming in the render method.
Another very important thing is that try to calculate your math and make it independent of the render in the sense that your next frame should not wait for calculations to happen...!
These are the major 2 things i encountered when I was making the Snake game Tutorial.
Thanks,
Abhijeet.
One thing I have found, is that drawing is laggy. This means that if you are drawing offscreen items, then it uses a lot of useless resources. If you just check if they are onscreen before drawing, then your performance improves by a lot surprisingly.
Points to ponder (From personal experience)
DO NOT keep calling a function,in render method, that updates something like time,score on HUD (Make these updates only when required eg when score increases ONLY then update score etc)
Make calls IF specific (Make updations on certain condition, not all the time)
eg. Calling/updating in render method at 60FPS - means you update time 60 times a sec when it just needs to be updated once per sec )
These points will effect hugely on performance (thumbs up)
You need to check the your Image size of the game.If your image size are more than decrease the size of images by using the following link "http://tinypng.org/".
It will be help you.
I'm developing a flex game which is really jerky and not smooth at all on mobile devices.
I changed the application frameRate to 60 in my mxml file and it seems to run smoother (but not as it should). Does this have any impact on performance?
Is there any other way to do this? I don't have long and complex operations and I'm saying this because I found some open source libraries through which I can use async threads. But I read that this has downsides also.
I'm really confused because the only objects I have on stage are:
15 Image objects, each one with a Move object attached and an OnClick listener.
4 timers that repeat each 500 ms, 1 second, 2 seconds and 5 seconds.
The longest operation in the listeners and timers is O(n) where n = image count = 15, but most of them are O(1)
All the objects are created on view creationComplete event and I reuse them throughout the entire time.
Memory is managed correctly, I checked using memory profiler.
Can you point me in some directions?
I'm making a game in AS3 that requires a huge amount of bullets to be fired sequentially in an extremely short amount of time. For example, at a certain point, I need to fire one bullet, every 1-5 millisecond, for about 1 second. The game runs (smoothly) at 60 FPS with around 800+ objects on screen, but the timers don't seem to be able to tick faster than my framerate (around once every 16 milliseconds). I only have one enterFrame going, which everything else updates from.
Any tips?
The 16 milliseconds sounds about right... According to the docs it has a resolution no smaller than 16.6 seconds.
delay:Number — The delay between timer events, in milliseconds. A delay lower than 20 milliseconds is not recommended. Timer frequency is limited to 60 frames per second, meaning a delay lower than 16.6 milliseconds causes runtime problems.
I would recommend that you create x objects (bullets) off-screen, at different offsets, on each tick to get the required amount of objects you want in 1 second. This assumes that your context allows for enemies off-screen to shoot.
How can you possibly have 800+ objects on screen? is each object a single pixel or is the entire screen just filled? I mean to be fair I have a 1920x1080 screen in front of me so each object could be 2 pixels wide and 2 pixels tall and it wouldn't quite fill the entire screen width wise 1600x1600. I'm just curious why you would have such a scenario as I've been toying with game development a bit.
As for the technical question a Timer is not guaranteed to be triggered at the moment after the duration has expired (just some time after), it depends on how quickly it's able to get around to processing the code for the timer tick. My guess is having so many objects is exhausting the CPU (on *NIX systems use top in the console to see or in Windows use task manager is it peaking a core of the cpu?). This can probably confirm or deny it or if you turn off the creation/updating of your objects and see if the timer itself ticks at the correct rate then. If either is true it suggests the CPU is peaking out.
Consider using Stage3D to offload the object drawing to the GPU to free up the CPU to run your Timer. You may also want to consider a "game framework" like flixel to help manage your resources though I don't know that it takes advantage of the GPU... actually just Googled and found an interesting post discussing it:
http://forums.flixel.org/index.php?topic=6101.0