Why is WebGL render speed so inconsistent? - google-chrome

In my application I plot about 8 million vertices with a single call to WebGL's drawarrays using the LINE_STRIP flag. I don't actually want one long line, I want about 200k short lines, so I cap all the short lines with extra vertices, and tell the vertex shader to "push" the line caps into negative z to create invisible bridges. The rendering is quasi-static (the user can click various things that trigger a re-render) so it doesn't have to be super-fast, but I'd really hoped it would take less than 200ms on modernish computers.
On my laptop [UPDATE: which runs Win7 using a few Intel i7s as its CPU and an integrated HD Graphics 4000 for a GPU] I get around 100ms in Chrome, which is good. Oddly though, Firefox gets around 1-2 seconds. On my Samsung Chromebook 550 I get anything from 600ms to 2s, often it starts quick and then subsequent renders get slower but it can get faster too.
Questions:
What might be causing the change in render speed on my Chromebook?
Why is Firefox so much slower than Chrome on my laptop?
Is it worth spending ages trying to make it run faster (i.e. can I expect much improvement)? Any tips?
Notes:
For the Chromebook repeated rendering tests, the only thing happening between renders is a uniform is changed to toggle between color palettes (implemented as textures). Chrome dev tools doesn't seem to think there are any major changes in the page's memory usage during the testing.
I'm using gl.finish and console.time to see how long the rendering is taking.
Except during debugging, I render to an orphaned canvas and then copy sections of the result to various small canvases on the page UPDATE: using drawImage (with the webgl canvas as the first argument). This probably does take a bit of time, but the numbers reported above don't seem to change much with or without the copy operation and with or without the webgl canvas attached to the page body (and visible).
UPDATE: There is a limit to how many vertices my laptop will render in one go, but the limit seems to fluctuate from moment to moment, if you go over the limit then it doesn't render anything. The number is around the 8million mark, but sometimes it's happy to go over 11million. I've now set it to batch 2 million at a time. Interestingly this seems to make my Chromebook go faster, but I can't be sure as it's so inconsistent.
UPDATE: I've disabled DEPTH_TEST and BLEND as I don't need them. I'm not convinced it made any difference.
UPDATE: I've tried rendering with POINTS instead of LINES. On my Chromebook it seemed to take about 1s with 0 point size (i.e. rendering nothing), and then around 1.5-2s as I increased the point size through 1,2, and 5.
UPDATE: Keeping everything on the z=0 plane doesn't seem to change the speed much, maybe it goes a little slower (which I'd expect as there are a lot more pixels to get through the fragment shader, though the fragment shader is just funneling a varying straight into gl_FragColor).

Although the usual (good) advice is to render as much as possible in each draw call, some (at least one which I know of) GPUs have internal buffers used while processing vertex data. Exceeding the capacity of these buffers can make your performance fall off a cliff. Reduce the size of your vertex batches until you start to see a performance drop from having batches which are too small.

Related

Large gaps in chrome profiler flame chart

I am trying to speed up a slow three.js based app. Here's Chrome's flame chart for 2 frames:
I can see that my draw takes a long time, so that needs to be improved, fair enough. But what I do not understand is why there is such a lot of white space, during which nothing seems to be happening (and in particular, there is no GPU bar in that gap either). I am doing a requestAnimationFrame after every draw, and I'd expect that to get fired, not to wait for a full 250ms or so. Any way to find out what it is doing in these white gaps?

Is there a performance benefit to pre-rendering an HTML5 canvas circle?

I understand it's often faster to pre-render graphics to an off-screen canvas. Is this the case for a shape as simple as a circle? Would it make a significant difference for rendering 100 circles at a game-like framerate? 50 circles? 25?
To break this into two slightly different problems, there are two aspects to what you're asking:
1) is drawing a shape off-screen and putting it on-screen faster
2) is drawing a shape one time and copying it to 100 different places faster than drawing a shape 100 times
The answer to the first one is "it depends".
That's a technique known as "buffering" and it's not really about speed.
The goal of buffering an image is to remove jerkiness from it.
If you drew everything on-screen, then as you loop through all of your objects and draw them, they're updating in real-time.
In the NES-days, that was normal, because there wasn't much room in memory, or much power to do anything about it, and because programmers didn't know much better, with the limited instructions they had to work with.
But that's not really the way games do things, these days.
Typically, they call all of the draw updates for one frame, then they take that whole frame as a finished image, and paste that whole thing on the screen.
The GPU (and GL/DirectX) takes care of this, by default, in a process called "double-buffering".
It's a double-buffer, because there's room for the "in-progress" buffer used for the updates, as well as the buffer that holds the final image from the last frame, that's being read by the monitor.
At the end of the frame processing, the buffers will "swap". The newly full frame will be sent to the monitor and the old frame will be overwritten with the new image data from the other draw calls.
Now, in HTML5, there isn't really access to the frame-buffer, so we do it ourselves; make every draw call to an offscreen canvas. When all of the updates are finished (the image is stable), then copy and paste that whole image to the onscreen canvas.
There is a large speed-optimization in here, called "blitting", which basically copies over only the parts that have changed, and reuses the old image.
There's a lot more to it than that, and there are a lot of caveats, these days, because of all of the special-effects we add, but there it is.
The second part of your question has to do with a concept called "instancing".
Instancing is similar to blitting, but while blitting is about only redrawing what's changed, instancing is about drawing the exact same thing several times in different places.
Say you're painting a forest in Photoshop.
You've got two options:
Draw every tree from scratch.
Draw one tree, copy it, paste it all over the image.
The downside of the second one is that each "instance" of the image looks exactly the same.
If your "template" image changes colour or takes damage, then all instances of the image do, too.
Also, if you had 87 different tree variations for an 8000 tree forest, making instances of them all would still be very fast, but it would take more memory, because you now need to save 87x more images than when it was just one tree, to reference on every draw call.
The upside is that it's still much, much faster.
To answer your specific question about X circles, versus instancing 1 circle:
Yes, it's still going to be a lot faster.
What a "lot" means, though, will change based on a lot of different things, because now you're talking about browsers on PCs.
How strong is the PC?
How good is the videocard?
How large is the canvas in software-pixels (not CSS pixels)?
How large are the circles? Do they have alpha-blending?
Is this written in WebGL or software?
If software is the canvas compositing in hardware mode?
For a typical PC, you should still be able to hit 60fps in Chrome, drawing 20 circles, I think (depending on what you're doing to them... ...just drawing them onscreen, every frame is simple), so in this case, the instances are still a "lot" faster, but it's not going to matter, because you've already passed the performance-ceiling of Canvas.
I don't know that the same would be true on phones/tablets, or battery-powered laptops/netbooks, though.
Yes, transferring from an offscreen canvas is faster than even primitive drawings like an arc-circle.
That's because the GPU just copies the pixels from the offscreen canvas (not much CPU effort required)

How to measure complete all-in performance of DOM changes?

I've found lots of information about measuring load time for pages and quite a bit about profiling FPS performance of interactive applications, but this is something slightly different than that.
Say I have a chart rendered in SVG and every click I make causes the chart to render slightly differently. I want to get a sense of the complete time elapsed between the click and the point in time that the pixels on the screen actually change. Is there a way to do this?
Measuring the Javascript time is straight forward but that doesn't take into consideration any of the time the browser spends doing any layout, flow, paint, etc.
I know that Chrome timeline view shows a ton of good information about this, which is great for digging into issues but not so great for taking measurements because the tool itself affects performance and, more importantly, it's Chrome only. I was hoping there was a browser independent technique that might work. Something akin to how the Navigation Performance API works for page load times.
you may consider using capturing hdmi capturing hardware (just google for it) or a high speed camera to create a video, which could be analyzed offline.
http://www.webpagetest.org/ supports capturing using software only, but I guess it would be too slow for what you want to measure.

Is there a trick to creating an animated gif of tv static that will allow it to be relatively small?

Apologies in advance, but this isn't really a photoshop question. Rather, I'm trying to come up with something that is convincing but exploits the compression and features of the gif format as best as possible to produce the smallest possible file for the animation.
Some constraints:
It needs to be at least 20 or 30 frames. I've tried with fewer (and since they're largely uncompressable 15 frames is half the size of 30, generally speaking)
Size needs to be no less than about 256x192
It doesn't need to be color though, nor even full grayscale. I've seen convincing stills with as few as about 16 grays
It can have a pattern, but not one that is instantly obvious to the human eye. If someone takes a single frame and after a minute or two can spot the pattern (which makes it compressable?) that's ok
Frames 2 through n can use quite a bit of alpha, but when I started using big horizontal stripes of alpha, it was instantly noticeable to my eyes. So you don't get to rack up a bunch of RLE with the easy cheat.
All of the above and still needs to look good at 30-33ms frame speed. No variable speed or relying on anything significantly faster than that.
Also acceptable: an apng that complies with the above constraints. Possibly even mpeg, if you can come up with that (I'm ignorant of how the DCT does its magic).
Ideally I could get something down in the 250kbyte range, but I'd settle for anything significantly smaller than the 9 meg monstrosity I cooked up last week.
Oh, and one last thing: obviously I don't expect anyone to supply the graphic for me. I'm just looking for some trick(s) that will let me get there myself eventually.
This is a very interesting question.
Static (random noise) by its nature is actually highly incompressible. Information theory says that true noise is basically incompressible, and the more patterns something contains the more compressible it becomes (to the point of a solid line of 1's or 0's being perfectly compressible.
The ideal would be to create a true noise generator (just random numbers), but that doesn't help within the constraints of your problem.
The best thing I can think of is storing a number of small tiles of static and displaying them in staggered fashion to prevent the eye catching on to any patterns. Aside from that, you won't have much luck compressing this beyond 256 x 192 x 20 / 2 or about 500 kilobytes ( assuming 20 frames with resolution of 256 x 192, using 4 bit color depth ).
Simply encoding your animated gif in 16 color mode should get you to that point.
Well old but still unanswered answer (not checked anyway)
so create the NoSignal image data
If it is not obvious how read this:
NoSignal in asm and C++
encode into gif
Had played with it a bit so I used resolution 320x240, the lowest bit resolution usable is 3 bit per pixel. Lower does not look good. Single global palette only (obvious) here 300KB example
[Notes]
if this is just for some app then generate the image on the run it is really just few lines of code see that linked answer in bullet #1
Yes, you can achieve that with a lossy GIF compression, or rather a specifically rigged compressor that outputs noisy LZW stream.
A best-case scenario for LZW compression is to output X pixels, then X+1 pixels, then X+2 pixels, etc. It's easy to make that noisy.
Try screwing up the gfc_lookup function to (almost) always return longest dictionary item and compress series of noisy frames with it:
https://github.com/pornel/giflossy/blob/master/src/gifwrite.c#L270
Not easily normally. Good randomness (high entropy) by definition does not compress well. Having it greyscale may help, but not much.
If you want to do this on a web page and you have (some) control, you can always write a very small bit of JS to help... if you can do this, then you can do the following:
Create a gif about 1.5x the size you need with high-entropy static.
Set the clipping to the size you want.
Then you randomly move it around by changing the starting offset.
As long as your offsets are a decent distance away from one another (and don't repeat patterns) it is usually difficult to discern it as movement, and it looks truly like static.
I did this trick about 20 years ago on an Amiga to emulate static on a limited-memory demo, and it worked remarkably well... it also does not require fast low-level code as all was done by changing offsets and the co-processor bitblit-ed the rest.

Do browsers render canvas elements that are not inside the viewport?

I have a page that has pretty heavy (mid-weight rather) canvas operations going on. To cater for users on mobile devices and older computers I was thinking I could implement a mechanism that will check if the canvas element is actually visible and decide if the constant calculations and canvas updates (animation running at 30fps) do have to be done or not.
This is working fine, yet when doing a performance test with the Chrome Dev Tools I noticed that even when I disable my visibility check and just let things render all the time the CPU usage of the function in question drops quite a bit when no part of the canvas element(s) is visible (although in theory it should still be performing the same tasks). So: at least on my computer running Chrome 17 it does not make a real difference if I check for the element's actual visibility.
To cut a long story short: Do I need to do this or are browsers smart enough to handle such a case without even telling them (and I can save the visibility checking)?
EDIT:
So I made some "research" on this topic and built this fiddle.
What happens is that it just generates noise at 30 frames per second. Not too pleasing to the eye but, well... The upper part is just a plain div to block the viewport. When I scroll down and have the canvas element in the viewport CPU Usage tells me it's taking up about 40%, so apparently the browser does have quite a lot to do here. When I scroll back up so that I just have the maroon colored div in my viewport and profile the CPU usage it drops to sth around 10%. When I scroll back down: usage goes up again.
So when I implement a visibility check like in this modified fiddle, I do see an increase (a tiny one to be honest) in CPU usage instead of a drop (as it has the additional task of checking if the canvas is inside the viewport).
So I am still wondering if this is some side effect of something that I am not aware of (or I am making some major mistake when profiling) or if I can expect browsers to be smart enough to handle such situations?
If anyone could shed a light on that I'd be very thankful!
I think you're confused between whether the logic is running and whether the rendering is happening. Many browsers now hardware-accelerate their canvases so all rendering happens on the GPU, so actual pixel pushing takes no CPU time anyway. However your tick function has non-trivial code to generate random noise on the CPU. So you're only really concerned over whether the tick function is running. If the canvas is offscreen, it certainly won't be rendered to the display (it's not visible). As for the canvas draw calls, it probably depends on the browser. It could render all draw calls to an off-screen canvas in case you suddenly scroll it back in to view, or it could just queue up all the draw calls and not actually do anything with them until you scroll the canvas in to view. I'm not sure what each browser does there.
However, you shouldn't use setInterval or setTimeout for animating Canvas. Use the new requestAnimationFrame API. Browsers don't know what you do in a timer call so will always call the timer. requestAnimationFrame on the other hand is designed specifically for visual things, so the browser has the opportunity to not call the tick function, or to reduce the rate it's called at, if the canvas or page is not visible.
As for how browsers actually handle it, I'm not sure. However, you should definitely prefer it since future browsers may be able to better optimise requestAnimationFrame in ways they cannot optimise setInterval or setTimeout. I think modern browsers also reduce the ordinary timers to 1 Hz if the page is not visible, but it's definitely much easier for the browser to optimise requestAnimationFrame, plus some browsers get you V-syncing and other niceness with it.
So I'm not certain requestAnimationFrame will mean your tick function is not called if the canvas is scrolled out of view. So I'd recommend using both requestAnimationFrame and the existing visibility check. That should guarantee you the most efficient rendering.
From my own experience it renders whatever you tell it to render regardless of position on screen.
An example is if you draw tiles, that exceeds the canvas size, you will still see the performance drop unless you optimize the script.
Try your function with a performance demanding animation, and see if you still get the same results.