Why does a HTTP/2 multiplexing demo using multiple connections?

Why does a HTTP/2 multiplexing demo using multiple connections? - google-chrome

These days I'm evaluating HTTP/2 (on Nginx), as a possible candidate for boosting performance of my application.
I was looking on this nice Akamai HTTP2 demo. From this demo I can see that the "http2" part loads much faster, apparently thanks to HTTP2 multiplexing feature.
So, I decided to look a bit closer. I opened Chrome (version 51) developer tools and examined the Network panel.
I expected to see one single network connection, handling all the requests (e.g. multiplexing).
However, I see multiple connections issued, one per image tile:
Moreover, I see that there is a delay ("stalled") for almost every reques:
I expected that (contrary to HTTP1) all requests will be issued in parallel without delays. Would someone help me to understand what is going on?

What you see are not multiple connections, one per image tile, but multiple requests, one per image tile, on a single TCP connection.
The fact that are multiplexed is evident because there is a large number of requests (tens or even hundreds) that are sent at the same time.
See how the requests are all aligned vertically.
Compare this with a HTTP/1.1 profile and you will see a ladder, ziggurat-style profile because only (typically) 6 requests can be sent at a time. See for example this presentation I gave at 39:54.
What you see is therefore the expected profile for a multiplexed protocol like HTTP/2.
The tiny "stalled" delay that you see for those requests may be due to internal implementation delays (such as queuing) as well as HTTP/2 protocol details such as flow control.

Related

Chrome Queueing Requests

Chrome Timing View
The image above show chrome spends most of the time queuing up the request. I am trying to figure out why that is happening to minimize it.
According to chrome developer documents:
A request being queued indicates that:
The request was postponed by the rendering engine because it's
considered lower priority than critical resources (such as
scripts/styles). This often happens with images.
The request was put on hold to wait for an unavailable TCP socket that's about to free up.
The request was put on hold because the browser only allows six TCP connections per origin on HTTP 1.
Time spent making disk cache
entries (typically very quick.)
Number 3 seems to be the most likely problem according to chrome developer documents but I know that only one request is going out at a time so that can't be it. I don't think it is number 1 either because the performance monitor doesn't show a lag from rendering. Maybe it is either 2 or 4 but I don't know how to test that.
Chrome Performance Monitor
I've included a picture of the performance monitor that shows these long tasks where something is happening in the system. These are also a mystery to me and seem related.
Any help is greatly appreciated!
Edit: It seems you can disable the disk cache when you open dev tools and that didn't seem to fix the problem.

Chrome Devtools: Accessing values in the summary of the network tab

I am running performance tests on a webpage and I noticed the network tab of Chrome DevTools has a summary bar at the bottom with number of requests, bytes transferred and finish time. I would like to console.log (or even better log to a file) these values at the end of each test.
I do see a chrome.loadTimes() which has the timing info. Is there something similar to retrieve number of requests and bytes transferred?

Check out the Resource Timing API. It can give you a whole bunch of info on every resource that a page requests. And it's got good cross-browser support, so you can use it to collect some Real User Metrics on your page's load performance in the wild.

Why does Chrome use Http/1.1 instead of H2 for some resources

We are in the process of enabling H2 for our web-site. During the testing I am observing that some resources seem to be requested with Http /1.1 and most of the others with H2. Interestingly the same resource, when requested via one path seem to be using http/1.1 and at another point seems to be using H2.
I am using Chrome version 58.0.3029.96 (64-bit) running on OSX Sierra, running in Incognito mode
Both the resources are requested from the same origin.
See below the screenshot from Chrome developer tools for the resources.
In addition there are few other resources that are also requested using http/1.1. Any ideas as to why this is happening ? Also when switching from http/2 to http/1.1 the same connection seems to be reused, could this also be causing a head of line blocking issue as well ?
Any help here would be appreciated !!

I can't explain why HTTP/1.1 is used some of the time but not the others based on the limited info you have given in your screenshots as this should not happen.
Are you 100% sure they are both to the same origin? Are the resources perhaps being served from cache and were they cached under HTTP/1.1?
On another point why are you requesting the same source twice within the same page load anyway as that seems wrong? Fair enough for data that changes (e.g. JSON requests) but don't understand why you would load jquery-UI multiple times or even the same css file as you seem to be doing? Seems a very odd use case and at least you should be caching it to use it again.
To your second question, under HTTP/2 the same connection is reused for the same origin (and this includes same effect origins in certain use cases if you have a separate vhost on the same IP address with the same https cert). This does not lead to a head of line blocking as the HTTP/2 protocol has been specifically designed for this scenario and uses multiplexing to intermingle requests.
However this does change the profile of how the requests appear in dev tools depending on the client, the server and the bandwidth. For example let's say you have a request for two resources that each take 5 seconds to download. Under HTTP/1.1 you would see:
Example
Request 1: start 0 seconds, end 5 seconds.
Request 2: start 5 seconds, end 10 seconds.
Total time to download could be calculated as: 5s + 5s = 10s
Overall Page load time would be 10 seconds
Under HTTP/2 you might see this (assuming first request was prioritised to be sent back in full first):
Example 2a
Request 1: start 0 seconds, end 5 seconds.
Request 2: start 0 seconds, end 10 seconds.
Total time **looks** be 5s + 10s = 15s
Overall Page load time would still be 10 seconds
Or alternatively it might look like this if you have enough bandwidth to handle both requests in flight at same time and if server responds with second request one second later than first:
Example 2b
Request 1: start 0 seconds, end 5 seconds.
Request 2: start 0 seconds, end 6 seconds.
Total time **looks** be 5s + 6s = 11s
Overall Page load time would be 6 seconds
The point is both "look" slower under HTTP/2 , if you try to sum the parts, even though the total time is the same under for Example 2a, and in fact 4 seconds faster for Example 2b. You cannot compare individual requests on a like for like bases in Developer tools between HTTP/1.1 and HTTP/2.
It's the same way as comparing multiple HTTP/1.1 requests (browsers typically open 4-8 connections per host rather than just one), except that there is no overhead to opening and managing multiple connections under HTTP/2 since it's baked into the protocol. And there is no 4-8 limit under HTTP/2, through browsers and servers will often implement one Apache defaults to 100 for example).
Saying all that I still think there are a lot of optimisations still to be done on both client and server to get the most out of HTTP/2. The internet has also optimised heavily for HTTP/1.1 and how it worked so some of those things may need to be undone, or at least tweaked, to make most of HTTP/2. For example a page load typically loads HTML, then CSS, then images which naturally leads to a priority. Under HTTP/2 you can request all assets at the same time but really should prioritise the CSS over the images for example. Most browsers do this, but do they do it in the most optimal way?

Chromium: is communicating with the page faster than communicating with a worker?

Suppose I've got the following parts in my system: Storage (S) and a number of Clients (C). The clients are separate Web Workers and I'm actually trying to emulate something like shared memory for them.
Right now I've got just one Client and it's communicating with the Storage pretty intensively. For the sake of testing it is spinning in a for-loop, requesting some information from the Storage and processing it (processing is really cheap).
It turns out that this is slow. I've checked the process list and noticed chrome --type=renderer eating lots of CPU, so I thought that it might be redrawing the page or doing some kind of DOM processing after each message, since the Storage is running in the page context. Ok, I've decided to try to move the Storage to a separate Worker so that the page is totally idle now and… ended up getting even worse performance—exactly twice slower (I've tried a Shared Worker and a Dedicated Worker with explicit MessageChannels with the same results).
So, here is my question: why sending a message from a Worker to another Worker is exactly twice slower than sending a message from a Worker to the page? Are they sending messages through the page? Is it “by design” or a bug? I was going to check the source code, but I'm afraid it's a bit too complex and, probably, someone is already familiar with this part of Chromium internals…
P.S. I'm testing in Chrome 27.0.1453.93 on Linux and Chrome 28.0.1500.20 on Windows.

Server Scalability - HTML 5 websockets vs Comet

Many Comet implementations like Caplin provide server scalable solutions.
Following is one of the statistics from Caplin site:
A single instance of Caplin liberator can support up to 100,000 clients each receiving 1 message per second with an average latency of less than 7ms.
How does this to compare to HTML5 websockets on any webserver? Can anyone point me to any HTML 5 websockets statistics?

Disclosure - I work for Caplin.
There is a bit of misinformation on this page so I'd like to try and make it clearer..
I think we could split up the methods we are talking about into three camps..
Comet HTTP polling - including long polling
Comet HTTP streaming - server to client messages use a single persistent socket with no HTTP header overhead after initial setup
Comet WebSocket - single bidirectional socket
I see them all as Comet, since Comet is just a paradigm, but since WebSocket came along some people want to treat it like it is different or replaces Comet - but it is just another technique - and unless you are happy only supporting the latest browsers then you can't just rely on WebSocket.
As far as performance is concerned, most benchmarks concentrate on server to client messages - numbers of users, numbers of messages per second, and the latency of those messages. For this scenario there is no fundamental difference between HTTP Streaming and WebSocket - both are writing messages down an open socket with little or no header or overhead.
Long polling can give good latency if the frequency of messages is low. However, if you have two messages (server to client) in quick succession then the second one will not arrive at the client until a new request is made after the first message is received.
I think someone touched on HTTP KeepAlive. This can obviously improve Long polling - you still have the overhead of the roundtrip and headers, but not always the socket creation.
Where WebSocket should improve upon HTTP Streaming in scenarios where there are more client to server messages. Relating these scenarios to the real world creates slightly more arbitrary setups, compared to the simple to understand 'send lots of messages to lots of clients' which everyone can understand. For example, in a trading application, creating a scenario where you include users executing trades (ie client to server messages) is easy, but the results a bit less meaningful than the basic server to client scenarios. Traders are not trying to do 100 trades/sec - so you end up with results like '10000 users receiving 100 messages/sec while also sending a client message once every 5 minutes'. The more interesting part for the client to server message is the latency, since the number of messages required is usually insignificant compared to the server to client messages.
Another point someone made above, about 64k clients, You do not need to do anything clever to support more than 64k sockets on a server - other than configuring the number file descriptors etc. If you were trying to do 64k connection from a single client machine, that is totally different as they need a port number for each one - on the server end it is fine though, that is the listen end, and you can go above 64k sockets fine.

In theory, WebSockets can scale much better than HTTP but there are some caveats and some ways to address those caveats too.
The complexity of the handshake header processing of HTTP vs WebSockets is about the same. The HTTP (and initial WebSocket) handshake can easily be over 1K of data (due to cookies, etc). The important difference is that the HTTP handshake happens again every message. Once a WebSocket connection is established, the overhead per message is only 2-14 bytes.
The excellent Jetty benchmark links posted in #David Titarenco's answer (1, 2) show that WebSockets can easily achieve more than an order of magnitude better latency when compared to Comet.
See this answer for more information on scaling of WebSockets vs HTTP.
Caveats:
WebSocket connections are long-lived unlike HTTP connections which are short-lived. This significantly reduces the overhead (no socket creation and management for every request/response), but it does mean that to scale a server above 64k separate simultaneous client hosts you will need to use tricks like multiple IP addresses on the same server.
Due to security concerns with web intermediaries, browser to server WebSocket messages have all payload data XOR masked. This adds some CPU utilization to the server to decode the messages. However, XOR is one of the most efficient operations in most CPU architectures and there is often hardware assist available. Server to browser messages are not masked and since many uses of WebSockets don't require large amounts of data sent from browser to server, this isn't a big issue.

It's hard to know how that compares to anything because we don't know how big the (average) payload size is. Under the hood (as in how the server is implemented), HTTP streaming and websockets are virtually identical - apart from the initial handshake which is more complicated when done with HTTP obviously.
If you wrote your own websocket server in C (ala Caplin), you could probably reach those numbers without too much difficulty. Most websocket implementations are done through existing server packages (like Jetty) so the comparison wouldn't really be fair.
Some benchmarks:
http://webtide.intalio.com/2011/09/cometd-2-4-0-websocket-benchmarks/
http://webtide.intalio.com/2011/08/prelim-cometd-websocket-benchmarks/
However, if you look at C event lib benchmarks, like libev and libevent, the numbers look significantly sexier:
http://libev.schmorp.de/bench.html

Ignoring any form of polling, which as explained elsewhere, can introduce latency when the update rate is high, the three most common techniques for JavaScript streaming are:
WebSocket
Comet XHR/XDR streaming
Comet Forever IFrame
WebSocket is by far the cleanest solution, but there are still issues in terms of browser and network infrastructure not supporting it. The sooner it can be relied upon the better.
XHR/XDR & Forever IFrame are both fine for pushing data to clients from the server, but require various hacks to be made to work consistently across all browsers. In my experience these Comet approaches are always slightly slower than WebSockets not least because there is a lot more client side JavaScript code required to make it work - from the server's perspective, however, sending data over the wire happens at the same speed.
Here are some more WebSocket benchmark graphs, this time for our product my-Channels Nirvana.
Skip past the multicast and binary data graphs down to the last graph on the page (JavaScript High Update Rate)
In summary - The results show Nirvana WebSocket delivering 50 events/sec to 2,500k users with 800 microsecond latency. At 5,000 users (total of 250k events/sec streamed) the latency is 2 milliseconds.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008