Google CDN behavior when serving concurrent (chunked) requests - google-cloud-cdn

I'm trying to understand the Google CDN behavior in the following scenario:
Let's assume I have a backend service serving chunked http data. For the sake of the explanation, let's assume that serving a single request takes up to 10s
Let's imagine the case where a file is requested through the CDN by a client A, and that this file is not currently cached in the CDN. The request will go to the backend service, that starts serving the file. Client A will immediately start receiving HTTP chunks
After 5s, another client B requests the same file. I can envision 3 possible behaviors, but I can't figure out how to control this through the CDN configuration:
Option a: the CDN simply pass the request to the backend service, ignoring that half of the file has already been served and could already be cached. Not desirable as the backend service will be reached twice and serve the same data twice.
Option b: the CDN puts the second request on "hold", waiting for the first one to be terminated before serving the client B from its cache (in that case, the request B does not reach the backend service). Ok, but still not amazing as client B will wait 5s before getting any http data.
Option c: the CDN immediately serves the first half of the http chunks and then serves the remaining http chunks at the same pace than request A. Ideal!
Any ideas on the current behavior ? And what could we do to get the option C, which is by far our preferred option ?
Tnx, have a great day!
Jeannot

It is important to note that GFE historically cached only complete responses and stored each response as a single unit. As a result, the current behavior will follow option A. You can take a look at this help center article for more details.
However, with the introduction of Chunk caching, which is currently in Beta, large response bodies are treated as a sequence of chunks that can each be cached independently. Response bodies less than or equal to 1 MB in size can be cached as a unit, without using chunk caching. Response bodies larger than 1 MB are never cached as a unit. Such resources will either be cached using chunk caching or not cached at all.
Only resources for which byte range serving is supported are eligible for chunk caching. GFE caches only chunk data received in response to byte range requests it initiated, and GFE initiates byte range requests only after receiving a response that indicates the origin server supports byte range serving for that resource.
To be more clear, once Chunk caching is in GA, you would be able to achieve your preferred option C.

Regarding your recent query, unfortunately, only resources for which byte range serving is supported are eligible for chunk caching at the moment. You can definitely create a feature request for your use case at Google Issue Trackers.
The good news is that chunk caching with Cloud CDN is now in GA and you can check the functionality anytime you wish.

Related

Caching in HTTP requests: ETag vs max-age

I have a SPA which consumes some static assets from the backend server. For reasons, I picked ETag validation as the caching mechanism. In short, I want the browser keep the assets in its cache forever, as long as the related ETags remain unchanged.
To signal the browser about caching, header Cache-Control must be present in the the responses. To me it's absolutely comprehensible, but what makes me confused is that I have to provide max-age in the header as well. In other words Cache-Control=public doesn't work whereas Cache-Control=public, max-age=100 is the correct header.
To me it sounds contradictory. The browser inquiries the server to see if an asset has changed using If-Not-Match={ETag} any time it asks for it. What's the role of max-age here then?
The resource/file cached in browser with ETag will be anyway requested each time. If this is a *.js file that was changed on server then server will send a new version with a new ETag and browser will refresh it's cached version.
But anyway performed a full network round trip of request and response and this is quite expensive.
If you do expect that some file really may change at any time then you have to use ETag.
The Cache-Control is a directive to a browser to not even try to retrieve an updated version for some time specified by the max-age. This is much more performant.
This is useful for static assets that probably wont be changed e.g. jquery-3.1.js
file will be always the same.
Or even if the resource was changed it's not a big deal e.g. style.css.
During development when assets often changed the Cache-Control is usually disabled.
But please be careful with the public modifier: that means that the resource may be cached on a proxy server (like CloudFlare) and shared between different users. If the resource have private info e.g. messages then users may see data of each others.

Understanding results from Chrome request timing

I am doing some profiling of my web app and I am puzzled about the results I am getting from the Chrome built-in timing of requests. To put things simple, lets say I request a web page composed of:
The HTML.
The static files (js, css, etc.).
Data loaded using AJAX.
According to the timing provided by Chrome the main HTML loading time is like this:
So far, so good. If I understand it, this means my server takes around 7-8 secs to process the request and send the response. In fact, according to this info, the first byte arrives at 7.79 secs, so the browser cannot start processing the HTML and loading other resources until this time.
However, if I look at the timing for static resources and AJAX requests, I found something puzzling. For instance, this is the timing for the request for bootstrap.min.css:
Notice the request was queued at 21.6 ms, much earlier than the arrival of the first byte from the HTML. By the way, the file was retrieved from cache (from disk cache, according to Chrome), but how did the browser know the HTML were going to request that file before receiving the response?
In the case of the AJAX request we have something similar:
but this time is even more strange, because to perform this request some javascript from the html should be processed, but the first byte was not received when the AJAX request was queued!.
How can this be possible? How am I suppossed to interpret these results regarding the performance of the website (i. e. did the HTML really take 7 secs to load the very first byte)?

technical inquiry - HTML transfer of image from server to browser

When an image is uploaded from the client's machine to the client (browser), it requires FileReader() API in html, thereafter a base64 encoded url (say) of the image is sent in chunks to the server, where it needs to be re-assembled. All of this is taken care by the developer.
However, when an image is sent from the server to the client, only the directory path of the image inside the server machine suffices, no chunking and encoding is required.
My questions are:
1. Does the server send the image in chunks to the html file. If it does not, how does sending full images not bottle server's network? What would happen in case of large video files?
2. In what form of binary data is the image sent to the client - base64url / ArrayBuffer / binarystring / text / etc.
3. If the server does send the image in chunks, who is doing the chunking and the re-assembly on the client thereafter?
Thanks.
HTML isn't really important here. What you care about are the transport protocols used - HTTP and TCP, most likely.
HTTP isn't chunked by default, though there are advanced headers that do allow that - those are mostly used to support seeking in large files (e.g. PDF, video). Technically, this isn't really chunking - it's just the infrastructure for allowing for partial downloads (i.e. "Give me data from byte 1024 to byte 2048.").
TCP is a stream-based protocol. From programmer point of view, that's all there is to it - no chunking. Technically, though, it will process your input data and send it as distinct packets that are reassembled in-order on the other side. This is a matter of practicality - it allows you to manage data latency, streaming, packet retransmission etc. TCP handles the specifics during connection negotiation - flow control, window sizing, congestion control...
That's not the end of it, though. All the other layers add their own bits - their own ways to package the payload and split it as necessary, their own ways to handle routing and filtering, their own ways to handle congestion...
Finally, just like HTTP natively supports downloads, it supports uploading data as well. Just send an HTTP request (usually POST or PUT) and write data in a format the server understands - anything from text through base-64 to raw binary data. The limiting factor in your case isn't the server, the browser or HTTP - it's JavaScript. The basic mechanism is still the same - a request followed by a response.
Now, to address your questions:
Server doesn't send images to the HTML file. HTML only contains an URL of the image[1], and when the browser sees an URL in the img tag, it will initiate a new, separate request just for the image data. It isn't fundamentally different from downloading a file from a link. As for the transfer itself, it follows pretty much exactly the same way as the original HTML document - HTTP headers, then the payload.
Usually, raw binary data. HTTP is a text-based protocol, but it's payload can be arbitrary. There's little reason to use Base-64 to transfer image data (though in the past, there have been HTTP and FTP servers that didn't support binary at all, so you had to use something like Base-64).
The HTTP server doesn't care (with the exception of "partial downloads" mentioned above). The underlying network protocols handle this.
[1] Nowadays, there's methods to embed images directly in the HTML text, but it's of varying practicality depending on the image size, caching requirements etc.

Understanding Firebug's Net panel

I am trying to get a hang on analysing the performance of a web page using Firebug's Net panel.
The following screenshot shows an example of a google query. For the sake of this discussion I clicked twice, so some requests are cached.
So here are my questions:
1) What is happening between the end of the first request and the beginning of the next request (which is the third one). In the same context: Why is the third request starting earlier than the second request?
2) The next 6 requests are coming from the cache. The purple bar is indicating waiting time and I assumed this is the time the browser "waiting for the server to to something". So as comes from cache, what exactly is the browser waiting for. Also: What could be the reason, that the waiting time for for a 4,4KB response is longer (63ms) than for a 126,3 KB response (50ms).
3) In the next request there is a fairly long green bar indicating the time of receiving the response. How comes that this doesn't seem to be at least fairly proportional to the size of the response?
4) The red vertical line indicates the load event. According to https://developer.mozilla.org/en-US/docs/Web/Events/load this means: "The load event is fired when a resource and its dependent resources have finished loading." In the timeline you can see that there are still a couple of requests performed after the load event. How comes? Are they considered to be not dependent and if so why?
The response of the first request needs to be parsed to find out what else needs to be loaded. This parsing time causes the gap to the second request. See also my answer to a related question.
Responses coming from cache still have an associated network request, which returns the 304 HTTP status code. You can see the request and response headers as well as the response headers of the cached response when you expand the request.
In contrast to that there is also a response that is directly served from a special cache called Back-Forward Cache (or BFCache). These responses happen directly after the browser start when you have the option enabled to restore your tabs from the last session and also when you navigate back and forth in the tab history.
This depends on the network connection speed and the response's size in the first place but also on how long the server takes to send the full response. Why that one request takes that long in comparison to the others can't be explained without knowing what happens on the server side.
The load event is fired when the page request is loaded including all its depending resources like CSS, images, JavaScript sources, etc. Requests initiated after the load event are loaded asyncronously, e.g. through an XMLHttpRequest or the defer attribute of the element.

Where to store and cache JSON?

Have been considering caching my JSON on Amazon Cloudfront.
The issue with that is it can take 15 minutes to manually clear that cache when the JSON is updated.
Is there a way to store a simple JSON value in a CDN-like http cache that -
does not touch an application server (heroku) after intial generation
allows me to instantly expire a cache
Update
In response to AdamKG's point:
If it's being "updated", it's not static :D Write a new version and
tell your servers to use the new URL.
My actual idea is to cache a new CloudFront url every time a html page changes. That was my original focus.
The reason I want to JSON is to store the version number for that latest CloudFront url. That way I can make an AJAX call to discover what version to load, then a second AJAX call to actually load the content. This way I never need to expire CloudFront content, I just redirect the ajax loading it.
But then I have the issue of the JSON needing to be cached. I don't want people hitting the Heroku dynamos every time they want to see the single JSON version number. I know memcache and rack can help me speed that up, but it's a problem I just dont want to have.
Some ideas I've had:
Maybe there is a third party service, similar to a Memcache db, that allows me to be expose a value in a JSON url? That way my dynamos are never touched.
Maybe there is an alternative to Cloudfront that allows for quicker manual expiration? I know that kinda defeats the nature of caching, but maybe there more intermediary services, like a varnish layer or something.
One method is to use asset expiration similar to the way that Rails static assets are expired. Rails adds a hash signature to filenames, so something like application.js becomes application-abcdef1234567890.js. Then, each time a user requests your page, if application.js has been updated, the script tag has the new address.
Here is how I envision you doing this:
User → CloudFront (CDN) → Your App (Origin)
User requests http://www.example.com/. The page has meta tag
<meta content="1231231230" name="data-timestamp" />
based on the last time you updated the JSON resource. This could be generated from something like <%= Widget.order(updated_at: :desc).pluck(:updated_at).first.to_i %> if you are using Rails.
Then, in your application's JavaScript, grab the timestamp and use it for your JSON url.
var timestamp = $('meta[name=data-timestamp]').attr('content');
$.get('http://cdn.example.com/data-' + timestamp + '.json', function(data, textStatus, jqXHR) {
blah(data);
});
The first request to CloudFront will hit your origin server at /data/data-1231231230.json, which can be generated and cached forever. Each time your JSON should be updated, the user gets a new URL to query the CDN.
Update
Since you mention that the actual page is what you want to cache heavily, you are left with a couple options. If you really want CloudFront in front of your server, your only real option would be to send an invalidation request every time your homepage updates. You can invalidate 1,000 times per month for free, and $5 per 1,000 after that. In addition, CloudFront invalidations are not fast, and you will still have a delay before the page is updated.
The other option is to cache your content in Memcached and serve it from your dynos. I will assume that you are using Ruby on Rails or another Ruby framework based on your asking history (but please clarify if you are not). This entails getting Rack::Cache installed. The instructions on Heroku are for caching assets, but this will work for dynamic content, as well. Next, you would use Rack::Cache's invalidate method each time the page is updated. Yes, your dyno's will handle some of the load, but it will be a simple Memcached lookup and response.
Your server layout would look like:
User → CloudFront (CDN) → Rack::Cache → Your App (Origin) on cdn.example.com
User → Rack::Cache → Your App (Origin) on www.example.com
When you serve static assets like your images, CSS, and JavaScript, use the cdn.example.com domain. This will route requests through CloudFront and they will be cached for long periods of time. Requests to your app will go directly to your Heroku dyno, and the cacheable parts will be stored and retrieved by Rack::Cache.