How to force re-validation of cached resource? - html

I have a page /data.txt, which is cached in the client's browser. Based on data which might be known only to the server, I now know that this page is out of date and should be refreshed. However, since it is cached, they will not re-request it for a long time (until the cache expires).
The client is now requesting a different page /foo.html. How can I make the client's browser re-request /data.txt and update its cache?
This should be done using HTTP or HTML (not all clients have JS).
(I specifically want to avoid the "cache-busting" pattern of appending version numbers to the /data.txt URL, like /data.txt?v=2. This fills the cache with useless entries rather than replacing expired ones.)
Edit for clarity: I specifically want to cache /data.txt for a long time, so telling the client not to cache it is unfortunately not what I'm looking for (for this question). I want /data.txt to be cached forever until the server chooses to invalidate it. But since the user never re-requests /data.txt, I need to invalidate it as a side effect of another request (for /foo.html).

To expand my comment:
You can use IF-Modified-Since and Etag, and to invalidate the resource that has been already downloaded you may take a look at the different approaches suggested in Clear the cache in JavaScript and fetch(), how do you make a non-cached request?, most of the suggestions there mentioned fetching the resource from JavaScript with no-cache header fetch(url, {cache: "no-store"}).
Or, if you can try sending a Clear-Site-Data header if your clients' browsers are supported.
Or maybe, give up this time only for the cache-busting solution. And if it's possible for you, rename the file to something else rather than adding a querystring as suggested in Revving Filenames: don’t use querystring.
Update after clarification:
If you are not maintaining a legacy implementation with users that already have /data.txt cached, the use of Etag And IF-Modified-Since headers should help.
And for the users with the cached versions, you may redirect to: /newFile.txt or /data.txt?v=1 from /foo.html. The new requests will have the newly added headers.

The first step is to fix your cache headers on the data.txt resource so it uses your desired cache policy (perhaps Cache-Control: no-cache in conjunction with an ETag for conditional validation). Otherwise you're just going to have this problem over and over again.
The next step is to get clients who have it in their cache already to re-request it. In general there's no automatic way to achieve this, but if you know they're accessing foo.html then it should be possible. On that page you can make an AJAX request to data.txt with the Cache-Control: no-cache request header. That should force the browser to bypass the cache and get a fresh version, and the cache should then be repopulated with the new version.
(At least, that's how it's supposed to work. I've never tried this, and I've seen reports here that browsers don't handle Cache-Control request headers properly.)

Related

Cache JSON for 1 second with Google CDN to reduce server load

I'm looking for a solution to reduce server load on my API. I thought it's a good idea to cache the content for 1 second, so the server will get only 1 requests each second instead of many. However, I wasn't able to make it work:
In Use origin settings based on Cache-Control headers mode: the Google CDN ignores the 'Cache-Control', 'public, max_age=1' header. It looks like I can't cache JSON in this mode, probably because of the missing Content-Range or Transfer-Encoding: chunked header?
In Force cache all content mode: it actually works, but the minimum TTL is 1 minute, which is just too much for me. I'd like to update the content every second.
I'm quite surprised no one uses short-lived caches for API-s, since it can theoretically reduce load and maybe save costs when using with external APIs (eg. you don't need to pay twice if you request the same thing from an expensive API endpoint). Any ideas?
In order to cache an object, even if you are using the honor origin cache mode, you need to have a valid content-range header. Using the force cache all mode will decorate the object with the cache control directive Cloud CDN needs to cache the object, even if the origin doesn't have the directive- FORCE_CACHE_ALL basically takes whatever the origin pushes out and overwrites the cache control directive.
If you are using the UX, you are correct, the console doesn't allow you to set a lower TTL value; however, you can set a lower TTL via gCloud commands
gcloud compute backend-services update BACKEND_SERVICE_NAME --client-ttl=1 --default-ttl=1 --global
When using the "Use origin settings based on Cache-Control headers" cache mode, you can instruct Google Cloud CDN to cache a response for 1 second by including a Cache-Control: public, max-age=1 response header.
The problem with your earlier attempt appears to be a typo: the correct directive is max-age (with a hyphen), not max_age (with an underscore).

<link rel=preload> with additonal HTTP headers

I added a preload for my site like this <link rel=preload href=http://api.github.com/... as=fetch crossorigin=anonymous> which is fetch()ed later. It worked very well (the preload request sent to the remote server at the very beginning of loading, and answer came back, everything was fine).
Later I added an Authorization: Bearer ... to the fetch() call (because of other reasons), which is caused that the preload's HTTP headers do not match the later fetch's HTTP headers, so the entire preload result is not re-used anymore (Both Chrome and Firefox are correctly notify me about this).
I also tried to add the preload with Link HTTP header to the main page's response, but that did not help as well.
So the current sitation is this: I can't add the same Authorization header to preload request because it is simply not possible, so the two request are never will be the same, so the preload is useless anymore.
Please correct me and advise:
Is there a way to add that Authorization: Bearer ... to the preload request to?
OR is there a way to ask the browser to ignore that difference between to two request's headers?
OR any other idea?
Web standards do not allow for such a possibility.
The Fetch standard defines the conditions under which prefetched resources may be used as follows:
To fetch, given a request request, […] run the steps below. […]
If all of the following conditions are true:
[…]
request’s unsafe-request flag is not set or request’s header list is empty
then:
Let foundPreloadedResource be the result of invoking consume a preloaded resource for [request]
Having an Authorization header in the fetch request disqualifies it from reusing preloaded resources. Unless you happen to know of a non-standard extension that allows you to bypass this, this means there is no way to prefetch a resource with custom headers.
There are three ways you can resolve this: skip the Authorization header in the request proper, give up on preloading entirely, or reimplement prefetching yourself. That is, inject a script that fetches the resource early during page loading, preferably gated by network.connection && !network.connection.saveData, and stores it in your own cache, then simply look up the data there.
The order I listed those solutions is one of, in my opinion, decreasing appropriateness. Prefetching has been designed mostly for the sake of static resources that present the same to any user that may want to download them; as such, an Authorization header is not supposed to matter, so if you can get away with avoiding it, do. If authorization does matter, then maybe the resource isn’t such a great candidate for prefetching. If you insist though, you can do it manually.

Caching in HTTP requests: ETag vs max-age

I have a SPA which consumes some static assets from the backend server. For reasons, I picked ETag validation as the caching mechanism. In short, I want the browser keep the assets in its cache forever, as long as the related ETags remain unchanged.
To signal the browser about caching, header Cache-Control must be present in the the responses. To me it's absolutely comprehensible, but what makes me confused is that I have to provide max-age in the header as well. In other words Cache-Control=public doesn't work whereas Cache-Control=public, max-age=100 is the correct header.
To me it sounds contradictory. The browser inquiries the server to see if an asset has changed using If-Not-Match={ETag} any time it asks for it. What's the role of max-age here then?
The resource/file cached in browser with ETag will be anyway requested each time. If this is a *.js file that was changed on server then server will send a new version with a new ETag and browser will refresh it's cached version.
But anyway performed a full network round trip of request and response and this is quite expensive.
If you do expect that some file really may change at any time then you have to use ETag.
The Cache-Control is a directive to a browser to not even try to retrieve an updated version for some time specified by the max-age. This is much more performant.
This is useful for static assets that probably wont be changed e.g. jquery-3.1.js
file will be always the same.
Or even if the resource was changed it's not a big deal e.g. style.css.
During development when assets often changed the Cache-Control is usually disabled.
But please be careful with the public modifier: that means that the resource may be cached on a proxy server (like CloudFlare) and shared between different users. If the resource have private info e.g. messages then users may see data of each others.

Chrome, Firefox caching 302 redirects

According to the HTTP spec, upon loading a resource that results in a 302 redirect:
...the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field.
However, within a single page load, I'm seeing current Chrome and Firefox both resolving subsequent requests to the initial Request-URI to the resolved value from the first request, even when the redirect specifies no caching.
I've setup a minimal repro case here:
http://chrome-302-broke.herokuapp.com/test.html
It's on a free heroku dyno (in case you reach it while it's offline).
Am I missing something? It seems like caching the redirect from the initial response, even within the same page load, is taking liberty with the description from the spec. A strict interpretation shouldn't cache this request at all.
Especially with a growing number of web applications that don't navigate between pages for a considerable amount of time, this seems like it would cause problems for an increasing number of use cases.
Is this something I should submit as a bug to Chrome/Firefox?

Chrome is not sending if-none-match

I'm trying to do requests to my REST API, I have no problems with Firefox, but in Chrome I can't get the browser to work, always throws 200 OK, because no if-none-match (or similar) header is sent to the server.
With Firefox I get 304 perfectly.
I think I miss something, I tried with Cache-Control: max-age=10 to test but nothing.
One reason Chrome may not send If-None-Match is when the response includes an "HTTP/1.0" instead of an "HTTP/1.1" status line. Some servers, such as Django's development server, send an older header (probably because they do not support keep-alive) and when they do so, ETags don't work in Chrome.
In the "Response Headers" section, click "view source" instead of the parsed version. The first line will probably read something like HTTP/1.1 200 OK — if it says HTTP/1.0 200 OK Chrome seems to ignore any ETag header and won't use it the next load of this resource.
There may be other reasons too (e.g. make sure your ETag header value is sent inside quotes), but in my case I eliminated all other variables and this is the one that mattered.
UPDATE: looking at your screenshots, it seems this is exactly the case (HTTP/1.0 server from Python) for you too!
Assuming you are using Django, put the following hack in your local settings file, otherwise you'll have to add an actual HTTP/1.1 proxy in between you and the ./manage.py runserver daemon. This workaround monkey patches the key WSGI class used internally by Django to make it send a more useful status line:
# HACK: without HTTP/1.1, Chrome ignores certain cache headers during development!
# see https://stackoverflow.com/a/28033770/179583 for a bit more discussion.
from wsgiref import simple_server
simple_server.ServerHandler.http_version = "1.1"
Also check that caching is not disabled in the browser, as is often done when developing a web site so you always see the latest content.
I had a similar problem in Chrome, I was using http://localhost:9000 for development (which didn't use If-None-Match).
By switching to http://127.0.0.1:9000 Chrome1 automatically started sending the If-None-Match header in requests again.
Additionally - ensure Devtools > Network > Disable Cache [ ] is unchecked.
1 I can't find anywhere this is documented - I'm assuming Chrome was responsible for this logic.
Chrome is not sending the appropriate headers (If-Modified-Since and If-None-Match) because the cache control is not set, forcing the default (which is what you're experiencing). Read more about the cache options here: https://developer.mozilla.org/en-US/docs/Web/API/Request/cache.
You can get the wished behaviour on the server by setting the Cache-Control: no-cache header; or on the browser/client through the Request.cache = 'no-cache' option.
Chrome was not sending 'If-None-Match' header for me either. I didn't have any cache-control headers. I closed the browser, opened it again and it started sending 'If-None-Match' header as expected. So restarting your browser is one more option to check if you have this kind of problem.