How to: Custom Prevent Caching? - html

After symcbean's answer I decided to change my question to:
What's the correct way to keep cache of images/css/js only? Html will not be cached in any web browser.

Go read some good books on the topic - or the specs. You're currently very badly informed.
A normal "trick" is to use:
Normal for whom? Setting Pragma: no-cache has nothing to do with what the browser caches. Setting Expires to -1 should prevent the current document from being cached - but its an HTTP/1.0 ONLY attribute - HTTP/1.1 has been in widespread use for the past 8 years.
However this is a very expensive decision. The cost is to retrieve all the images, css and javascript files in every request
No - the example you've given is an HTML tag - which can only occur in an HTML file. By default (i.e. in the absence of any specific caching directions) browsers "MAY" use cached file - in my experience, its only some mobile devices which cache so aggressively - but none of them implement the requirement to warn the user about this (see rfc 2616 13.1.5).
Caching instructions (and indeed all meta-data) should be sent in the HTTP headers - the META tags provide a surrogate mechanism in some cases.
Have a google for Mark Nottingham's caching tutorial - its a good starting point - but only a starting point.

Configure your server to send the Pragma: no-cache and Expires: ... headers with html content. its trivial to do with apache in a .htaccess just add a files section with a pattern that matches any .html file and set the headers there using mod_headers or better yet mod_expires

Related

How to force re-validation of cached resource?

I have a page /data.txt, which is cached in the client's browser. Based on data which might be known only to the server, I now know that this page is out of date and should be refreshed. However, since it is cached, they will not re-request it for a long time (until the cache expires).
The client is now requesting a different page /foo.html. How can I make the client's browser re-request /data.txt and update its cache?
This should be done using HTTP or HTML (not all clients have JS).
(I specifically want to avoid the "cache-busting" pattern of appending version numbers to the /data.txt URL, like /data.txt?v=2. This fills the cache with useless entries rather than replacing expired ones.)
Edit for clarity: I specifically want to cache /data.txt for a long time, so telling the client not to cache it is unfortunately not what I'm looking for (for this question). I want /data.txt to be cached forever until the server chooses to invalidate it. But since the user never re-requests /data.txt, I need to invalidate it as a side effect of another request (for /foo.html).
To expand my comment:
You can use IF-Modified-Since and Etag, and to invalidate the resource that has been already downloaded you may take a look at the different approaches suggested in Clear the cache in JavaScript and fetch(), how do you make a non-cached request?, most of the suggestions there mentioned fetching the resource from JavaScript with no-cache header fetch(url, {cache: "no-store"}).
Or, if you can try sending a Clear-Site-Data header if your clients' browsers are supported.
Or maybe, give up this time only for the cache-busting solution. And if it's possible for you, rename the file to something else rather than adding a querystring as suggested in Revving Filenames: don’t use querystring.
Update after clarification:
If you are not maintaining a legacy implementation with users that already have /data.txt cached, the use of Etag And IF-Modified-Since headers should help.
And for the users with the cached versions, you may redirect to: /newFile.txt or /data.txt?v=1 from /foo.html. The new requests will have the newly added headers.
The first step is to fix your cache headers on the data.txt resource so it uses your desired cache policy (perhaps Cache-Control: no-cache in conjunction with an ETag for conditional validation). Otherwise you're just going to have this problem over and over again.
The next step is to get clients who have it in their cache already to re-request it. In general there's no automatic way to achieve this, but if you know they're accessing foo.html then it should be possible. On that page you can make an AJAX request to data.txt with the Cache-Control: no-cache request header. That should force the browser to bypass the cache and get a fresh version, and the cache should then be repopulated with the new version.
(At least, that's how it's supposed to work. I've never tried this, and I've seen reports here that browsers don't handle Cache-Control request headers properly.)

How to force cloudflare to cache api which it keeps categorizing as CF-Cache-Status: DYNAMIC?

I am trying to cache an api get request that produces json - using cloudflare by page rules : Cache Everything with Edge Cache TTL > 0.
No matter what I try - I am unable to get past: CF-Cache-Status: DYNAMIC!
Referring to the documentation of page rules I even tried a sample xml file - but the same issue persists.
Page rule:
https://.com:8443/cache_*
Cache Level: Cache Everything
Using cf workers too - availed no result. The api is being served by Spring Boot, have tried virtually all combination of headers. What am I doing wrong?
You'll want to read up on how to use Cache-Control header, as this will enable you to control the caching behavior on file types that Cloudflare would not normally cache -
https://developers.cloudflare.com/cache/about/cache-control
There are some cases where Cloudflare will never cache the response (for your safety) however. For example, if the server is returning a Set-Cookie response header.
Oops! I was using an unsupported port. Apparently 8443 in NOT supported for caching.
This was pointed out by M4rt1n at cloudflare forum:
https://community.cloudflare.com/t/unable-to-cache-json-api-with-page-rules-stays-at-dynamic/322632/4
https://support.cloudflare.com/hc/en-us/articles/218411427 > only port 80, 8080 & 443 supported for caching.
Cloudflare would have simply said - “8443 is not supported for caching” when i was trying to add the page rule with cache everything

ETags for server-side rendered pages that contain CSP nonce

I have a server-side-rendered React app and Node/Express so far were able to generate the correct, stable ETags, allowing for taking advantage of client-side caching.
Additionally, generated HTML contains fragments of render-blocking (above-the-fold) CSS and JS inlined as <script> and <style> tags for faster client-side first renders (as promoted by Google and its PageSpeed and Lighthouse tools).
Now I want to enable Content Security Policy (CSP) and I provide a nonce as an attribute to those <script> and <style> tags on every page request, to avoid unsafe-inline violations. However, ever-changing nonce makes ETags to change on every request as well. HTML is never cached and every request hits my Express server.
Is there a way to combine simultaneously:
inlined CSS and JS
CSP features (that is nonce, or similar)
ETags or alternatives
?
So far I see a contradiction between current performance vs security guidelines.
Are there equivalents to CSP nonce or can CSP nonce be provided while keeping HTML intact? Is there a way to otherwise cache pages that contain CSP nonce?
Ideally, I would like a solution to be contained within the Express server, without resorting to tinkering with my reverse proxy config, but any options are welcome.
One solution is to leave the whole content generation and caching to web application (Node in your case) and CSP nonce generation to front-end webserver (e.g. Nginx). I have implemented it with Django which does page caching with ETag, does all the Vary header logic etc and the HTML it produces contains such a static CSP nonce placeholder:
< script nonce="+++CSP_NONDE+++"> ... </script>
This placeholder is then filled in by Nginx using ngx_http_subs_filter_module:
sub_filter_once off;
sub_filter +++CSP_NONCE+++ $ssl_session_id;
add_header Content-Security-Policy "script-src 'nonce-$ssl_session_id'";
I have seen solutions using an additional Nginx module to generate a truly unique random nonce for each request but I believe it's an overkill and I'm just using TLS session identifier, which is unique per each connecting client and may be cached for some time (e.g. 10 minutes) depending on your Nginx configuration.
Just make sure the web application returns uncompressed HTML as Nginx won't be able to do string substitution.

What is the "Upgrade-Insecure-Requests" HTTP header?

I made a POST request to a HTTP (non-HTTPS) site, inspected the request in Chrome's Developer Tools, and found that it added its own header before sending it to the server:
Upgrade-Insecure-Requests: 1
After doing a search on Upgrade-Insecure-Requests, I can only find information about the server sending this header:
Content-Security-Policy: upgrade-insecure-requests
This seems related, but still very different since in my case, the CLIENT is sending the header in the Request, whereas all the information I've found is concerning the SERVER sending the related header in a Response.
So why is Chrome (44.0.2403.130 m) adding Upgrade-Insecure-Requests to my request and what does it do?
Update 2016-08-24:
This header has since been added as a W3C Candidate Recommendation and is now officially recognized.
For those who just came across this question and are confused, the excellent answer by Simon East explains it well.
The Upgrade-Insecure-Requests: 1 header used to be HTTPS: 1 in the previous W3C Working Draft and was renamed quietly by Chrome before the change became officially accepted.
(This question was asked during this transition when there were no official documentation on this header and Chrome was the only browser that sent this header.)
Short answer: it's closely related to the Content-Security-Policy: upgrade-insecure-requests response header, indicating that the browser supports it (and in fact prefers it).
It took me 30mins of Googling, but I finally found it buried in the W3 spec.
The confusion comes because the header in the spec was HTTPS: 1, and this is how Chromium implemented it, but after this broke lots of websites that were poorly coded (particularly WordPress and WooCommerce) the Chromium team apologized:
"I apologize for the breakage; I apparently underestimated the impact based on the feedback during dev and beta."
— Mike West, in Chrome Issue 501842
Their fix was to rename it to Upgrade-Insecure-Requests: 1, and the spec has since been updated to match.
Anyway, here is the explanation from the W3 spec (as it appeared at the time)...
The HTTPS HTTP request header field sends a signal to the server expressing the client’s preference for an encrypted and authenticated response, and that it can successfully handle the upgrade-insecure-requests directive in order to make that preference as seamless as possible to provide.
...
When a server encounters this preference in an HTTP request’s headers, it SHOULD redirect the user to a potentially secure representation of the resource being requested.
When a server encounters this preference in an HTTPS request’s headers, it SHOULD include a Strict-Transport-Security header in the response if the request’s host is HSTS-safe or conditionally HSTS-safe [RFC6797].
This explains the whole thing:
The HTTP Content-Security-Policy (CSP) upgrade-insecure-requests
directive instructs user agents to treat all of a site's insecure URLs
(those served over HTTP) as though they have been replaced with secure
URLs (those served over HTTPS). This directive is intended for web
sites with large numbers of insecure legacy URLs that need to be
rewritten.
The upgrade-insecure-requests directive is evaluated before
block-all-mixed-content and if it is set, the latter is effectively a
no-op. It is recommended to set one directive or the other, but not
both.
The upgrade-insecure-requests directive will not ensure that users
visiting your site via links on third-party sites will be upgraded to
HTTPS for the top-level navigation and thus does not replace the
Strict-Transport-Security (HSTS) header, which should still be set
with an appropriate max-age to ensure that users are not subject to
SSL stripping attacks.
Source: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/upgrade-insecure-requests

"Cache-Control: max-age=0, no-cache" but browser bypasses server query (and hits cache)?

I'm using Chrome 40 (so something nice and modern).
Cache-Control: max-age=0, no-cache is set on all pages - so I expect the browser to only use something from its cache if it has first checked with the server and gotten a 304 Not Modified response.
However on pressing the back button the browser merrily hits its own cache without checking with the server.
If I open the same page, as I reached with the back button, in a new tab then it does check with the server (and gets a 303 See Other response as things have changed).
See the screen captures below showing the output for the two different cases from the Network tab of the Chrome Developer Tools.
I thought I could use max-age=0, no-cache as a lighter weight alternative to no-store where I don't want users seeing stale data via the back button (but where the data is non-valuable and so can be cached).
My understanding of no-cache (see here and here on SO) is that the browser must always revalidate all responses. So why doesn't Chrome do this when using the back button?
Is no-store the only option?
200 response (from cache) on pressing back button:
303 response on requesting the same page in a new tab:
From http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1
no-cache
If the no-cache directive does not specify a field-name, then a cache MUST NOT use the response to satisfy a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent caching even by caches that have been configured to return stale responses to client requests.
If the no-cache directive does specify one or more field-names, then a cache MAY use the response to satisfy a subsequent request, subject to any other restrictions on caching. However, the specified field-name(s) MUST NOT be sent in the response to a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent the re-use of certain header fields in a response, while still allowing caching of the rest of the response.
Other than the name implies, no-cache does not require that the response must not be stored in cache. It only specifies that the cached response must not be reused to serve a subsequent request without re-validating, so it's a shorthand for must-revalidate, max-age=0.
It is up to the browser what to qualify as a subsequent request, and to my understanding using the back-button does not. This behavior varies between different browser engines.
no-store forbids the use of the cached response for all requests, not only for subsequent ones.
Note that even with no-store, the RFC actually permits the client to store the response for use in history buffers. That means client may still use a cached response even when no-store has been specified.
Latter behavior covers cases where the page has been recorded with its original page title in the browser history. Another use case is the behavior of various mobile browsers which will not discard the previous page until the following page has fully loaded as the user might want to abort.
For clarification on the the behavior of the back button: It is not subject to any cache header, according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.13
User agents often have history mechanisms, such as "Back" buttons and history lists, which can be used to redisplay an entity retrieved earlier in a session.
History mechanisms and caches are different. In particular history mechanisms SHOULD NOT try to show a semantically transparent view of the current state of a resource. Rather, a history mechanism is meant to show exactly what the user saw at the time when the resource was retrieved.
By default, an expiration time does not apply to history mechanisms. If the entity is still in storage, a history mechanism SHOULD display it even if the entity has expired, unless the user has specifically configured the agent to refresh expired history documents.
That means that disrespecting any cache control headers when using the back button is the recommended behavior. If your browser happens to honor a backdated expiration date or applies the no-store directive not only to the browser cache but also to the history, it's actually already departing from that recommendation.
For how to solve it:
You can't, and you are not supposed to. If the user is returning to a previously visited page, most browsers will even try to restore the viewport. You may use deferred mechanism like AJAX to refresh content if this was the original behavior before the user left the page, but otherwise you should not even modify the content.
Have you tried using the full old set of no-cache headers?
<meta http-equiv="cache-control" content="max-age=0" />
<meta http-equiv="cache-control" content="no-cache" />
<meta http-equiv="expires" content="0" />
<meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT" />
<meta http-equiv="pragma" content="no-cache" />
This seems to work all the time, unless you are running a "pushState" web.
Looks like this is a known 'quirk' in Chrome with using the back button. There is a good discussion of the issue in the bug report for it here:
https://code.google.com/p/chromium/issues/detail?id=28035
Sadly it looks like most people reverted to using no-store instead.
I expect though that most users are used to an experience of not getting a full page refresh using the back button. If you think about most Angular or Backbone apps that manage the back action themselves so that you just refresh content and not the page. With this in mind I suspect that having the customer refresh or get updates when they come back might not be that unexpected.