HTML - Cache control max age - html

I'ld like to present always the latest website content to the user but also have it fast loaded. By researching I came across postings people suggesting to use the cache for speeding up loading.
So what do I need to add to my website to "overwrite" the cache after 3 days to display the latest content?

The Cache-Control header is used in HTTP 1.1 to control the behavior of caches. The max-age directive is used to specify (in seconds) the maximum age of the content before it becomes stale (i.e., the content will not change for some period of time). So if you know that your content will not change for 3 days, you want your server to add the following HTTP header:
Cache-Control: max-age=259200
(259200 = 60s x 60m x 24h x 3d)
To do that in PHP, add this line to your output:
header('Cache-Control: max-age=259200');
Read here for more info on the header function:
http://php.net/manual/en/function.header.php

There is more than one way to do this - but you need to consider exactly what you need to cache and what you don't. The biggest speed increases will likely come from making sure your assets (css, images, javascript) are cached, rather than the html itself. You then need to look at various factors (how often do these assets change, how will you force a user to download a new version of the file of you do change it?).
Often as part of a sites release process, new files (updated files) are given a new filename to force the users browser to redownload the file, but this is only one approach.
You should take a look at apache mod_expire, and the ability to set expiry times for assets using the .htaccess file.
http://www.google.com/?q=apache+cache+control+htaccess#q=apache+cache+control+htaccess

As mentioned Expires and Cache-Control Headers are usually the best way to incorporate information about information lifetime.
Because clients are not very reliable on interpreting these informations proxies with caching capabilities like squid, varnish or such solutions are preferred by most people. You also need to consider if you want to cache only static content (like images, stylesheets, ..) or dynamically generated content as well.

As per the YSlow recommendations you could configure your web server to add an Expires or a Cache-Control HTTP header to the response which will result in user agents caching the response for the specified duration.

Related

Do browsers not follow the HTTP spec's Cache-Control correctly?

I am somewhat new to web development and have noticed an issue, Browsers seem to not respect the Cache-Control header, I have it set to no-cache, no-store, must-revalidate but yet many of my clients have a cache to begin with (which no-store should prevent according to https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#no-store) and the cache is used rather than revalidating with the server leading to broken pages when I change a JS script referenced in a page, only after I tell them to refresh without cache does the browser then fetch the new file but for the browsers to be compliant with the HTTP protocol and spec, don't they need to respect the no-store policy or are none of the major browsers properly compliant with the HTTP protocol/spec and why haven't they been fixed so we don't need workaround solutions like query strings appended to files or using the file's hash or last modification date?
You initially served the resource without cache headers. In that case, the specification allows the client to choose the cache time itself:
Since origin servers do not always provide explicit expiration times,
a cache MAY assign a heuristic expiration time when an explicit time
is not specified, employing algorithms that use other header field
values (such as the Last-Modified time) to estimate a plausible
expiration time.
Different browsers will use different algorithms, but in any case it probably won't be very long. Your problem might have already resolved itself.
As for query strings, I think your confusion comes from conflating at least three distinct issues. One is the HTTP protocol mechanism for communicating cache policies. That is covered in RFC 7234 and mainly involves the proper use of the Cache-Control response header.
A separate issue is what cache strategy to use. That is, which resources should be cached and for how long? There are different ways to approach this, my suggestion would be to follow the best practices discussed here.
Finally, there's how to fix your mistake if you communicated the wrong cache policy and now need an already-cached resource to be ignored or invalidated. In that case, if possible, you could just use a different resource (i.e. change the name). Adding query strings is sometimes suggested here, but it's not a great solution since the standard does not forbid clients from caching resources with query strings.
Getting back to your question, you can temporarily fix your mistake (missing Cache-Control headers) by changing the name of the linked resource, or just by waiting a short time for the heuristic expiration time to pass. Longer term, you should decide how you want your different resources to be cached, and then use Cache-Control to communicate that intent to the browser.

New content not visible because browser cache

I have updated my website with some new content.
I talked to some people to view the content on their computers,
but it seems like they cant's see the content unless they delete their browser cache. Is there a way to handle this by my side, so all new things show up automatically on every browser?
There is no way to accomplish this without cleaning the cache.
Although, you can delete cache easily by pressing CTRL+SHIFT+R (atleast on firefox).
You cannot remote-wipe someone's cache. This time your only options are to wait, or tell your users to clear their cache, or instruct them to vigorously press refresh a few times, which will cause most browsers to refresh the page.
For future reference, there are two types of caching: expiration based caches and ETag caches.
If you set an explicit expiration date on your HTTP response, the client will not check back at all until that expiration date has passed. This greatly reduces network traffic, at the tradeoff of possibly having outdated content out there. Choose your expiration dates wisely for the best tradeoff.
The alternative is ETags, in which case the server sends an ETag token, and the client will inquire with "send me new content unless this token is still valid". This only reduces network traffic somewhat, but you're guaranteed to always have the latest content out there.
You need to balance your caching strategy in practice. First decide if you need caching at all, then decide how much you need and what tradeoff you're willing to make. For a high-traffic site even a cache of a few minutes can be worthwhile, while the issue of outdated content will be minuscule in this scenario.
You cannot erase a remove user's browser cache from Server/Client-side code.
Going forward, best you could do is tell the browsers not to cache at all in future, or to cache for a specified time (less than next expected update)
Cache-Control : no-cache
Cache-Control : max-age=315600
ETag sounds like it could serve the purpose (though I've never used).
The server generates and returns an arbitrary token which is typically
a hash or some other fingerprint of the contents of the file. The
client does not need to know how the fingerprint is generated, it only
needs to send it to the server on the next request: if the fingerprint
is still the same then the resource has not changed and we can skip
the download.

Do I have to set cache expiration every time on HTML?

I want to set cache expiration for my html page after may be 10days
<META HTTP-EQUIV="expires" CONTENT="Thu, 12 Apr 2012 08:21:57 GMT">
So my question is
What happens after 10days? yes the cache will get expire but do I have to set the expiration date again ?
I there a way to set the day length in number for eg: 10
Am confused here please give me some reference.
Please help...
The tag has limited effect. In particular, it does not affect proxies, since they work on HTTP headers and do not parse HTML documents.
After the expiry time, browsers are expected to treat the copy of the page in their caches as stale and not use it but request for the page from the server (if online), at least conditionally (send if modified since such-and-such). This means that after any new request for the page, the copy received should not be cached at all. So yes, you should set a new expiry date, unless you really want to prevent caching.
The Expires header or its meta simulation needs to have a specific time mentioned. There are other ways to affect caches, see http://www.mnot.net/cache_docs/
You need to use some kind of server-side scripting language (like PHP or ASP or JSP) to set that date dynamically. This is only a 'hint' and browsers may or may not even listen to it.
That is a hint telling browsers that they should keep the HTML in cache until the specified date. That means that, if the browser complies, then whenever it sees the same URL, it will not make a request to retrieve it, but rather it would take the HTML from its cache and show that instead.
Therefore you can safely generate a new time for each request, since the browser that's caching the page won't make the request anyway, and the browsers making the new requests will get an updated hint.
Note though that no one's forcing the browsers to comply, they may simply ignore the hint and make the request anyway.

How to trigger browser html refresh for cached html files?

YSLOW suggests: For static components: implement "Never expire" policy by setting far future Expires header.... if you use a far future Expires header you have to change the component's filename whenever the component changes. At Yahoo! we often make this step part of the build process: a version number is embedded in the component's filename, for example, yahoo_2.0.6.js.
http://developer.yahoo.com/performance/rules.html
I'd like to take advantage of caching for my mostly static pages and reload the js files when the version # changes. I've set a version # for my .js files but my main.html page has Expires set to the future so it doesn't reload and therefore doesn't reload the js files. Ideally I'd like to tell the browser (using a psychic technique) to reload main.html when a new version of the site is released. I could make my main.html page always reload but then I loose the caching benefit. I'm not looking for the ctrl-F5 answer as this needs to happen automatically for our users.
I think the answer is: main.html can't be cached, but I'd like to hear what are others doing to solve this problem. How are you getting the best caching vs. reload benefits.
Thanks.
Your analysis is correct. Web performance best practices suggest a far future expiration date for static components (i.e., those which don't change often), and using a version number in the URL manages those changes nicely.
For the main page (main.html), you would not set a far future expiration date. Instead, you could not set an expiration, or set it for a minimal amount of time, for example +24 hours.
Guess it depends on why you want to cache the HTML page - to improve user load-times or reduce server load.
Even with a long expiry time you might find that it's not actually cached at the client for very long (Yahoo studies show that files don't live in the cache for very long), so a shorter expiry time e.g. 1 day, might not be an issue.
If it's to reduce backend load, it might be worth looking at whether a proxy like Varnish would help i.e. it caches the pages from the origin server at serves them when requested. This way you could control how long pages are cached with a finer level of control.

Website files caching?

I want to know how long certain files like css, html and js are desirable to be cached by .htaccess setting and why different time setting for each file type?
In few examples i saw that someone cache html for 10 mins, js for a month and imagery for a year.
I think it depends how often a resource is updated. You HTML content is probably dynamic, so you can't cache it for a long time. Otherwise a visitor sees the changes after a long delay.
On the other side, pictures are rarely updated, so you can set longer cache time.
The JavaScript files are often updated for new features or bugfixes. Maybe you can use a version number for this files (core.js?v=12323) so that you can change the number in your HTML content to get them refreshed by a visitor. This way you can cache them for a longer time as well.