What are the complete set of factors that affect image caching in web browsers? How much control does a web developer have over this, and how much is browser settings? Are there different considerations for other types of assets (i.e. scripts, audio)?
Thanks
The complete set of factors:
HTTP headers which affect caching
the user agent's (browser's) built-in caching behavior
may be modified through user settings, depending on UA
including private browsing modes that may use and then clear a separate cache per session
the user's actions, such as manually clearing the cache
Web developers have very little control, but this is fine. Remember that caching is done for the benefit of the end user, usually to reduce page load time, and it's generally infeasible for you to know all the considerations specific to every user.
The bit you can control is expiration time and no-cache behavior. These respectively specify that the user wants to refetch the resource because it is expected to have changed or should not be cached for other reasons.
Browsers may treat images differently than other resources (mainly differing in default expiration time when unspecified), but you can send HTTP headers for any resource.
From the client side, check if the client browser sends If-Modified-Since header to the server. If the client sends the header, IIS will respond 304 Not Modified and hence, the client will use its local cache to display/use the file.
The client settings are responsible for this. IE -> Tools -> Internet Options -> Browsing History -> Settings -> Automatically will ensure that this takes place. Different browsers will have different regions for this setting.
For scripts/audio you can place them in a special folder for content, and simply set content expiration from your server so that the server sends appropriate information to the client to cache the file when it is asked for. This won't be a developer setting though.
The developer setting is typically for the Dynamic files. Based on language [in ASP.NET, OutputCache directive creates different cache headers] this would vary.
Related
In ASP.Net MVC we have page output cache. we do this by adding the attribute
[OutputCache]
Then there is HTTP caching which is used by browser. I hope i'm correct up to this point.
Using HTTP headers we can enable or disable this HTTP caching.
Is there a relation between above two. If i disable either will it impact on other.
Output caching tells the server to hold the rendered result of the page (as a string) in server memory - ready for the next request. This means that (for example) any database or file requests needed to populate data for the page does not need to happen against for further requests whilst the cache remains valid - as well as the (small) overhead of building the view and any components or partials. HTTP caching tells the client and/or downstream proxies that the content will remain valid for the specified period - and that it can be served from a local or proxy cache without having to be re-requested.
It is worth noting that child actions can have OutputCache applied - allowing you to cache portions of the page which do not change between users, whilst still allowing per-user customisation of the page over all. This is sometimes called Donut Hole Caching (where the "hole" doesn't change but the rest of the "donut" around it does).
There is another concept of "donut caching" where the majority of a page is cached by a small portion (the hole) is not - but this is not yet supported out of the box in ASP.NET MVC.
The OutputCacheAttribute does allow you to specify the "location" - client, downstream, server, server and client - which allows a handy method to specify both client and server output caching in one place, but each can be controlled independently.
I am using HTML5 appcache in my webapplicaiton. Basic idea is to serve content from appcache when server is offline and as soon as server is up , application can take content from server.
Is there a way to detect if server if offline and toggle between appcache and server.
Not exactly. You have a couple of options:
So long as you update the cache manifest properly when there's a content change in your site, as users 'go online' this update will be detected and the application will cache the relevant content
If there are areas of your site that simply should not be cached ever, that's what the NETWORK section of the cache manifest is for.
Following on from #2, conceivably you could also then combine online-only elements of your site with cached content and serve different items depending on the online state.
(Note that when it comes to detecting whether a browser is off-line or not, opinions vary across the different browser vendors. It may be wise to use a library for this, e.g. offline.js).
Sadly, no. Per the appcache spec, applicationcache does not work if the server can't be contacted.
Otherwise, if fetching the manifest fails in some other way (e.g. the server returns another 4xx or 5xx response or equivalent, or there is a DNS error, or the connection times out, or the user cancels the download, or the parser for manifests fails when checking the magic signature), or if the server returned a redirect, or if the resource is labeled with a MIME type other than text/cache-manifest, then run the cache failure steps.
Source
This Yahoo Developer Network article says that browsers handle non-cacheable resources that are referenced more than once in a single HTML differently. I didn't find any rule about this in the HTTP/1.1 cache RFC.
I made some experiments in Chrome, but I couldn't figure out the exact rules. It was loading a duplicate non-cacheable scripts tag only once. Then I referenced the same script in 3 iframes. The first one triggered a network request, but the others were served from the cache. I tried to reference the same url as the src of an image, and that triggered a network reques again.
Is there any documentation about this behavior? How does this differ between browsers?
When a client decides to retrieve a resource, it's RFC2616 that governs that rules of whether that resource can be returned from a cache, or needs to be revalidated/reloaded from the origin server (mostly section 14.9 but you really need to read the whole thing) .
However, when you have multiple copies of the same resource on the same page, after the first copy has been retrieved following the rules of RFC2616, the decision as to whether to retrieve additional copies is now covered by the HTML5 spec (mostly specified in the processing model for fetching resources).
In particular, note step 10:
If the resource [...] is already being downloaded for other reasons (e.g. another invocation of this algorithm), and this request would be identical to the previous one (e.g. same Accept and Origin headers), and the user agent is configured such that it is to reuse the data from the existing download instead of initiating a new one, then use the results of the existing download instead of starting a new one.
This clearly describes a number of factors that could come into play in deciding whether a resource may be reused or not. Some key points:
Same Accept and Origin headers: While most browsers use the same Accept headers everywhere, in Internet Explorer they're different for an image vs a script vs HTML. And every browser sends a different Referer when frames are involved, and while Referer isn't directly mentioned, Accept and Origin were only given as examples.
Already being downloaded: Note that that is something quite different from already downloaded. So if the resource occurs multiple times on the page, but the first occurrence is finished downloading before the second occurrence is encountered, then the option to reuse may not be applicable.
The user agent is configured to reuse the data: That implies to me that the decision to reuse or re-retrieve the data is somewhat at the discretion of the user-agent, or at least a user option.
The end result, is that every single browser handles caching slightly differently. And even in a particular browser, the results may differ based on timing.
I created a test case with three nested frames (i.e. a page containing an iframe, which itself contained an iframe) and 6 copies of the same script, 2 on each page (using Cache-Control:no-cache to make them non-cacheable, but also tested with other variations, including max-age=0).
Chrome loaded only 1 copy.
Internet Explorer tended to vary, assumedly based on load, but was between 1 and 3.
Safari loaded 3 copies, one for each frame (i.e. with different Referer headers).
Opera and Firefox loaded all 6 copies.
When I reused the same resource in a couple of images (one on the root page, one in the first iframe) and a couple of other images for reference, the behaviour changed.
Chrome now loaded 5 copies, 1 of each type on each page. While Accept headers in Chrome are the same for images and scripts, the header order is different, which suggests they may be treated differently, and potentially cached differently.
Internet Explorer loaded 2 copies, 1 of each type which was to be expected for them. Assumedly that could have varied though, given their behaviour when it was just scripts.
Safari was still only 3 copies, one per frame.
Opera was inexplicably still 6. Couldn't tell what portion of those were scripts and which were images. But possibly this is also something that could vary based on load or timing.
Firefox loaded 8 copies, which was to be expected for them. The 6 scripts, plus the 2 new images.
Now this was what happened when viewing the page normally - i.e. just entering the page url into the address bar. Forcing a reload with F5 (or whatever the equivalent on Safari) produced a whole different set of results. And in general, the whole concept of reloading, F5 vs Ctrl-F5, what headers get sent by the client, etc. also differs wildly from one browser to the next. But that's a subject for another day.
The bottom line is caching is very unpredictable from one browser to the next, and the specs somewhat leave it up to the implementors to decide what works best for them.
I hope this has answered your question.
Additional Note: I should mention that I didn't go out of my way to test the latest copy of every browser (Safari in particular was an ancient v4, Internet Explorer was v9, but the others were probably fairly up to date). I doubt it makes much difference though. The chances that all browsers have suddenly converged on consistent behaviour in this regard is highly unlikely.
Molnarg, if you read the article properly it will become clear why this happens.
Unnecessary HTTP requests happen in Internet Explorer, but not in
Firefox. In Internet Explorer, if an external script is included twice
and is not cacheable, it generates two HTTP requests during page
loading. Even if the script is cacheable, extra HTTP requests occur
when the user reloads the page.
This behavior is unique to Internet Explorer only. If you ask me why this happens, I would say that IE developers chose to ignore the HTTP/1.1 cache RFC or at least could not implement it. Maybe it is a work in progress. But then again there are lot of aspects wherein IE is different from most of the browsers (JavaScript, HTML5, CSS ). This can't be helped unless devs update it.
The Yahoo Dev Article you gave lists best practices for high performance. This must accommodate for all the IE users, which are impaired by this. Which is why including same script multiple times, though OK for other browsers hurts IE users and should be avoided.
Update
Non-cacheable resources will generate network request, once or multiple times alike.
From 2. Overview of Cache Operation from the HTTP/1.1 cache RFC
Although caching is an entirely OPTIONAL feature of HTTP, we assume
that reusing the cached response is desirable and that such reuse
is the default behavior when no requirement or locally-desired
configuration prevents it.
So using cache means attempting to reuse and non-cacheable means opposite. Think of it like this non-cacheable request is like HTTP request with cache turned off (fallback from HTTP with cache on).
Cache-Control: max-age=n does not prevent cache storing it merely states that cache item is stale after n seconds. To prevent using cache use these headers for image:
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
I finally managed to get HTML5 cache work, but I can't find any solution how to control how long cache given elements. Google Page Speed is showing that cached elements should have expiration date.
I would be grateful if you could provide any tips concerning that issue.
From http://www.w3.org/TR/html5/browsers.html#offline
5.7.7 Expiring application caches
As a general rule, user agents should not expire application caches,
except on request from the user, or after having been left unused for
an extended period of time.
Application caches and cookies have similar implications with respect
to privacy (e.g. if the site can identify the user when providing the
cache, it can store data in the cache that can be used for cookie
resurrection). Implementors are therefore encouraged to expose
application caches in a manner related to HTTP cookies, allowing
caches to be expunged together with cookies and other origin-specific
data.
For example, a user agent could have a "delete site-specific data"
feature that clears all cookies, application caches, local storage,
databases, etc, from an origin all at once.
About the Google Page Speed warning, it is most likely to be on the SERVER-side.
You should take a look at your server config files, or perhaps your .htaccess files.
Related : https://stackoverflow.com/search?q=html5+cache+control
and from : HTML 5 Cache Manifest Vs. Etags, Expires or cache-control header
Here are some resources that will get you started:
http://www.html5rocks.com/en/tutorials/appcache/beginner/ A Beginner's Guide to Using the Application Cache
https://developer.mozilla.org/en-US/docs/HTML/Using_the_application_cache
Using the application cache
http://en.wikipedia.org/wiki/Cache_manifest_in_HTML5 Cache manifest in HTML5
http://www.w3.org/TR/offline-webapps/ Offline Web Applications
http://www.whatwg.org/specs/web-apps/current-work/multipage/offline.html
Offline Web applications
What is new in HTML 5’s “offline web applications” feature which was not already available in all browsers?
Offline caching is the job of the browser — how did it become a job of HTML?
A web cache is a mechanism for the
temporary storage (caching) of web
documents, such as HTML pages and
images, to reduce bandwidth usage,
server load, and perceived lag. A web
cache stores copies of documents
passing through it; subsequent
requests may be satisfied from the
cache if certain conditions are met.
As written in Wikipedia’s article for Web cache.
And this is written for offline web cache in the W3C website:
In order to enable users to continue
interacting with Web applications and
documents even when their network
connection is unavailable — for
instance, because they are traveling
outside of their ISP's coverage area —
authors can provide a manifest which
lists the files that are needed for
the Web application to work offline
and which causes the user's browser to
keep a copy of the files for use
offline.
What is HTML 5 doing better and different in caching?
Is it similar to offline mode in Internet Explorer 5? And can we cache the data beyond the limit of amount of space set in browser?
Please give me an example so that I can understand the difference of HTML 5 offline cache, and browser caches.
Web browser caching is when browsers decide to store files locally to improve performance. HTTP allows web servers to suggest browsers how long to store the files for, and allows browsers to ask the server whether a file has changed (so that they can avoid re-downloading it).
However, it’s not designed to reliably store assets required by an offline application. It’s ultimately up to the browser whether, and for how long, it caches the files. And browsers will often stop using their cached version if they can’t contact the server to check that it’s up-to-date.
The HTML5 offline web applications spec provides web authors with the ability to tell browsers what to store for offline access, and requires browsers to keep those files up-to-date when it is online. It also provides a DOM property that tells the developer whether the browser is online or offline, and events that fire when the online status changes.
As Peeter describes in his answer, this allows web app developers to store user-inputted data whilst the user is offline, then sync it with the server when they’re online again. The developer has to do this storage and syncing manually, as the browser only provides the events indicating online status, but if the browser also supports localStorage, the developer can store the data there.
I can do no better than point you to the relevant chapter of Dive into HTML5: http://diveintohtml5.ep.io/offline.html
You can now cache dynamic data, instead of just js/css/html files / images.
Lets say you've got a todo list application open in your browser. You're connected to the internet and you're adding a bunch of stuff you have to do.
Boom, you're on an airplane without a connection. You've got 6 hours of time to kill so you decide to get some work done. You finish all of the things on your todo list (the list was still open in your browser). You select all of the items and change their state to "finished".
Your plane lands, you open up your laptop and refresh the page. All the changes you did without a connection are now synced to the server as you have a internet connection now.