Dangers of using HTML5 prefetch? - html

Ok, so it isn't a huge worry yet as it is only supported by a few browsers:
Mozilla Firefox: Supported
Google Chrome: Supported since version 13 (Use an alternate syntax)
Safari: Currently not supported Internet
Explorer: Currently not supported
However, prefetch makes me twitch. If the user lands on your page and bounces off to another site have you paid for the bandwidth of them visiting your prefetch links?
Isn't there a risk of developers prefetching every link on the page which in turn would make the website a slower experience for user?
It looks like it can alter analytics. Will people be forcing page views onto users via prefetch?
Security, you wont know what pages are being prefetched. Can it prefetch malicious files?
Will all this prefetching be painful for mobile users with limited usage?

I can't call myself an expert on the subject, but I can make these observations:
Prefetch should be considered only where it is known to be beneficial. Enabling prefetch on everything would just be silly. It's essentially a balance of server load vs user experience.
I haven't looked into the HTML5 prefetching spec, but I would imagine they've specified a header that states "this request is being performed as part of prefetching", which could be used to fix the analytics problem - i.e. "if this is a prefetch, don't include it in analytics stats".
From a security standpoint, one would expect prefetch to follow the same cross-domain rules as Ajax does. This would mitigate any cases where XSS is an issue.
Mobile browsers that support HTML5 prefetch should be smart enough to turn it on when using WiFi, and off when using potentially expensive or slow forms of network connection, e.g. 2G/3G.
As I've stated, I can't guarantee any of the above things, but (like with any technology) it's a case of best practices. You wouldn't use Cache-Control to force every page on your site to be cached for a year. Nor would you expect a browser to satisfy a cross-domain Ajax request. Hopefully the same considerations were/will be taken for prefetching.

To answer the question of analytics and statistics, the spec has the following to say:
To ensure compatibility and improve the success rate of prerendering requests the target page can use the [PAGE-VISIBILITY] to determine the visibility state of the page as it is being rendered and implement appropriate logic to avoid actions that may cause the prerender to be abandoned (e.g. non-idempotent requests), or unwanted side-effects from being triggered (e.g. analytics beacons firing prior to the page being displayed).

Related

Why use protocol-relative URLs at all?

It's been an oft-discussed question on StackOverflow what this means:
<script src="//cdn.example.com/somewhere/something.js"></script>
This gives the advantage that if you're accessing it over HTTPS, you get HTTPS automatically, instead of that scary "Insecure elements on this page" warning.
But why use protocol-relative URLs at all? Why not simply use HTTPS always in CDN URLs? After all, an HTTP page has no reason to complain if you decide to load some parts of it over HTTPS.
(This is more specifically for CDNs; almost all CDNs have HTTPS capability. Whereas, your own server may not necessarily have HTTPS.)
As of December 2014, Paul Irish's blog on protocol-relative URLs says:
2014.12.17: Now that SSL is encouraged for everyone and doesn’t have performance concerns, this technique is now an anti-pattern. If the asset you need is available on SSL, then always use the https:// asset.
Unless you have specific performance concerns (such as the slow mobile network mentioned in Zakjan's answer) you should use https:// to protect your users.
Because of performance. Establishing of HTTPS connection takes much longer time than HTTP, TLS handshake adds latency delay up to 2 RTTs. You can notice it on mobile networks. So it is better not to use HTTPS asset URLs, if you don't need it.
There are a number of potential reasons, though they're all not particularly crucial:
How about the next time every business with an agenda pushes a new protocol? Are we going to have to swap out thousands of strings again then? No thanks.
HTTPS is slower than HTTP of same version
If any of the notes listed at caniuse.com for HTTP/2 are a problem
Conceptually, if the server enforces the protocol, there is no reason to be specific about it in the first place. Agnosticism is what it is. It's covering all your bases.
One thing to note, if you are using CSP's upgrade-insecure-requests, you can safely use protocol-agnostic URLs (//example.com).
Protocol-relative URLs sometimes break JS codes that try to detect location.protocol. They are also not understood by extremely old browsers. If you are developing web services that requires maximum backward-compatibility (i.e. serving crucial emergency information that can be received/sent on slow connections and/or old devices) do not use PRURLs.

Is Service Worker intended to replace or coexist with Appcache?

Is ServiceWorker intended to replace Appcache, or is the intention that the two will coexist? Phrased another way, is appcache about to become deprecated?
Blink's Service Worker team is keen on deprecating AppCache (We will follow our usual intent to deprecate process). We believe that Service Worker is a much better solution. Also, it should be pretty easy to offer a drop-in replacement for AppCache built on top of SW. We'll start by collecting usage metrics and do some outreach.
AppCache and Service Worker should coexist without any issue since offering offline support via AppCache for browsers that don't support Service Workers is a valid use case.
#flo850 If it's not working, please let us know by filing a bug.
I must say that Services Worker is not only the replacement for AppCache, but it’s far more capable. An AppCache can’t be partially updated, a byte-by-byte manifest comparison to trigger the update seems odd and there are several use cases leading to security and terrible usability problems.
Even Chrome and Firefox are planning to stop support for AppCache in the near future. Now that service workers are supported by Chrome, Opera, and Firefox.Also, The noises coming from Microsoft and Safari have been positive with respect to implementation and under consideration.
As a cache tool, it will coexist with appcache. Appcache works on virtually every browser.
But service workers are a solid foundation that will permit new usage like push (even when the browser is in the background) , geofencing or background synchronization.

How do browsers handle non-cacheable resources that are referenced more than once?

This Yahoo Developer Network article says that browsers handle non-cacheable resources that are referenced more than once in a single HTML differently. I didn't find any rule about this in the HTTP/1.1 cache RFC.
I made some experiments in Chrome, but I couldn't figure out the exact rules. It was loading a duplicate non-cacheable scripts tag only once. Then I referenced the same script in 3 iframes. The first one triggered a network request, but the others were served from the cache. I tried to reference the same url as the src of an image, and that triggered a network reques again.
Is there any documentation about this behavior? How does this differ between browsers?
When a client decides to retrieve a resource, it's RFC2616 that governs that rules of whether that resource can be returned from a cache, or needs to be revalidated/reloaded from the origin server (mostly section 14.9 but you really need to read the whole thing) .
However, when you have multiple copies of the same resource on the same page, after the first copy has been retrieved following the rules of RFC2616, the decision as to whether to retrieve additional copies is now covered by the HTML5 spec (mostly specified in the processing model for fetching resources).
In particular, note step 10:
If the resource [...] is already being downloaded for other reasons (e.g. another invocation of this algorithm), and this request would be identical to the previous one (e.g. same Accept and Origin headers), and the user agent is configured such that it is to reuse the data from the existing download instead of initiating a new one, then use the results of the existing download instead of starting a new one.
This clearly describes a number of factors that could come into play in deciding whether a resource may be reused or not. Some key points:
Same Accept and Origin headers: While most browsers use the same Accept headers everywhere, in Internet Explorer they're different for an image vs a script vs HTML. And every browser sends a different Referer when frames are involved, and while Referer isn't directly mentioned, Accept and Origin were only given as examples.
Already being downloaded: Note that that is something quite different from already downloaded. So if the resource occurs multiple times on the page, but the first occurrence is finished downloading before the second occurrence is encountered, then the option to reuse may not be applicable.
The user agent is configured to reuse the data: That implies to me that the decision to reuse or re-retrieve the data is somewhat at the discretion of the user-agent, or at least a user option.
The end result, is that every single browser handles caching slightly differently. And even in a particular browser, the results may differ based on timing.
I created a test case with three nested frames (i.e. a page containing an iframe, which itself contained an iframe) and 6 copies of the same script, 2 on each page (using Cache-Control:no-cache to make them non-cacheable, but also tested with other variations, including max-age=0).
Chrome loaded only 1 copy.
Internet Explorer tended to vary, assumedly based on load, but was between 1 and 3.
Safari loaded 3 copies, one for each frame (i.e. with different Referer headers).
Opera and Firefox loaded all 6 copies.
When I reused the same resource in a couple of images (one on the root page, one in the first iframe) and a couple of other images for reference, the behaviour changed.
Chrome now loaded 5 copies, 1 of each type on each page. While Accept headers in Chrome are the same for images and scripts, the header order is different, which suggests they may be treated differently, and potentially cached differently.
Internet Explorer loaded 2 copies, 1 of each type which was to be expected for them. Assumedly that could have varied though, given their behaviour when it was just scripts.
Safari was still only 3 copies, one per frame.
Opera was inexplicably still 6. Couldn't tell what portion of those were scripts and which were images. But possibly this is also something that could vary based on load or timing.
Firefox loaded 8 copies, which was to be expected for them. The 6 scripts, plus the 2 new images.
Now this was what happened when viewing the page normally - i.e. just entering the page url into the address bar. Forcing a reload with F5 (or whatever the equivalent on Safari) produced a whole different set of results. And in general, the whole concept of reloading, F5 vs Ctrl-F5, what headers get sent by the client, etc. also differs wildly from one browser to the next. But that's a subject for another day.
The bottom line is caching is very unpredictable from one browser to the next, and the specs somewhat leave it up to the implementors to decide what works best for them.
I hope this has answered your question.
Additional Note: I should mention that I didn't go out of my way to test the latest copy of every browser (Safari in particular was an ancient v4, Internet Explorer was v9, but the others were probably fairly up to date). I doubt it makes much difference though. The chances that all browsers have suddenly converged on consistent behaviour in this regard is highly unlikely.
Molnarg, if you read the article properly it will become clear why this happens.
Unnecessary HTTP requests happen in Internet Explorer, but not in
Firefox. In Internet Explorer, if an external script is included twice
and is not cacheable, it generates two HTTP requests during page
loading. Even if the script is cacheable, extra HTTP requests occur
when the user reloads the page.
This behavior is unique to Internet Explorer only. If you ask me why this happens, I would say that IE developers chose to ignore the HTTP/1.1 cache RFC or at least could not implement it. Maybe it is a work in progress. But then again there are lot of aspects wherein IE is different from most of the browsers (JavaScript, HTML5, CSS ). This can't be helped unless devs update it.
The Yahoo Dev Article you gave lists best practices for high performance. This must accommodate for all the IE users, which are impaired by this. Which is why including same script multiple times, though OK for other browsers hurts IE users and should be avoided.
Update
Non-cacheable resources will generate network request, once or multiple times alike.
From 2. Overview of Cache Operation from the HTTP/1.1 cache RFC
Although caching is an entirely OPTIONAL feature of HTTP, we assume
that reusing the cached response is desirable and that such reuse
is the default behavior when no requirement or locally-desired
configuration prevents it.
So using cache means attempting to reuse and non-cacheable means opposite. Think of it like this non-cacheable request is like HTTP request with cache turned off (fallback from HTTP with cache on).
Cache-Control: max-age=n does not prevent cache storing it merely states that cache item is stale after n seconds. To prevent using cache use these headers for image:
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0

Preventing HTML5 applicationCache checking event on offline application load

I have an HTML5/jquery mobile web app at http://app.bluedot.mobi. It is used for long distance races to track competitors via SPOT satellite tracking. The issue I have not yet resolved is that when loading the app when no data connection exists, the browser throws a "no data connection" alert popup as it is attempting to fetch the manifest during the checking event. Even when a data connection is present, loading the app can take a very long time. There are ~ 500 files to check. The fastest way to load the app (from a phone) is to be in airplane mode and dismiss the browser's alert - not so elegant.
Rather than force an update on users who tend to be in the backcountry with a spotty connection, I want to use applicationCache.update() programmatically, giving the user control over the process and speeding up app load whether on or offline.
Is this currently possible with the HTML5 spec and respective browser implementations?
Sounds like you need the abort() method. Unfortunately it is very new, and it will probably be a while before it is implemented by the majority of mobile browsers.
There are ~ 500 files to check.
It sounds like you're implying that the browser checks each file to see if there's any of them which has changed. This is not correct. The browser only checks the manifest file if that has changed, and that is a simple byte check. If the manifest file has not changed, the browser believes nothing has changed.
So if your application is slow to start, it might be your because your application is complex and there's alot of HTML and Javascript to parse. I would advise you to take a look at the application and see if there's anything you can optimize. In that case, you might want to take a look at Yahoo's Best Practices for Speeding Up Your Web Site page.
For example, I noticed you have a lot of Javascript code in the HEAD section. The beforementioned article advices us to move all Javascript (To the extent of what's possible) to the bottom of the page, so that the browser can start rendering the page as soon as possible. And there's a lot of other sound advice in the article. So take a look, I'm sure you'll find it useful. :-)

What is new in HTML 5 "offline web application" which was not already available in the all browsers?

What is new in HTML 5’s “offline web applications” feature which was not already available in all browsers?
Offline caching is the job of the browser — how did it become a job of HTML?
A web cache is a mechanism for the
temporary storage (caching) of web
documents, such as HTML pages and
images, to reduce bandwidth usage,
server load, and perceived lag. A web
cache stores copies of documents
passing through it; subsequent
requests may be satisfied from the
cache if certain conditions are met.
As written in Wikipedia’s article for Web cache.
And this is written for offline web cache in the W3C website:
In order to enable users to continue
interacting with Web applications and
documents even when their network
connection is unavailable — for
instance, because they are traveling
outside of their ISP's coverage area —
authors can provide a manifest which
lists the files that are needed for
the Web application to work offline
and which causes the user's browser to
keep a copy of the files for use
offline.
What is HTML 5 doing better and different in caching?
Is it similar to offline mode in Internet Explorer 5? And can we cache the data beyond the limit of amount of space set in browser?
Please give me an example so that I can understand the difference of HTML 5 offline cache, and browser caches.
Web browser caching is when browsers decide to store files locally to improve performance. HTTP allows web servers to suggest browsers how long to store the files for, and allows browsers to ask the server whether a file has changed (so that they can avoid re-downloading it).
However, it’s not designed to reliably store assets required by an offline application. It’s ultimately up to the browser whether, and for how long, it caches the files. And browsers will often stop using their cached version if they can’t contact the server to check that it’s up-to-date.
The HTML5 offline web applications spec provides web authors with the ability to tell browsers what to store for offline access, and requires browsers to keep those files up-to-date when it is online. It also provides a DOM property that tells the developer whether the browser is online or offline, and events that fire when the online status changes.
As Peeter describes in his answer, this allows web app developers to store user-inputted data whilst the user is offline, then sync it with the server when they’re online again. The developer has to do this storage and syncing manually, as the browser only provides the events indicating online status, but if the browser also supports localStorage, the developer can store the data there.
I can do no better than point you to the relevant chapter of Dive into HTML5: http://diveintohtml5.ep.io/offline.html
You can now cache dynamic data, instead of just js/css/html files / images.
Lets say you've got a todo list application open in your browser. You're connected to the internet and you're adding a bunch of stuff you have to do.
Boom, you're on an airplane without a connection. You've got 6 hours of time to kill so you decide to get some work done. You finish all of the things on your todo list (the list was still open in your browser). You select all of the items and change their state to "finished".
Your plane lands, you open up your laptop and refresh the page. All the changes you did without a connection are now synced to the server as you have a internet connection now.