I've been using HTML5 Offline caching on my website for a while and for some reasons I am considering turning it off. To my surprise it doesn't work.
This is how I've implemented HTML5 Offline caching.
In my index.html I give path to the manifest file
<html manifest="app.manifest">
In the app.manifest file I list all the js/css/png file that I would like to be cached by the browser for offline usage. Every time I deploy updates, I update the app.manifest file, which causes the browser to fetch latest version of all the files listed in the manifest file.
In order to turn off the offline caching, I changed my index.html's opening tag to
<html>
I made a dummy change to app.manifest file, so that browser (which has already cached my website), will detect the change and download latest version of all the files (including index.html).
What I noticed is, the browser indeed gets the latest version of all the files. I see the new <html> tag in the updated version without the manifest declaration, however the behavior of the browser for future changes does not change. i.e. I now expect the browser to immediately fetch the new version of the index.html file, when it's changed on server. However that doesn't happen. The browser doesn't download updated index.html until I make any changes to the manifest file.
Thus it appears to me that the browser has permanently associated app.manifest file with my website URL and it won't get rid of it even when I don't mention it in <html> tag.
I have tested this on both Google Chrome and Firefox, same results. I also tried restarting Chrome, but it won't forget that my site ever had app.manifest defined for it. I haven't found any discussion on this aspect of offline caching on the web.
Update: I managed to get rid of the behavior in Chrome by clearing all the browsing data (by going to settings). But that's not something I can tell the users to do.
Make the manifest URL return a 404 to indicate you don't want offline web applications anymore. According to Step 5 of HTML5 §5.6.4, this marks the cache as obsolete, and will remove it.
You can also manually delete the offline web application in Chrome by going to about:appcache-internals.
Related
Edited to clarify the underlying question.
I am trying to debug a simple HTML5 webpage containing one image and one video. Everything displays fine. The video plays correctly. But, when I try to refresh the page, everything is downloaded except the video file. I am using the Firefox developer tools but I can't understand what is going on.
On the network tab I see the .html file being downloaded, then the image.jpg file. But I never see the video.mp4 file downloaded. The video plays OK, but it is not the current version on the server. It seems to be a previous version that has been cached.
I'm mystified why this should be. The cache is disabled in developer tools. I'm refreshing the page with Ctrl+F5. It's as if the video is being served from some secret local cache that I don't know about. I'm using Firefox 47.0.1. The same thing also happens when I test with Firebug.
Edit. I have now tried Developer Tools in Chrome and it's exactly the same. The very first time I access the page, I can see video.mp4 being downloaded. On subsequent reloads, I see the .html and .jpg files normally, but not the video.mp4 file. It must be cached somewhere because it plays. I disabled the cache in Chrome Dev Tools. I cleared the cache explicitly and tried an incognito window. Apart from the very first time, I never see any indication of the video file being downloaded.
I must be missing something obvious. Can anyone else reproduce this?
Here is my HTML.
<! DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<p>Test page.</p>
<img src="media/image.jpg">
<video src="media/video.mp4" controls="">
Display this if the browser can't play video.
</video>
</body>
Information moved from comments on an answer to the question:
1:
Thanks #nakji. Clearing the cache and private browsing made no difference at all. But closing the browser did. I reopened the browser after clearing the cache. On my very first access to the page I could see two GETs for video.mp4 with responses 206 (Partial Content). But after that it was back to the original problem. I will download Chrome and try that
2:
#ManoDestro. I tried everything possible to force a fresh download of video.mp4. But it's not happening. I reloaded the page with Ctrl+F5. I turned off caching in Dev Tools settings. I cleared the cache manually. I tried a private browsing window. I can't think of anything else. It's like the video is served from a secret cache that doesn't obey the normal caching rules. I have used multiple tools to confirm that the file is not coming down the wire - FF Dev Tools, Firebug, and now Wireshark. Can someone please test with a similar setup?
After a whole day's Googling I can now answer my own question. It turns out that Firefox has a special "media cache" for HTML5 video and audio content which is completely separate from the regular cache that everyone knows about. It is optimised for the high bandwidth and huge files associated with media content. One of the devs, Robert O'Callahan explains it all here.
The dumb thing is that this media cache doesn't seem to get cleared when you would expect it to. In fact it never seems to get cleared. Ever. The result is that Firefox keeps serving up stale content from the cache when you really want it to fetch the media file again from the server. This was the problem I was trying to debug originally. Firefox kept playing the wrong video after I changed the file on the server. I couldn't get it to download the new version.
All the things you normally do to force a page reload don't work with the media cache. The following have no effect.
The user selects 'Clear recent history' and deletes everything.
The user turns off caching in Developer tools.
The user forces a complete page reload with Ctrl+F5.
The only thing that does work is closing the browser and starting again. I'm still finding my way around this complex area. If anyone knows any more about it, please comment.
I reported this as a bug to Firefox here.
In previous version of chrome, on a webpage with the following:
<script>
document.write('<plaintext>');
</script>
<img src="http://example.com/image.jpg">
the image would not be downloaded. At some point a chrome update changed this behavior. Now when I look at the network tab, I see the image is downloaded. (fiddle here: https://jsfiddle.net/doojunqx/)
I have a script that is on a page, I would like to use this script to stop the browser from downloading (using up network bandwidth) for images and other assets that are unwanted and below my script tag.
Mobify does something similar here:
http://cdn.mobify.com/mobifyjs/examples/capturing-grumpycat/index.html
As they say on the page "Open your web inspector and note the original imgs did not load." However, when I open chrome developer tools and look at the network tab, I see the original images ARE now loading. I'm not sure what version of chrome changed this, but I think it is recent, within the last month or two.
Is there any way to force chrome back to the old behavior? Or any other way to stop these unwanted assets from loading?
Thanks,
Great question, and you're correct that it is a recent change in Chromium that affected the plaintext tag behaviour. In versions up to and including version 42.*, the HTML document parser would not spawn an asynchronous parsing thread until an external resource was found in the original HTML document. Once such a resource was found, an asynchronous thread would be spawned that would aggressively download all resources references within the HTML.
The recent change simplified the parsing behaviour by moving all document parsing to the asynchronous thread which now kicks off automatically. Whereas before, using the plaintext tag would ensure that no resources would be loaded if it was inserted before the first external resource, the plaintext tag is now racy as resources will download up to the moment the plaintext tag is executed in the main HTML document. As there is a time delay for the script to execute, an unknown number of resources will be retrieved.
There is as of yet no solution to this new behaviour, nor is there a way to disable the preload scanner as you would like. You will need to rely on workarounds such as polyfills to control your resource downloads. This new behaviour is only present in all versions of Chrome >= 43.* and has not been implemented in Safari, Firefox, or other browsers.
I'm implementing an appcache in my application and I have a lot of problems with setings it correctly.
For this specific moment I have problems to determine, if files loaded are for sure from the appcache and not from the regular cache?
Can someone provide me with links/tips how can I check that? E.g. in Chrome in the dev tools in Network tab there is placed a (from cache) text for cached resources, but how can I know this is the correct cache?
In Firefox sometimes on files which should be cached in appcache I have nitification in Firebug, that they are loaded from BFCache not AppCache and something like tjat houldn't happen.
So once again, can you provie me with e.g. some plugins for popular browsers (IE, Firefox, Chrome) to check that?
You explicitly declare a page to be cached by the AppCache by referencing a manifest file so you can be sure its using AppCache. A manifest file is
simple text file that lists the resources the browser should cache for
offline access.
and,
The manifest attribute should be included on every page of your web
application that you want cached.
<html manifest="example.appcache">
....
</html>
http://www.html5rocks.com/en/tutorials/appcache/beginner/
BFCache on the other hand is specific to Firefox (other browsers have similar implementation) and serves its purpose differently from AppCache.
AppCache helps your web apps be accessible offline while BFCache speeds up your backward and forward page navigation between visited pages.
You will no longer require any plugins aside from Firebug and the browser's built-in Developer tools if AppCache is implemented correctly
For an html5 game I'm making at a company we've hit a snag. In safari it doesn't even seem to be trying to load our manifest file while in chrome it is. And it runs offline too. Is there any huge differences between how the two handle it that trip it up?
I'll check how firefox handles it and update in a bit. This is literally how the cache looks. Already had it validated and everything.
CACHE MANIFEST
#v 1.01
CACHE:
/graphics/Apalia_Map 02.jpg
/graphics/comic/PAGE4.jpg
/graphics/comic/PAGE2.jpg
/graphics/comic/PAGE8.jpg
/graphics/comic/PAGE7.jpg
/graphics/comic/PAGE3.jpg
/graphics/comic/PAGE6.jpg
/graphics/comic/PAGE5.jpg
/graphics/comic/PAGE1.jpg
/graphics/gameComplete.jpg
/graphics/ui/main_menu_bg.jpg
/graphics/ui/apaliaCredits.jpg
/graphics/levels/elpala3-lvl1.jpg
/graphics/levels/elpala1-lvl1.jpg
/graphics/levels/elpala2-lvl1.jpg
/graphics/effects/fswipe_northwest_1_4.png
/graphics/effects/spinfx08.png
/graphics/effects/shieldfx_7.png
/graphics/effects/spinfx01.png
etc...
I have found the answer about this question.....
Safari is more funny than chrome, I can easily make chrome cache my page, but safari does not.
I list these key to make the instruction clearly for the dump like me:
The HTML tag contain Manifest file name like this:
< !DOCTYPE html>
< html manifest="safari.manifest" >
addType, I use apache server, my http.conf contain this in IfModule which contain those addtype:
< IfModule >
....(other content...)
AddType text/cache-manifest .manifest
< /IfModule >
The manifest file name is "safari.manifest". Its content is the most funny part, I have a html that only contain the javascript. I have no image so I don't have anything in NETWORK and FALLBACK. So I don't even fill them in the safari.manifest.
My failed safari.manifest content is:
CACHE MANIFEST
So it does not work.
My success safari.manifest content is:
CACHE MANIFEST
NETWORK:
FALLBACK:
SO FUNNY that I STILL need that "NETWORK FALLBACK" empty blocks in the file to make safari cache the page. If I don't add that two words, Safari will not cache anything.
That's all I found.
not sure as I can't see what is happening, but the problem could be related either to the way you link to the manifest file or (and I'll place my bets now) to the mime-type the file gets sent with (has to be "text/cache-manifest").
I'm trying to get a simple html5 webcache to work.
This is my one and only html page, index.html:
<!DOCTYPE HTML>
<html manifest="./main.manifest">
<body>
<p>Hi.</p>
</body>
</html>
This is my only cache file, main.manifest:
CACHE MANIFEST
# 2011-05-02-03
index.html
I'm running on apache shared hosting, I put a .htaccess file in my web directory where these other two files are, because I thought maybe I have to define the mime type:
AddType text/cache-manifest .manifest
So in the end I just have these three files in that directory:
index.html
main.manifest
.htaccess
When I visit the page on chrome from my mac, safari from my iphone, or chrome from my android 2.3 device, nothing happens, the page just loads as usual. If I turn airplane mode on (killing all connections) the page can't be loaded (so I guess caching failed).
What am I missing here?
Thanks
------------ Update ------------------
I think the mime type was not being recognized correctly. I updated .htaccess to:
AddType text/cache-manifest manifest
Now if I run in google chrome with console on, I see:
Document was loaded from Application Cache with manifest
http://example.com/foo/main.manifest
Application Cache Checking event
Application Cache NoUpdate event
Firefox prompts me when I load the page about the website wanting to let me store it to disk, so that's good. Looks like it's also working on android 2.3.4. The browser still says "This page cannot be loaded because you are not connected to the internet", but then it loads anyway.
Thanks!
First, you were right the first time on your mime type declaration. It should be like this:
AddType text/cache-manifest .manifest
Next, read this paragraph from Dive Into HTML5:
Q: Do I need to list my HTML pages in my cache manifest?
A: Yes and no. If your entire web application is contained in a single
page, just make sure that page points to the cache manifest using the
manifest attribute. When you navigate to an HTML page with a manifest
attribute, the page itself is assumed to be part of the web
application, so you don’t need to list it in the manifest file itself.
However, if your web application spans multiple pages, you should list
all of the HTML pages in the manifest file, otherwise the browser
would not know that there are other HTML pages that need to be
downloaded and cached.
So, in this case, you don't need a cache manifest. The browser will automatically cache your page (as long as it's the only resource, such as a CSS file or Javascript file, for example).
For more information, visit the link above.
I have had some trouble using "explicitly cached" items in my manifests, so I usually set it up like this:
CACHE MANIFEST
# 2011-05-02-03
CACHE:
index.html
But the other answer is correct, the browser will automatically cache any URLs that include an application cache manifest.
I recommend using Chrome's JavaScript Console -- it outputs application cache events as they are happening, including errors.