About HTML5 offline storage - html

I have a few questions regarding HTML5 offline storage, which I could not figure out.
Where exactly these files are stored in the Windows? I could not find here:
C:\Documents and Settings[User Name]\Application Data\Mozilla\Firefox\Profiles\
Is there any expiration time, after that browser deletes these files automatically? Or do the files remain forever?
What if I change the contents of the page, is there anyway to refresh the refresh the data which is stored offline?
Thanks.

I found them in %AppData%/Profiles/<currentprofilename>.default/OfflineCache. I'm using Windows 7.
This depends on the expires headers your web server sends for the files in question. It is recommended you set the expires header to one week, but it is up to you, you can make it never expire. Note that the manifest file itself should be set to never be cached.
In order to refresh the data you must actually change the manifest file. It is recommended that somewhere in the manifest file you put a comment with the version number, then update it every time you change any of your other files.
Edit: I had answered these questions thinking you meant offline application cache, not local storage.

Well, for the sake of accuracy, it should be mentioned that although localStorage was indeed part of the HTML5 specification, it was split into its own after getting slightly over-complicated to be include alongside with the rest of HTML5.
It really depends on your browser, but it should be found on your AppData folder, in /profiles//OfflineCache. (for Windoes 7).
There is generally NO expiration date for localStorage, it can remain forever unless specifically removed by the website.
Javascript changes the localStorage data, (assuming you don't touch the actual file), in which case, the website you are using (or writing) needs to be smart enough to refresh the localStorage along with the page's content.

Related

PDF in Object tag does not read latest version

I am hosting a website where a couple of PDF-files which are currently in Object-tags are updated weekly.
The name of these PDF-files stay the same, but the data changes.
Currently I'm using:
<object id="men" data="seasons/S2223/Men2023.pdf?" type="application/pdf" width="100%" height="750px">
<p>The file could not be read in the browser
Click here to download
</p>
</object>
When I update the PDF I'm expecting the
data="seasons/S2223/Men2023.pdf?"
to be reading the latest PDF however it stays the same as before.
I added the ? at the end of the filename which should check for the latest version but it doesn't seem to work.
When I clear my browser's cache it's updated but ofcourse this isn't a suitable option for users.
All help is appreciated.
Caching, in this context, is where the browser has loaded the data from a URL in the past and still has a local copy of it. To speed things up and save bandwidth, it uses its local copy instead of asking the server for a fresh copy.
If you want the browser to fetch a fresh copy then you need to do something to make it think that the copy it has in the cache is no good.
Cache Busting Query String
You are trying to use this approach, but it isn't really suitable for your needs and your implementation is broken.
This technique is designed for resources that change infrequently and unpredictable such as the stylesheet for a website. (Since your resources change weekly, this isn't a good option for you.)
It works by changing the URL to the resource whenever the resource changes. This means that the URL doesn't match the one the browser has cached data for. Since the browser doesn't know about the new URL it has to ask for it fresh.
Since you have hardcoded the query to n=1, it never changes which defeats the object.
Common approaches are to set the value of the query to a time stamp or a checksum of the file. (This is usually done with the website's build tool as part of the deployment process.)
Cache Control Headers
HTTP provides mechanisms to tell the browser when it should get a new copy. There are a variety of headers and I encourage you to read this Caching Tutorial for Web Authors and Webmasters as it covers the topic well.
Since your documents expire weekly, I think the best approach for you would be to set an Expires header on the HTTP resource for the PDF's URL.
You could programmatically set it to (for example) one hour after the time a new version is expected to be uploaded.
How you go about this would depend on the HTTP server and/or server-side programming capabilities of the host where you deploy the PDF.

Html Application Cache - Check if empty

We are using a "manifest.appcache" file to control the application cache on our site. A part of the application should be accessible offline, which means that some of the pages have the reference on the manifest in the html-tag, others don't.
Is there any way to check if the cache is empty (from all pages)?
Example
Page A is available online only, so no manifest is referenced. Page B is available online and offline, so the manifest is referenced. Now we want to check on page A (online only) if page B is already cached (the cache is not empty).
There is no way to check if the cache is empty or if there is an page in the cache (with application cache).
If you really need to do it, there are two solutions you can use:
You can use cookies to track the sites that are loaded in the cache. This opens new problems: What if the user clears the cookies and not the application cache?
The more cleaner solution is to use IndexedDB or the HTML5 FileSystem to save the content and just cache wrappers for that content. You can cache the wrappers to the beginning and then you can handle the content with the APIs i mentioned (i used this one). This way you can simply check whether a page is in the cache or not.
Sorry for answering my own question, but i hope this saves some time.

What is the difference between HTML5 AppCache and the normal browser cache?

I don't understand the point of the HTML5 AppCache. We already have a normal cache. If you visit a website the first time it'll already cache all the assets. What extra value does the AppCache provide? Is it just a list of files so that the browser knows what assets to download, even if they're not referenced by the HTML right now? Does the browser make sure that the caching is "all-or-nothing", i.e. does it ensure that everything referenced by the manifest is cached, or nothing at all?
I think the point you're missing is that AppCache is specifically designed to allow web apps (and web sites) to be made available offline, though the same speed benefits which the normal browser cache provides, when the user is online, are also provided by AppCache.
The key difference with the browser cache is that you can specify all the assets the browser should cache in a manifest file (conceivably your entire site) whereas the browser cache will only store the pages (and associated assets) you have actually visited.
I'm no expert on the AppCache, but I do know it is not without its problems. There's a really good article here from a chap who used AppCache to allow parts of his mobile site to be available offline. It includes some rationale on their decision to use it and a number of gotchas they encountered in doing so.
This HTML5 Rocks article on the subject also has some good information.
AppCache actually uses the browser cache in support of its operation. It is the browser equivalent of downloading an app to run locally.
The first time a user visits the page, the resources of that page will be loaded from the server and stored in the normal cache. If the page specifies an appcache manifest, the browser will download the manifest and fetch all the resources in there (even if they do not appear on the page that embedded the manifest). These are then stored in appcache.
The second time a user visits the page, the browser will check its appcache. If an entry exists for that URL, it will load the page from appcache instead of from the server, based on the rules specified in the manifest (the manifest can mark some resources explicitly as fetched from the network).
After the browser loads the page from appcache, it will contact the server to see if there is an updated manifest. If the manifest is updated, it will fetch the resources from the manifest. These fetches are done using normal browser cache rules, so some of these resources may actually end up being fetched from the regular browser cache instead of from the server (this allows you to do differential updates when using appcache to develop offline apps). The new version of the appcache is kept separate from the old version. After the new version is fetched the user keeps interacting with the resources from the old version until they refresh the main page, after which the new version is loaded and the old is discarded.
Another important point is that appcache has different rules for when resources are discarded. Appcache basically never discards the latest set of resources, and caches them as a whole. To prevent abuse it enforces storage limits (sometimes as little as 5 MB) of how big a site's cache can be. By contrast, the browser cache has no per-site limits, but will discard individual resources from a site if the global cache limits are reached.
The important feature of HTML 5 application cache is it makes available the web application offline. Which is not given by normal browser cache.
In addition to this application cache will provide
Speed - since the entire contents of the specified page will be cached to browser so it will provide a better speed than browser cache
Reduce Server Load - There is no need of a post back all the time, since all the contents are there in cache, till there is any changes in the Manifest file
Cache manifest :- The cache manifest file is the heart of HTML5 application cache. We can specify what are the pages need not be cached, what should not, and even we can reuse this one as a error handling technique, for that we can specify a custom error page in the FALLBACK section to show if the user request a content that requires network connectivity
For a basic understanding on Application cache you can See this tutorial

Browsers don't refresh html

I'm building a website whose hosting supports only html and javascript. I've corrected some mistakes but visitors probably won't see them because browsers can't show the updated pages. They show the old pages.
I remember "html expires" code but it's too late. Because many visitors saw our site.
I'm assuming your HTML files aren't being updated and not CSS style sheets or JS files (for which the answer would be different).
This is mainly a server side issue - you would have to check with your hosting provider what caching headers the server emits. Ideally, the server would listen to if-modified-since requests from browsers so it can serve updated content if there is any, and make the browser use the cached copy if there is none.
To remedy the problem at hand, you may need to rename your existing HTML files, or put them in a new directory. That will force the browser to actively re-fetch them. Of course, that still means that the entry point (usually index.html) will have to be re-fetched by the client - otherwise they will never notice the new structure, and hence not re-load anything.
A hosting configuration that indiscriminately dishes out caching instructions that prevent frequent updates from being made is not really useful. Talking to your hosting provider would probably be a good idea.

Basic HTML5 caching

I am a little slow to the HTML5 caching, I have just some simple questions though.
1) How long is data in a caching manifest cached?
2) If I update the data, how can I make sure the client checks for a newer version when it is available, or is this already done?
3) Also, is this completely useless for a non-0mobile environment or can it speed up load times on a desktop?
<html lang="en" manifest="offline.manifest">
offline.manifest
CACHE MANIFEST
index.html
style.css
image.jpg
image-med.jpg
image-small.jpg
notre-dame.jpg
1) As long as the user cares to cache it. The only way to completely get rid of the cache is to go into the browser settings and explicitly remove it.
2) If you update the manifest file, the client will download new versions of all the files. This download is still governed by 'old' HTTP caching rules, so set headers appropriately, also make sure you send a 'no-cache' header on the manifest file itself. The rules from HTML5 Boilerplate are probably a good place to start.
3) Remember desktops can lose connectivity too. Also, having files in application cache means they are always served locally so, providing you're sensible about what you put in it, the application cache can reduce bandwidth and latency. What I mean by sensible is: if most visitors only see a couple of pages of your site and you update the manifest of your entire site every week, then they could end up using more bandwidth if you're forcing them to cache a load of static files for pages they never look at.
To really cut down on bandwidth and latency in your HTML5 website of the future: use the application cache for all your assets and a static framework; use something like mustache to render all your content from JSON; send that JSON over Web Sockets instead of HTTP, saving you ~800 bytes and a two way network handshake per request; cache data with Local Storage to save you fetching it again, and manage navigation with the History API.
1) How long is data in a caching manifest cached?
Once an application is cached, it remains cached until one of the following happens:
The user clears the browser's cache
The manifest file is modified
The application cache is programatically updated
2) If I update the data, how can I make sure the client checks for a newer version when it is available, or is this already done?
you can specify witch files not to cache (NETWORK:)
If you want to update your cached files, you should modify something in the manifest file, the best way is to put comment in the file and change it when you want the browser to update the cache
3) Also, is this completely useless for a non-mobile environment or can it speed up load times on a desktop?
Yes it is useful, cause the internet can cut on all devices