Providing hashes of included files in HTML for caching? - html

I was wondering if it's possible to include hashes of external files within a HTML file. This should basically serve 2 purposes:
Including unencrypted content into encrypted pages. (The hashes would ensure the integrity of the data)
Allow more caching for resources that are used on multiple pages
Let's focus on the second case and clarify it with a made-up example:
<script type="text/javascript" src="jQuery-1.5.1.min.js" hash-md5="b04a3bccd23ddeb7982143707a63ccf9">
Browsers could now download and cache the file initially. For every following page that uses the same hash, it would be clear that the cached version could be used. This technique should work independent of file origin, file type, transmission protocol and without even hitting the server once to know that a file is already cached locally.
My question is: Is such a mechanism available in HTML?
The following example is just to clarify the idea further and does not add new information.
An example of a library included in 2 unrelated pages would lead to the following steps.
User navigates to page A for the first time
Browser loads page A and looks for external files (images, scripts, …)
Browser finds page A includes a script with hash b04a3bccd23ddeb7982143707a63ccf9
Browser checks its cache and finds no file with that hash
Browser downloads the file from the given URL (gives a file on page A's domain)
Browser calculates hash and compares it with the hash as stated on page A
Browser adds file to its cache using the hash. If calculated hash would not have matched given hash, the file would have been rejected with an error message
Browser executes file.
At some point later in time:
User navigates to page B for the first time
Browser loads page B and looks for external files (images, scripts, …)
Browser finds page B includes a script with hash b04a3bccd23ddeb7982143707a63ccf9
Browser checks its cache and finds a file with that hash
Browser loads file from cache. The browser did not care about the URL given on page B pointing to the file. Also, it did not matter how the file's content found its way into the cache – protocol, encryption of connection and source are ignored. No connection to any server was made to load the file for page B
Browser executes file.

It's basically a kernel of a good idea, but I don't think there's anything in HTML to support it. Might be able to kludge something together with JavaScript, I suppose.

It is not necessary and not a new idea at all.
You can do this, using your example, omitting attr. "type" (for brevity):
<script src="jQuery-1.5.1.min.js?b04a3bccd23ddeb7982143707a63ccf9">
This has been practised using the file's timestamp instad of MD5 for a long time on quite a few sites, Rails supports it too, see here (look for "timestamp"), or here for an example with PHP.
Also see How to set up caching for css/js static properly

Related

Application Cache - HTML 5

In one of the online documents that talks about appcache for HTML5, it indicates that the cached files get updated once an offline user reconnects. I checked the original HTML5 appcache definition by W3, and I am not able to find anything that supports this statement.
Does anyone know if this is to be true?
Thanks in advance
MDN says the following, although if you scroll up on that page it says it's being deprecated.
If an application cache exists, the browser loads the document and its associated resources directly from the cache, without accessing the network. This speeds up the document load time.
The browser then checks to see if the cache manifest has been updated on the server.
If the cache manifest has been updated, the browser downloads a new version of the manifest and the resources listed in the manifest. This is done in the background and does not affect performance significantly.
And logic tells me that it would also depend on the app you're using, server you're trying to connect to and any special settings it might have, how long your browser keeps it's history, what it keeps, and if you saved the page to view offline - whether or not you have all the code/images saved in the right location(s).
Example:
Imagine you saved a page to view offline, and that page has a JS event handler that ran a while loop that did an ajax request every n seconds to do something, like make a number on a page change as long as you were online... As long as the loop is running, you suddenly connect to the internet, and it makes the request to the proper url with the right arguments, then it should go through, even though the url in your browser might say something like file:///C:/Users/you/Desktop/....
I've done this before, even though my url was like the one above. One time I was using braintree's drop-in javascript to a website, and using it's api on my backend. Trying to load the page when offline = Nothing. Online = Updated the spot on the page just fine when I had the required arguments, and it was pointing to the right url. If I got offline again, I could refresh the page, see the same images loaded in the <div>, but I couldn't send any data with it.

How does using external scripts in html reduce the download time for a web page

I know that using external files can reduce download time for browsers, is this because the lines of code are shorter? But if this was the case wouldn't the number of HTTP connections required to download an external script in html be higher and therefore cause it to download slower?
Using external .js files does reduce the amount of time required for the .html file to download because the JavaScript is removed from it. However, this does not necessarily mean that the total time to initially process the page is reduced.
Here are some things to take into account:
If the references to the external script files are placed near the
end of the HTML <body>, the HTML content can be parsed, without
interruption, and a visible UI can be seen quicker than if the user
agent had to process the JavaScript first.
If the external references point to well-known and used resources
from popular Content Delivery Networks (CDNs), like JQuery, then it's
possible the user may already have a cached copy of this resource
stored locally and so, the browser won't need to actually download
the file again.
On the contrary side....
If the scripts are not files that the user has already downloaded,
then the total time to download the page vs. if the scripts were
embedded inside the page will be the same.
Even if the scripts are separated into their own files, if the
scripts are not at the bottom of the HTML, the page may appear to
have a longer load time because the UI may not be visible and/or
responsive right away.
Browsers/Operating Systems usually put a "cap" on the amount of
simultaneous HTTP requests that can be made at any given time. As
such, it is often recommended to combine external .js files (where
appropriate) to reduce the amount of HTTP requests a page has to
make. Keep in mind that every image, every CSS file, every iframe source, every .js file, every AJAX call all amount to more requests.
If you mean cdn with external files, the answer is caching. The user most likely visited another page, pointing to the resource before and still knows about it. Therefore it does not need to load the resource anymore, which is faster.

HTML5 read files from path

Well, using HTML5 file handlining api we can read files with the collaboration of inpty type file. What about ready files with pat like
/images/myimage.png
etc??
Any kind of help is appreciated
Yes, if it is chrome! Play with the filesytem you will be able to do that.
The simple answer is; no. When your HTML/CSS/images/JavaScript is downloaded to the client's end you are breaking loose of the server.
Simplistic Flowchart
User requests URL in Browser (for example; www.mydomain.com/index.html)
Server reads and fetches the required file (www.mydomain.com/index.html)
index.html and it's linked resources will be downloaded to the user's browser
The user's Browser will render the HTML page
The user's Browser will only fetch the files that came with the request (images/someimages.png and stuff like scripts/jquery.js)
Explanation
The problem you are facing here is that when HTML is being rendered locally it has no link with the server anymore, thus requesting what /images/ contains file-wise is not logically comparable as it resides on the server.
Work-around
What you can do, but this will neglect the reason of the question, is to make a server-side script in JSP/PHP/ASP/etc. This script will then traverse through the directory you want. In PHP you can do this by using opendir() (http://php.net/opendir).
With a XHR/AJAX call you could request the PHP page to return the directory listing. Easiest way to do this is by using jQuery's $.post() function in combination with JSON.
Caution!
You need to keep in mind that if you use the work-around you will store a link to be visible for everyone to see what's in your online directory you request (for example http://www.mydomain.com/my_image_dirlist.php would then return a stringified list of everything (or less based on certain rules in the server-side script) inside http://www.mydomain.com/images/.
Notes
http://www.html5rocks.com/en/tutorials/file/filesystem/ (seems to work only in Chrome, but would still not be exactly what you want)
If you don't need all files from a folder, but only those files that have been downloaded to your browser's cache in the URL request; you could try to search online for accessing browser cache (downloaded files) of the currently loaded page. Or make something like a DOM-walker and CSS reader (regex?) to see where all file-relations are.

HTTP Headers - Hard refresh JavaScript/CSS

I've recently added HTTP headers to my site to inform the browser to check with the server every time it comes across a given JS/CSS URL. I've tested it and it works perfectly; all browsers now make conditional GET requests.
Here's the problem though -- people still have the old headers cached; headers which more or less told the browser "cache this forever; don't bother asking for an update!". This can be busted with a hard refresh. I don't want to have to communicate to everyone to please hit F5 on any buggy pages after we push out code.
Are there any HTTP header(s)/HTML meta tag(s) I could put on the HTML document itself to say "Browser, ignore the headers you have on the JS/CSS files and download the latest version of all the included files on this page"?
Eventually this problem will work itself out as more and more people clear their cache or learn to refresh on their own. But, I'd rather fix it now. Then in a month or so, I'll remove the HTML-level headers to get caching where I want -- on a per resource basis.
EDIT: I do not want to rename the resources or add on query parameters. That's what we used to use (?v=18, ?v=19, etc.) and it was a chore to increment that number every time we updated resources. Even doing that programmatically isn't the ideal solution; especially now that our server is configured correctly. It makes more sense to do it on the HTTP level so it works regardless of how you're accessing it -- included on a page, directly from the address bar, or otherwise.
pass a parameter to on the script source which will force a reload of the script... in fact you could do it by version or similiar
<script src="/test/script/myawesomescript.js?ver=1.0&pwn=yes" ...>
that would work and be seemless to the other users... when you feel like it has been long enough go back to the old way. but this will work if you want to force a refresh from users.
This method is utilized to prevent caching of webpages by some frameworks. Let me know if you were successful
http://css-tricks.com/can-we-prevent-css-caching/ -- here is a link to the concept for css (should work in js too) -- the biggest difference is you dont want it to never cache, so dont use a time stamp, use my style like from above :) enjoy!
Basically the only way is to get the browser not to use the cached URL.
One method is to use a cache-busting dummy parameter on the end of the URL.
some-name.css?q=1
That will force the browser to reload that file (because that URL isn't in the cache), and the downloaded file won't be cached because of your new headers. However: you may need to use this new name indefinitely, because you can't guarantee that once you leave off the dummy parameter again the cached version may still be used.
The other method is to completely rename the file.
my-new-name.css

How do I specify a wildcard in the HTML5 cache manifest to load all images in a directory?

I have a lot of images in a folder that are used in the application. When using the cache manifest it would be easier maintenance wise if I could specify a wild card to load all the images or files in a certain directory to be cached.
E.g.
CACHE MANIFEST
# 2011-11-3-v0.1.8
#--------------------------------
# Pages
#--------------------------------
../index.html
../edit.html
#--------------------------------
# JavaScript
#--------------------------------
../js/jquery.js
../js/main.js
#--------------------------------
# Images
#--------------------------------
../img/*.png
Can this be done? Have tried it in a few browsers with ../img/* as well but it doesn't seem to work.
It would be easier, but how's it going to work? The manifest file is something which is parsed and acted upon in the browser, which has no special knowledge of files on your server other than what you've told it. If the browser sees this:
../img/*.png
What is the first image the browser should request from the server? Let's start with these:
../img/1.png
../img/2.png
../img/3.png
../img/4.png
...
../img/2147483647.png
That's all the images that might exist with a numeric name, stopping semi-arbitrarily at 231-1. How many of those 2 billion files exist in your img directory? Do you really want a browser making all those requests only to get 2 billion 404s? For completeness the browser would probably also want to request all the zero-filled equivalents:
../img/01.png
../img/02.png
../img/03.png
../img/04.png
...
../img/001.png
../img/002.png
../img/003.png
../img/004.png
...
../img/0001.png
../img/0002.png
../img/0003.png
../img/0004.png
...
Now the browser's made more than 4 billion HTTP requests for files which mostly aren't there, and it's not yet even got on to letters or punctuation in constructing the possible filenames which might exist on the server. This is not a feasible way for the manifest file to work. The server is where the files in the img directory are known, so it's on the server that the list of files has to be constructed.
I don't think it works that way. You'll have to specify all of the images one by one, or have a simple PHP script to loop through the directory and output the file (with the correct text/cache-manifest header of course).
It would be a big security issue if browsers could request folder listings - that's why Tomcat turns that capability off by default now.
But, the browser could locate all matches to the wildcards referenced by the pages it caches. This approach would still be problematic (like, what about images not initially used but set dynamically by JavaScript, etc., and it would require that all cached items not only be downloaded but parsed as well).
If you are trying automate this process, instead of manually doing it. Use a script, or as I do I use manifestR. It will output your manifest/appcache file and all you have to do is copy and paste. I've used it successfully and usually only have to make a few changes.
Also, I recommend using the network header with the wild card:
NETWORK:
*
This allows all assets from other linked domains via JSON, for instance, to download into the cache. I believe that this is the only header where you can specify a wildcard. Like the others have said here, it's for security reasons.
The cache manifest is now deprecated and you should use HTML headers to control caching.
For example:
<meta http-equiv="Cache-control" content="public">
Public - may be cached in public shared caches.
Private - may only be cached in private cache.
No-Cache - may not be cached.
No-Store - may be cached but not archived.