Website files caching? - html

I want to know how long certain files like css, html and js are desirable to be cached by .htaccess setting and why different time setting for each file type?
In few examples i saw that someone cache html for 10 mins, js for a month and imagery for a year.

I think it depends how often a resource is updated. You HTML content is probably dynamic, so you can't cache it for a long time. Otherwise a visitor sees the changes after a long delay.
On the other side, pictures are rarely updated, so you can set longer cache time.
The JavaScript files are often updated for new features or bugfixes. Maybe you can use a version number for this files (core.js?v=12323) so that you can change the number in your HTML content to get them refreshed by a visitor. This way you can cache them for a longer time as well.

Related

How to minimise the time for new static content to appear on the GitHub Pages CDN?

Assume we are only pushing lightweight static content like small HTML or JS files with no Liquid tags. There are no plugins, and there is no _posts/ directory, and files are never changed once committed.
Because nothing really needs to be built, in theory if we configure incremental_build: true and keep_files: ['.html', '.js'], then the build should be very fast.
However, right now, the GitHub pages build only happens every 5 minutes or so, so effectively there is a lag of 0 to 10 minutes.
Is there a way to reduce the time it takes for the file to appear at [repo].github.io/[path]? Is there some logic to it, for example do more commits or more files or more reads have an effect one way or another?
Github Pages does not respect those options. You could try prebuilding your site, but will possibly increase the total time to deploy. It's also possible that the build is happening instantly but it's taking time for the CDN to receive updates and invalidate caches.
You can try using another host (like running your own Jekyll server on EC2) or having your build upload the static content to S3 instead.
However, I recommend taking a step back and asking why you need less than 10 minute latency on deploy. If there are highly volatile resources you need to serve, then perhaps you need to identify those and serve them in a different way. Static site generators are good at, well, static content, not so much for highly volatile content.
If the volatile resources are page content, then it sounds like you have a use case better served with a mainstream CMS like Wordpress. If it's code, then deploy that separately to S3 and reference it in your site.

Changing name of CSS every time the website code is updated in the server

We are using a product called "mouseflow" which basically does heatmaps and user recordings. The problem is, because we are updating the site a few times a day (due to bugs found, UI/UX changes etc), the recordings in the dashboard doesnt seen normal.
I would see something like this:
Here is the answer received from their support:
" It has to do with how we save the recorded pages on our end. We save the HTML shown to the visitor, but not the external resources like stylesheets, images and script-files. Those are loaded directly from your webserver. And if these files suddenly become available, it can throw off the playback and heatmaps.
In your case it seems you've recently made some updates to your live page, changing the filename of one of your stylesheets. The saved HTML was referencing the file 'https://mywebsite.com/app.e28780aef0c5d0f9ed85.css', which is no longer available on your server. Instead, you are now using the file 'https://mywebsite.com/app.20d77e2240a25ae92457.css'.
I suspect the filename of this stylesheet is automatically updated whenever the content is changed."
The problem is
My tech team tells me that CSS file name always changes after its mimified and they really cant do anything about it. On the other hand, we really want to know what the user is actually seeing.
Is there any way around it? Can we have a stable file name even after mimifying the file?
Another option for you could be to copy the contents of the CSS files to a static file hosted on your server. The file should have a name that would never change (like mouseflow.css). Mouseflow could then insert a reference to that file, to load the needed CSS. This is something I know they can do quite easily.
You would need to manually update the static file, whenever major changes are made to the CSS on the livesite - but you wouldn't have to do it every time the file names changes.
Like Ouroborus just said, there is no such thing as "we cant do anything about it". It is bounded to the way you or the designer leader tell how things will work.
Update the css 10 times a day isnt that much, so you can still manually changing the name the file name. If the file is called in a several files, but each file call the file again (not using an general header), so you can start work on it.
You also can keep an backup of all those old versions in your webserver.
And last, but not least, you can stop minifying your files, and work with something like SASS or LESS. Is way more productive and you will avoid this kind of issue.
Hope it helps you, and sorry about my english.
Best regards.
The point of changing the file name is to invalidate the client cache. Every time your team makes changes, the filename changes, and the browser knows it needs to download it again. If the content hasn't changed between two visits, the file name will be the same, and the client browser will used a locally cached version of the file if it has any.
So changing the filename makes the site update for everyone right after changes are published.
One solution is to remove the hash from the filename, and set a short cache duration, but that's bad for performance and not good practice.

How to trigger browser html refresh for cached html files?

YSLOW suggests: For static components: implement "Never expire" policy by setting far future Expires header.... if you use a far future Expires header you have to change the component's filename whenever the component changes. At Yahoo! we often make this step part of the build process: a version number is embedded in the component's filename, for example, yahoo_2.0.6.js.
http://developer.yahoo.com/performance/rules.html
I'd like to take advantage of caching for my mostly static pages and reload the js files when the version # changes. I've set a version # for my .js files but my main.html page has Expires set to the future so it doesn't reload and therefore doesn't reload the js files. Ideally I'd like to tell the browser (using a psychic technique) to reload main.html when a new version of the site is released. I could make my main.html page always reload but then I loose the caching benefit. I'm not looking for the ctrl-F5 answer as this needs to happen automatically for our users.
I think the answer is: main.html can't be cached, but I'd like to hear what are others doing to solve this problem. How are you getting the best caching vs. reload benefits.
Thanks.
Your analysis is correct. Web performance best practices suggest a far future expiration date for static components (i.e., those which don't change often), and using a version number in the URL manages those changes nicely.
For the main page (main.html), you would not set a far future expiration date. Instead, you could not set an expiration, or set it for a minimal amount of time, for example +24 hours.
Guess it depends on why you want to cache the HTML page - to improve user load-times or reduce server load.
Even with a long expiry time you might find that it's not actually cached at the client for very long (Yahoo studies show that files don't live in the cache for very long), so a shorter expiry time e.g. 1 day, might not be an issue.
If it's to reduce backend load, it might be worth looking at whether a proxy like Varnish would help i.e. it caches the pages from the origin server at serves them when requested. This way you could control how long pages are cached with a finer level of control.

HTML - Cache control max age

I'ld like to present always the latest website content to the user but also have it fast loaded. By researching I came across postings people suggesting to use the cache for speeding up loading.
So what do I need to add to my website to "overwrite" the cache after 3 days to display the latest content?
The Cache-Control header is used in HTTP 1.1 to control the behavior of caches. The max-age directive is used to specify (in seconds) the maximum age of the content before it becomes stale (i.e., the content will not change for some period of time). So if you know that your content will not change for 3 days, you want your server to add the following HTTP header:
Cache-Control: max-age=259200
(259200 = 60s x 60m x 24h x 3d)
To do that in PHP, add this line to your output:
header('Cache-Control: max-age=259200');
Read here for more info on the header function:
http://php.net/manual/en/function.header.php
There is more than one way to do this - but you need to consider exactly what you need to cache and what you don't. The biggest speed increases will likely come from making sure your assets (css, images, javascript) are cached, rather than the html itself. You then need to look at various factors (how often do these assets change, how will you force a user to download a new version of the file of you do change it?).
Often as part of a sites release process, new files (updated files) are given a new filename to force the users browser to redownload the file, but this is only one approach.
You should take a look at apache mod_expire, and the ability to set expiry times for assets using the .htaccess file.
http://www.google.com/?q=apache+cache+control+htaccess#q=apache+cache+control+htaccess
As mentioned Expires and Cache-Control Headers are usually the best way to incorporate information about information lifetime.
Because clients are not very reliable on interpreting these informations proxies with caching capabilities like squid, varnish or such solutions are preferred by most people. You also need to consider if you want to cache only static content (like images, stylesheets, ..) or dynamically generated content as well.
As per the YSlow recommendations you could configure your web server to add an Expires or a Cache-Control HTTP header to the response which will result in user agents caching the response for the specified duration.

Why people always encourage single js for a website?

I read some website development materials on the Web and every time a person is asking for the organization of a website's js, css, html and php files, people suggest single js for the whole website. And the argument is the speed.
I clearly understand the fewer request there is, the faster the page is responded. But I never understand the single js argument. Suppose you have 10 webpages and each webpage needs a js function to manipulate the dom objects on it. Putting 10 functions in a single js and let that js execute on every single webpage, 9 out of 10 functions are doing useless work. There is CPU time wasting on searching for non-existing dom objects.
I know that CPU time on individual client machine is very trivial comparing to bandwidth on single server machine. I am not saying that you should have many js files on a single webpage. But I don't see anything go wrong if every webpage refers to 1 to 3 js files and those js files are cached in client machine. There are many good ways to do caching. For example, you can use expire date or you can include version number in your js file name. Comparing to mess the functionality in a big js file for all needs of many webpages of a website, I far more prefer split js code into smaller files.
Any criticism/agreement on my argument? Am I wrong? Thank you for your suggestion.
A function does 0 work unless called. So 9 empty functions are 0 work, just a little exact space.
A client only has to make 1 request to download 1 big JS file, then it is cached on every other page load. Less work than making a small request on every single page.
I'll give you the answer I always give: it depends.
Combining everything into one file has many great benefits, including:
less network traffic - you might be retrieving one file, but you're sending/receiving multiple packets and each transaction has a series of SYN, SYN-ACK, and ACK messages sent across TCP. A large majority of the transfer time is establishing the session and there is a lot of overhead in the packet headers.
one location/manageability - although you may only have a few files, it's easy for functions (and class objects) to grow between versions. When you do the multiple file approach sometimes functions from one file call functions/objects from another file (ex. ajax in one file, then arithmetic functions in another - your arithmetic functions might grow to need to call the ajax and have a certain variable type returned). What ends up happening is that your set of files needs to be seen as one version, rather than each file being it's own version. Things get hairy down the road if you don't have good management in place and it's easy to fall out of line with Javascript files, which are always changing. Having one file makes it easy to manage the version between each of your pages across your (1 to many) websites.
Other topics to consider:
dormant code - you might think that the uncalled functions are potentially reducing performance by taking up space in memory and you'd be right, however this performance is so so so so minuscule, that it doesn't matter. Functions are indexed in memory and while the index table may increase, it's super trivial when dealing with small projects, especially given the hardware today.
memory leaks - this is probably the largest reason why you wouldn't want to combine all the code, however this is such a small issue given the amount of memory in systems today and the better garbage collection browsers have. Also, this is something that you, as a programmer, have the ability to control. Quality code leads to less problems like this.
Why it depends?
While it's easy to say throw all your code into one file, that would be wrong. It depends on how large your code is, how many functions, who maintains it, etc. Surely you wouldn't pack your locally written functions into the JQuery package and you may have different programmers that maintain different blocks of code - it depends on your setup.
It also depends on size. Some programmers embed the encoded images as ASCII in their files to reduce the number of files sent. These can bloat files. Surely you don't want to package everything into 1 50MB file. Especially if there are core functions that are needed for the page to load.
So to bring my response to a close, we'd need more information about your setup because it depends. Surely 3 files is acceptable regardless of size, combining where you would see fit. It probably wouldn't really hurt network traffic, but 50 files is more unreasonable. I use the hand rule (no more than 5), but surely you'll see a benefit combining those 5 1KB files into 1 5KB file.
Two reasons that I can think of:
Less network latency. Each .js requires another request/response to the server it's downloaded from.
More bytes on the wire and more memory. If it's a single file you can strip out unnecessary characters and minify the whole thing.
The Javascript should be designed so that the extra functions don't execute at all unless they're needed.
For example, you can define a set of functions in your script but only call them in (very short) inline <script> blocks in the pages themselves.
My line of thought is that you have less requests. When you make request in the header of the page it stalls the output of the rest of the page. The user agent cannot render the rest of the page until the javascript files have been obtained. Also javascript files download sycronously, they queue up instead of pull at once (at least that is the theory).