I started wondering what is the overall impact of using whitespaces to indent html documents.
Why not simply use tabs to indent? Wouldn't this be more cost-effective: 1 char (\t) vs. example 4 chars (spaces)?
I did little experimenting by converting an asp.net-page to use tabs and compared sizes of rendered markups.
By replacing only one partial view's white space caused a page of 22kb size to be reduced to 19,4kb -> that's 12% reduction. Changing all indentation, page ended up allocating 16,7kb - 24% reduction! (used chrome dev tools and Fiddler for verifying)
Is my reasoning sound? Should tabs be used primary for indentation of HTML? Is there any reason to use spaces(such as compatibility with exotic browsers)?
ps. Stackoverflow seems to use spaces too. Converting SO main page to use tabs gave 9% reduction. Is this valid observation? If so, why haven’t they used tabs?
StackOverflow uses HTTP Compression - when this is turned on, the differences between using spaces versus tabs goes down - a lot.
You need to run your tests against the compressed versions for reliable results.
You do have a point though for the cases when a browser does not support the compression schemes the server supports.
First thing : html doesn't have a rule of doing indentation. It's done by programmers for code readability and program's structure. More ever We can reduce size taken by indents and white spaces by compression.
Minify/compact/compressing HTML : Compacting HTML code, can save many bytes of data and speed up downloading, parsing, and execution time.
StackOverflow uses HTTP Compression
Minifying HTML has the same benefits as those for minifying CSS and JS: reducing network latency, enhancing compression, and faster browser loading and execution. Moreover, HTML frequently contains inline JS code (in tags) and inline CSS (in tags), so it is useful to minify these as well.
Note: This rule is experimental and is currently focused on size reduction rather than strict HTML well-formedness. Future versions of the rule will also take into account correctness. For details on the current behavior, see the Page Speed wiki.
Tip: When you run Page Speed against a page referencing HTML files, it automatically runs the Page Speed HTML compactor (which will in turn apply JSMin and cssmin.js to any inline JavaScript and CSS) on the files and saves the minified output to a configurable directory.
Refer : http://code.google.com/speed/page-speed/docs/payload.html#MinifyHTML
Why not simply use tabs to indent? Wouldn't this be more cost-effective: 1 char (\t) vs. example 4 chars (spaces)?
If you're worried about downloaded HTML size, you won't fuss over tabs-vs-spaces — you'll compress your HTML as it goes over the wire and minify your markup, CSS, and Javascript, which provide real savings and don't interfere with your own coding guidelines.
Related
If one views the source code of http://www.google.com, it's highly minified. Even the html part. I am just wondering if formatted html takes up more space than minified HTML.
All I can think of is, that in formatted html, the characters : spaces, tabs and newline take space. And that is the only scope where html minification can save memory.
Yes, your thinking is correct. Removing whitespace and compressing the HTML will result in smaller download sizes.
If you'd like to see test cases for HTML minification, check out this blog post on Perfection Kills.
Excerpt:
Original size: 217KB (35.8KB gzipped)
Minified size: 206.6KB (34.3KB gzipped)
Savings: 10.4KB (1.5KB gzipped)
Minifying home page of amazon.com saves about 10KB with uncompressed
document, and only 1.5KB with compressed one.
Yes, there’s a difference. But for many (most?) websites this difference is not worth thinking about, because (1) the server will probably serve the HTML gzipped anyway, and (2) you don’t have enough pageviews to make the difference substantial. (Google does.)
Yes, minifying HTML, CSS, and JavaScript by removing spaces, tabs, newlines, and comments saves on bandwidth cost.
In addition to minifying the HTML, you should also be certain your HTML, CSS, and JavaScript is being GZIP'ed when being sent over the wire for even better performance. For more information about GZIP, read: http://developer.yahoo.com/performance/rules.html#gzip
I would also like to add that it is very important to think about bandwidth cost and page speed to any degree this day in age. Mobile web users are on a large upward swing. Even if you are not expecting a large mobile draw from your site, you are doing a disserve to those trying to access your site on their mobile 3G devices by not taking the proper considerations into bandwidth cost and speed.
It is a simple doubt.
What is the difference between
http://code.jquery.com/mobile/1.1.0/jquery.mobile-1.1.0.css
and
http://code.jquery.com/mobile/1.1.0/jquery.mobile-1.1.0.min.css
Anything wrong happen If i replace any of them with other, in a live site? if both were untouched(not edited previously)?
You can replace them interchangeably.
The regular one is meant for examination and (if necessary) editing. The minified version makes the file as small as possible by removing all the whitespace it can. This makes it load faster for users.
The min version is minified, compressed. Functionally they should be identical. The minified version is smaller and downloads faster and should be used in production, but is unreadable and therefore bad during development.
The one with min simply means it's minified, the one without min is human readable.
To quote from Wikipedia:
Minification (also minimisation or minimization), in computer programming languages and especially JavaScript, is the process of removing all unnecessary characters from source code, without changing its functionality. These unnecessary characters usually include white space characters, new line characters, comments, and sometimes block delimiters, which are used to add readability to the code but are not required for it to execute.
The purpose of minifying code is obvious in web once you compare their sizes: removing unnecessary characters significantly reduces the size of the file that need to be transferred.
The min version is a minified version of the "cegular" CSS file. The end result is exactly the same (same styles are applied). The min version is just smaller, as unnecessary white-space and such are stripped from the file.
The reason for this is to save bandwith and speed up page load times as the browser has to download less stuff in order to render the page.
min means minified version of a cegular css file. The result will be the same, although the min will load faster. You should probably not delete the file if your website is very large, but if its a small website, you can probably delte it.
-BurningPotato
By using minified version of files following advantages could be experienced.
It will drastically reduce loading times and bandwidth usage on your
website.
It also improves site speed and accessibility, directly
translating into a better user experience.
Minification has become standard practice for page optimization.
All major JavaScript library developers (bootstrap, JQuery, AngularJS, etc.) provide minified versions of their files for production deployments, usually denoted with a min.js name extension.
In summary: developers tend to use spacing, comments and well-named variables to make code and markup readable for themselves. This is a plus in the development phase, it becomes a negative when it comes to serving your pages. When minified, comments and extra spaces will be removed saving up file size and reducing bandwidth of network.
Hence it is better to use minified version in your PROD environment.
You may have a look on this
The question is pretty self explanatory. Why shouldn't I strip it? It seems to me that most of the whitespace is used purely for formatting in the text editor and has no impact on the final page.
What's more, when these random nodes of whitespace do have an impact on the final page, it is usually an impact I do not want, such as a mysterious one character (after whitespace collapse) gap between inline-blocks.
I can strip all these whitespace text nodes pretty easily. Is there any reason I shouldn't?
edit:
It's mainly for the strange behaviour where whitespace, rather than for performance. One example is me wanting to put images side by side using inline-block instead of float, while preventing wrapping to next line and allowing them to spill out of the parent.
The whitespace causes these mysterious gaps, which can be removed by basically minifying the HTML source code to remove the whitespace between inline-blocks manually (and completely messing up your source code formatting in the process).
There's no reason not to, really. It can be done very easily with something like htmlcompressor.
However, assuming you're delivering all your html, css, and js files via gzip, then the amount of real-world bandwidth savings you'll see from stripping whitespace will be very small. The question then becomes, is it worth the trouble?
UPDATE:
Perhaps this will affect your decision. I performed a simple minification on a page of my website just to see what kind of difference it would make. Here are the results:
BEFORE minification
22232 bytes (uncompressed)
5276 bytes (gzip)
AFTER minification
19207 bytes (uncompressed)
5146 bytes (gzip) - 130 bytes saved
The uncompressed file is about 3 KB smaller after minification. But that's not really what matters. The gzip compressed file is what is sent over the wire. And you can clearly see that gzip does a pretty good job even with the non-minified HTML.
I see the benefit of minifying js libraries, or things that aren't changing constantly. But I don't think it's worth the trouble doing this to your HTML for a measly 130 bytes.
Let me give one reason why you shouldn't minify html:
How html eventually gets rendered is strongly tie to the CSS applied up on it, but the minifiers usually work without expecting the influence of CSS. All minifiers you can get out there at the time of writing, they remove the spaces in html based on certain assumptions of your coding and CSS styling, if you don't code it the way they expected, the minified rendering result in browser will be different from before minification.
For example, some of minifiers assume the space between "block elements" (such as <div/>, <p/>) can be removed, this is usually true, because spaces between them has no effect on rendering the final result. But what if in the CSS you set "display: inline" or "inline-block" for elements whose default display property is block?
Will below html snippet still rendering as it should be if you remove the spaces between <div/>s ?
<div style="display: inline">will</div> <div style="display: inline">this</div> <div style="display: inline">still</div> <div style="display: inline">work?</div>
You may argue that, we can reserve at least 1 space, and remove remaining consecutive spaces and that still save a lot bytes. Then how about <pre> tag and white-space: pre?
Try copy the html code snippet from below url and paste into your minifier, see if it produces result as before the minification:
https://jsfiddle.net/normanzb/58rpazL2/
The only downside of stripping out whitespace from production pages is readability, and maintainability for the person that follows you in editing that/those page(s); but if you maintain a 'properly'/'readable' whitespaced-version for editing, and then minify that post-editing to form the production pages then it doesn't really cause significant problems.
I'm not sure how effective, or useful, the technique will be, but there's nothing to stop you trying it.
Short answer: no reason whatsoever
The only real purpose white space serves is to make the code more human-readable. You can, over time, save a lot of bandwidth by stripping all the unnecessary white space out of your documents and it should be considered good practice for production code. If your compressing your content the saving will be less, but even 1% of 1GB is 10MB... If your doing 100GB in a month on a busy web site, cutting out 1% of the data might be the difference between two pricing tiers of hosting...
As you say, some browsers (usually IE, grrrr....) will occasionally interpret the white space when they render the page, but usually when this happens it's in a way you'd rather it hadn't...
Minimizing html is the only section on Google's Page Speed where there is still room for improvement.
My site is all dynamic and the HTML is already Deflated so there is no reason to put any more pressure on the server (I don't want to minimize pages real time before sending).
What I could do was to minimize the template files. My templates files are a mix of PHP and HTML so I've come up with some code that I think is pretty safe but would like to be community revised.
// this will loop trough all template files
// php is cleaned first so that line-comments will not interfere with the regex
$original = file_get_contents($dir.'/'.$file);
$php_clean = php_strip_whitespace($dir.'/'.$file);
$minimized = preg_replace('/\s+/', ' ', $php_clean);
This will make my template files as a single very long file alternated with some places where DB content is inserted. Google's homepage source looks more or less like what I get so I wonder if they follow a similar approach.
Question 1: Do you antecipate potencial problems?
Question 2: Is there anyway better (more efficient to do this)?
And please remember that I'm not trying to validate HTML as the templates are not valid HTML (header and footer are includes, for example).
Edit: Do take into consideration that the template files will be minimized on deploy. As CSS and Javascript files are minimized and compressed using YUI Compressure and Closure, the template files would be minimized like-wise, on deploy. Not on client-request.
Thank you.
Google's own Closure Templates (Soy) strips whitespace at the end of the line by default, and the template designer explicitly inserts a space using {sp}. This probably isn't a good enough reason to switch away from PHP, but I just wanted to bring it to your attention.
In addition, realize that HTML 4 allows you to exclude some tags, as recommended by the Page Speed documentation on minifying HTML (http://code.google.com/p/page-speed/wiki/MinifyHtml). You can exclude </p>, </td>, </tr>, etc. For a complete list of elements for which you can omit the end tag, search for "- O" in the HTML 4 DTD (http://www.w3.org/TR/REC-html40/sgml/dtd.html). You can even omit the <html>, <head>, <body>, and <tbody> tags entirely, as both start and end tags are optional ("O O" in the DTD).
You can also omit the quotes around attributes (http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.2) such as id, class (with a single class name), and type that have simple content (i.e., matches /^[-A-Za-z0-9._:]+$/). For attributes that have a single possible value, you can exclude the value (e.g., say simply checked rather than checked=checked).
Some people may find these tips repulsive because we've been conditioned for so many years to prepare for the upcoming world of simple LALR parsers for XHTML. Thus, tools like Dave Raggett's HTML Tidy generate HTML with proper closing tags and quotes around attribute values. But let's face it, all the browsers already have parsers that understand HTML 4, any new browser will use the HTML 5 parser rather than XHTML, and we should get comfortable writing HTML that is optimized for size.
That being said, besides a couple large companies like Google and Facebook, my guess is that page size is a negligible component of latency, so if you're optimizing your own site it's probably because of your own obsessive tendencies rather than performance.
White space can be significant (e.g. in pre elements).
When I had a particularly large page (i.e. large enough that there was a benefit in minifying the HTML) I used HTML Tidy and cached the results.
tidy -c -n -omit -ashtml -utf8 --doctype strict \
--drop-proprietary-attributes yes --output-bom no \
--wrap 0
I think you'll end up running into issues with load time with this approach, as the get contents, strip whitespace, and preg replace calls are going to take a lot longer to do than whatever bandwidth the minified HTML is saving you.
I've been running tests on all my sites for a couple of weeks and I can say that this method is pretty consistent. It will only affect template content, so there is little risk of messing up with unknown <pre> or similar.
It is run before deploy so there is no impact on server - actually there should be a little speed up as the file becomes smaller.
Do remember that all content that comes from the database will not suffer any influence as, like said before, this runs before deploy and on template files only.
The method seams solid enough to pass it into production.
If anything goes wrong I'll post it here.
I have a few hand-crafted web pages. When deploying them I would like to run them through a tool so that new smaller HTML files are created, with extraneous whitespace taken out, etc.
We already use YUICompressor for our Javascript and our CSS, and we tend to follow all of the techniques described by the Yahoo performance team.
Is there a good, free tool that does this? I prefer tools that would fit into our deployment process similarly to YUICompressor.
HTML Tidy does the job.
I use the following on one document that I generate (a rather large one). This saved me about 10% on the post-gzip size.
tidy -c -omit -ashtml -utf8 --doctype strict \
--drop-proprietary-attributes yes --output-bom no \
--wrap 0 source.html > target.html
-c — Replace surplus presentational tags and attributes
-omit — Drop optional end tags
-ashtml — use HTML rather than XHTML (HTML is leaner and XHTML provides no benefits for most use cases)
-utf8 — So we don't have to use entities for characters outside the character set (entities are more bytes)
--doctype strict — use Strict (again, leaner)
--drop-proprietary-attributes yes — get rid of proprietary junk
--output-bom no — BOMs cause issues in some clients
--wrap 0 — Have very long lines
Plain old minify will also attack your HTML for you, if you want.
But HTML minification isn't, generally, hugely effective:
Taking runs of whitespace down to one won't do that much. If you're already using gzip/deflate, that'll be compressing the whitespace quite efficiently. You can't remove all whitespace as single whitespaces can often have an effect on rendering that it is desirable to keep.
Taking comments out may have an effect, depending on how much comment content you actually have. But you'd have to be careful not to hit conditional comments.
Apart from that, there is not much in an HTML document that can be ‘minified’. Obviously the JS idea of packing variable names down to the shortest possible string is inapplicable.
Doing all this with regex, as most minifiers do, is a bit dodgy. You have to stick to a limited ‘normal’ range of markup that won't trip it up.
With HTML minification you're typically getting less gain (and less post-gzip gain) than JS/CSS minification, and for dynamically-generated pages you have more overhead (as you can't pre-minify them like with static scripts/styles). Some templating languages may already have built-in features for trimming whitespace at generation time; if available in your environment, use that.