I have a problem. I am creating website and it is working well. But my boss told me that I shouldn't indent my codes because it can affect memory space because it reads the spaces or every indention in the code? Is this true? If yes can you provide me a reference for that so that I can defend myself. My boss show me some website in Japan that didn't indent their code and He ask me If indention is a program standard. If yes why some of the website in Japan didn't indent their code. It is all align left. I forgot some of the websites He showed me because it is in Japanese.
That's all thanks.
Here's the website that has no indention
http://www.dior.com/couture/home/ja_jp
For compiled code, it makes no difference whatsoever unless whitespace is significant in the language and means something that impacts memory usage.
Code transmitted over a network (HTML, CSS, JS, XML, etc.) can be made marginally smaller by removing spaces (in addition to compressing the output, which is considered a best practice). But this should never impact coding style and readability!
Whitespace removal can be done automatically when the page is served. This will actually increase the load on the server slightly (CPU, possibly memory), not reduce it. If the bandwidth savings are worthwhile (doubtful, if compression is enabled), it is an acceptable trade.
This answer shows the savings (or lack thereof) achieved by removing whitespace.
But my boss told me that I shouldn't indent my codes because it can
affect memory space because it reads the spaces or every indention in
the code?
"It" could refer to the web server sending the content or the browser reading the content.
Server: must send bytes of data from disk/memory in response to a request. So yes, extra whitespace may take a trivial amount more memory if the data is buffered and/or cached.
Browser: the browser can discard whitespace it doesn't need. It may never use any memory at all for those extra bytes of whitespace.
So (to be generous) your boss is right, but he is focused on the wrong thing. The savings (if at all) is measured in bytes and nanoseconds.
There are many, many other things that can be optimized first and will yield much more substantial benefits.
Hurting developer productivity is expensive. People indent code to make it more readable, and lack of readability equals lost productivity.
Related
If one views the source code of http://www.google.com, it's highly minified. Even the html part. I am just wondering if formatted html takes up more space than minified HTML.
All I can think of is, that in formatted html, the characters : spaces, tabs and newline take space. And that is the only scope where html minification can save memory.
Yes, your thinking is correct. Removing whitespace and compressing the HTML will result in smaller download sizes.
If you'd like to see test cases for HTML minification, check out this blog post on Perfection Kills.
Excerpt:
Original size: 217KB (35.8KB gzipped)
Minified size: 206.6KB (34.3KB gzipped)
Savings: 10.4KB (1.5KB gzipped)
Minifying home page of amazon.com saves about 10KB with uncompressed
document, and only 1.5KB with compressed one.
Yes, there’s a difference. But for many (most?) websites this difference is not worth thinking about, because (1) the server will probably serve the HTML gzipped anyway, and (2) you don’t have enough pageviews to make the difference substantial. (Google does.)
Yes, minifying HTML, CSS, and JavaScript by removing spaces, tabs, newlines, and comments saves on bandwidth cost.
In addition to minifying the HTML, you should also be certain your HTML, CSS, and JavaScript is being GZIP'ed when being sent over the wire for even better performance. For more information about GZIP, read: http://developer.yahoo.com/performance/rules.html#gzip
I would also like to add that it is very important to think about bandwidth cost and page speed to any degree this day in age. Mobile web users are on a large upward swing. Even if you are not expecting a large mobile draw from your site, you are doing a disserve to those trying to access your site on their mobile 3G devices by not taking the proper considerations into bandwidth cost and speed.
I am storing the HTML from the body of emails in a SQL Server nvarchar(max) column.
Is there any benefit in minimizing the HTML on the way in?
By minimizing I mean removing redundant white space and carriage returns/linefeeds in the HTML text stream. My terminology might not be quite right: I'm not looking at removing any HTML tags/comments or anything like that.
By benefit I mean in terms of efficiency of storage space, speed of insert/retrieval, so benefits are focused on the database side.
If it is worthwhile to do, what should I look out for (e.g. if I replace linefeeds with a single space, might it render the HTML incorrectly at a later time)?
You'd still have to have a full HTML parser to understand what's HTML and whats not. Most browsers do a bit of 'fixing up' to make otherwise unpresentable HTML graphically renderable -- in such a way that without fully parsing the tree would be impossible.
Someone could stick some bad HTML in that'd goof up your 'simple' parser pretty easily more often by mistake than malice. Don't get in the business of fixing HTML, handle it verbatim and let the bad content hang itself.
HTML will be just be stored as a BLOB in the database. You won't be able to parse it, search it etc (well, you technically can but that's silly). In that case, you can (un)compress it in the client and send it+store it as varbinary(max) in the database.
The trade off is CPU time to manage compression vs increased storage+network traffic.
I wouldn't sanitise the HTML because you'll lose readability and possibly original content.
I started wondering what is the overall impact of using whitespaces to indent html documents.
Why not simply use tabs to indent? Wouldn't this be more cost-effective: 1 char (\t) vs. example 4 chars (spaces)?
I did little experimenting by converting an asp.net-page to use tabs and compared sizes of rendered markups.
By replacing only one partial view's white space caused a page of 22kb size to be reduced to 19,4kb -> that's 12% reduction. Changing all indentation, page ended up allocating 16,7kb - 24% reduction! (used chrome dev tools and Fiddler for verifying)
Is my reasoning sound? Should tabs be used primary for indentation of HTML? Is there any reason to use spaces(such as compatibility with exotic browsers)?
ps. Stackoverflow seems to use spaces too. Converting SO main page to use tabs gave 9% reduction. Is this valid observation? If so, why haven’t they used tabs?
StackOverflow uses HTTP Compression - when this is turned on, the differences between using spaces versus tabs goes down - a lot.
You need to run your tests against the compressed versions for reliable results.
You do have a point though for the cases when a browser does not support the compression schemes the server supports.
First thing : html doesn't have a rule of doing indentation. It's done by programmers for code readability and program's structure. More ever We can reduce size taken by indents and white spaces by compression.
Minify/compact/compressing HTML : Compacting HTML code, can save many bytes of data and speed up downloading, parsing, and execution time.
StackOverflow uses HTTP Compression
Minifying HTML has the same benefits as those for minifying CSS and JS: reducing network latency, enhancing compression, and faster browser loading and execution. Moreover, HTML frequently contains inline JS code (in tags) and inline CSS (in tags), so it is useful to minify these as well.
Note: This rule is experimental and is currently focused on size reduction rather than strict HTML well-formedness. Future versions of the rule will also take into account correctness. For details on the current behavior, see the Page Speed wiki.
Tip: When you run Page Speed against a page referencing HTML files, it automatically runs the Page Speed HTML compactor (which will in turn apply JSMin and cssmin.js to any inline JavaScript and CSS) on the files and saves the minified output to a configurable directory.
Refer : http://code.google.com/speed/page-speed/docs/payload.html#MinifyHTML
Why not simply use tabs to indent? Wouldn't this be more cost-effective: 1 char (\t) vs. example 4 chars (spaces)?
If you're worried about downloaded HTML size, you won't fuss over tabs-vs-spaces — you'll compress your HTML as it goes over the wire and minify your markup, CSS, and Javascript, which provide real savings and don't interfere with your own coding guidelines.
Most web pages are filled with significant amounts of whitespace and other useless characters which result in wasted bandwidth for both the client and server. This is especially true with large pages containing complex table structures and CSS styles defined at the level. It seems like good practice to preprocess all your HTML files before publishing, as this will save a lot of bandwidth, and where I live, bandwidth aint cheap.
It goes without saying that the optimisation should not affect the appearance of the page in any way (According to the HTML standard), or break any embedded Javascript or backend ASP code, etc.
Some of the functions I'd like to perform are:
Removal of all whitespace and carriage returns. The parser needs to be smart enough to not strip whitespace from inside string literals. Removal of space between HTML elements or attributes is mostly safe, but iirc browsers will render the single space between div or span tags, so these shouldn't be stripped.
Remove all comments from HTML and client side scripts
Remove redundant attribute values. e.g. <option selected="selected"> can be replaced with <option selected>
As if this wasn't enough, I'd like to take it even farther and compress the CSS styles too. Pages with large tables often contain huge amounts of code like the following: <td style="TdInnerStyleBlaBlaBla">. The page would be smaller if the style label was small. e.g. <td style="x">. To this end, it would be great to have a tool that could rename all your styles to identifiers comprised of the least number of characters possible. If there are too many styles to represent with the set of allowable single digit identifiers, then it would be necessary to move to larger identifiers, prioritising the smaller identifiers for the styles which are used the most.
In theory it should be quite easy to build a piece of software to do all this, as there are many XML parsers available to do the heavy lifting. Surely someone's already created a tool which can do all these things and is reliable enough to use on real life projects. Does anyone here have experience with doing this?
The term you're probably after is 'minify' or 'minification'.
This is very similar to an existing conversation which you may find helpfull:
https://stackoverflow.com/questions/728260/html-minification
Also, depending on the web server you use and the browser used to look at your site, it is likely that your server is already compressing data without you having to do anything:
http://en.wikipedia.org/wiki/HTTP_compression
your 3 points are actually called "Minimizing HTML/JS/CSS"
Can have a look these:
HTML online minimizer/compressor?
http://tidy.sourceforge.net/
I have done some compression HTML/JS/CSS too, in my personal distributed crawler. which use gzip, bzip2, or 7zip
gzip = fastest, ~12-25% original filesize
bzip2 = normal, ~10-20% original filesize
7zip = slow, ~7-15% original filesize
The question is pretty self explanatory. Why shouldn't I strip it? It seems to me that most of the whitespace is used purely for formatting in the text editor and has no impact on the final page.
What's more, when these random nodes of whitespace do have an impact on the final page, it is usually an impact I do not want, such as a mysterious one character (after whitespace collapse) gap between inline-blocks.
I can strip all these whitespace text nodes pretty easily. Is there any reason I shouldn't?
edit:
It's mainly for the strange behaviour where whitespace, rather than for performance. One example is me wanting to put images side by side using inline-block instead of float, while preventing wrapping to next line and allowing them to spill out of the parent.
The whitespace causes these mysterious gaps, which can be removed by basically minifying the HTML source code to remove the whitespace between inline-blocks manually (and completely messing up your source code formatting in the process).
There's no reason not to, really. It can be done very easily with something like htmlcompressor.
However, assuming you're delivering all your html, css, and js files via gzip, then the amount of real-world bandwidth savings you'll see from stripping whitespace will be very small. The question then becomes, is it worth the trouble?
UPDATE:
Perhaps this will affect your decision. I performed a simple minification on a page of my website just to see what kind of difference it would make. Here are the results:
BEFORE minification
22232 bytes (uncompressed)
5276 bytes (gzip)
AFTER minification
19207 bytes (uncompressed)
5146 bytes (gzip) - 130 bytes saved
The uncompressed file is about 3 KB smaller after minification. But that's not really what matters. The gzip compressed file is what is sent over the wire. And you can clearly see that gzip does a pretty good job even with the non-minified HTML.
I see the benefit of minifying js libraries, or things that aren't changing constantly. But I don't think it's worth the trouble doing this to your HTML for a measly 130 bytes.
Let me give one reason why you shouldn't minify html:
How html eventually gets rendered is strongly tie to the CSS applied up on it, but the minifiers usually work without expecting the influence of CSS. All minifiers you can get out there at the time of writing, they remove the spaces in html based on certain assumptions of your coding and CSS styling, if you don't code it the way they expected, the minified rendering result in browser will be different from before minification.
For example, some of minifiers assume the space between "block elements" (such as <div/>, <p/>) can be removed, this is usually true, because spaces between them has no effect on rendering the final result. But what if in the CSS you set "display: inline" or "inline-block" for elements whose default display property is block?
Will below html snippet still rendering as it should be if you remove the spaces between <div/>s ?
<div style="display: inline">will</div> <div style="display: inline">this</div> <div style="display: inline">still</div> <div style="display: inline">work?</div>
You may argue that, we can reserve at least 1 space, and remove remaining consecutive spaces and that still save a lot bytes. Then how about <pre> tag and white-space: pre?
Try copy the html code snippet from below url and paste into your minifier, see if it produces result as before the minification:
https://jsfiddle.net/normanzb/58rpazL2/
The only downside of stripping out whitespace from production pages is readability, and maintainability for the person that follows you in editing that/those page(s); but if you maintain a 'properly'/'readable' whitespaced-version for editing, and then minify that post-editing to form the production pages then it doesn't really cause significant problems.
I'm not sure how effective, or useful, the technique will be, but there's nothing to stop you trying it.
Short answer: no reason whatsoever
The only real purpose white space serves is to make the code more human-readable. You can, over time, save a lot of bandwidth by stripping all the unnecessary white space out of your documents and it should be considered good practice for production code. If your compressing your content the saving will be less, but even 1% of 1GB is 10MB... If your doing 100GB in a month on a busy web site, cutting out 1% of the data might be the difference between two pricing tiers of hosting...
As you say, some browsers (usually IE, grrrr....) will occasionally interpret the white space when they render the page, but usually when this happens it's in a way you'd rather it hadn't...