Value of whitespace in html source code - html

I'm using a function to generate all output in php. Using that function I can control whether to display the code like this:
<html><header></header><body><p>Hello World!</p></body></html>
or like this
<html>
<header>
</header>
<body>
<p>Hello World!</p>
</body>
</html>
including the indentation and all.
Is there a particular value to displaying the code indented and spaced (besides seemingly slower loading time)? I usually don't need to view the source code, since I can simply access the PHP file. During development I would most likely prefer whitespace, but when on production would it necessarily be advantageous?
Thanks!

I'd space it out if you have the option, there's nothing wrong with white-space to make something readable, and with GZip it makes the download difference not all that major anyway. You never know when you'll have to debug a style, it'll save you time later by having it pretty now, trust me.

All whitespace is condensed to a single space, rather than nothing, so there is a slight difference. For example:
<img src="image.jpg"><img src="image2.jpg">
Will produce slightly different results to this:
<img src="image.jpg">
<img src="image2.jpg">
So at a minimum, use a single space/newline between tags. Personally I prefer using spacing on live sites because it aids live debugging, and when using gzip the difference between space and no-space is tiny anyway.
And of course, it would also help budding new developers who might like to see "how it was done".

I prefer to omit whitespaces, especially in production.
You can still view the code through Firebug. there is no reason to do "view source".
Note that spaces can cause some problems, because they are considered as a space.

If you strip out whitespace you'll rue your parsimonious nature one day when you have to View Source in Internet Explorer on some remote client machine and have to wade through a swamp of HTML tags.

Whitespace will unnecessarily accumulate network bandwidth. No, GZIP won't fix it up to with 100%. I myself trim all the whitespace from the response and then pass it through GZIP. The only ones who cares about whitespace in HTML source are webdevelopers who are curious how the page source look like. They are really not worth the waste of network bandwidth --unless you're practically the only visitor ;)

Is Javascript or PHP generating your Html? Either way - utilize an escape character.
\n = new line
\t = tab
\r = carriage return
<?php echo "This is a test. <br> \n"; ?>

Related

How do I disable Markdown code block generation?

I have an unusual situation. I am in a transitional state for a website that will eventually be a wiki-like site that uses markdown files to generate documentation. However, for our phase 0 demonstration to upper management, I need to use HTML instead of markdown for advanced layouts. This leads to large portions of the Markdown files being HTML. Generally speaking, this is working fine, but sometimes the "4 spaces means code block" "feature" of markdown means that instead of rendering the page, I just get the HTML pasted to the screen in a <pre>.
So, my question is, how can I turn off the "4 spaces means code block" thing? IMO, this is an idiotic design in the first place, but it's really screwing with my current project!
For example:
I have a banner
<div class="banner detail">
<div class="banner-inner">
...
</div>
</div>
On some pages, this renders exactly as expected. On others, it spits out the "banner-inner" div and everything inside it to the page. Hell, even convincing this editor to display that code snippet instead of processing it took 5 minutes of trial and error poking...
Please, some one help me turn off or get around (without simply not using indenting...) this "feature"!!
Sadly, whether this "feature" can be turned off is relegated to a customization question for the specific software package in use.
On the bright side, I was able to eventually determine that the specific problem I was having was caused by the engine interpreting invalid HTML code (ie, missing a closing tag) as code to be displayed rather than processed. So in the end, seeing this happen actually tended to mean that I had a bug to fix.

Are white spaces and blank lines in http response affect the speed of downloading the page?

I found by wget some_url that it has so many white spaces and blank lines like
<span class="meta">
someValue
</span>
And the whole document downloaded by wget is in good layout as we can see in Chrome dev tools,Does the document has so many white spaces and blank lines (or tabs) and they're downloaded as well as the main content.
e.g
if the document(also, download by wget or curl) is:
<div class=" someclass">
somevalue
</div>
there're 5 spaces(3 before someclass , 2 before </div>) and 2 blank lines wrapping somevalue
Was it downloaded in tighten form like:
<div class="someclass">somevalue</div>
if not,I'm shock by the fact that some many bandwidth is wasted by these mostly useless information,Are the just the wasted(except they're for layout purpose)?
It is my understanding that whitespace takes up just as much as a character- So technically yes, it's "a waste". However, generally speaking, it is something that you would not ever notice as there are many other things that hinder load time. if You were loading an incredibly large page with a high percentage of whitespace on an incredibly slow network, you might be able to notice.
Generally it is better to think not about how it affects performance (because it doesn't) and think more about whether it makes your code readable. When writing something you need to revisit or show to others, whitespace is very important. When obscuring code so people won't mess with it, getting rid of whitespace can go a long way.
You can set a compression algorithm for the webserver to use with the Content-Encoding header. For example, gzip: http://betterexplained.com/articles/how-to-optimize-your-site-with-gzip-compression/
However, the webserver doesn't have to do it. It's like you're strongly hinting for the webserver to compress your traffic.

How do I remove excess whitespace in an HTML file? (And only excess whitespace)

I have a horrible, ugly HTML file that was spat out by a form generator and slightly modified to look nice. This HTML file needs to be translated, so I hooked up some scripts using po4a and csv2po, and that all works fairly well except for one thing: some of the base strings in our translation templates are surrounded by whitespace, and the translators get rather confused.
The other thing is I have this working with a Makefile (because that generated form is updated quite frequently and I'm a nerd). I'd like to keep it that way because it's nice for my workflow. So, I need a command line tool.
I'm really looking for the simplest solution in this case, so I ran the HTML file through HTML Tidy, and that removes the weird whitespace quite competently. However, it does a lot of stuff I don't need. It messes with the doctype (and it doesn't support an html5 doctype), and I've ended up with a really crazy command line just to get it to not mangle things. It is not very pleasant.
All I really want is a command line tool (not an online one) whose single goal in life is to look at my HTML file and format it nicely. Ideally not a "compressor" thing, but if that's the only option, suggestions would be nice :)
Stick it in an ide or text editor like notepad++ or net beans and hit the "format code" button which is available in nearly every ide?
I'm not sure if it is still being developed, but would HTML Tidy do the trick?

When "viewing source", some sites have neat markup, some sites don't. Why? (pic attached)

Notice how in the 'ugly' side, the doctype is all the way indented and some of the meta lines extend past the left indent.
How can I get my markup looking neat when viewing source in a browser? Is there a certain way to encode the code while using an editor? I use Notepad++ by the way.
Large blocks of unindented code like you see in the left hand side are probably being written out server side, and so although the tag that creates them is nicely indented in your HTML the erver script output will not honour that.
It's not about encoding, it's about writing neat source code, haha. If you are outputting from php or something you can use keep track of how far to indent each thing or you an use some sort of template output function that keeps track of how many tags are open for you and indents the correct amount each time. But, there is no point on having neat HTML, the only important thing is that it's valid. Developer Tools will make it neat for you when you're trying to debug, and actually removing all that whitespace used to make it neat can reduce your page size quite a bit.
The ugly ones probably look pretty in the underlying php or other source. Once generated into HTML it looks ugly, and very few programmers will try to make that pretty too - it's not worth it.
It's funny that what you list as "ugly" seems properly indented to me... at least from what I can tell from the screenshot.
In any case, it doesn't matter. Most of the time these days, sites are made with something dynamic, and a lot of the HTML formatting isn't explicitly output.
If you were to view the source on many of my sites, it is all rammed together on one line, as that is how I echo it out. I don't see the point in wasting bytes on line feeds. Especially these days with all of the browser tools available that reformat the source while debugging.
I use Eclipse to do my coding and I can use Source->Format to clean up my code and format it nicely.
For Notepad++, I believe you can use HTML tidy as per: Formatting code in Notepad++
TextFX -> HTML Tidy -> Tidy: Reindent XML
You really want your HTML code to look like this:
view-source:http://lightningsoul.com/
As it uses the minimum amount of data to present itself to the browser. Remember that indents and white-spaces consume data as well as any other character.

What should I consider before minifying HTML?

I've googled around but can't find any HTML minification scripts.
It occurred to me that maybe there is nothing more to HTML minification than removing all unneeded whitespace.
Am I missing something or has my Google Fu been lost?
You have to be careful when removing stuff from HTML as it's a fragile language. Depending on how your pages are coded some of that whitespace might be more significant; also if you have CSS styles such as white-space: pre then you may need to keep the whitespace. Plus there are numerous browser bugs, etc, and basically every character in an HTML file might be there to satisfy some requirement or appease some browser.
In my opinion your best bet is to design the pages well with CSS techniques (I was recently able to take an important page on the site I work for and reduce it's size by 50% just by recoding it using CSS instead of tables and nested style="..." attributes). Then, use GZip to reduce the size of your pages for browsers that understand gzip. This will save bandwidth while preserving the structure of the html.
Sometimes, depending on the enclosing tags and/or on the CSS, whitespace may be significant.
Outside of HTML Tidy/removing white space as the other answers mentioned, there isn't much.
This is more of a manual task pulling out style attributes into CSS (hopefully you're not using FONT tags, etc.), using fewer tags and attributes where possible (like not embedding <strong> tags in an element but using CSS to make the whole element font-weight: bold, unless of course it makes semantic sense to use >strong<), etc.
Yes I guess it's pretty much removing whitespace and comments. You cannot replace identifiers with shorter ones like in javascript, since chances are that CSS classes or javascript will depend on those identifiers.
Also, you should be careful when removing whitespace and make sure that there is always at least whitespace character left, otherwise allyourtextwilllooklikethis.
There's a pretty lengthy discussion on this Wordpress blog about this topic. You can find a very lengthy proposed solution using PHP and HTML Tidy there.
You can find some good references here to things like HTML tidy and others.
If you don't want to use one of those options, Prototype has a means to clean the whitespace in the DOM. You could do that on your own and copy it via 'View Generated Source' in the Firefox extension Web Developer Toolbar. Then you can replace the original html with prototype's fix. Sorry for not making that apparent nickf.
(I recommend the first link)
I haven’t tried it yet, but htmlcompressor is an HTML minifier, if you fancy giving one a try.
If you have installed node.js and you are a windows user you can create this .bat
It will minify all html in your folder in the min subfolder.
The output will be in min folder
open the console. run--> npm install html-minifier -g
create the .bat. don't forget to change the route in cd command. It's easier to change the folder in the bat file than copy and paste.
go in console into the .bat folder and run it.
cd the_destination_folder
dir /b *.HTML > list1.txt
for /f "tokens=*" %%A in (list1.txt) do html-minifier --collapse-whitespace --remove-comments --remove-optional-tags %%~nxA -o min\%%~nxA
pause
Couldn't JavaScript be used as a decompresser for a compressed HTML string, for instance have a DEV build for the uncompressed format, run a 'publish' script to compress the DEV build to production and attach a JavaScript to the HTML source (with the whitespace and such removed as before)?
The bandwidth would be reduced on the server, but the downside is there is a lot more client strain for decompressing the string to HTML. Also JavaScript would need to be enabled and be able to parse the decompressed string to HTML.
I am not saying its a definite solution, but something that might work - it all depends on if your looking in regards to bandwidth without the users JavaScript permissions/systems spec, or such.
Otherwise look for obfuscation scripts, a simple google search produced http://tinyurl.com/phpob - dependent on what your looking for there should be a software package available.
If I am on the wrong lines, please shout and I will see what else I can do.
Good Luck!
I recently found a PHP based script that minify your sites HTML - Inline css - Inline javascript on the fly it is called as
Dynamic website compressor
I've used this regexp for years, without any problems: s/>\s*</></g
In Python re.sub(r'>\s*<', '><', html)
Or in PHP preg_replace('/>\s*</', '><', $html);
This removed all whitespace between tags, but not anywhere, this is fairly safe (but not perfect, there are situations where this will break, but they're rare).
My main reason for doing this isn't speed/file size, but because the whitespace often introduces a, well, space. This would be okay, but when you start mucking about in your DOM with Javascript, spaces are often lost, creating (minor) layout differences.
Consider:
<div>
<a>link1</a>
<a>link2</a>
</div>
There's a space between the links, but now I do something like:
$('div').append('<a>link3</a>')
And there's no space ... I need to manually add the space in my JS, which is fairly ugly & error-prone IMHO.
Here is a minifier for HTML5 written in PHP.
<?PHP
$in=file_get_contents('path/to/source.html');
//Strips spaces if there are more than one.
$in=preg_replace('/\s{2,}/m',' ',$in);
//trim
$in=preg_replace('/^\s+|\s+$/m','',$in);
/*Strips spaces between tags.
Use ( or ­ or better) padding or margin if necessary, otherwise the html
parser appends a one space textnode.*/
$in=preg_replace('/ ?> < ?/','><',$in);
//Removes tag end slash.
$in=preg_replace('# ?/>#','>',$in);
//Removes HTML comments except conditional IE comments.
$in=preg_replace('/<!--[^\[]*?-->/','',$in);
//Removes quotes where possible.
$in=preg_replace('/="([^ \'"\=><]+)"/','=$1',$in);
$in=preg_replace("/='([^ '\"\=><]+)'/",'=$1',$in);
file_put_contents('path/to/min.html',$in);
?>
After that you have a one line, shorter html code.
Better you make an array from the regular expressions, but aware to escape the back slashes.