HTML Style Guide Google vs W3Schools (Omit Optional Tag) - html

I was reading some style guides and saw a conflicting recomandation
regarding the Optional Tags.
Google says:
Omit optional tags (optional). For file size optimization and
scannability purposes, consider omitting optional tags. The HTML5
specification defines what tags can be omitted.
(This approach may require a grace period to be established as a wider
guideline as it’s significantly different from what web developers are
typically taught. For consistency and simplicity reasons it’s best
served omitting all optional tags, not just a selection.)
W3CSchools says:
Close All HTML Elements In HTML5, you don't have to close all elements
(for example the <p> element).
We recommend closing all HTML elements:
And
We do not recommend omitting the < html > and < body > tags.
This means Google prefers:
<!-- Recommended -->
<!DOCTYPE html>
<title>Saving money, saving bytes</title>
<p>Qed.
W3CSchools prefers:
<!DOCTYPE html>
<html>
<title>Page Title</title>
<body>
<h1>This is a heading</h1>
<p>This is a paragraph.</p>
</body>
</html>
it is also considered "bad looking" to write this whereas google would recommend it.
<section>
<p>This is a paragraph.
<p>This is a paragraph.
</section>
I found it very interesting that W3CSchools makes a difference regarding the head tag
Is there any good reason to stop using the optional Tags ?
Personally I found the code then less readable but that is purely opinion based and I guess with some training I would prefer one over the other.
Google stated that its for size optimization and scannability purposes but is that really a good reason ? The articles below stated some suggestions but seemed to me more opinion based and I am looking for good reasons to stop using the optional Tags
Here the resources:
Google Style guid
HTML5 Style Guide
html-include-or-exclude-optional-closing-tags
Omitting optional tags of html

Many times we use the optional closing tags because it makes the document more readable. As Google says, removing them reduces file size but, then, most of us don't have the traffic Google does. That suggestion is for those who do. Then, again, reducing download size is always a good thing.
I often leave out the body tag altogether because even the opening tag is optional in most cases. However, there is a danger that leaving that out, and leaving out closing tags, may cause issues later on. I would say putting body tags in and closing all elements removes the possibility of causing those issues. For example, you can only leave the html and body tags out under certain situations.
Reading the spec:
An html element's start tag can be omitted if the first thing inside
the html element is not a comment. An html element's end tag can be
omitted if the html element is not immediately followed by a comment.
For some, this is very important. To others it's not.
It can be more of an issue for dynamically generated sites where the content is created on the fly and the surrounding elements may not be known. Does one really know that the following element will cause a div element to be closed?

Another opinion:
If your site uses gzip compression, DO go out of your way to include the optional tags, speaking at least if in very large tables where so many cell tags are repeated.
If omitted, the original file size is smaller, but then the receiving browser just has to spend time putting every optional tag back in. They will be in the displayed html file. Look at the displayed page source and you will see this is true. So omission is a balance of faster download speed and slower display recovery speed.
But if gzip and the tags are included, gzip compression will compress away the repeated tags, and the transmitted file is no larger than if not included (so absolutely no download savings). And then the browser does not have to spend any time adding every one of them back in. And they ARE going to be added back in. And if hundreds or more cells in a table, this time might add up.
So if gzip, omitting HTML optional tags seems a minus for speed, not a plus (certainly at least in the case of very large tables). Your web host likely provides an feature to add gzip compression.
Single tags like /body or /tbody can't matter to speed, but /td and /tr can be very numerous in large tables.

Related

What can be validly placed between </body> and </html>, if anything?

I get a lot of requests from clients, all of whom are working with ad agencies and SEO firms, who insist on having all kinds of tags and scripts placed after </body> and before </html>. I always thought this was inappropriate, but it is asked for quit often. Some people even become upset and demanding when the code I insert is not explicitly between those two closing tags.
Is this in any way an acceptable HTML practice? If so, why? And what benefit would it even provide?
No published HTML specification allows any tags between </body> and </html>, because the body element may only appear as the last child of the html element. This is defined by the content model of the html element.
What exactly is allowed there depends somewhat on HTML version, but the most permissive version in this respect, HTML5 CR, allows (by the general rules on Content models) “Inter-element whitespace, comment nodes, and processing instruction nodes”.
If you put some elements between </body> and </html>, browsers actually treat them as appearing at the end of the body element. That is, they simply ignore the </body> tag. (If you put elements after the </html> tag, browsers similarly interpret them as being in the body. So the </html> tag has no impact, really.)
Putting anything between </body> and </html> is normally a pointless risk, since browsers could behave differently, and normally nothing should prevent you from putting your elements before the </body> tag.
You don't even need the html or body tags.... Except for certain situations but you can put comments between them if you like, but why would you?
Nothing but comments should be placed there and even comments won't have any effect on SEO or page ranking as search engines don't read comments. Script tags should be placed just before the </body> as that will make the page load faster, but should not go between </body> and </html> as that is really bad practice and will probably mess the website up as the browser might render it differently.
I think your clients must be reading instructions incorrectly and you should probably ask them where they got their information from. A lot of sites which ask you to place script tags will always say before </body></html> and they might be misreading the before text.
If you are ever unsure, then please check http://www.w3.org/ and validate your code through http://validator.w3.org/.

What would happen to my SEO and Scripts if my <HEAD> tag was below my <BODY> tag?

Just thinking about it, XHTML1.1 spec and by extension, HTML5 (assumed)... well Markup is designed so that unless otherwise specified, "order" isn't supposed to matter.
Everything in the Body tag obviously is ordered a specific way for the browsers rendering engine to interpret, but the HEAD and BODY tags themselves conceptually have nothing to do with render order (despite their name, and except includes in HEAD; if an include depends on another include obviously that must be loaded in first), and thus follow the same rules as any Markup language.
Throwing the HEAD tag block below the BODY tag block works (at least in WebKit based browsers anyways) but all I've been able to do so far is test that te Title tag works as it should. Not a totally conducive test, but as I write this on my phone, I didn't have time to go any further with my thought process.
I'm wondering how doing this would affect SEO, and worse yet: loading of Script and CSS files typically handled in the HEAD. I understand a practice lately has external loading of script files happening at or near the bottom of the markup to ahem delay their loading for when the page is ready, would this react any differently?
Basically I'm asking, What are the *repercussions * of having a website where the HEAD block is located below the BODY block?
<html>
<body>
Test
</body>
<head>
<title>Test</title>
<script src="test.js" type="text/javascript"></script>
<link rel="stylesheet" type="text/css" href="test.css" />
</head>
</html>
It would only have a negative effect on SEO, if any at all.
First off, your proposal results in incorrect HTML. The HTML4.01 DTD, which strictly defines the structure of HTML documents mandates that <head> becomes before <body>:
<!ENTITY % html.content "HEAD, BODY">
(If the order didn't matter, then it would be <!ENTITY % html.content "(HEAD|BODY)+">
Secondly, I'd wager most spiders look for a <head> element as quickly as possible, if it can't find one before the <body> element then it will probably discount your document at best, if not completely ignore it. I suspect most spiders would ignore any <head> elements encountered after <body>.
Third, it ruins the user experience. Sometimes pages can take a while to load, but a browser parses the HTML as it downloads. As soon as it sees <title> it displays it to the user so the user knows the page has at least partially loaded (even if it hasn't been rendered yet). Without this ability your users might close the browser tab/window out of frustration if it loads too slow, as they'd think the site was completely unresponsive.
Interesting question but I strongly believe that HTML structure is very much similar with human anatomy (Head-Body-foot), what happen and how its look if it’s not in proper structure?? Looks ugly, difficult to identify the particular person, here browser act accordingly to the universal structure (like head-body-footer) so these are few predefine structure that we need to follow for best result.
Regarding SEO, offsite SEO work in such a case that how are we following structure, and of course it will effect to Google spider and many more thing .

Is leaving out end tags valid?

I remember reading a while ago that in some cases leaving out end tags (</li>, for example) speeds up the rendering (and loading/parsing, since there is less bytes) of a webpage?
Unfortunately, I forgot where I read this, but I remember it saying this feature was specific to HTML 4.0.
Since I no longer have access to this source I was wondering if someone can confirm this or link to the documentation on w3c (since I wasn't able it find it myself)?
Thanks!
EDIT: Forgot to mention that I meant to ask if this behaviour is also available in HTML5.
EDIT 2: I manged to find the article again, and it does mention it only speeds the download speed of the page, not actual rendering:
One good reason for leaving out the end tags for these elements is because they add extra characters to the page download and thus slow down the pages. If you are looking for things to do to speed up your web page downloads, getting rid of optional closing tags is a good place to start. For documents that have lots of paragraphs or table cells this can be a significant savings.
Sorry for asking a pointless question! :(
Here is the list of HTML 4.01 elements.
http://www.w3.org/TR/html401/index/elements.html
The End Tag column says where end tags are optional.
However, take note that this is valid only in HTML 4.01. In Xhtml, all end tags are required. Not 100% sure about HTML5.
I wrote a HTML parser once, and believe me, if you're a parser and you're inside a <p> and you encounter a </table> end tag, it's slower to check in your document tree if that is correct, and if so, to close the current <p> first, than if you simply encounter a </p>.
Edit:
Ah, found it: http://dev.w3.org/html5/html-author/#index-of-elements
Same requirements as HTML 4.01.
New edit:
Oh, that was a page from 2009. This one is more up to date:
http://dev.w3.org/html5/spec/syntax.html#optional-tags
Some tags in some version of the HTML spec have optional end tags. However, I believe it is generally considered bad form to exclude the end tag.
As mentioned, the end tag of li is optional in html4:
http://www.w3.org/TR/html401/struct/lists.html#h-10.2
so technically this is valid:
<ul>
<li>
text
<li>
<span>stuff</span>
</ul>
But you are only saving 5 characters per li, not really worth what you lose in readability/maintainability.
EDIT: The HTML5 spec is sort of interesting:
An li element's end tag may be omitted if the li element is
immediately followed by another li element or if there is no more
content in the parent element.
Leaving out ending tags is usually forgivable by browsers (it's generally smart enough to know what you're doing). However, any css or js markup properties that the unclosed tag has can affect descendant and/or sibling tags, leaving you scratching your head as to what happened.
While XHTML does expect you to add a closing forward slash to self-contained tags, HTML 5 does not.
XHTML: <img src="" />
HTML5: <img src="">
If you're writing using an xhtml DOCTYPE, then the answer is 'yes', they are required. An xhtml document needs to be valid XML, which means that all tags need to be properly closed.
An HTML document is a bit less fussy. Some tags are specified as being 'self closing', which means you don't need to close them specifically. These include <br>, <img>, etc.
The browsers are generally pretty lenient, because they need to be able to cope with badly written code. But beware that sometimes skipping closing tags can result in different browsers interpreting your code differently, and producing hard-to-debug layout glitches.
In terms of page load speed, you might be right that there would be a marginal gain to be had in download speed and bandwidth costs, but it would be marginal. In terms of rendering, I suspect you'd actually lose speed if you provided invalid HTML, as the browser would have to work harder to parse it.
So even if there is a speed gain to be had it will be marginal, and I don't think skipping closing tags deliberately is a worthwhile exercise. It might possibly be helpful to reduce bandwidth if you're running a site that has massive traffic, but very few of us are writing for Facebook or Google; for virtually everyone else, it's better to write valid code than to try to shave those few bytes.
If you're that worried about bandwidth and page loading speeds, there are likely to be other better ways to reduce your page load sizes than this. For example, compressing your files with gZip will drastically reduce your bandwidth, with zero impact on your code or the browser. gZip compression can be configured in your web server, so you just switch it on and forget about it. You can also 'minify' your CSS and JS code by stripping out unnecessary white space. (HTML can also be minified to a certain extent, but beware that white space is syntactically relevant in HTML, so minifying may not be the right thing to do in all cases).
AFAIK, in XHTML you must always at least self-close a tag <img ... />
In HTML (non xml-html) some tags do not need to be closed. <img> for instance. However, I'd suggest making sure you know exactly which version you're targeting and use W3C's validation service to double-check.
http://validator.w3.org/
I don't see how this would speed things up except that you'd have to send less bytes of data per page (no /'s for some tags, no closing tags for others.) As for building the DOM, I don't know the details of a given implementation (webkit, mozilla, etc) to know which way is faster to parse. I would imagine XML is simply because it is more regular.
EDIT: Yes this behavior is available in HTML5. Note that the help pages are confusing, such as:
http://www.w3schools.com/html5/tag_meta.asp
Meta's in non-xml-html do not require the /, but they can have it. Because of the (in my opinion) leaning towards XML-flavored HTML's the ending slash is more prevalent in written HTML, but you can see they use both styles in the document. The Validator will let you know for sure what you can get away with. :)
In HTML 4.01, which became a W3C Recommendation way back in 1999, you're right:
9.3.1 Paragraphs: the P element
Start tag: required, End tag: optional
http://www.w3.org/TR/1999/REC-html401-19991224/struct/text.html#h-9.3.1
And as for <li>,
Start tag: required, End tag: optional
http://www.w3.org/TR/1999/REC-html401-19991224/struct/lists.html#h-10.2

How does the Traditional "HTML is only for content" line of thought handle dynamic formatting?

For so long, I've read and understood the following truths concerning web development:
HTML is for content
CSS is for presentation
JavaScript is for behavior.
This is normally all fine and good, and I find that when I strictly follow these guidelines and use external .css and .js files, it makes my entire site much much more manageable. However, I think I found a situation that breaks this line of thought.
I have a custom forums system that I've built for one of my sites. In addition to the usual formatting for such a system (links, images, bold italics and underline, etc) I've allowed my users to set the formatting of their text, including color, font family, and size. All of this is saved in by database of forum messages as formatting code, and then translated to the corresponding HTML when the page is viewed. (A bit inefficient, technically I should translate before saving, but this way I can work on the system live.)
Due to the nature of this and other similar systems, I end up with a lot of tags floating around the resulting HTML code, which I believe are unofficially deprecated since I'm supposed to be using CSS for formatting. This breaks rules one and two, which state that HTML should not contain formatting information, preferring that information to be located in the CSS document instead.
Is there a way to achieve dynamic formatting in CSS without including that information in the markup? Is it worth the trouble? Or, considering the implied limitations of proper code, an I to limit what my users can do in order to follow the "correct" way to format my code?
It's okay to use the style attribute for elements:
This is <span style="color: red;">red text</span>.
If users are limited to only some options, you can use classes:
This is <span class="red">red text</span>.
Be sure to use semantic HTML elements:
This is <strong>strong and <em class="blue">emphasized</em></strong>
text with a link.
Common semantic elements and their user-space terms:
<p> (paragraphs)
<strong> (bold)
<em> (italic)
<blockquote> (quotes)
<ul> and <ol> with <li> (lists)
More...?
Likely less common in forum posts, but still usable semantic elements:
<h1>, <h2>, etc. (headings; be sure to start at a value so your page makes sense)
<del>, and, to a lesser extent, <ins> (strikeout)
<sup> and <sub> (superscript and subscript, respectively)
<dl> with <dt> and <dd> (list of pairs)
<address> (contact information)
More...
This is a bit tricky. I would think about what you really want to allow visitors to do. Arbitrary colours and fonts? That seems rather useless. Emphasis, headings, links, and images? Well that you can handle easily enough by restricting to those tags / using a wikitext/forumtext markup that only provides these features.
You could dynamically build an inline style sheet in the head of the html page fed to the users. Put in the head of the page and allow it to target those elements configurable by the user.
Alternatively, there's the notion of using external stylesheets that feature the most common adjustments, but there'd be hundreds of them to account for every possible alternative. If you use this you'd need an external style sheet for a specific font size, colour and so on, and dynamically link to those in the header. As with any external stylesheet. Though this is almost unbearably complex to enable.
Option one would work okay though.
As an example:
<STYLE>
h1,h2,h3,h4 {font-family: Helvetica, Calibri;}
p {font-size: 1.2em; // Populate all this with values from the Db.
font-weight: bold;
}
a {text-decoration: underline;
color: #f00;
}
</STYLE>
Also, it just occurred to me that you could probably create a per-user stylesheet to apply the configurable aspects. Use
<link href="/css/defaultstylesheet.css" type="text/css" rel="stylesheet" media="all" />
<link href="/css/user1245configured.css" type="text/css" rel="stylesheet" media="all" />
<!-- clearly the second is a stylesheet created for 'user 1245'. -->
The bonus of this approach is that it allows caching of the stylesheet by the browser. Though it might likely clutter up the css folder, unless you have specific user-paths to the user sheet? Wow, this could get complex... :)
This is an interesting situation because you can have an infinite number of different styles, depending on your users' tastes and their own personal styles.
There are a couple of things you can be doing to manage this situation efficiently. Probably the easiest would be to just use style overrides:
<p style="color: blue; font-size: 10pt;">Lorem Ipsum</p>
This is quick and easy. And remember, this is what style overrides are there for. But as you've said, this does not fit well with this content-presentation separation paradigm. To separate them a little more, you could build some CSS information on page load and then insert it into the <head> tag of your HTML. This still keeps the HTML and the CSS somewhat distinct, even though you're not technically sepating them.
Your other option would be to build the CSS and then output that to a file. This, however, would not be efficient (in my opinion). If you were to, on every page load, build a new CSS file that accounts for your users' unique preferences, this would sort of defeat the purpose. It's the same thing as the second option, using the <head> tag, you're just making it look separated. Even if you used techniques such as caching to try to limit how often you have to build a CSS file, will the ends really justify the means?
This is a completely subjective topic and you should, in the end, choose what you're most comfortable with.
I don't know which framework or even language you are using but e.g. Django uses a certain template language to sort of represent the HTML being output. I think a nice solution would be to simply use a different "template" depending on what the user has chosen. This way you wouldn't have to care about breaking the "rules" or having a bunch of basically unused tags floating around in the DOM.
Unless I completely misunderstood...!
The easiest way to manage this is probably to emit dynamic CSS when the pages are generated, based on the user's settings. Then everything is doing the job it is supposed to be doing and the server is doing the work of converting the user's settings into the appropriate CSS.
With the CSS doing this work, you can use appropriate attributes in the HTML (id and name and class and so on) and emit CSS that will cleanly format everything the way you want.
Consider the benefits versus the costs before you do anything. What is actually wrong with your code right now? Tag soup and combined content/presentation is to be avoided not because it makes a bad website, but because it is hard to maintain. If your HTML/CSS is being generated, who cares what the output is? If what you've got now works, then stick to it.
I assume you are allowing only a limited white list of safe options, and therefore parsing the the user's HTML already.
When rendering the HTML you could convert each style declaration to a class:
<span style="font-family: SansSerif; font-size: 18px;">Hello</span>
To:
<span class="SansSerif"><span class="size_18px">Hello</span></span>
Laborious to generate (and maintain) the list. However you needn't worry about a class for each combination, which is of course your main problem.
It also has the benefit of extra security as user's CSS is less likely to slip through your filter as it's all replaced, and this should also ensure all the CSS is valid.
I've allowed my users to set the
formatting of their text, including
color, font family, and size. All of
this is saved in by database of forum
messages as formatting code, and then
translated to the corresponding HTML
when the page is viewed.
So, you've done formatting through HTML, and you know that formatting is supposed to be done through CSS, and you realise this is a problem, and you got as far as asking a 300-word SO question about it ... ?
You don't see the solution, even though you can formulate the question ... ?
Here, I'll give you a hint:
All of this is saved in by database of
forum messages as formatting code, and
then translated to the corresponding
HTML CSS when the page is viewed.
Does that help?
Is this question a joke?

Order of tags in <head></head>

does it matter at all what order the <link> or <script> or <meta> tags are in in the <head></head>?
(daft question but one of those things i've never given any thought to until now.)
Optimization
According to the folks over at Yahoo! you should put CSS at the top and scripts at the bottom because scripts block parallel downloads. But that is mostly a matter of optimization and is not critical for the page actually working. Joeri Sebrechts pointed out that Cuzillion is a great way to test this and see the speed improvement for yourself.
Multiple Stylesheets
If you are linking multiple stylesheets, the order they are linked may affect how your pages are styled depending on the specificity of your selectors. In other words, if you have two stylesheets that define the same selector in two different ways, the latter will take precedence. For example:
Stylesheet 1:
h1 { color: #f00; }
Stylesheet 2:
h1 { color: #00f; }
In this example, h1 elements will have the color #00f because it was defined last with the same specificity:
Multiple Scripts
If you are using multiple scripts, the order they are used may be important if one of the scripts depends on something in another script. In this case, if the scripts are linked in the wrong order, some of your scripts may throw errors or not work as expected. This, however, is highly dependent on what scripts you are using.
The accepted answer is kind of wrong, depending on the encoding of the document. If no encoding is sent by in the HTTP header, the browser has to determine the encoding from the document itself.
If the document uses a <meta http-equiv="Content-Type" … declaration to declare its encoding, then any ASCII-valued character (character code < 128) occurring before this statement must be an ASCII value, as per HTML 4 spec. Therefore, it's important that this meta declaration occurs before any other element that may contain non-ASCII characters.
It's recommended to put the meta tag with the character encoding as high as possible. If the encoding is not included in (or differs from) the response header of the requested page, the browser will have to guess what the encoding is. Only when it finds this meta tag it knows what it is dealing with and it will have to read everything it has already parsed again.
See for instance Methods for indicating the character set.
One important thing to note: if you're using the Internet Explorer meta X-UA-Compatible tag to switch rendering modes for IE, it needs to be the first thing in the HEAD:
<head>
<meta http-equiv="X-UA-Compatible" content="IE=7" />
<title>Page title</title>
...etc
&lt/head>
meta does not matter, but link (for css) and script matters.
script will block most browser from rendering the page until the scripts are loaded.
Therefore, if possible put them not in the head, but the body.
css link will not block page rendering.
It is usually recommended to have the <script> tag as lower down the page as possible (not in the head but in the body).
Other than that, I don't think it makes much of a difference because the body cannot be parsed unless you have the <head> section completely loaded. And, you want your <link> tag to be in the head as you want your styling to occur as the browser renders your page and not after that!
If you declare the charset in a meta element, you should do it before any other element.
Not a daft question at all.
CSS above Script tags for reasons already mentioned.
CSS is applied in the order you place the tags - the more specific the stylesheet, the lower down the order it should be.
Same goes for scripts - scripts that use functions declared in other files should be loaded after the dependency is loaded.
Put the meta tag that declares the charset as the first element in head. The browser only searches so far for the tag. If you have too much stuff before the meta element, the charset might not get applied.
If you use the BASE element, put it before any elements that load URIs (if desired).
It would only matter if one of the linked files (CSS/Javascript) depended on another. In that case, all dependencies must be loaded first.
Say, for example, you are loading a jQuery plugin, you'd then need to first load jQuery itself. Same when you have a CSS file with some rules extending other rules.
As already pointed out meta describing content charset should be the first otherwise it could actually be a security hole in a certain situation. (sorry i dont remember that situation well enought to describe here but it was demostrated to me at web security training course)
I recently was having a problem with a draggable jquery ui element. It was behaving properly in Firefox, but not Safari. After a ton of trial and error, the fix was to move my css links above the javascript links in the head. Very odd, but will now become my standard practice.
For the purposes of validation as XHTML, yes. Otherwise you're probably going to care about the optimization answers.
Nope, it doesn't matter, except for CSS linking or inclusion, because of CSS inheritance and the fact that it overwrite what was already styled (sorry for my english, i think my sentence is not really clear :-/).