HTML: Include, or exclude, optional closing tags? - html

Some HTML1 closing tags are optional, i.e.:
</HTML>
</HEAD>
</BODY>
</P>
</DT>
</DD>
</LI>
</OPTION>
</THEAD>
</TH>
</TBODY>
</TR>
</TD>
</TFOOT>
</COLGROUP>
Note: Not to be confused with closing tags that are forbidden to be included, i.e.:
</IMG>
</INPUT>
</BR>
</HR>
</FRAME>
</AREA>
</BASE>
</BASEFONT>
</COL>
</ISINDEX>
</LINK>
</META>
</PARAM>
Note: xhtml is different from HTML. xhtml is a form of xml, which requires every element have a closing tag. A closing tag can be forbidden in html, yet mandatory in xhtml.
Are the optional closing tags
ideally included, but we'll accept them if you forgot them, or
ideally not included, but we'll accept them if you put them in
In other words, should I include them, or should I not include them?
The HTML 4.01 spec talks about closing element tags being optional, but doesn't say if it's preferable to include them, or preferable to not include them.
On the other hand, a random article on DevGuru says:
The ending tag is optional. However, it is recommended that it be included.
The reason I ask is because you just know it's optional for compatibility reasons; and they would have made them (mandatory | forbidden) if they could have.
Put it another way: What did HTML 1, 2, 3 do with regards to these, now optional, closing tags. What does HTML 5 do? And what should I do?
Note
Some elements in HTML are forbidden from having closing tags. You may disagree with that, but that is the specification, and it's not up for debate. I'm asking about optional closing tags, and what the intention was.
Footnotes
1HTML 4.01

There are cases where explicit tags help, but sometimes it's needless pedantry.
Note that the HTML spec clearly specifies when it's valid to omit tags, so it's not always an error.
For example you never need </body></html>. Nobody ever remembers to put <tbody> explicitly (to the point that XHTML made exceptions for it).
You don't need </head><body> unless you have DOM-manipulating scripts that actually search <head> (then it's better to close it explicitly, because rules for implied end of <head> could surprise you).
Nested lists are actually better off without </li>, because then it's harder to create erroneous ul > ul tree.
Valid:
<ul>
<li>item
<ul>
<li>item
</ul>
</ul>
Invalid:
<ul>
<li>item</li>
<ul>
<li>item</li>
</ul>
</ul>
And keep in mind that end tags are implied whether you try to close all elements or not. Putting end tags won't automatically make parsing more robust:
<p>foo <p>bar</p> baz</p>
will parse as:
<p>foo</p><p>bar</p> baz
It can only help when you validate documents.

The optional ones are all ones that should be semantically clear where they end, without needing the end tag.
E.G. each <li> implies a </li> if there isn't one right before it.
The forbidden end tags all would be immediately followed by their end tag so it would be kind of redundant to have to type <img src="blah" alt="blah"></img> every time.
I almost always use the optional tags (unless I have a very good reason not to) because it lends to more readable and updateable code.

I am adding some links here to help you with the history of HTML, for you to understand the various contradictions. This is not the answer to your question, but you will know more after reading these various digests.
How Did We Get Here? – Dive Into HTML5
The History of the Web
Brief History of HTML
HTML’s History – HTML WG Wiki
Some excerpts from Dive Into HTML5:
[T]he fact that “broken” HTML markup still worked in web browsers led authors to create broken HTML pages. A lot of broken pages. By some estimates, over 99% of HTML pages on the web today have at least one error in them. But because these errors don’t cause browsers to display visible error messages, nobody ever fixes them.
The W3C saw this as a fundamental problem with the web, and they set out to correct it. XML, published in 1997, broke from the tradition of forgiving clients and mandated that all programs that consumed XML must treat so-called “well-formedness” errors as fatal. This concept of failing on the first error became known as “draconian error handling,” after the Greek leader Draco who instituted the death penalty for relatively minor infractions of his laws. When the W3C reformulated HTML as an XML vocabulary, they mandated that all documents served with the new application/xhtml+xml MIME type would be subject to draconian error handling. If there was even a single well-formedness error in your XHTML page […] web browsers would have no choice but to stop processing and display an error message to the end user.
This idea was not universally popular. With an estimated error rate of 99% on existing pages, the ever-present possibility of displaying errors to the end user, and the dearth of new features in XHTML 1.0 and 1.1 to justify the cost, web authors basically ignored application/xhtml+xml. But that doesn’t mean they ignored XHTML altogether. Oh, most definitely not. Appendix C of the XHTML 1.0 specification gave the web authors of the world a loophole: “Use something that looks kind of like XHTML syntax, but keep serving it with the text/html MIME type.” And that’s exactly what thousands of web developers did: they “upgraded” to XHTML syntax but kept serving it with a text/html MIME type.
Even today, millions of web pages claim to be XHTML. They start with the XHTML doctype on the first line, use lowercase tag names, use quotes around attribute values, and add a trailing slash after empty elements like <br /> and <hr />. But only a tiny fraction of these pages are served with the application/xhtml+xml MIME type that would trigger XML’s draconian error handling. Any page served with a MIME type of text/html — regardless of doctype, syntax, or coding style — will be parsed using a “forgiving” HTML parser, silently ignoring any markup errors, and never alerting end users (or anyone else) even if the pages are technically broken.
XHTML 1.0 included this loophole, but XHTML 1.1 closed it, and the never-finalized XHTML 2.0 continued the tradition of requiring draconian error handling. And that’s why there are billions of pages that claim to be XHTML 1.0, and only a handful that claim to be XHTML 1.1 (or XHTML 2.0). So are you really using XHTML? Check your MIME type. (Actually, if you don’t know what MIME type you’re using, I can pretty much guarantee that you’re still using text/html.) Unless you’re serving your pages with a MIME type of application/xhtml+xml, your so-called “XHTML” is XML in name only.
[T]he people who had proposed evolving HTML and HTML forms were faced with two choices: give up, or continue their work outside of the W3C. They chose the latter, registered the whatwg.org domain, and in June 2004, the WHAT Working Group was born.
[T]he WHAT working group was quietly working on a few other things, too. One of them was a specification, initially dubbed Web Forms 2.0, which added new types of controls to HTML forms. (You’ll learn more about web forms in A Form of Madness.) Another was a draft specification called “Web Applications 1.0,” which included major new features like a direct-mode drawing canvas and native support for audio and video without plugins.
In October 2009, the W3C shut down the XHTML 2 Working Group and issued this statement to explain their decision:
When W3C announced the HTML and XHTML 2 Working Groups in March 2007, we indicated that we would continue to monitor the market for XHTML 2. W3C recognizes the importance of a clear signal to the community about the future of HTML.
While we recognize the value of the XHTML 2 Working Group’s contributions over the years, after discussion with the participants, W3C management has decided to allow the Working Group’s charter to expire at the end of 2009 and not to renew it.
The ones that win are the ones that ship.

The reason i ask is because you just know it's optional for compatibility reasons; and they would have made them (mandatory | forbidden) if they could have.
That's an interesting inference. My reading of it is that just about any time a tag could be reliably inferred, the tag is optional. The design suggests that the intention was to make it quick and easy to write.
What did HTML 1, 2, 3 do with regards to these, now optional, closing tags.
The DTD for HTML 2 is embedded in the RFC which, along with the original HTML DTD, has optional start and end tags all over the place.
HTML 3 was abandoned (thanks to the browser wars) and replaced with HTML 3.2 (which was designed to describe the then current state of the web).
What does HTML 5 do?
HTML 5 was geared towards "paving the cowpaths" from the outset.
And what should i do?
Ah, now that is subjective and argumentative :)
Some people think that explicit tags are better for readability and maintainability by virtue of being in front of the readers eyes.
Some people think that inferred tags are better for readability and maintainability by virtue of not cluttering up the editor.

What does HTML 5 do?
The answer to this question is in the W3C Working Draft:
http://www.w3.org/TR/html5/syntax.html#syntax-tag-omission
And what should i do?
It's a matter of style. I try to never omit end tags because it helps me to be rigorous and not omit tags that are necessary.

If it is superfluous, leave it out.
If it serves a purpose (even a seemingly trivial purpose, such as appeasing your IDE or appeasing your eyes), leave it in.
It's rare in a well-defined spec to see optional items that do not affect behavior. With the exception of "comments", of course. But the HTML spec is less of a design spec, and more of a document of the state of current major implementations. So when an item is optional in HTML and it seems to serve no purpose, we may guess that optional nature is merely documentation of a quirk in specific browser.
Looking at the HTML-5 spec RFC section linked above, you see that the optional tags are strangely linked to the presence of comments! That should tell you that the authors are not wearing design hats. They are instead playing the game of "document the quirks" in major implementations. So we can't take the spec too seriously in this respect.
So, the solution is: Don't sweat it. Move on to something that actually matters. :)

I think the best answer is to include closing tags for readability or error detection. However, if you have lots of generated HTML (say, tables of data), you could save significant bandwidth by omitting optional tags.

My recommendation is that you omit most optional close tags, and all optional attributes that you can get away with. Many IDEs will complain so you may not be able to get away with omitting some of these but it is generally better for smaller file size and less clutter. If you have code generators definitely omit end tags there because you can get some good size reduction from it. Usually it doesn't really matter one way or the other.
But when it does matter then act on it. On some recent work of mine I was able to reduce the size of my rendered HTML from 1.5 MB to 800 KB by eliminating most of the generated end and redundant value attributes for the open tag, where the text of the element was the same as the value. I have about 200 tags. I could implement this some other way entirely, but that would be more work ($$$), so this allows me to easily make the page more responsive.
Just out of curiosity I found that if I removed quotes around attributes that didn't need them I could save 20 KB, but my IDE (Visual Studio) doesn't like it. I also was surprised to find that the really long ID that ASP.NET generates account for 20% of my file.
The idea that we could ever get any relevant fraction of HTML strictly valid was misguided in the first place, so do whatever works best for you and your customers. Most tools that I have ever seen or used will say they generate xhtml, but they don't really work 100%, and there isn't any benefit to strict adherence anyway.

Personally, I'm a fan of XHTML and, like ghoppe, "I try to never omit end tags because it helps me to be rigorous and not omit tags that are necessary."
but
If you're deliberately using HTML 4.n, one can't argue that including them makes it easier to consume the document, as the notion of well-formedness as opposed to validity is an XML concept, and you lose that benefit when you forbid certain close tags. So the only issue becomes validity... and if it's still valid without them... you might as well save the bandwidth, no?

Using end tags makes dealing with fragments easier because their behaviour is not dependant on sibling elements. This reason alone should be compelling enough. Does anyone deal with monolithic html documents anymore?

In some curly bracket languages like C#, you can omit the curly braces around an if statement if its only two lines long. for example...
if ([condition])
[code]
but you can't do this...
if ([condition])
[code]
[code]
the third line won't be a part of the if statement. it hurts readability, and bugs can be easily introduced, and be difficult to find.
for the same reasons, i close all tags. tags like the img tag do still need to be closed, just not with a separate closing tag.

Do whatever you feel makes the code more readable and maintainable.
Personally I would always be inclined to close <td> and <tr>, but I would never bother with <li>.

If you were writing an HTML parser, would it be easier to parse HTML that included optional closing tags, or HTML that doesn't? I think the optional closing tags being present would make it easier, as I wouldn't have to infer where the closing tag should be.
For that reason, I always include the optional closing tags - on the theory that my page might render faster, as I'm creating less work for the browser's HTML parser.

For forbidden closing types use syntax like: <img /> With the /> to close the tag which is accepted in xml

Related

Do we really need to use </body> and </html> closing tags?

More often than not I see HTML without the closing tags, especially body and html.
According to:
http://www.w3.org/TR/html5/sections.html#the-body-element
http://www.w3.org/TR/html5/semantics.html#the-html-element
This can be omitted, but what about cross device issues? Like running such HTML on androids or windows phone's or whatever you know where not having these closing tags this would not work.
Do we need it? Well that depends on your DTD. If you're using XHTML, then yes, you will need it to conform. For accessibility sake I would include the closing tags, you never know if there's a screen reader (or other piece of software) out there that only parses valid XHTML, you could be hindering partially sighted people for example.
Google will also, apparently, rank your valid documents higher than invalid documents in their listings.
Here's a document by a friend of a friend that answers your question a bit better; granted that it was written in 2008, I think some of the points still apply.
If you ever need to use the same html in an XHTML application you won't need to mess around with it, you can just copy it across and not have to worry about conforming (because you already are).
On a separate note, you are essentially future proofing your markup. Who's to say that the spec won't eventually change to "You must include the closing head and body tags"? You won't need to worry if you already have them. It is, however, highly unlikely that the spec will change to, "You must not include the closing head and body tags".
As a great man once said:
Should I close the lid of the toilet when I'm finished? Yes,
especially if the wife is going to use it after me.
- Darren Gourley (Nov 2015)
use https://validator.w3.org/
select your standards target... if that says it passes then I think its Gd enough.
Please bared in mind the HTML5 spec is still being defined/evolved.
Technically you could omit html, head and body tags all together as long as the markup follows the following conditions:
http://www.w3.org/TR/2011/WD-html5-20110525/syntax.html#optional-tags
In regards to your comment about half your team using them and half your team not I would suggest that as long as either option is technically standards compliant you just choose one and move on as the entire subject is open for discussion and interpretation. My personal opinion would be that it's probably more important for your team to get on the same page and produce work of a similar standard especially if you have more than one person working on a project simultaneously.
You can leave out the end tags. Indeed you can leave out the opening tags, too (obviously not if you are using any attributes on them).
Not only is that the case with the more recent standard, but it's been the case since the very beginning. (The obvious exception being if you are using the XML syntax, since XML itself requires all elements have an explicit closing tag.)
Browsers have been dealing with the existence of HTML documents lacking the trailing closing tags since the 1990s. If the standards hadn't allowed it they'd probably still have dealt with them, much as they try their best to deal with all manner of messy code. (This causes it's own problems, which was one of the motivations behind XML not allowing optional tags, but that's another matter).
Many people consider it poor style. I would be one of them. But it's certainly widely supported.

html validator versus SU:BADGE

How to fix that
element "SU:BADGE" undefined <su:badge layout="5"></su:badge>
I use HTML 4.01 Transitional on my website ...
The BADGE is from stumbleupon.com website
Thank you . Regards.
Since such markup is not valid in HTML 4.01, the validator is just doing what you asked it to do: reporting any reportable markup errors.
The only practical reason why such error messages might be disturbing is that if there are many of them, you might accidentally miss to notice some other error messages, relating to constructs where your markup unintentionally deviates from HTML 4.01. If this is an issue, consider writing a custom DTD. It requires some understanding of SGML, but so does the use of validators in general, does it not?
On the other hand, you might decide to refrain from using tags suggested by people who cannot do such a simple thing using valid HTML. There are many ways to put information on a web page in a manner that does not affect the visible appearance but can be retrieved by programs that search for specific constructs (e.g., meta tags). They decided on a way that causes problems to authors who wish to use validtors.
You don’t.
Validation is nice (especially as a sanity check while developing) but if you don’t use valid markup then there’s nothing you can do, and if you need invalid markup (because you use a third-party API for instance) then validation simply ceases to be relevant.
Alternatively, you could serve your page as XML instead of HTML 4 and define an appropriate su XML namespace. But since this would mean that some browsers no longer display your page correctly this is more a theoretical possibility.

Is leaving out end tags valid?

I remember reading a while ago that in some cases leaving out end tags (</li>, for example) speeds up the rendering (and loading/parsing, since there is less bytes) of a webpage?
Unfortunately, I forgot where I read this, but I remember it saying this feature was specific to HTML 4.0.
Since I no longer have access to this source I was wondering if someone can confirm this or link to the documentation on w3c (since I wasn't able it find it myself)?
Thanks!
EDIT: Forgot to mention that I meant to ask if this behaviour is also available in HTML5.
EDIT 2: I manged to find the article again, and it does mention it only speeds the download speed of the page, not actual rendering:
One good reason for leaving out the end tags for these elements is because they add extra characters to the page download and thus slow down the pages. If you are looking for things to do to speed up your web page downloads, getting rid of optional closing tags is a good place to start. For documents that have lots of paragraphs or table cells this can be a significant savings.
Sorry for asking a pointless question! :(
Here is the list of HTML 4.01 elements.
http://www.w3.org/TR/html401/index/elements.html
The End Tag column says where end tags are optional.
However, take note that this is valid only in HTML 4.01. In Xhtml, all end tags are required. Not 100% sure about HTML5.
I wrote a HTML parser once, and believe me, if you're a parser and you're inside a <p> and you encounter a </table> end tag, it's slower to check in your document tree if that is correct, and if so, to close the current <p> first, than if you simply encounter a </p>.
Edit:
Ah, found it: http://dev.w3.org/html5/html-author/#index-of-elements
Same requirements as HTML 4.01.
New edit:
Oh, that was a page from 2009. This one is more up to date:
http://dev.w3.org/html5/spec/syntax.html#optional-tags
Some tags in some version of the HTML spec have optional end tags. However, I believe it is generally considered bad form to exclude the end tag.
As mentioned, the end tag of li is optional in html4:
http://www.w3.org/TR/html401/struct/lists.html#h-10.2
so technically this is valid:
<ul>
<li>
text
<li>
<span>stuff</span>
</ul>
But you are only saving 5 characters per li, not really worth what you lose in readability/maintainability.
EDIT: The HTML5 spec is sort of interesting:
An li element's end tag may be omitted if the li element is
immediately followed by another li element or if there is no more
content in the parent element.
Leaving out ending tags is usually forgivable by browsers (it's generally smart enough to know what you're doing). However, any css or js markup properties that the unclosed tag has can affect descendant and/or sibling tags, leaving you scratching your head as to what happened.
While XHTML does expect you to add a closing forward slash to self-contained tags, HTML 5 does not.
XHTML: <img src="" />
HTML5: <img src="">
If you're writing using an xhtml DOCTYPE, then the answer is 'yes', they are required. An xhtml document needs to be valid XML, which means that all tags need to be properly closed.
An HTML document is a bit less fussy. Some tags are specified as being 'self closing', which means you don't need to close them specifically. These include <br>, <img>, etc.
The browsers are generally pretty lenient, because they need to be able to cope with badly written code. But beware that sometimes skipping closing tags can result in different browsers interpreting your code differently, and producing hard-to-debug layout glitches.
In terms of page load speed, you might be right that there would be a marginal gain to be had in download speed and bandwidth costs, but it would be marginal. In terms of rendering, I suspect you'd actually lose speed if you provided invalid HTML, as the browser would have to work harder to parse it.
So even if there is a speed gain to be had it will be marginal, and I don't think skipping closing tags deliberately is a worthwhile exercise. It might possibly be helpful to reduce bandwidth if you're running a site that has massive traffic, but very few of us are writing for Facebook or Google; for virtually everyone else, it's better to write valid code than to try to shave those few bytes.
If you're that worried about bandwidth and page loading speeds, there are likely to be other better ways to reduce your page load sizes than this. For example, compressing your files with gZip will drastically reduce your bandwidth, with zero impact on your code or the browser. gZip compression can be configured in your web server, so you just switch it on and forget about it. You can also 'minify' your CSS and JS code by stripping out unnecessary white space. (HTML can also be minified to a certain extent, but beware that white space is syntactically relevant in HTML, so minifying may not be the right thing to do in all cases).
AFAIK, in XHTML you must always at least self-close a tag <img ... />
In HTML (non xml-html) some tags do not need to be closed. <img> for instance. However, I'd suggest making sure you know exactly which version you're targeting and use W3C's validation service to double-check.
http://validator.w3.org/
I don't see how this would speed things up except that you'd have to send less bytes of data per page (no /'s for some tags, no closing tags for others.) As for building the DOM, I don't know the details of a given implementation (webkit, mozilla, etc) to know which way is faster to parse. I would imagine XML is simply because it is more regular.
EDIT: Yes this behavior is available in HTML5. Note that the help pages are confusing, such as:
http://www.w3schools.com/html5/tag_meta.asp
Meta's in non-xml-html do not require the /, but they can have it. Because of the (in my opinion) leaning towards XML-flavored HTML's the ending slash is more prevalent in written HTML, but you can see they use both styles in the document. The Validator will let you know for sure what you can get away with. :)
In HTML 4.01, which became a W3C Recommendation way back in 1999, you're right:
9.3.1 Paragraphs: the P element
Start tag: required, End tag: optional
http://www.w3.org/TR/1999/REC-html401-19991224/struct/text.html#h-9.3.1
And as for <li>,
Start tag: required, End tag: optional
http://www.w3.org/TR/1999/REC-html401-19991224/struct/lists.html#h-10.2

Why is XHTML syntax so widely used in web pages?

First of all let's emphasize that syntax rules don't work alone, but they need the correct Content-type header to be fully interpreted by the clients. Currently web pages cannot be served with the correct XHTML header because Internet Explorer doesn't understand that.
The first advantage usually mentioned is that XHTML requires pages to be well-formed: true, but when browsers treat them as (malformed) HTML nothing enforces this rule, so it's up to you being a disciplined developer -- but you can be as disciplined writing good well-formed HTML too.
Another point often mentioned is that XHTML promotes the separation between content and presentation, but even in this case it doesn't really offer anything that can't be done with HTML -- it still depends on the developer since nothing is enforced, and no exclusive tools are offered.
So why do so many developer (including those of famous CMS/blogging softwares) still use XHTML syntax instead of directly writing what those pages will become anyway (i.e. plain HTML)?
Related fact: Stackoverflow uses HTML strict.
http://en.wikipedia.org/wiki/XHTML
From the wiki:
"The only essential difference between XHTML and HTML is that XHTML must be well-formed XML, while HTML need not be."
It's up to you which one you choose. There is no real difference in terms of what the user sees. Whichever you choose, please try to make it well-formed and make sure that your HTML/XHTML validates and follows the standards.
This probably isn't the actual reason, but it makes them parsable using a regular XML parser.
Sadly, XHTML syntax isn't as widely used as the XHTML doctype. You'd think people would be conscious about it, but a lot of the time (at least a few years ago), an XHTML doctype was used mostly because HTML 4 was being "dissed". That hasn't stopped people from continuing to use HTML syntax though. Open ended <li> and <p> tags, non-terminated <br> and <img> tags, tag attributes not enclosed in quotes, and more hypocritical nonsense.
Currently web pages cannot be served with the correct XHTML header because Internet Explorer doesn't understand that.
Sure they can, provided you're prepared to use content negotiation to serve a application/xhtml+xml content type to those user-agents that say they accept it.
There a number of reasons both good and bad why xhtml is so widely used. Jay Askren has a point about people who use XML in other contexts, (I'm one of them), but I doubt if that accounts for much use. If there is a good reason why XHTML is popular, it's most likely that the orthogonality of XML is a very seductive idea. It's simply easier remembering "Always close every tag, always quote the attribute values" than trying to remember all the rules about when you can safely omit tags and leave attributes unquoted etc., even though it results in a more verbose document.
There are other reasons like the fact that it's easier to indent your code if every opening tag has a matching closing one, and if you do, you've got a pretty accurate picture of the DOM laid out in the source code, which can aid with scripting. But I doubt that this is a primary reason.
Using XHTML states an intent, don’t underestimate that (but don’t overestimate this either). Web standards are politics: if nobody cares, nothing is gonna change. Using XHTML (or HTML5) signals “yes, we are in fact interested in the continued development of the standards.
Furthermore, while clients certainly don’t enforce XHTML rules with a text/html content type, design tools still can do this. XHTML is much easier to support for editors than real HTML (with “real” I mean the whole ugly SGML package). There are good XHTML validators that do much more than HTML validators can (e.g. Schneegans’ XML schema validator).
All in all, many arguments against XHTML are in fact straw-men that aim at some of the poorly-formulated arguments for XHTML. For instance, Microsoft is responsible of publishing long lists of purported XHTML advantages (such as semantic web design). Attacking those arguments is like reductio ad absurdum. But there are good arguments for XHTML.
I suspect a major reason xhtml is so popular is cultural and historical more than anything. XML became quite popular some time ago and it is still used quite heavily. It is good for for defining a data model that can be sent over the wire using webservices. There are lots of tools/technologies that work with it such as xslt and many others. It is natural for a developer to use html which is structured like xml, even if there is no real advantage just because they use xml in other contexts.

What problem does XHTML strict solve?

I really don't understand the fascination with XHTML strict. Inline JavaScript typically requires a rats nest of escapes to make it compatible with XHTML and semi-backwards compatible with MSIE 5 & 6. Then there is the issue of not being OCD enough on user input to make sure you don't miss any illegal characters. It just seems like more effort then its worth. Nevermind that almost every developer I've worked along side of keeps forgetting to ensure the content-type returned from the server is reset for XHTML pages from text/html to application/xhtml+xml.
Wish I knew the name of the blogger, but someone else pointed out that a majority of supposedly XHTML compliant websites and open source packages are actually not because of that last issue, forgetting to set the content-type header correctly.
I'm looking to understand why XHTML is useful, or build enough of an arsenal of arguments to prevent it ever being used in future projects that I have influence on.
XHTML1 vs HTML4 and Strict vs Transitional are completely orthogonal issues.
XML might not give any huge advantage to browsers today, but on the server end it's an order of magnitude easier to process documents using XML than trying to parse the mess that is old-school-SGML-except-not-really HTML4.
Restricting yourself to [X]HTML Strict doesn't achieve anything in itself, other than simply that it discourages the use of old, less-maintainable techniques you shouldn't be using anyway.
Inline javascript typically requires a rats nest of escapes to make it compatible with XHTML
You can get away without any escapes as long as you don't use the characters < or &. And ‘// < [CDATA[’ isn't really much worse than ‘< !--’ was in the old days.
In any case, keeping the scripting external is much more manageable; you don't want to be doing anything significant inline.
Then there is the issue of not being OCD enough on user input to make sure you don't miss any illegal characters.
Out-of-band characters are exactly as invalid in HTML4 Transitional as in XHTML1 Strict.
If you're accepting user-submitted HTML and not checking/escaping it with enough of a fine tooth comb to prevent well-formedness errors you have much bigger problems than just complying with a doctype. You'll be letting injection hacks through and making your site vulnerable to cross-site-scripting security holes.
forgetting to ensure the content-type returned from the server is reset for XHTML pages from text/html to application/html+xml.
It's not ‘forgetting’, it's deliberate: there is not really that much point in serving application/xhtml+xml today. To account for IE you have to sniff UA, and then make sure you understand the CSS and JavaScript differences that pop up in both parsing modes... you can do it to prove your technical prowess, but it doesn't really get you anything.
Serving XHTML as legacy HTML may not be ideal, but it lets you keep the simpler, more processable syntax of XML (and potential interoperability with other XML languages like SVG) whilst still being browser-friendly.
People complain about the pickiness of the well-formedness errors, but having those errors picked up straight away for you to fix them is way better than leaving them there silently, ready to trip up some future browser.
there is a great post about the usage of XHTML # Beware of XHTML.
Hope it helps,
Bruno Figueiredo
XHTML 1.0 Strict tries to solve four problems:
XML is W3C technology and HTML4 wasn’t using it. Not your problem.
Strict seeks to be more theoretically pure than Transitional when it comes to presentationalism. But this is not an XHTML vs. HTML issue.
XML parser is supposedly simpler. (Not entirely true; the code for dealing with the DTD part is pretty complex.) These days, you get both XML and HTML parsers off-the-shelf, so this isn’t your problem. (Aside: the mobile argument is utterly bogus.)
application/xhtml+xml (though not valid XHTML 1.0 Strict!) allows you to mix other vocabularies. If you want to use inline MathML or SVG today, this is the main reason to use application/xhtml+xml today. However, the direction the HTML5 work is taking is making it possible to use MathML and SVG in text/html.
XHTML is useful because it's much easier to create a simple transforming stylesheet or roll your own parser for it, than it is for HTML.
Do you have to parse your HTML with a program, o for some tests? Then, use XHTML.
For everything else, HTML 4.01 (strict, loose, transitional, whatever) is perfectly "standard" and less "troublesome".
XHTML enables you to advanced rendering like SVG (Scalable Vector Graphics), which itself is an XML, but can easily be embedded in XHTML through the XML namespace extension without <embed> or <object>. Unfortunately, only Firefox and Safari does support it. Sorry IE6 users.
For more on SVG at http://en.wikipedia.org/wiki/Svg
XHTML makes HTML orthogonal with all the other xml-based structures in our universe, which has two primary benefits.
Design patterns we use in dealing with xml can be applied to html.
Software tools ditto.
XHTML has the advantages of xml. But then why the strict variant?
I see some similarities with deprecated functions. You can still use them this version, but they are possibly removed the next version. So I see the transitional version as deprecated use. It still works and it will work for a couple of versions, but if you want to build for the future, use the strict version.
Strict is intended to formalize the separation between content and style by making it more difficult to commingle the two. Elliotte Rusty Harold has a good write up on XHTML in one of his books, here's the relevant excerpt on 'Why XHTML'.
The only thing I've seen solved by XHTML is the "problem" of users using Safari: I don't know if the bug is still there, but when we were last asked to write in XHTML, we ran across a bug that made XHTML unusable with Safari. In XHTML, the following URL isn't allowed in anchor tags, because the ampersand isn't escaped:
http://www.example.com/page.php?arg1=val1&arg2=val2
so what you have to do is replace it with & like this:
http://www.example.com/page.php?arg1=val1&arg2=val2
but Safari converts & to & so you get this URL:
http://www.example.com/page.php?arg1=val1&arg2=val2
...and the hash symbol ends the URL as far as PHP is concerned. I know that there are ugly hacks that allow you to pass two variables in other ways, but if XHTML is going to force you to use ugly hacks, then you're better off without it.
Personally, I liked the concept of XHTML: much cleaner than most HTML we can see, easier to parse and validate. Like everybody, I started to code XHTML pages. BTW, I don't see an issue with inline JavaScript, no need for escapes if you put the code in CDATA. And IE5 is fortunately a bit out of the browser landscape, like Netscape 4 which forced us to write / > instead of />, thing I still see in pure XML sometime...
Now, I have read a number of articles, like the one linked by Bruno, which has lot of good arguments against its use in most cases. Basically, it says most browsers aren't just ready for strict XHTML (served as XML), it doesn't make much sense to server XHTML as HTML, and anyway it isn't that useful in the majority of sites.
Look at the arguments above: they are perfectly valid, and it is great to be able to put MathML or SVG directly in the page, to transform XML with an XSLT parser, to process the page with an XML parser.
But how often do you do that? Parsing the page is most often the problem of end users, which can use a good HTML parser. And given the number of browsers able to manage MathML, SVG or XSLT, it is more a need for intranet than for the vast Internet.
You can have an e-commerce or a blog or a forum, which spits out good XHTML pages. And the persons writing the descriptions, articles or messages insert <p><p><p> to skip some lines, when it isn't <p/> or some other exotic construct...
I believe in XHTML, but I think I will no longer use it for the little pages I do for my site. I will use HTML 4 with well written code (quoted attributes, closing tags even if optional, etc.).
And after all, if W3C is working in HTML 5, it is for a reason: HTML has still a live ahead, otherwise it would have been killed in favor of XHTML 2.
XHTML is by definition XML, unlike HTML.
This means you can do funky useful stuff with it, such as easily validate and parse it (since you know it's XML and thus can use the myriad of tools available).
Also, geeks like to make things "more correct" ;-)
This is a global standard issue
This is not just about xHTML, but about all the standards in the world. You need to make things clearer, from version to version.
xHTML is square and pushes coders to add semantic value to the code. It's fully XML compatible and therefor more easyly parseable, stylisable, etc.
Remember that a code is not just for coders, bot for machines too. In 10 years, people creating browsers or libraries won't want to implement the same complexes rules for old HTML processing but will rather expect something as clean as possible.
Search engine needs something to rely on to build semantic links between value and so it's better if there is only one easy way to do it.
And I am not talking about screen readers...
Standard, is above all, about going toward one unique open solution that fit everybody's need. Not just about adding new shiny features.