HTML version choice [closed] - html

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
When developing a new web based application which version of html should you aim for?
EDIT:
cool I was just attempting to get a feel from others I tend to use XHTML 1.0 Strict in my own work and Transitional when others are involved in the content creation.
I marked the first XHTML 1.0 Transitional post as the 'correct answer' but believe strongly that all the answers given at that point where equally valid.

HTML 4.01. There is absolutely no reason to use XHTML for anything but experimental or academic problems that you only want to run on the 'obscure' web browsers.
XHTML Transitional is completely pointless even to those browsers, so I'm not sure why anyone would aim for that. It's actually pretty alarming that a number of people would recommend that.
I'd say aiming for HTML 4.01 is the most predictable, but Teifion is right really, "anything that renders your page will do".
in response to Michael Stum:
XHTML is XML based, so it allows easier parsing and you can also use the XML Components of most IDEs to programatically query and insert stuff.
This is certainly not true. A lot of XHTML on the web (if not most) does not conform to XML validity (and it needn't - it's not being sent as XML). Trying to treat this like XML when dealing with it is just going to earn you a lot of headaches. This page on Stack Overflow, for instance, will generate errors with many unforgiving XML tools for having invalid mark-up.

I'd shoot for XHTML Transitional 1.0. There are still a few nuances out there that don't like XHTML strict, and most editors I've seen now will give you the proper nudges to make sure that things are done right.

Transitional flavors of XHTML and HTML are deprecated. They were intended only for old user-agents that don't support CSS. See explanation in the DTD.
W3C advises that you should use Strict whenever possible, and these days it's certainly possible.
Transitional version has already been removed in XHTML/1.1 and HTML5.
XHTML/1.0 has exactly the same elements and attributes (semantics) as HTML4. The XHTML/1.0 specification doesn't even specify any elements! For anything else than syntax, it refers to HTML4.
Additionally, you'll be unable to use any feature of XHTML that is not available in HTML (namespaces, XML DOM) if you send documents as text/html, and unfrortunately that is required for compatibility with IE and other HTML-only browsers.
In 2008 the correct choice would be HTML4 Strict:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
but as of 2016, there's only one version of HTML that matters.
<!DOCTYPE html>

Dillie-O is right on with his answer of XHTML 1.0 Transitional but I would suggest shooting for XHTML 1.0 Strict and only falling back on Transitional if there's some piece of functionality you absolutely need that Strict is not allowing.

#Mike:
While I agree that validity is not needed to make a page render (after all, we have to keep IE6 compatibility in...), creating valid XHTML that IS compatible AND valid is not a problem. The problems start when people are used to HTML 4 and using the depreciated tags and attributes.
Just because the Web is a pile of crap does not mean that every new page needs to be a pile of crap as well. Most Validation errors on SO are so trivial, it shouldn't take too long to fix, like missing quotes on attributes.
But it may still be kind of pointless, given the fact that the W3C does not have any idea where they want to be going anyway (see HTML 5) and a certain big Browser company that also makes operating systems does not care as well, so a site could as well send out it's doctype as HTML 1337 Sucks and browsers will still try to render it.

There are some compelling warnings about the usage of XHTML, primarily centering around the fact that the mime-type for such a document should be sent as:
Content-type: application/xhtml+xml
Yet IE 6 and 7 don't support this, and then websites must send it as:
Content-type: text/html
Unfortunately that method is considered harmful.
Some also bemoan the fact that although the intent of XHTML is to make web pages parsable by an XML parser, it has in practice failed due to incorrect usage on existing websites.
I still prefer to write documents in XHTML 1.0 Strict, mostly because of the challenge, and the cleanliness and error-checking that a validator gives. I enjoy the syntax a bit better, because it forces me to be very explicit in when tags end, etc. It's more for me a personal choice than purely technical.

Anything that renders your page is will do so regardless of which popular standard you use. XHTML is stricter and probably "better" but I can't see what advantages you will get with one standard over another.

Personally, I prefer XHTML 1.0 Transitional.
XHTML is XML based, so it allows easier parsing and you can also use the XML Components of most IDEs to programatically query and insert stuff.
Transitional is not as strict as strict, which makes it relatively easy to work with, compared to strict which can often be a PITA. Comparison between Transistional and Strict
1.0 is "more compatible" than 1.1 and 1.1 seems to be still under some sort of development.

I aim for XHTML 1.0 Trans. It's better to conform so when bugs are fixed in the browsers you won't suddenly be working against the clock trying to figure out what actually needs changing.
In my opinion 1.1 is borked and 2.0 has been smashed to smithereens: Do I really need/want a header/footer tag?

I'm all for XHTML Strict every time. I strongly believe that HTML should be more like XML. It's not hard to validate it if you know XML and the W3's validator ipoints you on the right track anyway.
XHTML 2.0 is heading toward what the W3 have been aiming for for a long time - the semantic web. The best benefit of XHTML 2.0 for me is that every conformant page on the web will be understandable as content, or an article (for that's what pages are - documents) becuase they all apply to the same standard. You would then be able to construct intepreters (i.e. browsers) that present the content in a completely different manner - there's literally thousands of ideas waiting here.

If you want to use XHTML 1.0 in an HTML-compatible way, that's fine. However, do note that the W3C validator and the XHTML DTDs know nothing about mime types and how browsers behave differently (like <map> name/id matching) between them. The DTDs know nothing about how well browsers support certain elements (like <embed> for example) either.
What this means is that the XHTML DTDs and the validator don't reflect reality and trying to conform to them is pointless.
If you want to use XHTML just so you can close certain elements with /> (where html-compatible), just use HTML5 markup (so the browser is in full standards mode). HTML5 allows the use of /> in an HTML-compatible way (the same HTML-compatible way you have to do it when using XHTML 1.0 markup with text/html). Then, just stick to what works (you know better than some DTD) in browsers.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<title></title>
</head>
<body>
<p>Line1<br/>Line2</p>
<p><img src="" alt="blank"/></p>
<p><input type="text"/></p>
<p><embed type="application/x-something" src=""/></p>
</body>
</html>
Then, use http://validator.nu/ to make sure it's well formed at least.

If you have tools to generate your XHTML like any other XML document, then go with XHTML. But when you just use plain text templates, text concatenation, etc. you are OK with good old HTML 4.01.
Browsers now start to support this 10 year old standard.
Important: Avoid being called a bozo when producing XML

I don't think it actually matters whether you use XHTML or plain HTML. The end goal here is to have low maintenance and quick development through a predictable rendering. You can get this from using xhtml or html, as long as you have validating code. I've even heard arguments that it's best to target quirks mode, because new versions of browsers don't change quirks mode, so maintenance is easy.
In the end, it all becomes tag soup, for good reason, because getting web app developers to write error-free html means asking them to write bug-free code. Validators are no help, because they only validate the initial page view. This is also why I've never seen the point in xhtml served as xml for anything beyond static sites. The level of arrogance a web apps developer would need to have to serve up their web app as xml is staggering.

HTML 4.0 Strict, or ISO HTML.

Related

Why put an XHTML doctype declaration on HTML files? What does that do?

I wonder about the number of web pages I encounter that are HTML files, but that wear an XHTML DOCTYPE declaration.
Why are people doing this? What do they hope to achieve? Why not reserve the XHTML doctype declaration for actual XHTML files?
Or am I missing something?
Edit: there is some confusion about what "actual XHTML files" are; to demonstrate that the difference is not caused by the DOCTYPE declaration, compare this file to this one. The first is HTML, the second is XHTML, although the contents are identical; only the file types differ. Both display fine in compliant browsers, but the first one is parsed with the HTML parser and the second one with the XML parser.
Why put an XHTML doctype declaration on HTML files? What does that do?
All that does is tell markup validators that they're about to validate an XHTML document, as opposed to a regular, SGML-rooted, HTML document. It describes the content, or more specifically the markup that follows, but nothing else.
Why are people doing this? What do they hope to achieve? Why not reserve the XHTML doctype declaration for actual XHTML files?
Or am I missing something?
Kind of. What actually happened was that people weren't aware that just putting an XHTML doctype declaration on top of an HTML document didn't automatically transform it into an XHTML document, although admittedly that was what everybody was hoping for.
You see, most web applications out there aren't configured to serialize XHTML documents as application/xhtml+xml properly, instead opting to serve pages as just text/html. (It's typically because of the .html file extension more than anything else, really; generally speaking, servers do correctly apply application/xhtml+xml to documents with .xhtml or .xht as the extension, but only static sites that actually make use of the file format will benefit from this.) That leads browsers to decide that they received a regular HTML document, and so that tag soup parsing nonsense we've all come to know and love inevitably ensues.
Note that it doesn't matter even if you have a meta tag like this on your XHTML document:
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
Browsers will ignore that, and only look at the actual HTTP Content-Type header that was sent along with the XHTML document.
To make matters worse, Internet Explorer, being the most-used browser in the past few years in XHTML's heyday, never properly supported the application/xhtml+xml MIME type before version 9 was finally released: instead of parsing the markup, constructing the DOM and rendering the page, all it would do was ask for a file download. That doesn't make a very usable XHTML page!
So, guess what we all had to live with until HTML5 became cool?
This, along with things like IE6 going quirky on pages with the XML declaration before the doctype declaration, is also one of the biggest factors leading to XHTML's downfall (along with XHTML 1.1 never gaining widespread usage, and XHTML 2.0 being canceled in favor of HTML5).
Most people use the XHTML doctype because they read it in an old book somewhere or read it on a forum but otherwise are using it for no technical reason they are aware of. Hardly anyone uses it properly by serving it as application/xml+xhtml. Serving XHTML pages as text/html means "tag soup" or "broken html". It should not be done but browsers generally handle it well.
You are correct in your wondering about this. It drives me crazy.
I assume that you're asking why people are serving XHTML documents as HTML, by using the text/html MIME type instead of application/xhtml+xml.
Mostly, it's because of a misguided understanding of compatibility: Lots of browsers simply don't understand the XHTML+XML MIME type, which has caused users to simply serve it as HTML to overcome this. Since browsers often don't complain about what they get, and people don't tend to research a lot, most people assume that the browsers just treat the XHTML-doctyped document as XHTML, even though it was served as HTML. But they don't - thry serve them as HTML. Since the two languages are so much alike, people rarely notice the difference.
So no, you're not missing anything; it's very bad practice. Nowadays, after HTML5, luckily, it seems to become less common.
The hilarious thing about XHTML is that because IE didn't understand the XML mimetype (application/xhtml+xml) at the peak of XHTML's popularity, most people never actually used the XML part of it as IE8 and lower refuse to render the content.
This meant that millions of sites think they are using standards compliant XHTML, when in fact they are being parsed as malformed/weird HTML4.
Luckily HTML5 came along and properly defined the parsing of documents, removing much of the ambiguity that surrounded XHTML (all that transitional and strict rubbish).
People who add the XML prolog before the doctype are doing themselves an extra disservice, as a comment before the doctype will cause old IE to use quirks mode, which among other things brings back the old box-model in IE6 and below. This undoubtedly has contributed to the mass hate of IE6, as in quirks mode it has significant bugs that cause modern layouts to be completely broken, rather than just lacking in newer features.
The short answer is that in this industry many people just copy and paste code without understanding it.

Why is XHTML syntax so widely used in web pages?

First of all let's emphasize that syntax rules don't work alone, but they need the correct Content-type header to be fully interpreted by the clients. Currently web pages cannot be served with the correct XHTML header because Internet Explorer doesn't understand that.
The first advantage usually mentioned is that XHTML requires pages to be well-formed: true, but when browsers treat them as (malformed) HTML nothing enforces this rule, so it's up to you being a disciplined developer -- but you can be as disciplined writing good well-formed HTML too.
Another point often mentioned is that XHTML promotes the separation between content and presentation, but even in this case it doesn't really offer anything that can't be done with HTML -- it still depends on the developer since nothing is enforced, and no exclusive tools are offered.
So why do so many developer (including those of famous CMS/blogging softwares) still use XHTML syntax instead of directly writing what those pages will become anyway (i.e. plain HTML)?
Related fact: Stackoverflow uses HTML strict.
http://en.wikipedia.org/wiki/XHTML
From the wiki:
"The only essential difference between XHTML and HTML is that XHTML must be well-formed XML, while HTML need not be."
It's up to you which one you choose. There is no real difference in terms of what the user sees. Whichever you choose, please try to make it well-formed and make sure that your HTML/XHTML validates and follows the standards.
This probably isn't the actual reason, but it makes them parsable using a regular XML parser.
Sadly, XHTML syntax isn't as widely used as the XHTML doctype. You'd think people would be conscious about it, but a lot of the time (at least a few years ago), an XHTML doctype was used mostly because HTML 4 was being "dissed". That hasn't stopped people from continuing to use HTML syntax though. Open ended <li> and <p> tags, non-terminated <br> and <img> tags, tag attributes not enclosed in quotes, and more hypocritical nonsense.
Currently web pages cannot be served with the correct XHTML header because Internet Explorer doesn't understand that.
Sure they can, provided you're prepared to use content negotiation to serve a application/xhtml+xml content type to those user-agents that say they accept it.
There a number of reasons both good and bad why xhtml is so widely used. Jay Askren has a point about people who use XML in other contexts, (I'm one of them), but I doubt if that accounts for much use. If there is a good reason why XHTML is popular, it's most likely that the orthogonality of XML is a very seductive idea. It's simply easier remembering "Always close every tag, always quote the attribute values" than trying to remember all the rules about when you can safely omit tags and leave attributes unquoted etc., even though it results in a more verbose document.
There are other reasons like the fact that it's easier to indent your code if every opening tag has a matching closing one, and if you do, you've got a pretty accurate picture of the DOM laid out in the source code, which can aid with scripting. But I doubt that this is a primary reason.
Using XHTML states an intent, don’t underestimate that (but don’t overestimate this either). Web standards are politics: if nobody cares, nothing is gonna change. Using XHTML (or HTML5) signals “yes, we are in fact interested in the continued development of the standards.
Furthermore, while clients certainly don’t enforce XHTML rules with a text/html content type, design tools still can do this. XHTML is much easier to support for editors than real HTML (with “real” I mean the whole ugly SGML package). There are good XHTML validators that do much more than HTML validators can (e.g. Schneegans’ XML schema validator).
All in all, many arguments against XHTML are in fact straw-men that aim at some of the poorly-formulated arguments for XHTML. For instance, Microsoft is responsible of publishing long lists of purported XHTML advantages (such as semantic web design). Attacking those arguments is like reductio ad absurdum. But there are good arguments for XHTML.
I suspect a major reason xhtml is so popular is cultural and historical more than anything. XML became quite popular some time ago and it is still used quite heavily. It is good for for defining a data model that can be sent over the wire using webservices. There are lots of tools/technologies that work with it such as xslt and many others. It is natural for a developer to use html which is structured like xml, even if there is no real advantage just because they use xml in other contexts.

At the end of the day, why choose XHTML over HTML? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I wonder why I should use XHTML instead of HTML.
XHTML is supposed to be "modularized", but I haven't seen any server side language take advantage of any of that.
XHTML is also more strict, and I don't see the advantage. What does XHTML offer that I need so bad? How does it make my code "better"?
EDIT: another question I found in the comments: Does XHTML parse faster than HTML?
EDIT2: after reading all your comments and the links, I indeed agree that another post deserves to be the correct answer, so I chose the one that directly links to the best source.
Also, goes to show that people upvote the green comment without even reading it.
You should read Beware of XHTML, which is an informative article that warns about some of the pitfalls of XHTML over HTML.
I was pretty gung-ho about XHTML until I read it, but it does make several valid points. Including the following bit;
XHTML 1.x is not “future-compatible”. XHTML 2, currently in the drafting stages, is not backwards-compatible with XHTML 1.x. XHTML 2 will have lots of major changes to the way documents are written and structured, and even if you already have your site written in XHTML 1.1, a complete site rewrite will usually be necessary in order to convert it to proper XHTML 2. A simple XSL transformation will not be sufficient in most cases, because some semantics won't translate properly.
HTML 4.01 is actually more future-compatible. A valid HTML 4.01 document written to modern support levels will be valid HTML 5, and HTML 5 is where the majority of attention is from browser developers and the W3C.
Future compatibility can be huge when working on some projects. The article goes on to make several other good points, but I think that may have stood out the most for me.
Don't mistake the article for a rant against XHTML, the author does talk about the good points of XHTML, but it is good to be aware of the shortcomings before you dive in.
I was going to add this as a comment to one of the other posts, but it grew a little too large.
What the fundamental point that most people seem to be missing, is the purpose behind XHTML. One of the major reasons for developing the XHTML specification was to de-emphasise presentation-related tags in the markup, and to defer presentation to CSS. Whilst this separation can be achieved with plain HTML, this behaviour isn't promoted by the specifcation.
Separating meta-markup and presentation is a vital part of developing for the 'programmable web', and will not only improve SEO, and access for screen readers/text browsers, but will also lead towards your website being more easily analysable by those wishing to access it programmatically (in many simple cases, this can negate the need for developing a specific API, or even just allow for client-side scripts to do things like, identify phone numbers readily). If your web-page conforms to the XHTML specification, it can easily be traversed using XML-related tools, and things such as XPath... which is fantastic news for those who want to extract particular information from your website.
XHTML was not developed for use by itself, but by use with a variety of other technologies. It relies heavily on the use of CSS for presentation, and places a foundation for things like Microformats (whether you love them, or hate them) to offer a standardised markup for common data presentation.
Don't be fooled by the crowd who think that XHTML is insignificant, and is just overly restrictive and pointless... it was created with a purpose that 95% of the world seems to ignore/not know about.
By all means use HTML, but use it for what it's good for, and take the same approach when looking at XHTML.
With regard to parsing speed, I imagine there would be very little difference in the parsing of the actual documents between XHTML and HTML. The trade-off will come purely in how you describe the document using the available markup. XHTML tags tend to be longer, due to required attributes, proper closing, etc. but will forego the need for any presentational markup in the document itself. With that being the case, I think you're talking about comparing one type of apple, with a very slightly different type of apple... they're different, but it's unlikely to be of any consequence (in terms of parsing and rendering) when all you want is a healthy, tasty apple.
For the visitor of a website it probably doesn't make any visible difference. Furthermore, XHTML is usually more of a pain to use as at least one widespread browser still doesn't know how to handle it and you need to serve it as text/html in that case (which yields invalid HTML).
If your HTML is going to be regularly processed by automated tools instead of being read by humans, then you might want to use XHTML because of its more strict structure and being XML it's more easy to parse (from an application standpoint. Not that XML is inherently easy to parse, though).
Apart from that I don't see any compelling reasons to use it, though. XHTML was created in an approach of making use of XML features for HTML and basically it boils down to "HTML 4 with several annoying side-effects" (IMHO, at least).
Use HTML (HTML4 Strict or HTML5).
HTML can fully utilize CSS, can be validated and parsed unambiguously. Separation of structure and presentation has been done in HTML4 and XHTML merely continued that.
All browsers support HTML. Only some browsers support XHTML and those that do, often have more mature and better tested and optimized support for HTML (it's caused by the fact that tiny fraction of pages uses XML mode).
If you care about IE and Google, you have to use HTML or subset of XHTML and HTML defined in Appendix C of XHTML spec. The latter is almost worst of the both worlds, because such XHTML cannot be generated with standard XML tools, cannot use extension mechanisms new to XHTML and has additional limitations over those in HTML alone.
XHTML1.0 is now over 10 years old, it was designed in "Web1.0" times, and as head of W3C said, in retrospect it didn't work out and better approach is needed. W3C HTML5 is written as we speak and addresses needs of web applications used today, and has very good backwards compatibility.
HTML5 closes many gaps that were between HTML4 and XHTML1 (e.g. adds inline SVG, MathML i RDF), cleans up language beyond what was done in XHTML1.0 and XHTML1.1.
XHTML2 is not going to be supported by web browsers in forseeable future. It's likely that it will never be supported (all browser vendors heavily support [X]HTML5, some have already declared that they won't implement XHTML2).
XHTML1.0 has exactly the same semantics and separation of presentation from structure as HTML4.01. Anybody who says otherwise, hasn't read the specification. I encourage everybody to read the spec – it's suprisingly short and uninteresting.
Stylesheets were introduced in HTML4.01 and were not changed in XHTML1.0.
Presentational elements were deprecated in HTML4.01 and were not removed in XHTML1.0.
XHTML myths.
There are no untractable differences in HTML and XHTML that would make parsing of one much slower than another. It depends how the parser is implemented.
Both SGML and XML parsers need to load and parse entire DTD in order to understand entities. This alone is usually more work than parsing of the document itself. HTML parsers almost always "cheat" and use hardcoded entities and element information. XHTML parsers in browsers cheat too.
Parsing of HTML requires handling of implied start and end tags, and real-world HTML requires additional work to handle misplaced tags.
Proper parsing of XHTML requires tracking of XML namespaces.
Draconian XML rules require checking if every character is properly encoded. HTML parsers may get away with this, but OTOH they need to look for <meta>.
The overall difference in cost of parsing is tiny compared to time it takes to download document, build DOM, run scripts, apply CSS and all other things browsers have to do.
I'm surprised that all the answers here recommend XHTML over HTML. I am firmly of the opposite opinion - you should not use XHTML, for the foreseeable future. Here's why:
No browser interprets XHTML as XHTML unless you serve it as mimetype application/xhtml+xml. If you just serve it with the default mimetype, all browsers will interpret it as HTML - eg, accepting unclosed or improperly nested elements.
However, you should never actually do this, as Internet Explorer does not recognise application/xhtml+xml, and would fail to render the page completely.
There are significant differences in the DOM between XHTML and HTML. Since all so-called XHTML pages are being served as HTML at the moment, all javascript code is written using the HTML DOM. If, support for the XHTML mimetype becomes significant enough to convince people to start using it, most of their javascript code will break - even if they think their pages validate as XHTML.
Instead of continuing to debate HTML 4.01 Strict vs XHTML Strict, I would suggest starting to use HTML 5 today. John Resig, the author of jquery, made a similar suggestion last year on his blog.
The HTML 5 doctype, in it's beautiful simplicity will trigger standards mode in all browsers (including IE6).
<!DOCTYPE html>
That's it.
HTML 5 provides some exciting new features such as the <canvas> tag which potentially can push javascript application development to the next level. HTML 5 also has proper support for media (and media is a fairly important aspect of the web these days!) in the form of <video> and <audio> tags.
If you like the syntax of XHTML, i.e. closing "empty" tags such as <br />, that is fully supported in HTML 5. From Karl Dubost of the W3C's post Learn How To Write HTML 5:
auto-closing tag is allowed and conformant in HTML 5.
XHTML2 has received relatively little attention compared to HTML 5. It's becoming increasingly clear that HTML 5 is the future of markup on the web. Microsoft's latest browser, IE8 still renders XHTML served as text/xml as text/html.
Microsoft have a co-chair on the W3C HTML working group and there's an implied support from them for HTML 5. All of the browser vendors have publicly announced their support for HTML 5.
At the end of the day, even if XHTML2 regains support from the industry, it won't be a significant issue having two competing standards as it has been in the past. Both languages support XML namespaces (in the case of HTML 5, serialization of HTML i.e. DOCTYPE switching).
As a programmer, you should be VERY concerned about your code. HTML is ugly and follows few rules.
XHTML on the other hand, turns HTML into a proper language, following strict structural and syntactic rules.
XHTML is better for everyone, as it will help move the web to a point where everyone (all browsers) can agree on how to display a web page.
XHTML is an XML descendent, and us such is much easier on parsers built for the job of analysing syntactically sound XML documents.
If you can't see the benefit of XHTML, you might as well be using MS Word to create your HTML documents.
Take a look at http://www.w3.org/MarkUp/2004/xhtml-faq#need. There are some good reasons apart from modularisation.
I favor XHTML because it's stricter and more clearly laid out. HTML is quirky and browsers have to accept things like <b><i>sadasd</b></i>.
While this is a really simple example, it could also get more confusing and different browsers could lay out things differently.
Also I think that XHTML has to be "faster" since the browser doesn't have to do that kind of "reparations".
Some differences are:
XHTML tags must be properly nested
The documents must have one root element
XHTML tags are always in lowercase
Tags must always be closed (e.g. using the <br> tag in XHTML must have closing tag <br /> or <br></br> in XHTML)
Here are some links on it
wiki XHTML
wiki HTML vs XHTML
XHTML allows to use all those tools designed for XML. Among then, there is XSLT, embedding SVG, etc...
Interesting development: XHTML 2 Working Group Expected to Stop Work End of 2009, W3C to Increase Resources on HTML 5
2009-07-02: Today the Director announces that when the XHTML 2 Working Group charter expires as scheduled at the end of 2009, the charter will not be renewed. By doing so, and by increasing resources in the Working Group, W3C hopes to accelerate the progress of HTML 5 and clarify W3C's position regarding the future of HTML. A FAQ answers questions about the future of deliverables of the XHTML 2 Working Group, and the status of various discussions related to HTML. Learn more about the HTML Activity.
Well, I guess that makes the future of HTML pretty clear.
XHTML forces you to be neat.
For example, in HTML, you can write:
<img src="image.jpg">
This isn't very logical, because the img tag never gets closed. In XHTML, however, you're forced to close the tag neatly, like this:
<img src="image.jpg" />
I like using something that forces me to be neat.
Steve
The subtitle to the XHTML 1.0 recommendation:
A Reformulation of HTML 4 in XML 1.0
Many tools exist today to process XML. By using XHTML, you are allowing a huge set of tools to operate on your pages and to extract information programmatically.
If you were to use HTML, this would be possible too. There are tools in existence to parse HTML DOM trees. However, these tools can often be more specialized than those for XML. You may not find your favorite XML data processing tools compatible with HTML. Furthermore, there are so many uses for XML nowadays that you may be using XML for some other part of an application; why not also use that same XML parser to parse your web pages? This is the motivation behind XHTML.
If you're already comfortable and familiar with HTML 4.01, you have an established project using HTML 4, and you don't have tons of spare time, just go with HTML 4.01. If you have spare time, learn XHTML 1.1 anyway, and start your new projects in XHTML 1.1 – there's no harm in doing so. If you're using something other than HTML 4.01 or are pretty unfamiliar with HTML 4 anyway, just learn XHTML 1.1.
Using XHTML with the correct DocType will force the browser to render the content in a more standards compliant (strict) mode. This makes the different browsers behave better and, most importantly, more like each other. This makes your job as a webdeveloper a lot easier since it reduces the amount of browser specific tweaks needed to make the content look the same in all browsers.
Quirksmode.org has a lot of good info on this subject.
In my opinion, the strictness is, at least in theory, a good thing, because in HTML, you don't need to be strict, and because of that and the HTML5 junk, Browsers have advanced error correction algorithms that will make the best out of broken HTML. The problem is, the algorithms are not exactly the same and will lead to really strange behaviour you can't predict. With XHTML, on the other hand, you typically have fine, valid XHTML and so the error correction algorithms are not needed, i.e. the entire Browser behaviour is predictable. In addition, strict code makes it easier for your tools to work with the code. So you have actually nothing to lose by using XHTML, but there is some potential to gain. Things will get worse with plain HTML when HTML5 is finally out and the "be open in what you accept" will lead to the described strange behaviour. But at least then it's a standardized strange behaviour. Sigh.
On the other hand, if you use a good IDE like Visual Studio, it's almost impossible to produce broken HTML code anyway, so the result is the same.
Use XHTML
Fails fast. If there are any inconsistencies they will be found during validation.
It encourages better design by separating semantic markup from presentation etc.
It's structured which means that you can treat it as a data object and run all sorts of queries against it. For example you could find all addresses or citations within your website.
You can do build-time optimizations. Since it's well-formed XML you can easily do find/replace operations during build time. Or any document management and manipulation.
You can write XSLT or other transformation scripts to programatically transform your XHTML for other platforms. For example you could have an XSLT for the iPhone that would transform all XHTML to make it compatible or more user-friendly for the iPhone
You are future proofing yourself. Transforming XHTML to newer semantics is again, very easy using transformation.
Search engines will continue to evolve to gather more semantic information as part of the programmable web.
DOM operations are more reliable since it's structured.
From an algorithmic perspective, it yields easier and faster parsing.
XHTMl is a good standing point to use because if you want valid code you would need to provide some aspect of help to the disabled community due to the fact screen readers need the alt and title parts of the image and link tags.
It must be faster to parse to an extent because unlike HTML the parser wouldn't need to check to see if the tag wasn't closed properly, if it was nested correctly etc.
Also it is better to use it because yes it is strict but it helps you to think more logically (in my opinion) when it comes to learning programming languages.
I believe XHTML is (or should be) faster to parse. A valid XHTML document must be written to a stricter spec in that errors are fatal when parsing, whereas HTML is more lenient and allows for oddities mentioned before my comment like out of order closing tags and such. I found this helpful in uncovering the differences between HTML and XHTML parsing:
http://wiki.whatwg.org/wiki/HTML_vs._XHTML#Parsing
A reason you might use XHTML over HTML might be if you intend to have mobile users as part of your audience. If I recall, many phones use something more of an XML parser, rather than an HTML one to display the web. If you are writing for desktop browsers, HTML would probably be acceptable.
That said, if you are going to serve the data as text/html anyway, you should use HTML:
http://www.hixie.ch/advocacy/xhtml

HTML vs XHTML does it still matter? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm wondering if I should bother at all about the markup language, as long as i produce valid markup.
I've read articles that point out HTML is the best choice and they come directly from the horse's mouth (the browsers implementors!):
http://webkit.org/blog/68/understanding-html-xml-and-xhtml/
https://developer.mozilla.org/en/Mozilla_Web_Developer_FAQ
Other articles, by James Bennet, make another point that if you're not serving XHTML as XML then you don't want XHTML but HTML.
http://www.b-list.org/weblog/2008/jun/18/html/
http://www.b-list.org/weblog/2008/jun/21/xhtml/
So i thought that if i wanted to trigger Standard Compliant Mode i should just use HTML strict validation. But that's not the case anymore with at least the most modern browsers (aka everything but IE6): if you have valid XHTML Strict you still trigger Standard Compliant Mode, hence, as long as i produce valid markup, why bother?
I always use HTML 4.01 strict for the time being. HTML 5 isn't definitive yet. I used to be a diehard XHTML user, but my reasons etched away and I'm much happier and more productive.
The arguments for XHTML generally tend towards the "cleaner markup" or talking about well-formed markup. This seems mostly like a strawman argument, and doesn't hold up under a thorough beating.
If XHTML is guaranteed to be parsed by an XML parser, it generally won't look cleaner than HTML 4.01 strict (just comparing strict doctypes).
For one, having to write URLs as http://example.com/?foo=bar&baz=qux looks awkward. Declaring the entity types gets old.
The other thing is that markup generally doesn't translate remarkably well as an XML Tree, but a Dom tree is fine.
HTML 4.01 strict is moderately easier to use and create valid sites. You don't have to put meaningless closing tags on elements like <img>, <br> or <link>. Just putting the backslash doesn't change anything of any particular meaning.
Douglas Crockford, of Yahoo and everything markupy, says it best to think of the markup as an application delivery format.
As such, what is going to be the easiest to deliver and more robust and reliable. This is what ultimately made the decision for me. All web browsers handle XHTML differently, and require munging of the Content-type header. If you use "text/xhtml" or "text/xml" you get different results.
Additionally, "text/xml" doesn't play nicely with REST because that should mean XML serialization of the data and not a formatted markup page (Safari gets this one wrong, in my opinion, by requesting text/xml before text/html as desired content-types!)
So, use HTML 4.01 because:
It works more similarly across all browsers
Doesn't require Content-type based handling (text/html does the trick across the board)
Isn't as brittle as XHTML
HTML 5 doesn't offer anything significant over HTML 4.01 strict
Pick one and stick to it, and be as compliant as possible in the face of other constraints. Don't think for a second, however, that HTML vs. XHTML is a more important issue than getting the job done and the site up and running, because that's what's going to bring in the revenue. Users don't give a toss about XHTML.
if you go xhtml, please choose xhtml 1.0, and not xhtml 1.1, unless you intend to serve it with a correct xml or xhtml mimetype. Actually, on second thought, don't do that either. There are huge crippling disadvantages to serving xhtml 1.0 or 1.1 with the correct mimetype. The slighest error and you get yellow screen of death!
The w3c specs say that it is okay to serve xhtml 1.0 as text/html as long as you follow certain rules for backward compatibility (mainly in self closing tags, include a space before the / forward slash.
Aside from that there's other arguments for/against. I tend to use xhtml because of the various tools and libraries that are available that can parse valid xml, making thinks like xslt transformations to/from xhtml possible. (useful?).
Another thing is that it's possible to parse valid xml in flash- so you may choose xhtml for dynamic content replacement with a flash movie, or otherwise dynamically load xhtml content into a flash movie. Or really, anything that can read xml can read xhtml. that's a lot of things.
My advice is to use HTML4 Strict or HTML5. Valid, in standards mode, with CSS for layout. You'll get all the benefits that are commonly associated with XHTML, but without any of the problems.
Remember: XHTML DOCTYPE does not enable parsing of document as XHTML. It only enables standards mode, the same which is available to HTML 4 Strict and HTML 5.
XHTML/1.0 has identical semantics and practically identical CSS support as HTML 4.01.
Valid HTML 4.01 is parsed unambiguously just like valid and well-formed XHTML/1.0.
XML DOM gives you namespaces support, but takes away support for document.write and innerHTML.
Without proper XML MIME type set in HTTP headers (not document itself) all you get is parsing of everything as HTML and HTML DOM.
XHTML is still not supported in Internet Explorer at all (including IE8). The best you can get in IE (and Googlebot) is XHTML misinterpreted as HTML with syntax errors (whether that's 70% or 30% of your audience, it's still something to think about).
Try forcing "XHTML" websites to use actual XML mode, and you'll quickly notice that almost nobody uses XHTML. They just slap wrong DOCTYPE on their HTML:
http://schneegans.de/xp/?url=http://www.wired.com/
http://schneegans.de/xp/?url=http://script.aculo.us/
These "XHTML" pages work only because they're usually interpreted as HTML.
I would go for xHTML, mostly because the code is cleaner and easer to maintain.
But here are some interesting points on why not to use xhtml http://meiert.com/en/blog/20081219/html-vs-xhtml/
If you really don't care about sketchy support, try HTML 5. My fallback would be HTML 4.01 Strict, unless I needed something like inline SVG or similar, in which case it's XHTML (served as XML) all the way.
In most case, it does not matter. In fact, using XHTML cause more headache. However, there is a few situation that XHTML is needed. I can think of two.
First, if you want to use embeded SVG, you need XHTML.
Second, if you want to use HTML mark up as XML. Sometime (for unknown reason), I found that my AJAX request verify the code even when I mark it to be html or text. And to quickly avoid that, I change it to XHTML.
That is all.

What problem does XHTML strict solve?

I really don't understand the fascination with XHTML strict. Inline JavaScript typically requires a rats nest of escapes to make it compatible with XHTML and semi-backwards compatible with MSIE 5 & 6. Then there is the issue of not being OCD enough on user input to make sure you don't miss any illegal characters. It just seems like more effort then its worth. Nevermind that almost every developer I've worked along side of keeps forgetting to ensure the content-type returned from the server is reset for XHTML pages from text/html to application/xhtml+xml.
Wish I knew the name of the blogger, but someone else pointed out that a majority of supposedly XHTML compliant websites and open source packages are actually not because of that last issue, forgetting to set the content-type header correctly.
I'm looking to understand why XHTML is useful, or build enough of an arsenal of arguments to prevent it ever being used in future projects that I have influence on.
XHTML1 vs HTML4 and Strict vs Transitional are completely orthogonal issues.
XML might not give any huge advantage to browsers today, but on the server end it's an order of magnitude easier to process documents using XML than trying to parse the mess that is old-school-SGML-except-not-really HTML4.
Restricting yourself to [X]HTML Strict doesn't achieve anything in itself, other than simply that it discourages the use of old, less-maintainable techniques you shouldn't be using anyway.
Inline javascript typically requires a rats nest of escapes to make it compatible with XHTML
You can get away without any escapes as long as you don't use the characters < or &. And ‘// < [CDATA[’ isn't really much worse than ‘< !--’ was in the old days.
In any case, keeping the scripting external is much more manageable; you don't want to be doing anything significant inline.
Then there is the issue of not being OCD enough on user input to make sure you don't miss any illegal characters.
Out-of-band characters are exactly as invalid in HTML4 Transitional as in XHTML1 Strict.
If you're accepting user-submitted HTML and not checking/escaping it with enough of a fine tooth comb to prevent well-formedness errors you have much bigger problems than just complying with a doctype. You'll be letting injection hacks through and making your site vulnerable to cross-site-scripting security holes.
forgetting to ensure the content-type returned from the server is reset for XHTML pages from text/html to application/html+xml.
It's not ‘forgetting’, it's deliberate: there is not really that much point in serving application/xhtml+xml today. To account for IE you have to sniff UA, and then make sure you understand the CSS and JavaScript differences that pop up in both parsing modes... you can do it to prove your technical prowess, but it doesn't really get you anything.
Serving XHTML as legacy HTML may not be ideal, but it lets you keep the simpler, more processable syntax of XML (and potential interoperability with other XML languages like SVG) whilst still being browser-friendly.
People complain about the pickiness of the well-formedness errors, but having those errors picked up straight away for you to fix them is way better than leaving them there silently, ready to trip up some future browser.
there is a great post about the usage of XHTML # Beware of XHTML.
Hope it helps,
Bruno Figueiredo
XHTML 1.0 Strict tries to solve four problems:
XML is W3C technology and HTML4 wasn’t using it. Not your problem.
Strict seeks to be more theoretically pure than Transitional when it comes to presentationalism. But this is not an XHTML vs. HTML issue.
XML parser is supposedly simpler. (Not entirely true; the code for dealing with the DTD part is pretty complex.) These days, you get both XML and HTML parsers off-the-shelf, so this isn’t your problem. (Aside: the mobile argument is utterly bogus.)
application/xhtml+xml (though not valid XHTML 1.0 Strict!) allows you to mix other vocabularies. If you want to use inline MathML or SVG today, this is the main reason to use application/xhtml+xml today. However, the direction the HTML5 work is taking is making it possible to use MathML and SVG in text/html.
XHTML is useful because it's much easier to create a simple transforming stylesheet or roll your own parser for it, than it is for HTML.
Do you have to parse your HTML with a program, o for some tests? Then, use XHTML.
For everything else, HTML 4.01 (strict, loose, transitional, whatever) is perfectly "standard" and less "troublesome".
XHTML enables you to advanced rendering like SVG (Scalable Vector Graphics), which itself is an XML, but can easily be embedded in XHTML through the XML namespace extension without <embed> or <object>. Unfortunately, only Firefox and Safari does support it. Sorry IE6 users.
For more on SVG at http://en.wikipedia.org/wiki/Svg
XHTML makes HTML orthogonal with all the other xml-based structures in our universe, which has two primary benefits.
Design patterns we use in dealing with xml can be applied to html.
Software tools ditto.
XHTML has the advantages of xml. But then why the strict variant?
I see some similarities with deprecated functions. You can still use them this version, but they are possibly removed the next version. So I see the transitional version as deprecated use. It still works and it will work for a couple of versions, but if you want to build for the future, use the strict version.
Strict is intended to formalize the separation between content and style by making it more difficult to commingle the two. Elliotte Rusty Harold has a good write up on XHTML in one of his books, here's the relevant excerpt on 'Why XHTML'.
The only thing I've seen solved by XHTML is the "problem" of users using Safari: I don't know if the bug is still there, but when we were last asked to write in XHTML, we ran across a bug that made XHTML unusable with Safari. In XHTML, the following URL isn't allowed in anchor tags, because the ampersand isn't escaped:
http://www.example.com/page.php?arg1=val1&arg2=val2
so what you have to do is replace it with & like this:
http://www.example.com/page.php?arg1=val1&arg2=val2
but Safari converts & to & so you get this URL:
http://www.example.com/page.php?arg1=val1&arg2=val2
...and the hash symbol ends the URL as far as PHP is concerned. I know that there are ugly hacks that allow you to pass two variables in other ways, but if XHTML is going to force you to use ugly hacks, then you're better off without it.
Personally, I liked the concept of XHTML: much cleaner than most HTML we can see, easier to parse and validate. Like everybody, I started to code XHTML pages. BTW, I don't see an issue with inline JavaScript, no need for escapes if you put the code in CDATA. And IE5 is fortunately a bit out of the browser landscape, like Netscape 4 which forced us to write / > instead of />, thing I still see in pure XML sometime...
Now, I have read a number of articles, like the one linked by Bruno, which has lot of good arguments against its use in most cases. Basically, it says most browsers aren't just ready for strict XHTML (served as XML), it doesn't make much sense to server XHTML as HTML, and anyway it isn't that useful in the majority of sites.
Look at the arguments above: they are perfectly valid, and it is great to be able to put MathML or SVG directly in the page, to transform XML with an XSLT parser, to process the page with an XML parser.
But how often do you do that? Parsing the page is most often the problem of end users, which can use a good HTML parser. And given the number of browsers able to manage MathML, SVG or XSLT, it is more a need for intranet than for the vast Internet.
You can have an e-commerce or a blog or a forum, which spits out good XHTML pages. And the persons writing the descriptions, articles or messages insert <p><p><p> to skip some lines, when it isn't <p/> or some other exotic construct...
I believe in XHTML, but I think I will no longer use it for the little pages I do for my site. I will use HTML 4 with well written code (quoted attributes, closing tags even if optional, etc.).
And after all, if W3C is working in HTML 5, it is for a reason: HTML has still a live ahead, otherwise it would have been killed in favor of XHTML 2.
XHTML is by definition XML, unlike HTML.
This means you can do funky useful stuff with it, such as easily validate and parse it (since you know it's XML and thus can use the myriad of tools available).
Also, geeks like to make things "more correct" ;-)
This is a global standard issue
This is not just about xHTML, but about all the standards in the world. You need to make things clearer, from version to version.
xHTML is square and pushes coders to add semantic value to the code. It's fully XML compatible and therefor more easyly parseable, stylisable, etc.
Remember that a code is not just for coders, bot for machines too. In 10 years, people creating browsers or libraries won't want to implement the same complexes rules for old HTML processing but will rather expect something as clean as possible.
Search engine needs something to rely on to build semantic links between value and so it's better if there is only one easy way to do it.
And I am not talking about screen readers...
Standard, is above all, about going toward one unique open solution that fit everybody's need. Not just about adding new shiny features.