HTML Validator for websites - html

I have website templates and want to validate its HTML. I've heard that there is a w5 or something like that, I don't remember the validator to check HTML errors. So can anyone post me a link of that?

http://validator.w3.org/ This is the standard.
This checks the markup validity of Web documents in HTML, XHTML, SMIL,
MathML, etc.
For things like CSS validity you should take a look at: http://jigsaw.w3.org/css-validator/

The website you are looking for is W3 Markup Validation Service (http://validator.w3.org/). The W3 stands for World Wide Web.

This depends on what you expect validation to be. The W3C Markup Validation Service http://validator.w3.org/ is the usual answer, but it answers a question different from what people usually mean when they ask for validation.
It is originally based on the SGML concept of validation, which is a purely formal process and does not check for correctness in general, only syntax rules specified in a formalized way. In addition to detecting actual markup errors, it issues messages about violations of formal syntax that do no harm in any browser (e.g. title=Hello! without quotation marks) and accepts constructs that do not work in any browser (say <em/Hello). More info: “HTML validation” is a good tool, but just a tool.
The W3C Markup Validation Service isn’t really a pure SGML validator (or a pure XML validator) but has some checks and warnings beyond that. There’s a more pure SGML validator: WDG HTML Validator.
When processing a document declared as HTML5, using <!DOCTYPE html>, the W3C Markup Validation Service changes it nature: it becomes an ad hoc checker for specifically HTML5, behaving rather differently from an SGML validator. It is described as experimental, and it does not necessarily correspond to the most recent version or HTML5 (or of “HTML Living Standard”). Technically it is based on Validator.nu Living Validator.
With very small exceptions, these services do not check for the correctness of embedded or linked CSS and JavaScript at all, or links, semantic correctness, usability, accessibility, etc. So a “valid” page may look complete mess and fail to work at all.
There is also software marketed as “HTML validator” (as you can find out by googling), typically performing a large set of heuristic checks, partly based on HTML specifications, partly on other specs, and partly by software vendor’s own ideas of what is correct.

Related

Is there a better approach to html standards validation than w3c validator?

Granted, this is a very generic question, but I am wondering if w3c validation is considered a best practice for html validation, or if there are better approaches to ensure contemporary standards-compliant markup.
This question arose when I noticed duplicate IDs on an MDN page (a site I would have assumed would be very strict about its coding practices). It appeared to be an artifact of how they generated the sections of the page.
Curious, I validated the page's code on the w3c validator, and there were various "errors" that suggested that MDN was just ignoring that a certain attribute or value was not valid. Generally, these related to seemingly appropriate uses of rel attributes.
I was left wondering if standards for valid, semantic markup matter less, or if there's a new ideal approach to code validation and standardization than relying on w3c validation.
Maintainer of the current W3C HTML Checker (validator) here. I think it's important to understand the intended purpose of the current HTML checker, which is different from the purpose of the legacy W3C Markup Validator.
The purpose of the checker is documented at https://validator.w3.org/nu/about.html#why-validate:
The core reason to run your HTML documents through a conformance checker is simple: To catch unintended mistakes—mistakes you might have otherwise missed—so that you can fix them.
Beyond that, some document-conformance requirements (validity rules) in the HTML spec are there to help you and the users of your documents avoid certain kinds of potential problems.
There are some markup cases defined as errors because they are potential problems for accessibility, usability, interoperability, security, or maintainability—or because they can result in poor performance, or that might cause your scripts to fail in ways that are hard to troubleshoot.
Along with those, some markup cases are defined as errors because they can cause you to run into potential problems in HTML parsing and error-handling behavior—so that, say, you’d end up with some unintuitive, unexpected result in the DOM
Validating your documents alerts you to those potential problems.
So as far as your question about "are better approaches to ensure contemporary standards-compliant markup", the answer is that it's not an either-or thing; there are a variety of approaches and the W3C HTML Checker is just one of them, and its goal isn't to be the single way to determine anything but instead to just help you catch mistakes you might otherwise miss and that might cause unexpected problems for your users.
As far as ways to get alerted to specific device issues or browser-implementation issues, we don’t have good automated checking tools for that, but a couple of things which are huge help there are:
https://caniuse.com/ — detailed information about the level of support for particular web-runtime features in different browsers, and in different versions of those browsers, and in release of the browsers for mobile devices vs desktop
https://wptdashboard.appspot.com/ — current test results across all major browser engines for dozens of web-runtime features/specs; if https://caniuse.com/ doesn’t have information about a particular feature, you can look through this dashboard and browse to the directory that has tests for that feature, and find whether a browser passes the tests for the feature
But as far as good automated tools we do actually have for checking other things, here are two:
https://validator.w3.org/i18n-checker/ — W3C Internationalization Checker
https://observatory.mozilla.org/ — for doing a security assessment of the content of your site
I recently faced a problem with mentioned above W3C HTML Checker. I respect a huge amount of work that was done by author of this validator, but it did not allow me in any way a tag <script type="text/vbscript" src="file.vbs">. It was said to change type value to empty string, a JavaScript MIME type, or module, which makes my page useless.
I know than VBScript language is rarely used now, it was just a test page, but let me share with you less tricky alternative, as good as the first one for HTML error checking.
Maintainer of the current JsonFormatter (validator) is here

Declaring Doctype HTML5 [duplicate]

The "old" HTML/XHTML standards have a DTD (Document Type Definition) defined for them:
HTML 4.01 http://www.w3.org/TR/html401/sgml/dtd.html
XHTML 1.0 http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Strict
This DTDs specify the rules for nesting elements - "which types of elements may appear in which types of elements". I made a diagram for XHTML 1.0 here (sorry, I no longer have that resource)
I would like to update that diagram with a new version which also includes the new HTML5 elements. However, there doesn't seem to be a HTML5 DTD. It seems that the nesting rules are defined by the various content models that are defined in HTML5.
So there is no DTD, correct?
Follow-up question: Is there a reason why there is no DTD in HTML5? The DTD is such a nice method of defining the nesting rules for all the different types of elements. Why wouldn't they include such a thing?
Update: I found this: http://www.w3.org/TR/html5/dom.html#kinds-of-content I guess, this is the closest to having a DTD.
Update: The Visual Studio Team made a XML Schema for XHTML5. I guess that answers my question: Link
There is no HTML5 DTD. The HTML5 RC explicitly says this when discussing XHTML serialization, and this clearly applies to HTML serialization as well.
DTDs have been regarded by the designers of HTML5 as too limited in expressive power, and HTML5 validators (basically the HTML5 mode of http://validator.nu and its copy at http://validator.w3.org/nu/) use schemas and ad hoc checks, not DTD-based validation.
Moreover, HTML5 has been designed so that writing a DTD for it is impossible. For example, there is no SGML way to capture the HTML5 rule that any attribute name that starts with “data-” and complies with certain general rules is valid. In SGML, attributes need to be listed individually, so a DTD would need to be infinite.
It is possible to design DTDs that correspond to HTML5 with some omissions and perhaps with some extra rules imposed, but they won’t really be HTML5 DTDs. My experiment with the idea is not very encouraging: too many limitations, too tricky, and the DTD would need to be so permissive that many syntax errors would go uncaught.
Correct. There is no DTD. However, HTML5 documents should start with <!DOCTYPE html>
So there's a DOCTYPE, but no DTD.
See:
http://dev.w3.org/html5/spec/syntax.html#the-doctype
http://en.wikipedia.org/wiki/Document_Type_Declaration#HTML5_DTD-less_DOCTYPE
I have created an HTML5 DTD for use in my PHP XML projects. It ain't beautiful, but it works with well-formed XHTML5 (that is, HTML5 expressed as XML).
You can grab it from my bitbucket account here:
https://bitbucket.org/kashbridge/dtd/overview
Enjoy!
Certain Marcus from sgmljs.net created and analyzed an SGML DTD for HTML 5.1 and started a thread in the XML-DEV mailing list for review and discussion. The discussion revolves around entity definitions so far.
I've just completed my analysis of W3C's HTML 5.1 recommendation at
http://sgmljs.net/docs/html5.html (from a markup language rather than
web development PoV), and I'm publishing it here for review in the
form of an initial SGML DTD for parsing HTML 5.1, along with a lengthy
analysis text.
[…]
I'm aware that WHATWG and W3C have since long moved away from SGML
(and XML in most web-related specification work), treating it as a
legacy technique and with a somewhat presumptuous attitude in the
specification text and elsewhere. But as the analysis of HTML5's
grammar shows, they've essentially abandoned use of any formal methods
alltogether (and it shows in at least two flaws discussed in the
analysis).
Nothing official yet, but maybe this initiative will get traction, or at least find its users as an unofficial resource.
I think they did away with the old DTDs, now we just start HTML pages with: <!DOCTYPE HTML>
Maybe the W3C will come out with one eventually.

html validator versus SU:BADGE

How to fix that
element "SU:BADGE" undefined <su:badge layout="5"></su:badge>
I use HTML 4.01 Transitional on my website ...
The BADGE is from stumbleupon.com website
Thank you . Regards.
Since such markup is not valid in HTML 4.01, the validator is just doing what you asked it to do: reporting any reportable markup errors.
The only practical reason why such error messages might be disturbing is that if there are many of them, you might accidentally miss to notice some other error messages, relating to constructs where your markup unintentionally deviates from HTML 4.01. If this is an issue, consider writing a custom DTD. It requires some understanding of SGML, but so does the use of validators in general, does it not?
On the other hand, you might decide to refrain from using tags suggested by people who cannot do such a simple thing using valid HTML. There are many ways to put information on a web page in a manner that does not affect the visible appearance but can be retrieved by programs that search for specific constructs (e.g., meta tags). They decided on a way that causes problems to authors who wish to use validtors.
You don’t.
Validation is nice (especially as a sanity check while developing) but if you don’t use valid markup then there’s nothing you can do, and if you need invalid markup (because you use a third-party API for instance) then validation simply ceases to be relevant.
Alternatively, you could serve your page as XML instead of HTML 4 and define an appropriate su XML namespace. But since this would mean that some browsers no longer display your page correctly this is more a theoretical possibility.

Yet another question regarding the html5 dtd/schema

If there is no DTD or schema to validate the H5 document against, how are we supposed to do document validation? And by document validation, I mean "how are we supposed to ensure our html5 documents are both syntactically accurate and structurally sound?" Please help! This is going to become a huge problem for our industry if we have no way to accurately validate HTML5 documents!
Sure, the W3C has an online tool that validates individual pages. But, if I'm creating A LOT of pages (hundreds, say) and I want to validate them in a sort of batch mode, what is the accepted method of ensuring valid structure and syntax? I mean, it seems rather rudimentary to just look at the document and say "yep. that's a valid xml document." What about custom tags? What about tag attributes? It seems like the W3C is leaving us out in the cold a little bit here.
Maybe the best answer will be found in the HTML editor. But then you get DTD/schema fragmentation. Each editor vendor coming up with their own rendition of what a valid structure is.
Maybe the answer is "wait for HTML5 to become official". But I really can't wait for that. I need to start creating and validating content now. I have applications I want to publish that can only be accomplished with html5.
So, any thoughts?
If there is no DTD or schema to validate the H5 document against, how are we supposed to do document validation?
With a specialized HTML5 validator rather then a generic SGML or XML validator.
Obviously, as the specification is still in draft form, the tools that do exist are immature and likely to be out of date or become out of date.
Sure, the W3C has an online tool that validates individual pages. But, if I'm creating A LOT of pages (hundreds, say) and I want to validate them in a sort of batch mode, what is the accepted method of ensuring valid structure and syntax?
Either use a different tool or download the W3C validator and run a local copy. It has a SOAP API so writing a batch validation tool isn't difficult.
What about custom tags?
HTML5 doesn't allow custom elements.
What about tag attributes?
The only custom attributes in HTML5 are data-* attributes, so an HTML 5 validator can recognize them.
It seems like the W3C is leaving us out in the cold a little bit here.
It seems like you expect the state of QA tools for HTML 5 (unfinished) to be up to the same standard as those for HTML 4 (over a decade old). This isn't a realistic expectation.
Maybe the best answer will be found in the HTML editor. But then you get DTD/schema fragmentation. Each editor vendor coming up with their own rendition of what a valid structure is.
The specification is clear (although in flux) even if it isn't expressed in the form of a DTD or schema. If each editor has a different idea of what is valid, then most or all of them are going to be either out of date or just buggy.
Maybe the answer is "wait for HTML5 to become official". But I really can't wait for that. I need to start creating and validating content now. I have applications I want to publish that can only be accomplished with html5.
If you need to live in the bleeding edge, then you have to accept the limitations and risks of doing so.
You might find this question/answer interesting: Will HTML 5 validation be worth the candle? . The answer is written by the developer of http://about.validator.nu/ .
You should start by taking a look at http://about.validator.nu/ .
Some, though not all, of your concerns are addressed there. You can host your own validator, there's a python based submission script, you can use a RESTFUL web service API and there are ways to get validation output in a variety of different forms.
I can't however see a simple way to integrate XHTML5 with other applications of XML such that one can easily create a validator of such compound documents. Not that there's really been a way to do that with earlier versions of XHTML either though.
This is working well for me: https://github.com/hober/html5-el
To get this to work, I renamed the default '/etc/schema/schemas.xml' file in order to move it out of the way and let the 'html5-el' one be used by nxml-mode.
If there is no DTD or schema to validate the H5 document against, how are we supposed to do document validation? And by document validation, I mean "how are we supposed to ensure our html5 documents are both syntactically accurate and structurally sound?" Please help! This is going to become a huge problem for our industry if we have no way to accurately validate HTML5 documents!
If testing pages with either Firefox or Opera, both of those will report errors such as code that is not "well-formed" and mismatched tags. Beyond that, one of the validators such as validator.w3.org or validator.nu will definitely help.
Sure, the W3C has an online tool that validates individual pages. But, if I'm creating A LOT of pages (hundreds, say) and I want to validate them in a sort of batch mode, what is the accepted method of ensuring valid structure and syntax? I mean, it seems rather rudimentary to just look at the document and say "yep. that's a valid xml document."
There are ways to run the W3C validator in batch mode.
What about custom tags? What about tag attributes? It seems like the W3C is leaving us out in the cold a little bit here.
The easy answer to that one is that "custom tags" are simply not considered valid. The Working Group has thoroughly addressed the issue of "distributed extensibility", particularly with respect to allowing "decentralized
parties to create their own languages" and "extension attributes" (http:// lists.w3.org/Archives/Public/public-html/2011Feb/0085.html). There are numerous ways to extend HTML (http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#extensibility) but adding custom tags is not one of them. Custom data and microdata attributes should validate fine.
Maybe the answer is "wait for HTML5 to become official". But I really can't wait for that. I need to start creating and validating content now. I have applications I want to publish that can only be accomplished with html5.
Since HTML 5 was stabilized at the end of last year (Dec. 2010), IMO we don't need to wait for it to become an official "recommendation" by the W3C. The stabilized spec provides a solid base that all browser vendors can implement consistently and for the ongoing evolution beyond HTML 5 of the spec, which is now being called the "HTML Living Standard" (Jan. 2011 and later). There is a good diagram of this at http://www.HTML-5.com/html-versions-and-history.html#html-versions (scroll down to see the diagram).

At the end of the day, why choose XHTML over HTML? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I wonder why I should use XHTML instead of HTML.
XHTML is supposed to be "modularized", but I haven't seen any server side language take advantage of any of that.
XHTML is also more strict, and I don't see the advantage. What does XHTML offer that I need so bad? How does it make my code "better"?
EDIT: another question I found in the comments: Does XHTML parse faster than HTML?
EDIT2: after reading all your comments and the links, I indeed agree that another post deserves to be the correct answer, so I chose the one that directly links to the best source.
Also, goes to show that people upvote the green comment without even reading it.
You should read Beware of XHTML, which is an informative article that warns about some of the pitfalls of XHTML over HTML.
I was pretty gung-ho about XHTML until I read it, but it does make several valid points. Including the following bit;
XHTML 1.x is not “future-compatible”. XHTML 2, currently in the drafting stages, is not backwards-compatible with XHTML 1.x. XHTML 2 will have lots of major changes to the way documents are written and structured, and even if you already have your site written in XHTML 1.1, a complete site rewrite will usually be necessary in order to convert it to proper XHTML 2. A simple XSL transformation will not be sufficient in most cases, because some semantics won't translate properly.
HTML 4.01 is actually more future-compatible. A valid HTML 4.01 document written to modern support levels will be valid HTML 5, and HTML 5 is where the majority of attention is from browser developers and the W3C.
Future compatibility can be huge when working on some projects. The article goes on to make several other good points, but I think that may have stood out the most for me.
Don't mistake the article for a rant against XHTML, the author does talk about the good points of XHTML, but it is good to be aware of the shortcomings before you dive in.
I was going to add this as a comment to one of the other posts, but it grew a little too large.
What the fundamental point that most people seem to be missing, is the purpose behind XHTML. One of the major reasons for developing the XHTML specification was to de-emphasise presentation-related tags in the markup, and to defer presentation to CSS. Whilst this separation can be achieved with plain HTML, this behaviour isn't promoted by the specifcation.
Separating meta-markup and presentation is a vital part of developing for the 'programmable web', and will not only improve SEO, and access for screen readers/text browsers, but will also lead towards your website being more easily analysable by those wishing to access it programmatically (in many simple cases, this can negate the need for developing a specific API, or even just allow for client-side scripts to do things like, identify phone numbers readily). If your web-page conforms to the XHTML specification, it can easily be traversed using XML-related tools, and things such as XPath... which is fantastic news for those who want to extract particular information from your website.
XHTML was not developed for use by itself, but by use with a variety of other technologies. It relies heavily on the use of CSS for presentation, and places a foundation for things like Microformats (whether you love them, or hate them) to offer a standardised markup for common data presentation.
Don't be fooled by the crowd who think that XHTML is insignificant, and is just overly restrictive and pointless... it was created with a purpose that 95% of the world seems to ignore/not know about.
By all means use HTML, but use it for what it's good for, and take the same approach when looking at XHTML.
With regard to parsing speed, I imagine there would be very little difference in the parsing of the actual documents between XHTML and HTML. The trade-off will come purely in how you describe the document using the available markup. XHTML tags tend to be longer, due to required attributes, proper closing, etc. but will forego the need for any presentational markup in the document itself. With that being the case, I think you're talking about comparing one type of apple, with a very slightly different type of apple... they're different, but it's unlikely to be of any consequence (in terms of parsing and rendering) when all you want is a healthy, tasty apple.
For the visitor of a website it probably doesn't make any visible difference. Furthermore, XHTML is usually more of a pain to use as at least one widespread browser still doesn't know how to handle it and you need to serve it as text/html in that case (which yields invalid HTML).
If your HTML is going to be regularly processed by automated tools instead of being read by humans, then you might want to use XHTML because of its more strict structure and being XML it's more easy to parse (from an application standpoint. Not that XML is inherently easy to parse, though).
Apart from that I don't see any compelling reasons to use it, though. XHTML was created in an approach of making use of XML features for HTML and basically it boils down to "HTML 4 with several annoying side-effects" (IMHO, at least).
Use HTML (HTML4 Strict or HTML5).
HTML can fully utilize CSS, can be validated and parsed unambiguously. Separation of structure and presentation has been done in HTML4 and XHTML merely continued that.
All browsers support HTML. Only some browsers support XHTML and those that do, often have more mature and better tested and optimized support for HTML (it's caused by the fact that tiny fraction of pages uses XML mode).
If you care about IE and Google, you have to use HTML or subset of XHTML and HTML defined in Appendix C of XHTML spec. The latter is almost worst of the both worlds, because such XHTML cannot be generated with standard XML tools, cannot use extension mechanisms new to XHTML and has additional limitations over those in HTML alone.
XHTML1.0 is now over 10 years old, it was designed in "Web1.0" times, and as head of W3C said, in retrospect it didn't work out and better approach is needed. W3C HTML5 is written as we speak and addresses needs of web applications used today, and has very good backwards compatibility.
HTML5 closes many gaps that were between HTML4 and XHTML1 (e.g. adds inline SVG, MathML i RDF), cleans up language beyond what was done in XHTML1.0 and XHTML1.1.
XHTML2 is not going to be supported by web browsers in forseeable future. It's likely that it will never be supported (all browser vendors heavily support [X]HTML5, some have already declared that they won't implement XHTML2).
XHTML1.0 has exactly the same semantics and separation of presentation from structure as HTML4.01. Anybody who says otherwise, hasn't read the specification. I encourage everybody to read the spec – it's suprisingly short and uninteresting.
Stylesheets were introduced in HTML4.01 and were not changed in XHTML1.0.
Presentational elements were deprecated in HTML4.01 and were not removed in XHTML1.0.
XHTML myths.
There are no untractable differences in HTML and XHTML that would make parsing of one much slower than another. It depends how the parser is implemented.
Both SGML and XML parsers need to load and parse entire DTD in order to understand entities. This alone is usually more work than parsing of the document itself. HTML parsers almost always "cheat" and use hardcoded entities and element information. XHTML parsers in browsers cheat too.
Parsing of HTML requires handling of implied start and end tags, and real-world HTML requires additional work to handle misplaced tags.
Proper parsing of XHTML requires tracking of XML namespaces.
Draconian XML rules require checking if every character is properly encoded. HTML parsers may get away with this, but OTOH they need to look for <meta>.
The overall difference in cost of parsing is tiny compared to time it takes to download document, build DOM, run scripts, apply CSS and all other things browsers have to do.
I'm surprised that all the answers here recommend XHTML over HTML. I am firmly of the opposite opinion - you should not use XHTML, for the foreseeable future. Here's why:
No browser interprets XHTML as XHTML unless you serve it as mimetype application/xhtml+xml. If you just serve it with the default mimetype, all browsers will interpret it as HTML - eg, accepting unclosed or improperly nested elements.
However, you should never actually do this, as Internet Explorer does not recognise application/xhtml+xml, and would fail to render the page completely.
There are significant differences in the DOM between XHTML and HTML. Since all so-called XHTML pages are being served as HTML at the moment, all javascript code is written using the HTML DOM. If, support for the XHTML mimetype becomes significant enough to convince people to start using it, most of their javascript code will break - even if they think their pages validate as XHTML.
Instead of continuing to debate HTML 4.01 Strict vs XHTML Strict, I would suggest starting to use HTML 5 today. John Resig, the author of jquery, made a similar suggestion last year on his blog.
The HTML 5 doctype, in it's beautiful simplicity will trigger standards mode in all browsers (including IE6).
<!DOCTYPE html>
That's it.
HTML 5 provides some exciting new features such as the <canvas> tag which potentially can push javascript application development to the next level. HTML 5 also has proper support for media (and media is a fairly important aspect of the web these days!) in the form of <video> and <audio> tags.
If you like the syntax of XHTML, i.e. closing "empty" tags such as <br />, that is fully supported in HTML 5. From Karl Dubost of the W3C's post Learn How To Write HTML 5:
auto-closing tag is allowed and conformant in HTML 5.
XHTML2 has received relatively little attention compared to HTML 5. It's becoming increasingly clear that HTML 5 is the future of markup on the web. Microsoft's latest browser, IE8 still renders XHTML served as text/xml as text/html.
Microsoft have a co-chair on the W3C HTML working group and there's an implied support from them for HTML 5. All of the browser vendors have publicly announced their support for HTML 5.
At the end of the day, even if XHTML2 regains support from the industry, it won't be a significant issue having two competing standards as it has been in the past. Both languages support XML namespaces (in the case of HTML 5, serialization of HTML i.e. DOCTYPE switching).
As a programmer, you should be VERY concerned about your code. HTML is ugly and follows few rules.
XHTML on the other hand, turns HTML into a proper language, following strict structural and syntactic rules.
XHTML is better for everyone, as it will help move the web to a point where everyone (all browsers) can agree on how to display a web page.
XHTML is an XML descendent, and us such is much easier on parsers built for the job of analysing syntactically sound XML documents.
If you can't see the benefit of XHTML, you might as well be using MS Word to create your HTML documents.
Take a look at http://www.w3.org/MarkUp/2004/xhtml-faq#need. There are some good reasons apart from modularisation.
I favor XHTML because it's stricter and more clearly laid out. HTML is quirky and browsers have to accept things like <b><i>sadasd</b></i>.
While this is a really simple example, it could also get more confusing and different browsers could lay out things differently.
Also I think that XHTML has to be "faster" since the browser doesn't have to do that kind of "reparations".
Some differences are:
XHTML tags must be properly nested
The documents must have one root element
XHTML tags are always in lowercase
Tags must always be closed (e.g. using the <br> tag in XHTML must have closing tag <br /> or <br></br> in XHTML)
Here are some links on it
wiki XHTML
wiki HTML vs XHTML
XHTML allows to use all those tools designed for XML. Among then, there is XSLT, embedding SVG, etc...
Interesting development: XHTML 2 Working Group Expected to Stop Work End of 2009, W3C to Increase Resources on HTML 5
2009-07-02: Today the Director announces that when the XHTML 2 Working Group charter expires as scheduled at the end of 2009, the charter will not be renewed. By doing so, and by increasing resources in the Working Group, W3C hopes to accelerate the progress of HTML 5 and clarify W3C's position regarding the future of HTML. A FAQ answers questions about the future of deliverables of the XHTML 2 Working Group, and the status of various discussions related to HTML. Learn more about the HTML Activity.
Well, I guess that makes the future of HTML pretty clear.
XHTML forces you to be neat.
For example, in HTML, you can write:
<img src="image.jpg">
This isn't very logical, because the img tag never gets closed. In XHTML, however, you're forced to close the tag neatly, like this:
<img src="image.jpg" />
I like using something that forces me to be neat.
Steve
The subtitle to the XHTML 1.0 recommendation:
A Reformulation of HTML 4 in XML 1.0
Many tools exist today to process XML. By using XHTML, you are allowing a huge set of tools to operate on your pages and to extract information programmatically.
If you were to use HTML, this would be possible too. There are tools in existence to parse HTML DOM trees. However, these tools can often be more specialized than those for XML. You may not find your favorite XML data processing tools compatible with HTML. Furthermore, there are so many uses for XML nowadays that you may be using XML for some other part of an application; why not also use that same XML parser to parse your web pages? This is the motivation behind XHTML.
If you're already comfortable and familiar with HTML 4.01, you have an established project using HTML 4, and you don't have tons of spare time, just go with HTML 4.01. If you have spare time, learn XHTML 1.1 anyway, and start your new projects in XHTML 1.1 – there's no harm in doing so. If you're using something other than HTML 4.01 or are pretty unfamiliar with HTML 4 anyway, just learn XHTML 1.1.
Using XHTML with the correct DocType will force the browser to render the content in a more standards compliant (strict) mode. This makes the different browsers behave better and, most importantly, more like each other. This makes your job as a webdeveloper a lot easier since it reduces the amount of browser specific tweaks needed to make the content look the same in all browsers.
Quirksmode.org has a lot of good info on this subject.
In my opinion, the strictness is, at least in theory, a good thing, because in HTML, you don't need to be strict, and because of that and the HTML5 junk, Browsers have advanced error correction algorithms that will make the best out of broken HTML. The problem is, the algorithms are not exactly the same and will lead to really strange behaviour you can't predict. With XHTML, on the other hand, you typically have fine, valid XHTML and so the error correction algorithms are not needed, i.e. the entire Browser behaviour is predictable. In addition, strict code makes it easier for your tools to work with the code. So you have actually nothing to lose by using XHTML, but there is some potential to gain. Things will get worse with plain HTML when HTML5 is finally out and the "be open in what you accept" will lead to the described strange behaviour. But at least then it's a standardized strange behaviour. Sigh.
On the other hand, if you use a good IDE like Visual Studio, it's almost impossible to produce broken HTML code anyway, so the result is the same.
Use XHTML
Fails fast. If there are any inconsistencies they will be found during validation.
It encourages better design by separating semantic markup from presentation etc.
It's structured which means that you can treat it as a data object and run all sorts of queries against it. For example you could find all addresses or citations within your website.
You can do build-time optimizations. Since it's well-formed XML you can easily do find/replace operations during build time. Or any document management and manipulation.
You can write XSLT or other transformation scripts to programatically transform your XHTML for other platforms. For example you could have an XSLT for the iPhone that would transform all XHTML to make it compatible or more user-friendly for the iPhone
You are future proofing yourself. Transforming XHTML to newer semantics is again, very easy using transformation.
Search engines will continue to evolve to gather more semantic information as part of the programmable web.
DOM operations are more reliable since it's structured.
From an algorithmic perspective, it yields easier and faster parsing.
XHTMl is a good standing point to use because if you want valid code you would need to provide some aspect of help to the disabled community due to the fact screen readers need the alt and title parts of the image and link tags.
It must be faster to parse to an extent because unlike HTML the parser wouldn't need to check to see if the tag wasn't closed properly, if it was nested correctly etc.
Also it is better to use it because yes it is strict but it helps you to think more logically (in my opinion) when it comes to learning programming languages.
I believe XHTML is (or should be) faster to parse. A valid XHTML document must be written to a stricter spec in that errors are fatal when parsing, whereas HTML is more lenient and allows for oddities mentioned before my comment like out of order closing tags and such. I found this helpful in uncovering the differences between HTML and XHTML parsing:
http://wiki.whatwg.org/wiki/HTML_vs._XHTML#Parsing
A reason you might use XHTML over HTML might be if you intend to have mobile users as part of your audience. If I recall, many phones use something more of an XML parser, rather than an HTML one to display the web. If you are writing for desktop browsers, HTML would probably be acceptable.
That said, if you are going to serve the data as text/html anyway, you should use HTML:
http://www.hixie.ch/advocacy/xhtml