Why many developers mixed HML5 and XHTML? - html

I see many posts realated to this but never see some good explanation why people do this and what is best practice in professional way?
We all normaly use HTML5 rules and that's perfect but is there any reason to we use in modern design with XHTML rules "for any case"?
I see many WordPress, Jommla, Drupal and some less known templates in this modern days that use combination of HTML5 and XHTML like properly nested HTML tags, non minimizations, closed empty elements like <br />, <hr />, <img />, <input />, etc.
Why do this mix? Is that because support for old browsers or just old-school development mixed with new technology or just leak of knowladge of HTML5 rules?

XHTML had really strict rules, and the browsers wouldn't show things correctly if they weren't coded using the correct syntax.
HTML5 is not that strict. Even if you write a page with doctype set to HTML5, XHTML code will still work.
In some cases it still is a good idea to use XHTML. Eg. e-books. Even though the epub format now supports HTML5, older screenreaders still don't do that. Because of this alot of e-books are stil written using XHTML

The context is understanding the difference between what version of HTML (HTML5) and which parser are being used in combination.
The HTML parser is loose and will accept literally almost anything.
The XML parser is strict and will not tolerate poor code.
Also:
XHTML (application/xhtml+xml) is a subset of XML.
HTML is a subset of SGML.
So you can use HTML5 with the XML parser, my web platform does this (see my profile).
Why serve HTML5 as XML? I had already been using XHTML 1.1 years ago and witnessed a thread on a different PHP programming forum. Some guy could not figure out why Safari would not style an element like all of the other browsers. After three days he figured out he was missing a quote on an attribute; if he had been parsing the page as application/xhtml+xml the page would have broke (Gecko/Firefox/Waterfox the whole page breaks, other browsers will render up to the error) and being aware of the issue fixed it and recovered in seconds.
Those websites are not XHTML, they are simply using an XHTML doctype. The page must be served as application/xhtml+xml (see the network requests panel in any browser developer tools) to be considered XHTML (e.g. XHTML5) otherwise it's actually HTML code with invalid bits of code that are ignored by the browser.
Your comment about the trailing slash is either correct or incorrect subjective to the context of what you intended due to the vagueness of your comment. If you implied that people generally switched from XHTML 1 to HTML5 then yes however if you intended that XHTML now allows omitting the trailing forward slash than no. XML / XHTML require the trailing slash without exception.

The correct syntax for HTML's elements base, link, meta, hr, br, wbr, source, img, embed, param, track, area, col, and input (called void elements in HTML 5) is not to use an XML-style empty-element tag.
Fromt WHATWG/W3c's HTML current specification at W3C:
Void elements only have a start tag; end tags must not be specified for void elements
This isn't a case of XML/XHTML being stricter than HTML or something; it's just due to HTML's SGML legacy: in HTML 4's SGML grammar (DTD) from 1999 these elements were declared to have content EMPTY. If anything, using XML-style empty element syntax is less formal, since merely tolerated and ignored by HTML 5 parsers; but a sequence of a start-element tag, followed by an end-element tag for a void element is not.
See also How to find empty elements in html5 for a more elaborate discussion of empty elements.

Related

How exactly declaring page doctype as HTML5 reduces the error in W3C markup validation

I have a website which has got around 1000 pages. I declared all html doctype to use XHTML 1.0 strict
I checked the website pages using W3C markup validation tool, I got 320 errors, Then I changed the doctype to HTML 4.0 the errors reduced to 300.
Then I used the HTML5 doctype, Then errors got reduced to 75. So How these errors got reduced by just changing the doctype.
EDIT
My Question is:
1) Validating my pages against XHTML1.0 standards gives me more than 300 errors, Which is quite huge and bit difficult to resolve them.
2) Validating my pages against HTML5 standards gives me around 70 errors, Which is not a issue and can resolve them easily.
So In this case which HTML version i have to use so that It does not affects SEO of the pages, Because w3c validation also affects the SEO
If i just use HTML5 doctype but not exactly the page structure (nav, header, section, footer, article ....), Will this really matters Because I have got around 1000 pages which is very difficult make them to follow the HTML5 page structure.
What i am thinking is to reduce the errors in w3c, I will just change the doctype to HTML5 and resolve the w3c errors. Is this a good idea. Or If any please suggest me.
As #Quentin says, there are many differences between XHTML 1.0 Strict and HTML5. Apart from the new tags, there are other significative differences, some examples:
1 - All XHTML tags and attributes should be written in lower case.
Is there any uppercase tags or attributes in your code?
2 - In XHTML, when you use a singleton tag like <br/> you are
required to include a trailing slash in the element for valid XHTML.
In HTML 5, the trailing slash is optional.
Have you self-closing the singleton tags?
3 - All XHTML attribute values must be quoted. In HTML5, you don’t
need to place quotation marks around attribute values if there are no
spaces.
Are your attribute values properly quoted?
4 - All the XHTML tags must be nested properly.
Is this your case?
5 - The HTML5 <meta> tag with the charset attribute is simpler than in
XHTML: <meta charset=utf-8>
If you're using this tag your document fails in XHTML
6 - There’s also no need to include the Type Attribute for Style Sheet
Links and Scripts.
If you didn't declare this attribute, your document fails in XHTML
These are a few examples of how different can validation will be simply changing the Doctype. You could check these points to see if is there any your case.
You can retrieve all the info here: Baby steps from XHTML to HTML5
I will just change the doctype to HTML5 and resolve the w3c errors. Is this a good idea?
Well, HTML5 is more "easier" to construct, because is more flexible, but is a decision you must decide before start making the website. I suggest you to read the W3C specifications for XHTML 1.0 and HTML5 specifications, and then decide what language fits better with your requirements and how code it to have a valid markup.
Because, quite simply, different versions of HTML are different and allow different things.
<video> for example is new in HTML 5 so will error in HTML 4.
Poor code is poor code, regardless of doctype. You will see fewer errors when validating with an html5 doctype because html5 as a spec is much less rigid in how it defines html to be structured.
Google doesn't validate pages. That said, better markup can help a search engine to better understand your website. Although if you're just changing the doctype and not cleaning up the poor code, it's not going to have an effect.
It happens because xhtml uses xml parser, which demands more strict syntax. I've found it out that <!DOCTYPE html> is much more tolerant, for using standard that is still in developent (last subsentence is more my guess than concrete).

What is the effect of using HTML tags that are invalid according to the doctype?

We're currently working with a system (for better or worse) that declares a doctype as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
The trouble is, many of our users, who will be writing content, are used to using XHTML-style tags like <br />, or <img ... />, instead of those that strictly should be used (i.e. <br> and <img>).
My question is, what is the real-world effect of this on a browsers rendering capability, and on semantics?
My first inclination is that it's a) not fair on the browser to throw this at it and expect it to bend over backwards and to know what to do, and b) removes the "guarantee" that any browser today or in the future will know how to display our pages correctly.
The page appears outwardly fine (although looking at the source code makes me shudder), but is this having some more sinister effect that isn't immediately apparent?
Browsers simply don't care about things like that. They usually even support attributes that simply do not exist in the given doctype (<a target="..."> in XHTML strict).
However, if you use XHTML with an XML content-type they may use an XML parser which will be strict and throw an error if you do invalid things - IE is known to behave like this.
The question appears to be about “self-closing” tags in HTML 4.01, rather than the much more general question in the heading. The answer is that they have no effect on browsers and it is highly unlikely that this would change, given the vast amount of such code around.
Technically, <br /> and <img ... /> are not invalid in HTML 4.01. HTML has formally been defined so that due to certain syntactic specialties, these constructs mean the same as <br>> and <img ...>> (where the final > is a data character). Browsers do not implement HTML this way; instead they just treat the / as an unrecognized and therefore discarded part of the tag.

Why put an XHTML doctype declaration on HTML files? What does that do?

I wonder about the number of web pages I encounter that are HTML files, but that wear an XHTML DOCTYPE declaration.
Why are people doing this? What do they hope to achieve? Why not reserve the XHTML doctype declaration for actual XHTML files?
Or am I missing something?
Edit: there is some confusion about what "actual XHTML files" are; to demonstrate that the difference is not caused by the DOCTYPE declaration, compare this file to this one. The first is HTML, the second is XHTML, although the contents are identical; only the file types differ. Both display fine in compliant browsers, but the first one is parsed with the HTML parser and the second one with the XML parser.
Why put an XHTML doctype declaration on HTML files? What does that do?
All that does is tell markup validators that they're about to validate an XHTML document, as opposed to a regular, SGML-rooted, HTML document. It describes the content, or more specifically the markup that follows, but nothing else.
Why are people doing this? What do they hope to achieve? Why not reserve the XHTML doctype declaration for actual XHTML files?
Or am I missing something?
Kind of. What actually happened was that people weren't aware that just putting an XHTML doctype declaration on top of an HTML document didn't automatically transform it into an XHTML document, although admittedly that was what everybody was hoping for.
You see, most web applications out there aren't configured to serialize XHTML documents as application/xhtml+xml properly, instead opting to serve pages as just text/html. (It's typically because of the .html file extension more than anything else, really; generally speaking, servers do correctly apply application/xhtml+xml to documents with .xhtml or .xht as the extension, but only static sites that actually make use of the file format will benefit from this.) That leads browsers to decide that they received a regular HTML document, and so that tag soup parsing nonsense we've all come to know and love inevitably ensues.
Note that it doesn't matter even if you have a meta tag like this on your XHTML document:
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
Browsers will ignore that, and only look at the actual HTTP Content-Type header that was sent along with the XHTML document.
To make matters worse, Internet Explorer, being the most-used browser in the past few years in XHTML's heyday, never properly supported the application/xhtml+xml MIME type before version 9 was finally released: instead of parsing the markup, constructing the DOM and rendering the page, all it would do was ask for a file download. That doesn't make a very usable XHTML page!
So, guess what we all had to live with until HTML5 became cool?
This, along with things like IE6 going quirky on pages with the XML declaration before the doctype declaration, is also one of the biggest factors leading to XHTML's downfall (along with XHTML 1.1 never gaining widespread usage, and XHTML 2.0 being canceled in favor of HTML5).
Most people use the XHTML doctype because they read it in an old book somewhere or read it on a forum but otherwise are using it for no technical reason they are aware of. Hardly anyone uses it properly by serving it as application/xml+xhtml. Serving XHTML pages as text/html means "tag soup" or "broken html". It should not be done but browsers generally handle it well.
You are correct in your wondering about this. It drives me crazy.
I assume that you're asking why people are serving XHTML documents as HTML, by using the text/html MIME type instead of application/xhtml+xml.
Mostly, it's because of a misguided understanding of compatibility: Lots of browsers simply don't understand the XHTML+XML MIME type, which has caused users to simply serve it as HTML to overcome this. Since browsers often don't complain about what they get, and people don't tend to research a lot, most people assume that the browsers just treat the XHTML-doctyped document as XHTML, even though it was served as HTML. But they don't - thry serve them as HTML. Since the two languages are so much alike, people rarely notice the difference.
So no, you're not missing anything; it's very bad practice. Nowadays, after HTML5, luckily, it seems to become less common.
The hilarious thing about XHTML is that because IE didn't understand the XML mimetype (application/xhtml+xml) at the peak of XHTML's popularity, most people never actually used the XML part of it as IE8 and lower refuse to render the content.
This meant that millions of sites think they are using standards compliant XHTML, when in fact they are being parsed as malformed/weird HTML4.
Luckily HTML5 came along and properly defined the parsing of documents, removing much of the ambiguity that surrounded XHTML (all that transitional and strict rubbish).
People who add the XML prolog before the doctype are doing themselves an extra disservice, as a comment before the doctype will cause old IE to use quirks mode, which among other things brings back the old box-model in IE6 and below. This undoubtedly has contributed to the mass hate of IE6, as in quirks mode it has significant bugs that cause modern layouts to be completely broken, rather than just lacking in newer features.
The short answer is that in this industry many people just copy and paste code without understanding it.

when can XHTML unexpectedly cause problems on IE?

since IE won't render XHTML as XHTML, but treat it as HTML instead, when can this actually cause problems for IE?
i know of one case, where
<div style="clear:both" />
in browsers that support XHTML, the div is closed. But IE will treat the div as still open, so the layout can have unexpected result later.
Internet Explorer will have trouble distinguishing XHTML documents from XML documents if the MIME-type is not specified as text/html. However, because it fully supports HTML 4.01 the majority of problems arise from inconsistent and non-standards implementations of positioning, layout, and CSS properties. To avoid any problems it is best to write valid XHTML and specify a DOCTYPE.
A list of all known Internet Explorer Bugs
Self-closing syntax won't work (it will appear to work only on elements that are always empty in HTML). XML serializers might generate <textarea/>, <script/> and similar, which break pages in various ways (triggering complicated error recovery, sometimes involving re-parsing of remainder of the page).
Explicitly closed HTML "empty" elements might behave oddly (</br> inserts break in IE).
<![CDATA[ outside HTML's hardcoded CDATA elements will be recognized as a tag. It won't affect escaping and might make some content disappear.
In HTML's CDATA elements (namely <script>) entities won't be recognized. XHTML requires <script> if (1 < 2) … which is going to be syntax error in IE.
Background of <body> will be applied differently in IE.
There will be no cross-browser syntax for namespace-aware selectors in CSS.
You'll get all implied HTML elements (e.g. <tbody> in all tables) and implictly closed elements (it's usually not a problem when document is valid, but other browsers won't warn you as long as markup is well-formed).
Elements and attributes with prefixes won't be namespaced and will get different tagName in IE (which is also illegal in XML). They won't get appropriate default styling and behaviour either (<xhtml:a> can't be a link).
You won't be able to use namespace-aware methods like createElementNS (they don't exist in IE), .tagName will be uppercase in IE, but not in all cases.
Elements and attributes with prefixes won't be namespaced and will get different local name in IE (which is also illegal in XML).
These are only problems concerning switching from working XML document to HTML. There are as many surprises when you're going from HTML (i.e. what everyone expects and takes as normal behavior) to real XML, e.g. document.write doesn't work rendering most of Google's scripts useless.
These all apply to any browser treating XHTML as text/html rather than specifically IE, but you should read Appendix C of the XHTML 1.0 spec here: http://www.w3.org/TR/xhtml1/#guidelines

HTML vs XHTML does it still matter? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm wondering if I should bother at all about the markup language, as long as i produce valid markup.
I've read articles that point out HTML is the best choice and they come directly from the horse's mouth (the browsers implementors!):
http://webkit.org/blog/68/understanding-html-xml-and-xhtml/
https://developer.mozilla.org/en/Mozilla_Web_Developer_FAQ
Other articles, by James Bennet, make another point that if you're not serving XHTML as XML then you don't want XHTML but HTML.
http://www.b-list.org/weblog/2008/jun/18/html/
http://www.b-list.org/weblog/2008/jun/21/xhtml/
So i thought that if i wanted to trigger Standard Compliant Mode i should just use HTML strict validation. But that's not the case anymore with at least the most modern browsers (aka everything but IE6): if you have valid XHTML Strict you still trigger Standard Compliant Mode, hence, as long as i produce valid markup, why bother?
I always use HTML 4.01 strict for the time being. HTML 5 isn't definitive yet. I used to be a diehard XHTML user, but my reasons etched away and I'm much happier and more productive.
The arguments for XHTML generally tend towards the "cleaner markup" or talking about well-formed markup. This seems mostly like a strawman argument, and doesn't hold up under a thorough beating.
If XHTML is guaranteed to be parsed by an XML parser, it generally won't look cleaner than HTML 4.01 strict (just comparing strict doctypes).
For one, having to write URLs as http://example.com/?foo=bar&baz=qux looks awkward. Declaring the entity types gets old.
The other thing is that markup generally doesn't translate remarkably well as an XML Tree, but a Dom tree is fine.
HTML 4.01 strict is moderately easier to use and create valid sites. You don't have to put meaningless closing tags on elements like <img>, <br> or <link>. Just putting the backslash doesn't change anything of any particular meaning.
Douglas Crockford, of Yahoo and everything markupy, says it best to think of the markup as an application delivery format.
As such, what is going to be the easiest to deliver and more robust and reliable. This is what ultimately made the decision for me. All web browsers handle XHTML differently, and require munging of the Content-type header. If you use "text/xhtml" or "text/xml" you get different results.
Additionally, "text/xml" doesn't play nicely with REST because that should mean XML serialization of the data and not a formatted markup page (Safari gets this one wrong, in my opinion, by requesting text/xml before text/html as desired content-types!)
So, use HTML 4.01 because:
It works more similarly across all browsers
Doesn't require Content-type based handling (text/html does the trick across the board)
Isn't as brittle as XHTML
HTML 5 doesn't offer anything significant over HTML 4.01 strict
Pick one and stick to it, and be as compliant as possible in the face of other constraints. Don't think for a second, however, that HTML vs. XHTML is a more important issue than getting the job done and the site up and running, because that's what's going to bring in the revenue. Users don't give a toss about XHTML.
if you go xhtml, please choose xhtml 1.0, and not xhtml 1.1, unless you intend to serve it with a correct xml or xhtml mimetype. Actually, on second thought, don't do that either. There are huge crippling disadvantages to serving xhtml 1.0 or 1.1 with the correct mimetype. The slighest error and you get yellow screen of death!
The w3c specs say that it is okay to serve xhtml 1.0 as text/html as long as you follow certain rules for backward compatibility (mainly in self closing tags, include a space before the / forward slash.
Aside from that there's other arguments for/against. I tend to use xhtml because of the various tools and libraries that are available that can parse valid xml, making thinks like xslt transformations to/from xhtml possible. (useful?).
Another thing is that it's possible to parse valid xml in flash- so you may choose xhtml for dynamic content replacement with a flash movie, or otherwise dynamically load xhtml content into a flash movie. Or really, anything that can read xml can read xhtml. that's a lot of things.
My advice is to use HTML4 Strict or HTML5. Valid, in standards mode, with CSS for layout. You'll get all the benefits that are commonly associated with XHTML, but without any of the problems.
Remember: XHTML DOCTYPE does not enable parsing of document as XHTML. It only enables standards mode, the same which is available to HTML 4 Strict and HTML 5.
XHTML/1.0 has identical semantics and practically identical CSS support as HTML 4.01.
Valid HTML 4.01 is parsed unambiguously just like valid and well-formed XHTML/1.0.
XML DOM gives you namespaces support, but takes away support for document.write and innerHTML.
Without proper XML MIME type set in HTTP headers (not document itself) all you get is parsing of everything as HTML and HTML DOM.
XHTML is still not supported in Internet Explorer at all (including IE8). The best you can get in IE (and Googlebot) is XHTML misinterpreted as HTML with syntax errors (whether that's 70% or 30% of your audience, it's still something to think about).
Try forcing "XHTML" websites to use actual XML mode, and you'll quickly notice that almost nobody uses XHTML. They just slap wrong DOCTYPE on their HTML:
http://schneegans.de/xp/?url=http://www.wired.com/
http://schneegans.de/xp/?url=http://script.aculo.us/
These "XHTML" pages work only because they're usually interpreted as HTML.
I would go for xHTML, mostly because the code is cleaner and easer to maintain.
But here are some interesting points on why not to use xhtml http://meiert.com/en/blog/20081219/html-vs-xhtml/
If you really don't care about sketchy support, try HTML 5. My fallback would be HTML 4.01 Strict, unless I needed something like inline SVG or similar, in which case it's XHTML (served as XML) all the way.
In most case, it does not matter. In fact, using XHTML cause more headache. However, there is a few situation that XHTML is needed. I can think of two.
First, if you want to use embeded SVG, you need XHTML.
Second, if you want to use HTML mark up as XML. Sometime (for unknown reason), I found that my AJAX request verify the code even when I mark it to be html or text. And to quickly avoid that, I change it to XHTML.
That is all.