Closing empty tags, HTML, XML - html

I posed a similar question some time ago and got a pretty good answer, now I would like to slightly modify that question in light of something I noticed while working with Google Maps.
The question is, I've always used <div style="clear:both"></div> to clear a float. There are other times when I've needed to create an empty element to be populated with JS, for example. Now, since HTML is a subset of XML, why can't I use <div style="clear:both" /> instead of typing the ugly closing tag.
I was given a great answer that I admittedly don't fully understand in my previous question, but while working with Google Maps I noticed that Google had the same idea that I did. In their very first code sample, they use <div id="map-canvas"/> without the ending tag.
So my new question is, even if this is not quite proper HTML, would there ever realistically be case where this would not work?
Please and thanks.

Now, since HTML is a subset of XML
HTML is not a subset of XML.
HTML 4 and earlier were SGML applications, but browsers never implemented the SGML specification properly.
XHTML 1.x is an XML application, but browsers will only use XML parsing rules if you serve XHTML with an XML content-type (like application/xhtml+xml).
HTML 5 has its own parsing rules that better reflect what browsers actually do. It allows a / character at the end of elements where the end tag must be omitted for the sake of people who are addicted to XML or have poor syntax highlighting software, but only those elements.
I noticed that Google had the same idea that I did. In their very first code sample, they use <div id="map-canvas"/> without the ending tag.
That is an error and is not allowed in HTML. It only "works" in browsers because the end of the document comes before the start of any element or text that is allowed as a child node of a div element.
The question is, I've always used <div style="clear:both"></div> to clear a float.
That's a nasty approach to the problem in the first place. It requires an extra element, and can add space where the element is rendered. Better, in almost every case, to set overflow: hidden on the containing element to cause it to wrap the floats.

HTML is not a subset of XML. HTML has a different structure. Some elements are singular (like image) and don't need a closing tag or ending / at all. The ones who do, need to be closed by a proper closing tag.
Some people use XHTML, though, which is basically HTML using XML syntax. It depends on the doctype you use.
Google's example is wrong in this case. It uses a Html 5 doctype. If you run their snippet through the W3 validator, it tells you:
Line 26, Column 26: Self-closing syntax (/>) used on a non-void HTML
element. Ignoring the slash and treating as a start tag.
<div id="map-canvas"/>
And that's what will probably happen in most browsers as well. They read it as if the div is just opened there. At a certain point it will be automatically closed. The / is ignored.
About the clearing of floats: the way you are doing it is old, and ugly because you need extra markup, for what is basically a CSS issue. Fortunately, there are better ways, a couple of which are described in detail here: What methods of ‘clearfix’ can I use?

Related

W3C validation for <ui-select>

I am using angular-ui-select within a website where the styled select fields are configured with an own tag named ui-select. This works great, but doing a W3C validation leads to this error:
Element ui-select not allowed as child of element div in this context. (Suppressing further errors from this subtree.)
Here's an example code:
<!doctype html>
<html lang="en">
<head><title>x</title></head>
<body>
<div>
<ui-select></ui-select>
</div>
</body></html>
I understand that <ui-select> is not expected to be there but how can I handle this better?
Can I wrap it into a different tag or is there a different approach for ui-select instead of using HTML markup?
W3C HTML5 validator maintainer here. The short answer with regard to the validator behavior right now is, the validator's going to emit errors for any custom elements you use in documents, and currently there's no way you as a user can work around it doing that—and it's going to continue that way for some time longer until we get around to figuring out a solution.
We're having some ongoing discussions about how to solve this. Changing the validator to just ignore any element name with a hyphen is not viable as a complete solution, because the consequence of that is we could then not practically check any child elements it might have—we'd just have to ignore the entire subtree, because to do otherwise would lead to other errors. So that's way short of being an ideal solution.
Anyway, I'd love to find a good way to solve this, so if others have ideas I'd like to hear them. Two good places to send ideas/proposals on this are the public-webapps#w3.org mailing list https://lists.w3.org/Archives/Public/public-webapps/ and the whatwg#whatwg.org mailing list https://whatwg.org/mailing-list#specs
One idea I've thought of myself is, we could just have the validator treat all custom elements in the same way it currently treats the <div> element (as far as where it's allowed in a document and what child elements it's allowed to contain). That's also short of ideal, but at least it would give a way to check for errors in descendant elements in the custom element's subtree.
Update 2017-02-06: the W3C HTML Checker now supports custom elements
So, I added support for custom elements to the W3C HTML Checker (validator) on 2016-12-16 and a few days later refined it to do more detailed checking for prohibited names.
The trick I ended up figuring out to implement it in the checker architecture—which is at its core a RelaxNG grammar/schema-based validator—was to add a pre-processing filter that take any elements that have a hyphen in their element name, and puts them in a separate XML namespace.
Then I updated the RelaxNG schema to allow any elements from that XML namespace anywhere. (Which is ironic because I pretty much hate XML namespaces and all the problems they cause.)
So we’re now looking at doing something similar for custom-attribute names—probably just by defining those as being any attribute names that contain a hyphen (like custom-element names).
But the HTML checker can’t be changed to allow custom-attribute names until the HTML spec is updated to allow them. For that, see the proposal being discussed in the HTML-spec issue tracker.
That's indeed a long-known issue with AngularJS.
A few things you can do:
Instead of using the element <ui-select>, you can use <div ui-select>, but that will still fail on the argument.
An argument prefixed with x- or data- will pass but I am not sure ui-select supports that.
HTML W3C validation is useful, but I think mostly important for HTML emails so they don't get screened as spam. It's also good for search engines, but really not that critical.
If you look at 'why validate', the reasons are mostly for cleanliness, ease of debugging, and overall good practice.
Angular (un?)fortunately expands the realm of possibilities for HTML5, in a way that, naturally, deviates from the latest specifications for HTML.
We are having the same problem using Knockout custom components.
http://knockoutjs.com/documentation/component-overview.html
I added a suggestion how to enhance the validator with a minor enhancement for users wanting to use custom elements even if the specification is not yet final (http://w3c.github.io/webcomponents/spec/custom/#custom-tag-example):
https://github.com/validator/validator/issues/94

Why Use The <html> Tag? [duplicate]

This question already has answers here:
Is it necessary to write HEAD, BODY and HTML tags?
(6 answers)
Closed 9 years ago.
Everything I have seen says that the HTML code needs it, but mine works fine without it. I'm only using extremely basic HTML without CSS or javascript, if that makes a difference. Could someone please explain?
Everything still works because the browser is plugging it in for you. You should use it because it makes your code more clear and standard.
A developer looking at your code might be confused about what they're seeing at first because they would wonder where the <html> tag is.
As with any standard, the <html> tag guarantees that things will work. Currently the tag can be omitted, but I still wouldn't recommend it, to be safe. This is from the W3 spec:
An html element's start tag may be omitted if the first thing inside
the html element is not a comment.
An html element's end tag may be omitted if the html element is not
immediately followed by a comment.
Everything I have seen says that the HTML code needs it, but mine works fine without it.
That's because you don't need it. The HTML specification says that you can omit the starting tags (and even the ending tags) of many elements, including html and head, which makes documents like this actually perfectly valid:
<!DOCTYPE html>
<title>Text to make me non-empty</title>
<p>Hello world!
Browsers will create the html and head elements even if you don't write out the tags, so omit them if you want. Do note that not all browsers follow the spec properly, so while this behavior would be ideal, some browsers will parse your HTML improperly and force you to be more explicit with your structure.
It's part of the standard. It helps tell the browser it is HTML vs XML or some other type of markup and makes it clear what kind of document it is.
http://www.w3schools.com/tags/tag_html.asp
The browser is just being nice to you and showing what is there without the tag.
It is what tells various parties reading the document that it is an HTML document and that this is where it starts.

</div></div> auto closing create child not sibling

I really like to use "short closing" for tags using ordinary <tag/> format but unfortunately using such method in Browser (i.e. chrome) cause quite unexpected behavior.
When in document I have:
<div/><div/>
it's interpreted as
<div>
<div></div>
<div>
no matter what DOCTYPE i use (XHTML) or HTML5 I just get this in a wrong way.
I'm also using this "notation" for custom tags in namespace <widget:aSampleWidgetA/> <widget:aSampleWidgetB/> which also introduce this problem.
I don't want to use a full closing notation as its making a lot of visual mess in code.
Is there some way to force Browser to parse those tags as proper XML?
Apologies, I can't find great documentation on this but I suspect it is because a div is not a valid self closing tag. Looking at the XHTML DTD, empty tags are specifically marked as EMPTY, div is not, so Chrome instead behaves as if it is html5 where the closing tags are can be left off and takes a best guess as to where to close them.
Alternatively, if you don't like the look of html, perhaps you might prefer something like haml or jade templates.
There is a way to make browsers (except of IE8 and below) parse the markup as XML. You need to serve it with the proper XHTML content type application/xhtml+xml.
Doctype is irrelevant for parsing, it affects only rendering mode (Standards or Quirks). When served as text/html, all pages will be parsed by HTML rules (HTML5 rules for modern browsers), which effectively mean that end slash in the 'self-closing' syntax is just ignored, and the ability of the element to be 'self-closed' is actually hard-coded in the parser. Divs and custom tags don't have this ability.

Is leaving out end tags valid?

I remember reading a while ago that in some cases leaving out end tags (</li>, for example) speeds up the rendering (and loading/parsing, since there is less bytes) of a webpage?
Unfortunately, I forgot where I read this, but I remember it saying this feature was specific to HTML 4.0.
Since I no longer have access to this source I was wondering if someone can confirm this or link to the documentation on w3c (since I wasn't able it find it myself)?
Thanks!
EDIT: Forgot to mention that I meant to ask if this behaviour is also available in HTML5.
EDIT 2: I manged to find the article again, and it does mention it only speeds the download speed of the page, not actual rendering:
One good reason for leaving out the end tags for these elements is because they add extra characters to the page download and thus slow down the pages. If you are looking for things to do to speed up your web page downloads, getting rid of optional closing tags is a good place to start. For documents that have lots of paragraphs or table cells this can be a significant savings.
Sorry for asking a pointless question! :(
Here is the list of HTML 4.01 elements.
http://www.w3.org/TR/html401/index/elements.html
The End Tag column says where end tags are optional.
However, take note that this is valid only in HTML 4.01. In Xhtml, all end tags are required. Not 100% sure about HTML5.
I wrote a HTML parser once, and believe me, if you're a parser and you're inside a <p> and you encounter a </table> end tag, it's slower to check in your document tree if that is correct, and if so, to close the current <p> first, than if you simply encounter a </p>.
Edit:
Ah, found it: http://dev.w3.org/html5/html-author/#index-of-elements
Same requirements as HTML 4.01.
New edit:
Oh, that was a page from 2009. This one is more up to date:
http://dev.w3.org/html5/spec/syntax.html#optional-tags
Some tags in some version of the HTML spec have optional end tags. However, I believe it is generally considered bad form to exclude the end tag.
As mentioned, the end tag of li is optional in html4:
http://www.w3.org/TR/html401/struct/lists.html#h-10.2
so technically this is valid:
<ul>
<li>
text
<li>
<span>stuff</span>
</ul>
But you are only saving 5 characters per li, not really worth what you lose in readability/maintainability.
EDIT: The HTML5 spec is sort of interesting:
An li element's end tag may be omitted if the li element is
immediately followed by another li element or if there is no more
content in the parent element.
Leaving out ending tags is usually forgivable by browsers (it's generally smart enough to know what you're doing). However, any css or js markup properties that the unclosed tag has can affect descendant and/or sibling tags, leaving you scratching your head as to what happened.
While XHTML does expect you to add a closing forward slash to self-contained tags, HTML 5 does not.
XHTML: <img src="" />
HTML5: <img src="">
If you're writing using an xhtml DOCTYPE, then the answer is 'yes', they are required. An xhtml document needs to be valid XML, which means that all tags need to be properly closed.
An HTML document is a bit less fussy. Some tags are specified as being 'self closing', which means you don't need to close them specifically. These include <br>, <img>, etc.
The browsers are generally pretty lenient, because they need to be able to cope with badly written code. But beware that sometimes skipping closing tags can result in different browsers interpreting your code differently, and producing hard-to-debug layout glitches.
In terms of page load speed, you might be right that there would be a marginal gain to be had in download speed and bandwidth costs, but it would be marginal. In terms of rendering, I suspect you'd actually lose speed if you provided invalid HTML, as the browser would have to work harder to parse it.
So even if there is a speed gain to be had it will be marginal, and I don't think skipping closing tags deliberately is a worthwhile exercise. It might possibly be helpful to reduce bandwidth if you're running a site that has massive traffic, but very few of us are writing for Facebook or Google; for virtually everyone else, it's better to write valid code than to try to shave those few bytes.
If you're that worried about bandwidth and page loading speeds, there are likely to be other better ways to reduce your page load sizes than this. For example, compressing your files with gZip will drastically reduce your bandwidth, with zero impact on your code or the browser. gZip compression can be configured in your web server, so you just switch it on and forget about it. You can also 'minify' your CSS and JS code by stripping out unnecessary white space. (HTML can also be minified to a certain extent, but beware that white space is syntactically relevant in HTML, so minifying may not be the right thing to do in all cases).
AFAIK, in XHTML you must always at least self-close a tag <img ... />
In HTML (non xml-html) some tags do not need to be closed. <img> for instance. However, I'd suggest making sure you know exactly which version you're targeting and use W3C's validation service to double-check.
http://validator.w3.org/
I don't see how this would speed things up except that you'd have to send less bytes of data per page (no /'s for some tags, no closing tags for others.) As for building the DOM, I don't know the details of a given implementation (webkit, mozilla, etc) to know which way is faster to parse. I would imagine XML is simply because it is more regular.
EDIT: Yes this behavior is available in HTML5. Note that the help pages are confusing, such as:
http://www.w3schools.com/html5/tag_meta.asp
Meta's in non-xml-html do not require the /, but they can have it. Because of the (in my opinion) leaning towards XML-flavored HTML's the ending slash is more prevalent in written HTML, but you can see they use both styles in the document. The Validator will let you know for sure what you can get away with. :)
In HTML 4.01, which became a W3C Recommendation way back in 1999, you're right:
9.3.1 Paragraphs: the P element
Start tag: required, End tag: optional
http://www.w3.org/TR/1999/REC-html401-19991224/struct/text.html#h-9.3.1
And as for <li>,
Start tag: required, End tag: optional
http://www.w3.org/TR/1999/REC-html401-19991224/struct/lists.html#h-10.2

is there a reason why <script> has to have a separate closing tag?

To start, I know that I have to have the </script> tag, and there are existing questions about that. The question isn't whether or not I need a closing tag. My question is: why was it designed this way?
The source of confusion for me comes from looking at the <link /> element - it appears to have similar functionality (importing external text files and defining their type) but has the self-closing property (which we see in other but not all element types). I may be oversimplifying things, but I don't understand why one external reference element should use a style that is different from another similar (obviously not the same) external reference element.
It looks like this doesn't change in the HTML5 draft either. I just want to understand the reasoning behind it so I can have a better/deeper understanding of basic HTML and why it works the way it does.
why was it designed this way?
It must have an explicit end tag because you can have inline script:
<script>
foo();
</script>
Having a forbidden end tag wouldn't work (since then you couldn't have content). Having an optional end tag would be more trouble then it is worth (since the element contains CDATA … that might actually make it impossible to have an optional end tag, I don't know that bit of SGML well enough to say).
It doesn't use <link> because it was a product of the browser wars and not something that was discussed in the W3C before being introduced.
It looks like this doesn't change in the HTML5 draft either.
It wouldn't be backwards compatible.