Is there a list of all html tags that do not need a solidus or a closing tag? - html

I built a parser for html, but I worked under the assumption that it would follow the rule that there were only two forms:
<foo> </foo>
<foo/>
Obviously that is wrong. Tags such as base,meta, and link do not need this.
I kind of wish this wasn't the case, because I have found things like this in a script:
for(var d=b.length,e=b[a];a<d>>1;)
Oh look, the mythical <d> tag.
So I need to make myself a whitelist of tags to ignore. Is there a comprehensive list for tags that do not require a solidus or closing tag? If not, I'll have to rewrite my parser.
Thanks

You can extract a list from the WHATWG HTML Living Standard. Or, if you prefer, the W3C's HTML 5 Specification or the subsequent draft. According to Wikipedia, the conflict has somewhat recently been resolved in favour of WHATWG, so you probably want to go with the first one.
In any case, pay particular attention to the subheading "Tag omission in text/html" in each element description. But you need to read the document carefully to understand the ins and outs of HTML parsing.
Note: It's not just that end tags can be omitted. There are also elements whose open tag can be omitted. (The classic example is <tbody>, which is hardly ever physically present in an HTML document, but there are lots of others. <head>, for example.) The mere fact that an element's open tag has been omitted does not force the omission of the element's close tag, although it's pretty commonly the case. So you can't do it with just a list of omitable tags; you need to take element containment rules into account, too.
Also, even though the full parsing algorithm is suprisingly complicated even for valid documents, the standard algorithm and real-world HTML parsers are even more complicated, because they try to deal gracefully with web pages which don't conform to the standard.

Related

Do we really need to use </body> and </html> closing tags?

More often than not I see HTML without the closing tags, especially body and html.
According to:
http://www.w3.org/TR/html5/sections.html#the-body-element
http://www.w3.org/TR/html5/semantics.html#the-html-element
This can be omitted, but what about cross device issues? Like running such HTML on androids or windows phone's or whatever you know where not having these closing tags this would not work.
Do we need it? Well that depends on your DTD. If you're using XHTML, then yes, you will need it to conform. For accessibility sake I would include the closing tags, you never know if there's a screen reader (or other piece of software) out there that only parses valid XHTML, you could be hindering partially sighted people for example.
Google will also, apparently, rank your valid documents higher than invalid documents in their listings.
Here's a document by a friend of a friend that answers your question a bit better; granted that it was written in 2008, I think some of the points still apply.
If you ever need to use the same html in an XHTML application you won't need to mess around with it, you can just copy it across and not have to worry about conforming (because you already are).
On a separate note, you are essentially future proofing your markup. Who's to say that the spec won't eventually change to "You must include the closing head and body tags"? You won't need to worry if you already have them. It is, however, highly unlikely that the spec will change to, "You must not include the closing head and body tags".
As a great man once said:
Should I close the lid of the toilet when I'm finished? Yes,
especially if the wife is going to use it after me.
- Darren Gourley (Nov 2015)
use https://validator.w3.org/
select your standards target... if that says it passes then I think its Gd enough.
Please bared in mind the HTML5 spec is still being defined/evolved.
Technically you could omit html, head and body tags all together as long as the markup follows the following conditions:
http://www.w3.org/TR/2011/WD-html5-20110525/syntax.html#optional-tags
In regards to your comment about half your team using them and half your team not I would suggest that as long as either option is technically standards compliant you just choose one and move on as the entire subject is open for discussion and interpretation. My personal opinion would be that it's probably more important for your team to get on the same page and produce work of a similar standard especially if you have more than one person working on a project simultaneously.
You can leave out the end tags. Indeed you can leave out the opening tags, too (obviously not if you are using any attributes on them).
Not only is that the case with the more recent standard, but it's been the case since the very beginning. (The obvious exception being if you are using the XML syntax, since XML itself requires all elements have an explicit closing tag.)
Browsers have been dealing with the existence of HTML documents lacking the trailing closing tags since the 1990s. If the standards hadn't allowed it they'd probably still have dealt with them, much as they try their best to deal with all manner of messy code. (This causes it's own problems, which was one of the motivations behind XML not allowing optional tags, but that's another matter).
Many people consider it poor style. I would be one of them. But it's certainly widely supported.

Why does CSS work with fake elements?

In my class, I was playing around and found out that CSS works with made-up elements.
Example:
imsocool {
color:blue;
}
<imsocool>HELLO</imsocool>
When my professor first saw me using this, he was a bit surprised that made-up elements worked and recommended I simply change all of my made up elements to paragraphs with ID's.
Why doesn't my professor want me to use made-up elements? They work effectively.
Also, why didn't he know that made-up elements exist and work with CSS. Are they uncommon?
Why does CSS work with fake elements?
(Most) browsers are designed to be (to some degree) forward compatible with future additions to HTML. Unrecognised elements are parsed into the DOM, but have no semantics or specialised default rendering associated with them.
When a new element is added to the specification, sometimes CSS, JavaScript and ARIA can be used to provide the same functionality in older browsers (and the elements have to appear in the DOM for those languages to be able to manipulate them to add that functionality).
(There is a specification for custom elements, but they have specific naming requirements and require registering using JavaScript.)
Why doesn't my professor want me to use made-up elements?
They are not allowed by the HTML specification
They might conflict with future standard elements with the same name
There is probably an existing HTML element that is better suited to the task
Also; why didn't he know that made-up elements existed and worked with CSS. Are they uncommon?
Yes. People don't use them because they have the above problems.
TL;DR
Custom tags are invalid in HTML. This may lead to rendering issues.
Makes future development more difficult since code is not portable.
Valid HTML offers a lot of benefits such as SEO, speed, and professionalism.
Long Answer
There are some arguments that code with custom tags is more usable.
However, it leads to invalid HTML. Which is not good for your site.
The Point of Valid CSS/HTML | StackOverflow
Google prefers it so it is good for SEO.
It makes your web page more likely to work in browsers you haven't tested.
It makes you look more professional (to some developers at least)
Compliant browsers can render [valid HTML faster]
It points out a bunch of obscure bugs you've probably missed that affect things you probably haven't tested e.g. the codepage or language set of the page.
Why Validate | W3C
Validation as a debugging tool
Validation as a future-proof quality check
Validation eases maintenance
Validation helps teach good practices
Validation is a sign of professionalism
YADA (yet another (different) answer)
Edit: Please see the comment from BoltClock below regarding type vs tag vs element. I usually don't worry about semantics but his comment is very appropriate and informative.
Although there are already a bunch of good replies, you indicated that your professor prompted you to post this question so it appears you are (formally) in school. I thought I would expound a little bit more in depth about not only CSS but also the mechanics of web browsers. According to Wikipedia, "CSS is a style sheet language used for describing ... a document written in a markup language." (I added the emphasis on "a") Notice that it doesn't say "written in HTML" much less a specific version of HTML. CSS can be used on HTML, XHTML, XML, SGML, XAML, etc. Of course, you need something that will render each of these document types that will also apply styling. By definition, CSS does not know / understand / care about specific markup language tags. So, the tags may be "invalid" as far as HTML is concerned, but there is no concept of a "valid" tag/element/type in CSS.
Modern visual browsers are not monolithic programs. They are an amalgam of different "engines" that have specific jobs to do. At a bare minimum I can think of 3 engines, the rendering engine, the CSS engine, and the javascript engine/VM. Not sure if the parser is part of the rendering engine (or vice versa) or if it is a separate engine, but you get the idea.
Whether or not a visual browser (others have already addressed the fact that screen readers might have other challenges dealing with invalid tags) applies the formatting depends on whether the parser leaves the "invalid" tag in the document and then whether the rendering engine applies styles to that tag. Since it would make it more difficult to develop/maintain, CSS engines are not written to understand that "This is an HTML document so here are the list of valid tags / elements / types." CSS engines simply find tags / elements / types and then tell the rendering engine, "Here are the styles you should apply." Whether or not the rendering engine decides to actually apply the styles is up it.
Here is an easy way to think of the basic flow from engine to engine: parser -> CSS -> rendering. In reality it is much more convoluted but this is good enough for starters.
This answer is already too long so I will end there.
Unknown elements are treated as divs by modern browsers. That's why they work. This is part of the oncoming HTML5 standard that introduces a modular structure to which new elements can be added.
In older browsers (I think IE7-) you can apply a Javascript-trick after which they will work as well.
Here is a related question I found when looking for an example.
Here is a question about the Javascript fix. Turns out it is indeed IE7 that doesn't support these elements out of the box.
Also; why didn't he know that made-up tags existed and worked with CSS. Are they uncommon?
Yes, quite. But especially: they don't serve additional purpose. And they are new to html5. In earlier versions of HTML an unknown tag was invalid.
Also, teachers seem to have gaps in their knowledge, sometimes. This might be due to the fact that they need to teach students the basics about a given subject, and it doesn't really pay off to know all ins and outs and be really up to date.
I once got detention because a teacher thought I programmed a virus, just because I could make a computer play music using the play command in GWBasic. (True story, and yes, long ago). But whatever the reason, I think the advice not to use custome elements is a sound one.
Actually you can use custom elements. Here is the W3C spec on this subject:
http://w3c.github.io/webcomponents/spec/custom/
And here is a tutorial explaining how to use them:
http://www.html5rocks.com/en/tutorials/webcomponents/customelements/
As pointed out by #Quentin: this is a draft specification in the early days of development, and that it imposes restrictions on what the element names can be.
There are a few things about the other answers that are either just poorly phrased or perhaps a little incorrect.
FALSE(ish): Non-standard HTML elements are "not allowed", "illegal", or "invalid".
Not necessarily. They're "non-conforming". What's the difference? Something can "not conform" and still be "allowed". The W3C aren't going to send the HTML police to your home and haul you away.
The W3C left things this way for a reason. Conformance and specifications are defined by a community. If you happen to have a smaller community consuming HTML for more specific purposes and they all agree on some new Elements they need to make things easier, they can have what the W3C refers to as "other applicable specifications". (this is a gross over simplification, obviously, but you get the idea)
That said, strict validators will declare your non-standard elements to be "invalid". but that's because the validator's job is to ensure conformance to whatever spec it's validating for, not to ensure "legality" for the browser or for use.
FALSE(ish): Non-standard HTML elements will result in rendering issues
Possibly, but unlikely. (replace "will" with "might") The only way this should result in a rendering issue is if your custom element conflicts with another specification, such as a change to the HTML spec or another specification being honored within the same system (such as SVG, Math, or something custom).
In fact, the reason CSS can style non-standard tags is because the HTML specification clearly states that:
User agents must treat elements and attributes that they do not understand as semantically neutral; leaving them in the DOM (for DOM processors), and styling them according to CSS (for CSS processors), but not inferring any meaning from them
Note: if you want to use a custom tag, just remember a change to the HTML spec at a later time could blow your styling up, so be prepared. It's really unlikely that the W3C will implement the <imsocool> tag, however.
Non-standard tags and JavaScript (via the DOM)
The reason you can access and alter custom elements using JavaScript is because the specification even talks about how they should be handled in the DOM, which is the (really horrible) API that allows you to manipulate the elements on your page.
The HTMLUnknownElement interface must be used for HTML elements that are not defined by this specification (or other applicable specifications).
TL;DR: Conforming to the spec is done for purposes of communication and safety. Non-conformance is still allowed by everything but a validator, whose sole purpose is to enforce conformity, but whose use is optional.
For example:
var wee = document.createElement('wee');
console.log(wee.toString()); //[object HTMLUnknownElement]
(I'm sure this will draw flames, but there's my 2 cents)
According to the specs:
CSS
A type selector is the name of a document language element type written using the syntax of CSS qualified names
I thought this was called the element selector, but apparently it is actually the type selector. The spec goes on to talk about CSS qualified names which put no restriction on what the names actually are. That is to say that as long as the type selector matches CSS qualified name syntax it is technically correct CSS and will match the element in the document. There is no CSS-specific restriction on elements that do not exist in a particular spec -- HTML or otherwise.
HTML
There is no official restriction on including any tags in the document that you want. However, the documentation does say
Authors must not use elements, attributes, or attribute values for purposes other than their appropriate intended semantic purpose, as doing so prevents software from correctly processing the page.
And it later says
Authors must not use elements, attributes, or attribute values that are not permitted by this specification or other applicable specifications, as doing so makes it significantly harder for the language to be extended in the future.
I'm not sure specifically where or if the spec says that unkown elements are allowed, but it does talk about the HTMLUnknownElement interface for unrecognized elements. Some browsers may not even recognize elements that are in the current spec (IE8 comes to mind).
There is a draft for custom elements, though, but I doubt it is implemented anywhere yet.
This is possible with html5 but you need to take into consideration of older browsers.
If you do decide to use them then, make sure to COMMENT your html!! Some people may have some trouble figuring out what it is so a comment could save them a ton of time.
Something like this,
<!-- Custom tags in use, refer to their CSS for aid -->
When you make your own custom tag/elements the older browsers will have no clue what that is just like html5 elements like nav/section.
If you are interested in this concept then I recommend to do it the right way.
Getting started
Custom Elements allow web developers to define new types of HTML
elements. The spec is one of several new API primitives landing under
the Web Components umbrella, but it's quite possibly the most
important. Web Components don't exist without the features unlocked by
custom elements:
Define new HTML/DOM elements Create elements that extend from other
elements Logically bundle together custom functionality into a single
tag Extend the API of existing DOM elements
There is a lot you can do with it and it does make your script beautiful as this article likes to put it. Custom Elements defining new elements in HTML.
So lets recap,
Pros
Very elegant and easy to read.
It is nice to not see so many divs. :p
Allows a unique feel to the code
Cons
Older browser support is a strong thing to consider.
Other developers may have no clue what to do if they don't know about custom tags. (Explain to them or add comments to inform them)
Lastly one thing to take into consideration, but I am unsure, is block and inline elements. By using custom tags you are going to end up writing more css because of the custom tag won't have a default side to it.
The choice is entirely up to you and you should base it on what the project is asking for.
Update 1/2/2014
Here is a very helpful article I found and figured I would share, Custom Elements.
Learn the tech Why Custom Elements? Custom Elements let authors define
their own elements. Authors associate JavaScript code with custom tag
names, and then use those custom tag names as they would any standard
tag.
For example, after registering a special kind of button called
super-button, use the super button just like this:
Custom elements are still elements. We
can create, use, manipulate, and compose them just as easily as any
standard or today.
This seems like a very good library to use but I did notice it didn't pass Window's Build status. This is also in a pre-alpha I believe so I would keep an eye on this while it develops.
Why doesn't he want you to use them? They are not common nor part of the HTML5 standard.
Technically, they are not allowed. They are a hack.
I like them myself, though. You may be interested in XHTML5. It allows you to define your own tags and use them as part of the standard.
Also, as others have pointed out, they are invalid and thus not portable.
Why didn't he know that they exist? I don't know, except that they are not common. Possibly he was just not aware that you could.
Made-up tags are hardly ever used, because it's unlikely that they will work reliably in every current browser, and every future browser.
A browser has to parse the HTML code into elements that it knows, to made-up tags will be converted into something else to fit in the document object model (DOM). As the web standards doesn't cover how to handle everyting that is outside of the standards, web browsers tend to handle non-standars code in different ways.
Web development is tricky enough with a bunch of different browsers that have their own quirks, without adding another element of uncertainty. The best bet it to stick with things that are actually in the standards, that is what the browser vendors try to follow, so that has the best chance to actually work.
I think made-up tags are just potentially more confusing or unclear than p's with IDs (some block of text generally). We all know a p with an ID is a paragraph, but who knows what made-up tags are intended for? At least that's my thought. :) Therefore this is more of a style / clarity issue than one of functionality.
Others have made excellent points but its worth noting that if you look at a framework such as AngularJS, there is a very valid case for custom elements and attributes. These convey not only better semantic meaning to the xml, but they also can provide behavior, look and feel for the web page.
CSS is a style sheet language that can be used to present XML documents, not only (X)HTML documents. Your snippet with the made-up tags could be part of a legal XML document; it would be one if you enclose it in a single root element. Probably you already have a <html> ...</html> around it? Any current browser can display XML documents.
Of course it is not a very good XML document, it lacks a grammar and an XML declaration. If you use an HTML declaration header instead (and probably a server configuration that sends the correct mime type) it would instead be illegal HTML.
(X)HTML has advantages over plain XML as elements have a semantic meaning that is useful in the context of a web page presentation. Tools can work with this semantics, other developers know the meaning, it is less error prone and better to read.
But in other contexts it is better to use CSS with XML and/or XSLT to do the presentation. This is what you did. As this wasn't your task, you didn't know what you were doing, and HTML/CSS is the better way to go most of the time you should stick to it in your scenario.
You should add an (X)HTML header to your document so tools can give you meaningful error messages.
...I simply change all of my made up tags to paragraphs with ID's.
I actually take issue with his suggestion of how to do it properly.
A <p> tag is for paragraphs. I see people using it all the time instead of a div -- simply for spacing purposes or because it seems gentler. If it's not a paragraph, don't use it.
You don't need or want to stick ID's on everything unless you need to target it specifically (e.g. with Javascript). Use classes or just a straight-up div.
From its early days CSS was designed to be markup agnostic so it can be used with any markup language producing tree alike DOM structures (SVG for example). Any tag that comply to name token production is perfectly valid in CSS. So your question is rather about HTML than CSS itself.
Elements with custom tags are supported by HTML5 specification. HTML5 standardize the way how unknown elements must be parsed in the DOM. So HTML5 is the first HTML specification that enables custom elements strictly speaking. You just need to use HTML5 doctype <!DOCTYPE html> in your document.
As of custom tag names themselves...
This document http://www.w3.org/TR/custom-elements/ recommends custom tags you choose to contain at least one '-' (dash) symbol. This way they will not conflict with future HTML elements. Therefore you'd better change your doc to something like this:
<style>
so-cool {
color:blue;
}
</style>
<body>
<so-cool>HELLO</so-cool>
</body>
Surprisingly, nobody (including my past self) mentioned accessibility. Another reason that using valid tags instead of custom ones is for compatibility with the greatest amount of software, including screen-readers and other tools that people need for accessibility purposes. Moreover, accessibility laws like WAI require making accessible websites, which generally means requiring them to use valid markup.
Apparently nobody mentioned it, so I will.
This is a by-product of browser wars.
Back in the 1990’s when the Internet was first starting to go mainstream, competition incrased in the browser market. To stay competitive and draw users, some browsers (most notably Internet Explorer) tried to be helpful and “user-friendly” by attempting to figure out what page designers meant and thus allowed markup that are incorrect (e.g., <b><i>foobar</b></i> would correctly render as bold-italics).
This made sense to some degree because if one browser kept complaining about syntax errors while another ate anything you threw at it and spit out a (more-or-less) correct result, then people would naturally flock to the latter.
While many thought the browser wars were over, a new war between browser vendors has reignited in the past few years since Chrome was released, Apple started growing again and pushing Safari, and IE lost its dominance. (You could call it a “cold war” due to the perceived cooperation and support of standards by browser vendors.) Therefore, it is not a surprise that even contemporary browsers which supposedly conform strictly to web standards actually try to be “clever” and allow standard-breaking behavior such as this in order to try to gain an advantage as before.
Unfortunately, this permissive behavior led to a massive (some might even say cancerous) growth of poorly marked up webpages. Because IE was the most lenient and popular browser, and due to Microsoft’s continued flouting of standards, IE became infamous for encouraging and promoting bad design and propagating and perpetuating broken pages.
You may be able to get away with using quirks and exploits like that on some browsers for now, but other than the occasional puzzle or game or something, you should always stick to web standards when creating web pages and sites to ensure they display correctly and avoid them becoming broken (possibly completely ignored) with a browser update.
While browsers will generally relate CSS to HTML tags regardless of whether or not they are valid, you should ABSOLUTELY NOT do this.
There is technically nothing wrong with this from a CSS perspective. However, using made up tags is something you should NEVER do in HTML.
HTML is a markup language, which means that each tag corresponds to a specific type of information.
Your made up tags don't correspond to any type of information. This will create problems from web crawlers, such as Google.
Read more information on the importance of correct markup.
Edit
Divs refer to groups of multiple related elements, meant to be displayed in block form and can be manipulated as such.
Spans refer to elements that are to be styled differenly than the context they are currently in and are meant to be displayed inline, not as a block. An example is if a few words in a sentence needs to be all caps.
Custom tags do not correlate to any standards and thus span/div should be used with class/ID properties instead.
There are very specific exemptions to this, such as Angular JS
Although CSS has a thing called a "tag selector," it doesn't actually know what a tag is. That's left for the document's language to define. CSS was designed to be used not just with HTML, but also with XML, where (assuming you're not using a DTD or other validation scheme) the tags can be just about anything. You could use it with other languages too, though you would need to come up with your own semantics for exactly what things like "tags" and "attributes" correspond to.
Browsers generally apply CSS to unknown tags in HTML, because this is considered better than breaking completely: at least they can display something. But it is very bad practice to use "fake" tags deliberately. One reason for this is that new tags do get defined from time to time, and if one is defined that looks sort of like your fake tag but doesn't quite work the same way, that can cause problems with your site on new browsers.
Why does CSS work with fake elements? Because it doesn't hurt anyone because you're not supposed to use them anyways.
Why doesn't my professor want me to use made-up elements? Because if that element is defined by a specification in the future your element will have an unpredictable behavior.
Also, why didn't he know that made-up elements exist and work with CSS. Are they uncommon? Because he, like most other web developers, understand that we shouldn't use things that might break randomly in the future.

HTML5: Why not use my own arbitrary tag names?

As far as I know, it is perfectly legal to make up tag names in HTML5, and they work like normal with CSS styling and nesting and all.
Of course my arbitrary tag names will make no difference to the browser, which does not understand them, but it goes very far in making my code more readable, which makes it easier to maintain.
So why should I NOT use arbitrary tag names on my pages? Will it affect SEO? Will it break anything?
IMPORTANT EDIT: Older browsers DO NOT choke on unsupported tags when used with http://ejohn.org/blog/html5-shiv/
UPDATE
The original answer is quite old and was created before web components existed (although they had been discussed since 2011). The rules on custom elements have changed quite a bit. The W3C now has a whole standard for custom elements
Custom elements provide a way for authors to build their own fully-featured DOM elements. Although authors could always use non-standard elements in their documents, with application-specific behaviour added after the fact by scripting or similar, such elements have historically been non-conforming and not very functional. By defining a custom element, authors can inform the parser how to properly construct an element and how elements of that class should react to changes.
In other words, the W3C is now fine with you using custom elements (or arbitrary tag names as the question calls them). However, there are quite a few rules surrounding them and their use so you should refer to the linked documentation.
Original answer
Older browsers choke on tags they don't recognize. There is nothing that guarantees that your custom tags will work as you expect in any browser
There is no semantic meaning applied to your tags since this meaning only exists in the agreed upon tags in the spec. Whether or not this affects SEO is unknown, but anything that parses the HTML may have trouble if it runs into a tag it doesn't recognize including screen readers or web scrapers
You may not get the help you need from the browser if your tags are improperly nested
In the same vein, some browsers may screw up rendering and be different from one another. For example, <p><ul></ul></p> is parsed in Chrome as <p></p><ul></ul>. There is no guarantee how your tags will be treated since they have no agreed upon content model
The W3C doesn't want you to do it.
Authors must not use elements, attributes, or attribute values that are not permitted by this specification or other applicable specifications, as doing so makes it significantly harder for the language to be extended in the future.
If you really need to use custom tags for this purpose, you can use a template language of some sort such as XSLT and create a stylesheet that transforms it into valid HTML

Is leaving out end tags valid?

I remember reading a while ago that in some cases leaving out end tags (</li>, for example) speeds up the rendering (and loading/parsing, since there is less bytes) of a webpage?
Unfortunately, I forgot where I read this, but I remember it saying this feature was specific to HTML 4.0.
Since I no longer have access to this source I was wondering if someone can confirm this or link to the documentation on w3c (since I wasn't able it find it myself)?
Thanks!
EDIT: Forgot to mention that I meant to ask if this behaviour is also available in HTML5.
EDIT 2: I manged to find the article again, and it does mention it only speeds the download speed of the page, not actual rendering:
One good reason for leaving out the end tags for these elements is because they add extra characters to the page download and thus slow down the pages. If you are looking for things to do to speed up your web page downloads, getting rid of optional closing tags is a good place to start. For documents that have lots of paragraphs or table cells this can be a significant savings.
Sorry for asking a pointless question! :(
Here is the list of HTML 4.01 elements.
http://www.w3.org/TR/html401/index/elements.html
The End Tag column says where end tags are optional.
However, take note that this is valid only in HTML 4.01. In Xhtml, all end tags are required. Not 100% sure about HTML5.
I wrote a HTML parser once, and believe me, if you're a parser and you're inside a <p> and you encounter a </table> end tag, it's slower to check in your document tree if that is correct, and if so, to close the current <p> first, than if you simply encounter a </p>.
Edit:
Ah, found it: http://dev.w3.org/html5/html-author/#index-of-elements
Same requirements as HTML 4.01.
New edit:
Oh, that was a page from 2009. This one is more up to date:
http://dev.w3.org/html5/spec/syntax.html#optional-tags
Some tags in some version of the HTML spec have optional end tags. However, I believe it is generally considered bad form to exclude the end tag.
As mentioned, the end tag of li is optional in html4:
http://www.w3.org/TR/html401/struct/lists.html#h-10.2
so technically this is valid:
<ul>
<li>
text
<li>
<span>stuff</span>
</ul>
But you are only saving 5 characters per li, not really worth what you lose in readability/maintainability.
EDIT: The HTML5 spec is sort of interesting:
An li element's end tag may be omitted if the li element is
immediately followed by another li element or if there is no more
content in the parent element.
Leaving out ending tags is usually forgivable by browsers (it's generally smart enough to know what you're doing). However, any css or js markup properties that the unclosed tag has can affect descendant and/or sibling tags, leaving you scratching your head as to what happened.
While XHTML does expect you to add a closing forward slash to self-contained tags, HTML 5 does not.
XHTML: <img src="" />
HTML5: <img src="">
If you're writing using an xhtml DOCTYPE, then the answer is 'yes', they are required. An xhtml document needs to be valid XML, which means that all tags need to be properly closed.
An HTML document is a bit less fussy. Some tags are specified as being 'self closing', which means you don't need to close them specifically. These include <br>, <img>, etc.
The browsers are generally pretty lenient, because they need to be able to cope with badly written code. But beware that sometimes skipping closing tags can result in different browsers interpreting your code differently, and producing hard-to-debug layout glitches.
In terms of page load speed, you might be right that there would be a marginal gain to be had in download speed and bandwidth costs, but it would be marginal. In terms of rendering, I suspect you'd actually lose speed if you provided invalid HTML, as the browser would have to work harder to parse it.
So even if there is a speed gain to be had it will be marginal, and I don't think skipping closing tags deliberately is a worthwhile exercise. It might possibly be helpful to reduce bandwidth if you're running a site that has massive traffic, but very few of us are writing for Facebook or Google; for virtually everyone else, it's better to write valid code than to try to shave those few bytes.
If you're that worried about bandwidth and page loading speeds, there are likely to be other better ways to reduce your page load sizes than this. For example, compressing your files with gZip will drastically reduce your bandwidth, with zero impact on your code or the browser. gZip compression can be configured in your web server, so you just switch it on and forget about it. You can also 'minify' your CSS and JS code by stripping out unnecessary white space. (HTML can also be minified to a certain extent, but beware that white space is syntactically relevant in HTML, so minifying may not be the right thing to do in all cases).
AFAIK, in XHTML you must always at least self-close a tag <img ... />
In HTML (non xml-html) some tags do not need to be closed. <img> for instance. However, I'd suggest making sure you know exactly which version you're targeting and use W3C's validation service to double-check.
http://validator.w3.org/
I don't see how this would speed things up except that you'd have to send less bytes of data per page (no /'s for some tags, no closing tags for others.) As for building the DOM, I don't know the details of a given implementation (webkit, mozilla, etc) to know which way is faster to parse. I would imagine XML is simply because it is more regular.
EDIT: Yes this behavior is available in HTML5. Note that the help pages are confusing, such as:
http://www.w3schools.com/html5/tag_meta.asp
Meta's in non-xml-html do not require the /, but they can have it. Because of the (in my opinion) leaning towards XML-flavored HTML's the ending slash is more prevalent in written HTML, but you can see they use both styles in the document. The Validator will let you know for sure what you can get away with. :)
In HTML 4.01, which became a W3C Recommendation way back in 1999, you're right:
9.3.1 Paragraphs: the P element
Start tag: required, End tag: optional
http://www.w3.org/TR/1999/REC-html401-19991224/struct/text.html#h-9.3.1
And as for <li>,
Start tag: required, End tag: optional
http://www.w3.org/TR/1999/REC-html401-19991224/struct/lists.html#h-10.2

HTML: Include, or exclude, optional closing tags?

Some HTML1 closing tags are optional, i.e.:
</HTML>
</HEAD>
</BODY>
</P>
</DT>
</DD>
</LI>
</OPTION>
</THEAD>
</TH>
</TBODY>
</TR>
</TD>
</TFOOT>
</COLGROUP>
Note: Not to be confused with closing tags that are forbidden to be included, i.e.:
</IMG>
</INPUT>
</BR>
</HR>
</FRAME>
</AREA>
</BASE>
</BASEFONT>
</COL>
</ISINDEX>
</LINK>
</META>
</PARAM>
Note: xhtml is different from HTML. xhtml is a form of xml, which requires every element have a closing tag. A closing tag can be forbidden in html, yet mandatory in xhtml.
Are the optional closing tags
ideally included, but we'll accept them if you forgot them, or
ideally not included, but we'll accept them if you put them in
In other words, should I include them, or should I not include them?
The HTML 4.01 spec talks about closing element tags being optional, but doesn't say if it's preferable to include them, or preferable to not include them.
On the other hand, a random article on DevGuru says:
The ending tag is optional. However, it is recommended that it be included.
The reason I ask is because you just know it's optional for compatibility reasons; and they would have made them (mandatory | forbidden) if they could have.
Put it another way: What did HTML 1, 2, 3 do with regards to these, now optional, closing tags. What does HTML 5 do? And what should I do?
Note
Some elements in HTML are forbidden from having closing tags. You may disagree with that, but that is the specification, and it's not up for debate. I'm asking about optional closing tags, and what the intention was.
Footnotes
1HTML 4.01
There are cases where explicit tags help, but sometimes it's needless pedantry.
Note that the HTML spec clearly specifies when it's valid to omit tags, so it's not always an error.
For example you never need </body></html>. Nobody ever remembers to put <tbody> explicitly (to the point that XHTML made exceptions for it).
You don't need </head><body> unless you have DOM-manipulating scripts that actually search <head> (then it's better to close it explicitly, because rules for implied end of <head> could surprise you).
Nested lists are actually better off without </li>, because then it's harder to create erroneous ul > ul tree.
Valid:
<ul>
<li>item
<ul>
<li>item
</ul>
</ul>
Invalid:
<ul>
<li>item</li>
<ul>
<li>item</li>
</ul>
</ul>
And keep in mind that end tags are implied whether you try to close all elements or not. Putting end tags won't automatically make parsing more robust:
<p>foo <p>bar</p> baz</p>
will parse as:
<p>foo</p><p>bar</p> baz
It can only help when you validate documents.
The optional ones are all ones that should be semantically clear where they end, without needing the end tag.
E.G. each <li> implies a </li> if there isn't one right before it.
The forbidden end tags all would be immediately followed by their end tag so it would be kind of redundant to have to type <img src="blah" alt="blah"></img> every time.
I almost always use the optional tags (unless I have a very good reason not to) because it lends to more readable and updateable code.
I am adding some links here to help you with the history of HTML, for you to understand the various contradictions. This is not the answer to your question, but you will know more after reading these various digests.
How Did We Get Here? – Dive Into HTML5
The History of the Web
Brief History of HTML
HTML’s History – HTML WG Wiki
Some excerpts from Dive Into HTML5:
[T]he fact that “broken” HTML markup still worked in web browsers led authors to create broken HTML pages. A lot of broken pages. By some estimates, over 99% of HTML pages on the web today have at least one error in them. But because these errors don’t cause browsers to display visible error messages, nobody ever fixes them.
The W3C saw this as a fundamental problem with the web, and they set out to correct it. XML, published in 1997, broke from the tradition of forgiving clients and mandated that all programs that consumed XML must treat so-called “well-formedness” errors as fatal. This concept of failing on the first error became known as “draconian error handling,” after the Greek leader Draco who instituted the death penalty for relatively minor infractions of his laws. When the W3C reformulated HTML as an XML vocabulary, they mandated that all documents served with the new application/xhtml+xml MIME type would be subject to draconian error handling. If there was even a single well-formedness error in your XHTML page […] web browsers would have no choice but to stop processing and display an error message to the end user.
This idea was not universally popular. With an estimated error rate of 99% on existing pages, the ever-present possibility of displaying errors to the end user, and the dearth of new features in XHTML 1.0 and 1.1 to justify the cost, web authors basically ignored application/xhtml+xml. But that doesn’t mean they ignored XHTML altogether. Oh, most definitely not. Appendix C of the XHTML 1.0 specification gave the web authors of the world a loophole: “Use something that looks kind of like XHTML syntax, but keep serving it with the text/html MIME type.” And that’s exactly what thousands of web developers did: they “upgraded” to XHTML syntax but kept serving it with a text/html MIME type.
Even today, millions of web pages claim to be XHTML. They start with the XHTML doctype on the first line, use lowercase tag names, use quotes around attribute values, and add a trailing slash after empty elements like <br /> and <hr />. But only a tiny fraction of these pages are served with the application/xhtml+xml MIME type that would trigger XML’s draconian error handling. Any page served with a MIME type of text/html — regardless of doctype, syntax, or coding style — will be parsed using a “forgiving” HTML parser, silently ignoring any markup errors, and never alerting end users (or anyone else) even if the pages are technically broken.
XHTML 1.0 included this loophole, but XHTML 1.1 closed it, and the never-finalized XHTML 2.0 continued the tradition of requiring draconian error handling. And that’s why there are billions of pages that claim to be XHTML 1.0, and only a handful that claim to be XHTML 1.1 (or XHTML 2.0). So are you really using XHTML? Check your MIME type. (Actually, if you don’t know what MIME type you’re using, I can pretty much guarantee that you’re still using text/html.) Unless you’re serving your pages with a MIME type of application/xhtml+xml, your so-called “XHTML” is XML in name only.
[T]he people who had proposed evolving HTML and HTML forms were faced with two choices: give up, or continue their work outside of the W3C. They chose the latter, registered the whatwg.org domain, and in June 2004, the WHAT Working Group was born.
[T]he WHAT working group was quietly working on a few other things, too. One of them was a specification, initially dubbed Web Forms 2.0, which added new types of controls to HTML forms. (You’ll learn more about web forms in A Form of Madness.) Another was a draft specification called “Web Applications 1.0,” which included major new features like a direct-mode drawing canvas and native support for audio and video without plugins.
In October 2009, the W3C shut down the XHTML 2 Working Group and issued this statement to explain their decision:
When W3C announced the HTML and XHTML 2 Working Groups in March 2007, we indicated that we would continue to monitor the market for XHTML 2. W3C recognizes the importance of a clear signal to the community about the future of HTML.
While we recognize the value of the XHTML 2 Working Group’s contributions over the years, after discussion with the participants, W3C management has decided to allow the Working Group’s charter to expire at the end of 2009 and not to renew it.
The ones that win are the ones that ship.
The reason i ask is because you just know it's optional for compatibility reasons; and they would have made them (mandatory | forbidden) if they could have.
That's an interesting inference. My reading of it is that just about any time a tag could be reliably inferred, the tag is optional. The design suggests that the intention was to make it quick and easy to write.
What did HTML 1, 2, 3 do with regards to these, now optional, closing tags.
The DTD for HTML 2 is embedded in the RFC which, along with the original HTML DTD, has optional start and end tags all over the place.
HTML 3 was abandoned (thanks to the browser wars) and replaced with HTML 3.2 (which was designed to describe the then current state of the web).
What does HTML 5 do?
HTML 5 was geared towards "paving the cowpaths" from the outset.
And what should i do?
Ah, now that is subjective and argumentative :)
Some people think that explicit tags are better for readability and maintainability by virtue of being in front of the readers eyes.
Some people think that inferred tags are better for readability and maintainability by virtue of not cluttering up the editor.
What does HTML 5 do?
The answer to this question is in the W3C Working Draft:
http://www.w3.org/TR/html5/syntax.html#syntax-tag-omission
And what should i do?
It's a matter of style. I try to never omit end tags because it helps me to be rigorous and not omit tags that are necessary.
If it is superfluous, leave it out.
If it serves a purpose (even a seemingly trivial purpose, such as appeasing your IDE or appeasing your eyes), leave it in.
It's rare in a well-defined spec to see optional items that do not affect behavior. With the exception of "comments", of course. But the HTML spec is less of a design spec, and more of a document of the state of current major implementations. So when an item is optional in HTML and it seems to serve no purpose, we may guess that optional nature is merely documentation of a quirk in specific browser.
Looking at the HTML-5 spec RFC section linked above, you see that the optional tags are strangely linked to the presence of comments! That should tell you that the authors are not wearing design hats. They are instead playing the game of "document the quirks" in major implementations. So we can't take the spec too seriously in this respect.
So, the solution is: Don't sweat it. Move on to something that actually matters. :)
I think the best answer is to include closing tags for readability or error detection. However, if you have lots of generated HTML (say, tables of data), you could save significant bandwidth by omitting optional tags.
My recommendation is that you omit most optional close tags, and all optional attributes that you can get away with. Many IDEs will complain so you may not be able to get away with omitting some of these but it is generally better for smaller file size and less clutter. If you have code generators definitely omit end tags there because you can get some good size reduction from it. Usually it doesn't really matter one way or the other.
But when it does matter then act on it. On some recent work of mine I was able to reduce the size of my rendered HTML from 1.5 MB to 800 KB by eliminating most of the generated end and redundant value attributes for the open tag, where the text of the element was the same as the value. I have about 200 tags. I could implement this some other way entirely, but that would be more work ($$$), so this allows me to easily make the page more responsive.
Just out of curiosity I found that if I removed quotes around attributes that didn't need them I could save 20 KB, but my IDE (Visual Studio) doesn't like it. I also was surprised to find that the really long ID that ASP.NET generates account for 20% of my file.
The idea that we could ever get any relevant fraction of HTML strictly valid was misguided in the first place, so do whatever works best for you and your customers. Most tools that I have ever seen or used will say they generate xhtml, but they don't really work 100%, and there isn't any benefit to strict adherence anyway.
Personally, I'm a fan of XHTML and, like ghoppe, "I try to never omit end tags because it helps me to be rigorous and not omit tags that are necessary."
but
If you're deliberately using HTML 4.n, one can't argue that including them makes it easier to consume the document, as the notion of well-formedness as opposed to validity is an XML concept, and you lose that benefit when you forbid certain close tags. So the only issue becomes validity... and if it's still valid without them... you might as well save the bandwidth, no?
Using end tags makes dealing with fragments easier because their behaviour is not dependant on sibling elements. This reason alone should be compelling enough. Does anyone deal with monolithic html documents anymore?
In some curly bracket languages like C#, you can omit the curly braces around an if statement if its only two lines long. for example...
if ([condition])
[code]
but you can't do this...
if ([condition])
[code]
[code]
the third line won't be a part of the if statement. it hurts readability, and bugs can be easily introduced, and be difficult to find.
for the same reasons, i close all tags. tags like the img tag do still need to be closed, just not with a separate closing tag.
Do whatever you feel makes the code more readable and maintainable.
Personally I would always be inclined to close <td> and <tr>, but I would never bother with <li>.
If you were writing an HTML parser, would it be easier to parse HTML that included optional closing tags, or HTML that doesn't? I think the optional closing tags being present would make it easier, as I wouldn't have to infer where the closing tag should be.
For that reason, I always include the optional closing tags - on the theory that my page might render faster, as I'm creating less work for the browser's HTML parser.
For forbidden closing types use syntax like: <img /> With the /> to close the tag which is accepted in xml