What's the key difference between HTML 4 and HTML 5? - html

What are the key differences between HTML4 and HTML5 draft?
Please keep the answers related to changed syntax and added/removed html elements.

HTML5 has several goals which differentiate it from HTML4.
Consistency in Handling Malformed Documents
The primary one is consistent, defined error handling. As you know, HTML purposely supports 'tag soup', or the ability to write malformed code and have it corrected into a valid document. The problem is that the rules for doing this aren't written down anywhere. When a new browser vendor wants to enter the market, they just have to test malformed documents in various browsers (especially IE) and reverse-engineer their error handling. If they don't, then many pages won't display correctly (estimates place roughly 90% of pages on the net as being at least somewhat malformed).
So, HTML5 is attempting to discover and codify this error handling, so that browser developers can all standardize and greatly reduce the time and money required to display things consistently. As well, long in the future after HTML has died as a document format, historians may still want to read our documents, and having a completely defined parsing algorithm will greatly aid this.
Better Web Application Features
The secondary goal of HTML5 is to develop the ability of the browser to be an application platform, via HTML, CSS, and Javascript. Many elements have been added directly to the language that are currently (in HTML4) Flash or JS-based hacks, such as <canvas>, <video>, and <audio>. Useful things such as Local Storage (a js-accessible browser-built-in key-value database, for storing information beyond what cookies can hold), new input types such as date for which the browser can expose easy user interface (so that we don't have to use our js-based calendar date-pickers), and browser-supported form validation will make developing web applications much simpler for the developers, and make them much faster for the users (since many things will be supported natively, rather than hacked in via javascript).
Improved Element Semantics
There are many other smaller efforts taking place in HTML5, such as better-defined semantic roles for existing elements (<strong> and <em> now actually mean something different, and even <b> and <i> have vague semantics that should work well when parsing legacy documents) and adding new elements with useful semantics - <article>, <section>, <header>, <aside>, and <nav> should replace the majority of <div>s used on a web page, making your pages a bit more semantic, but more importantly, easier to read. No more painful scanning to see just what that random </div> is closing - instead you'll have an obvious </header>, or </article>, making the structure of your document much more intuitive.

From Wikipedia:
New parsing rules oriented towards flexible parsing and compatibility
New elements – section, video, progress, nav, meter, time, aside, canvas
New input attributes – dates and times, email, url
New attributes – ping, charset, async
Global attributes (that can be applied for every element) – id, tabindex, repeat
Deprecated elements dropped – center, font, strike

HTML5 introduces a number of APIs that help in creating Web applications. These can be used together with the new elements introduced for applications:
An API for playing of video and audio which can be used with the new video and audio elements.
An API that enables offline Web applications.
An API that allows a Web application to register itself for certain protocols or media types.
An editing API in combination with a new global contenteditable attribute.
A drag & drop API in combination with a draggable attribute.
An API that exposes the history and allows pages to add to it to prevent breaking the back button.

You'll want to check HTML5 Differences from HTML4: W3C Working Group Note 9 December 2014 for the complete differences. There are many new elements and element attributes. Some elements were removed and others have different semantic value than before.
There are also APIs defined, such as the use of canvas, to help build the next generation of web apps and make sure implementations are standardized.

You might be interested in this list of HTML5 elements and attributes.
Also, please note that it's "HTML 4", not "HTML4". Indeed, for HTML 5, both variants are used, but there is an important difference in meaning. HTML 5 refers to the name of the W3C specification, whereas "HTML5" is the document type of those HTML files with a text/html MIME type that follow this spec.
The same goes for XHTML 5 vs. XHTML5.

Now W3c provides an official difference on their site:
http://www.w3.org/TR/html5-diff/

HTML 5 invites you give add a lot of semantic value to your code. What's more, there are natives solution to embed multimedia content.
The rest is important, but it's more technical sugar that will save you from doing the same stuff with a client programming language.

In short it is much simple compared to html, the long doctype is removed and also center and font tag is removed.
I also answered this difference in my blog :
http://ravisinghblog.in/key-difference-between-html-and-html-5/

Related

Why does CSS work with fake elements?

In my class, I was playing around and found out that CSS works with made-up elements.
Example:
imsocool {
color:blue;
}
<imsocool>HELLO</imsocool>
When my professor first saw me using this, he was a bit surprised that made-up elements worked and recommended I simply change all of my made up elements to paragraphs with ID's.
Why doesn't my professor want me to use made-up elements? They work effectively.
Also, why didn't he know that made-up elements exist and work with CSS. Are they uncommon?
Why does CSS work with fake elements?
(Most) browsers are designed to be (to some degree) forward compatible with future additions to HTML. Unrecognised elements are parsed into the DOM, but have no semantics or specialised default rendering associated with them.
When a new element is added to the specification, sometimes CSS, JavaScript and ARIA can be used to provide the same functionality in older browsers (and the elements have to appear in the DOM for those languages to be able to manipulate them to add that functionality).
(There is a specification for custom elements, but they have specific naming requirements and require registering using JavaScript.)
Why doesn't my professor want me to use made-up elements?
They are not allowed by the HTML specification
They might conflict with future standard elements with the same name
There is probably an existing HTML element that is better suited to the task
Also; why didn't he know that made-up elements existed and worked with CSS. Are they uncommon?
Yes. People don't use them because they have the above problems.
TL;DR
Custom tags are invalid in HTML. This may lead to rendering issues.
Makes future development more difficult since code is not portable.
Valid HTML offers a lot of benefits such as SEO, speed, and professionalism.
Long Answer
There are some arguments that code with custom tags is more usable.
However, it leads to invalid HTML. Which is not good for your site.
The Point of Valid CSS/HTML | StackOverflow
Google prefers it so it is good for SEO.
It makes your web page more likely to work in browsers you haven't tested.
It makes you look more professional (to some developers at least)
Compliant browsers can render [valid HTML faster]
It points out a bunch of obscure bugs you've probably missed that affect things you probably haven't tested e.g. the codepage or language set of the page.
Why Validate | W3C
Validation as a debugging tool
Validation as a future-proof quality check
Validation eases maintenance
Validation helps teach good practices
Validation is a sign of professionalism
YADA (yet another (different) answer)
Edit: Please see the comment from BoltClock below regarding type vs tag vs element. I usually don't worry about semantics but his comment is very appropriate and informative.
Although there are already a bunch of good replies, you indicated that your professor prompted you to post this question so it appears you are (formally) in school. I thought I would expound a little bit more in depth about not only CSS but also the mechanics of web browsers. According to Wikipedia, "CSS is a style sheet language used for describing ... a document written in a markup language." (I added the emphasis on "a") Notice that it doesn't say "written in HTML" much less a specific version of HTML. CSS can be used on HTML, XHTML, XML, SGML, XAML, etc. Of course, you need something that will render each of these document types that will also apply styling. By definition, CSS does not know / understand / care about specific markup language tags. So, the tags may be "invalid" as far as HTML is concerned, but there is no concept of a "valid" tag/element/type in CSS.
Modern visual browsers are not monolithic programs. They are an amalgam of different "engines" that have specific jobs to do. At a bare minimum I can think of 3 engines, the rendering engine, the CSS engine, and the javascript engine/VM. Not sure if the parser is part of the rendering engine (or vice versa) or if it is a separate engine, but you get the idea.
Whether or not a visual browser (others have already addressed the fact that screen readers might have other challenges dealing with invalid tags) applies the formatting depends on whether the parser leaves the "invalid" tag in the document and then whether the rendering engine applies styles to that tag. Since it would make it more difficult to develop/maintain, CSS engines are not written to understand that "This is an HTML document so here are the list of valid tags / elements / types." CSS engines simply find tags / elements / types and then tell the rendering engine, "Here are the styles you should apply." Whether or not the rendering engine decides to actually apply the styles is up it.
Here is an easy way to think of the basic flow from engine to engine: parser -> CSS -> rendering. In reality it is much more convoluted but this is good enough for starters.
This answer is already too long so I will end there.
Unknown elements are treated as divs by modern browsers. That's why they work. This is part of the oncoming HTML5 standard that introduces a modular structure to which new elements can be added.
In older browsers (I think IE7-) you can apply a Javascript-trick after which they will work as well.
Here is a related question I found when looking for an example.
Here is a question about the Javascript fix. Turns out it is indeed IE7 that doesn't support these elements out of the box.
Also; why didn't he know that made-up tags existed and worked with CSS. Are they uncommon?
Yes, quite. But especially: they don't serve additional purpose. And they are new to html5. In earlier versions of HTML an unknown tag was invalid.
Also, teachers seem to have gaps in their knowledge, sometimes. This might be due to the fact that they need to teach students the basics about a given subject, and it doesn't really pay off to know all ins and outs and be really up to date.
I once got detention because a teacher thought I programmed a virus, just because I could make a computer play music using the play command in GWBasic. (True story, and yes, long ago). But whatever the reason, I think the advice not to use custome elements is a sound one.
Actually you can use custom elements. Here is the W3C spec on this subject:
http://w3c.github.io/webcomponents/spec/custom/
And here is a tutorial explaining how to use them:
http://www.html5rocks.com/en/tutorials/webcomponents/customelements/
As pointed out by #Quentin: this is a draft specification in the early days of development, and that it imposes restrictions on what the element names can be.
There are a few things about the other answers that are either just poorly phrased or perhaps a little incorrect.
FALSE(ish): Non-standard HTML elements are "not allowed", "illegal", or "invalid".
Not necessarily. They're "non-conforming". What's the difference? Something can "not conform" and still be "allowed". The W3C aren't going to send the HTML police to your home and haul you away.
The W3C left things this way for a reason. Conformance and specifications are defined by a community. If you happen to have a smaller community consuming HTML for more specific purposes and they all agree on some new Elements they need to make things easier, they can have what the W3C refers to as "other applicable specifications". (this is a gross over simplification, obviously, but you get the idea)
That said, strict validators will declare your non-standard elements to be "invalid". but that's because the validator's job is to ensure conformance to whatever spec it's validating for, not to ensure "legality" for the browser or for use.
FALSE(ish): Non-standard HTML elements will result in rendering issues
Possibly, but unlikely. (replace "will" with "might") The only way this should result in a rendering issue is if your custom element conflicts with another specification, such as a change to the HTML spec or another specification being honored within the same system (such as SVG, Math, or something custom).
In fact, the reason CSS can style non-standard tags is because the HTML specification clearly states that:
User agents must treat elements and attributes that they do not understand as semantically neutral; leaving them in the DOM (for DOM processors), and styling them according to CSS (for CSS processors), but not inferring any meaning from them
Note: if you want to use a custom tag, just remember a change to the HTML spec at a later time could blow your styling up, so be prepared. It's really unlikely that the W3C will implement the <imsocool> tag, however.
Non-standard tags and JavaScript (via the DOM)
The reason you can access and alter custom elements using JavaScript is because the specification even talks about how they should be handled in the DOM, which is the (really horrible) API that allows you to manipulate the elements on your page.
The HTMLUnknownElement interface must be used for HTML elements that are not defined by this specification (or other applicable specifications).
TL;DR: Conforming to the spec is done for purposes of communication and safety. Non-conformance is still allowed by everything but a validator, whose sole purpose is to enforce conformity, but whose use is optional.
For example:
var wee = document.createElement('wee');
console.log(wee.toString()); //[object HTMLUnknownElement]
(I'm sure this will draw flames, but there's my 2 cents)
According to the specs:
CSS
A type selector is the name of a document language element type written using the syntax of CSS qualified names
I thought this was called the element selector, but apparently it is actually the type selector. The spec goes on to talk about CSS qualified names which put no restriction on what the names actually are. That is to say that as long as the type selector matches CSS qualified name syntax it is technically correct CSS and will match the element in the document. There is no CSS-specific restriction on elements that do not exist in a particular spec -- HTML or otherwise.
HTML
There is no official restriction on including any tags in the document that you want. However, the documentation does say
Authors must not use elements, attributes, or attribute values for purposes other than their appropriate intended semantic purpose, as doing so prevents software from correctly processing the page.
And it later says
Authors must not use elements, attributes, or attribute values that are not permitted by this specification or other applicable specifications, as doing so makes it significantly harder for the language to be extended in the future.
I'm not sure specifically where or if the spec says that unkown elements are allowed, but it does talk about the HTMLUnknownElement interface for unrecognized elements. Some browsers may not even recognize elements that are in the current spec (IE8 comes to mind).
There is a draft for custom elements, though, but I doubt it is implemented anywhere yet.
This is possible with html5 but you need to take into consideration of older browsers.
If you do decide to use them then, make sure to COMMENT your html!! Some people may have some trouble figuring out what it is so a comment could save them a ton of time.
Something like this,
<!-- Custom tags in use, refer to their CSS for aid -->
When you make your own custom tag/elements the older browsers will have no clue what that is just like html5 elements like nav/section.
If you are interested in this concept then I recommend to do it the right way.
Getting started
Custom Elements allow web developers to define new types of HTML
elements. The spec is one of several new API primitives landing under
the Web Components umbrella, but it's quite possibly the most
important. Web Components don't exist without the features unlocked by
custom elements:
Define new HTML/DOM elements Create elements that extend from other
elements Logically bundle together custom functionality into a single
tag Extend the API of existing DOM elements
There is a lot you can do with it and it does make your script beautiful as this article likes to put it. Custom Elements defining new elements in HTML.
So lets recap,
Pros
Very elegant and easy to read.
It is nice to not see so many divs. :p
Allows a unique feel to the code
Cons
Older browser support is a strong thing to consider.
Other developers may have no clue what to do if they don't know about custom tags. (Explain to them or add comments to inform them)
Lastly one thing to take into consideration, but I am unsure, is block and inline elements. By using custom tags you are going to end up writing more css because of the custom tag won't have a default side to it.
The choice is entirely up to you and you should base it on what the project is asking for.
Update 1/2/2014
Here is a very helpful article I found and figured I would share, Custom Elements.
Learn the tech Why Custom Elements? Custom Elements let authors define
their own elements. Authors associate JavaScript code with custom tag
names, and then use those custom tag names as they would any standard
tag.
For example, after registering a special kind of button called
super-button, use the super button just like this:
Custom elements are still elements. We
can create, use, manipulate, and compose them just as easily as any
standard or today.
This seems like a very good library to use but I did notice it didn't pass Window's Build status. This is also in a pre-alpha I believe so I would keep an eye on this while it develops.
Why doesn't he want you to use them? They are not common nor part of the HTML5 standard.
Technically, they are not allowed. They are a hack.
I like them myself, though. You may be interested in XHTML5. It allows you to define your own tags and use them as part of the standard.
Also, as others have pointed out, they are invalid and thus not portable.
Why didn't he know that they exist? I don't know, except that they are not common. Possibly he was just not aware that you could.
Made-up tags are hardly ever used, because it's unlikely that they will work reliably in every current browser, and every future browser.
A browser has to parse the HTML code into elements that it knows, to made-up tags will be converted into something else to fit in the document object model (DOM). As the web standards doesn't cover how to handle everyting that is outside of the standards, web browsers tend to handle non-standars code in different ways.
Web development is tricky enough with a bunch of different browsers that have their own quirks, without adding another element of uncertainty. The best bet it to stick with things that are actually in the standards, that is what the browser vendors try to follow, so that has the best chance to actually work.
I think made-up tags are just potentially more confusing or unclear than p's with IDs (some block of text generally). We all know a p with an ID is a paragraph, but who knows what made-up tags are intended for? At least that's my thought. :) Therefore this is more of a style / clarity issue than one of functionality.
Others have made excellent points but its worth noting that if you look at a framework such as AngularJS, there is a very valid case for custom elements and attributes. These convey not only better semantic meaning to the xml, but they also can provide behavior, look and feel for the web page.
CSS is a style sheet language that can be used to present XML documents, not only (X)HTML documents. Your snippet with the made-up tags could be part of a legal XML document; it would be one if you enclose it in a single root element. Probably you already have a <html> ...</html> around it? Any current browser can display XML documents.
Of course it is not a very good XML document, it lacks a grammar and an XML declaration. If you use an HTML declaration header instead (and probably a server configuration that sends the correct mime type) it would instead be illegal HTML.
(X)HTML has advantages over plain XML as elements have a semantic meaning that is useful in the context of a web page presentation. Tools can work with this semantics, other developers know the meaning, it is less error prone and better to read.
But in other contexts it is better to use CSS with XML and/or XSLT to do the presentation. This is what you did. As this wasn't your task, you didn't know what you were doing, and HTML/CSS is the better way to go most of the time you should stick to it in your scenario.
You should add an (X)HTML header to your document so tools can give you meaningful error messages.
...I simply change all of my made up tags to paragraphs with ID's.
I actually take issue with his suggestion of how to do it properly.
A <p> tag is for paragraphs. I see people using it all the time instead of a div -- simply for spacing purposes or because it seems gentler. If it's not a paragraph, don't use it.
You don't need or want to stick ID's on everything unless you need to target it specifically (e.g. with Javascript). Use classes or just a straight-up div.
From its early days CSS was designed to be markup agnostic so it can be used with any markup language producing tree alike DOM structures (SVG for example). Any tag that comply to name token production is perfectly valid in CSS. So your question is rather about HTML than CSS itself.
Elements with custom tags are supported by HTML5 specification. HTML5 standardize the way how unknown elements must be parsed in the DOM. So HTML5 is the first HTML specification that enables custom elements strictly speaking. You just need to use HTML5 doctype <!DOCTYPE html> in your document.
As of custom tag names themselves...
This document http://www.w3.org/TR/custom-elements/ recommends custom tags you choose to contain at least one '-' (dash) symbol. This way they will not conflict with future HTML elements. Therefore you'd better change your doc to something like this:
<style>
so-cool {
color:blue;
}
</style>
<body>
<so-cool>HELLO</so-cool>
</body>
Surprisingly, nobody (including my past self) mentioned accessibility. Another reason that using valid tags instead of custom ones is for compatibility with the greatest amount of software, including screen-readers and other tools that people need for accessibility purposes. Moreover, accessibility laws like WAI require making accessible websites, which generally means requiring them to use valid markup.
Apparently nobody mentioned it, so I will.
This is a by-product of browser wars.
Back in the 1990’s when the Internet was first starting to go mainstream, competition incrased in the browser market. To stay competitive and draw users, some browsers (most notably Internet Explorer) tried to be helpful and “user-friendly” by attempting to figure out what page designers meant and thus allowed markup that are incorrect (e.g., <b><i>foobar</b></i> would correctly render as bold-italics).
This made sense to some degree because if one browser kept complaining about syntax errors while another ate anything you threw at it and spit out a (more-or-less) correct result, then people would naturally flock to the latter.
While many thought the browser wars were over, a new war between browser vendors has reignited in the past few years since Chrome was released, Apple started growing again and pushing Safari, and IE lost its dominance. (You could call it a “cold war” due to the perceived cooperation and support of standards by browser vendors.) Therefore, it is not a surprise that even contemporary browsers which supposedly conform strictly to web standards actually try to be “clever” and allow standard-breaking behavior such as this in order to try to gain an advantage as before.
Unfortunately, this permissive behavior led to a massive (some might even say cancerous) growth of poorly marked up webpages. Because IE was the most lenient and popular browser, and due to Microsoft’s continued flouting of standards, IE became infamous for encouraging and promoting bad design and propagating and perpetuating broken pages.
You may be able to get away with using quirks and exploits like that on some browsers for now, but other than the occasional puzzle or game or something, you should always stick to web standards when creating web pages and sites to ensure they display correctly and avoid them becoming broken (possibly completely ignored) with a browser update.
While browsers will generally relate CSS to HTML tags regardless of whether or not they are valid, you should ABSOLUTELY NOT do this.
There is technically nothing wrong with this from a CSS perspective. However, using made up tags is something you should NEVER do in HTML.
HTML is a markup language, which means that each tag corresponds to a specific type of information.
Your made up tags don't correspond to any type of information. This will create problems from web crawlers, such as Google.
Read more information on the importance of correct markup.
Edit
Divs refer to groups of multiple related elements, meant to be displayed in block form and can be manipulated as such.
Spans refer to elements that are to be styled differenly than the context they are currently in and are meant to be displayed inline, not as a block. An example is if a few words in a sentence needs to be all caps.
Custom tags do not correlate to any standards and thus span/div should be used with class/ID properties instead.
There are very specific exemptions to this, such as Angular JS
Although CSS has a thing called a "tag selector," it doesn't actually know what a tag is. That's left for the document's language to define. CSS was designed to be used not just with HTML, but also with XML, where (assuming you're not using a DTD or other validation scheme) the tags can be just about anything. You could use it with other languages too, though you would need to come up with your own semantics for exactly what things like "tags" and "attributes" correspond to.
Browsers generally apply CSS to unknown tags in HTML, because this is considered better than breaking completely: at least they can display something. But it is very bad practice to use "fake" tags deliberately. One reason for this is that new tags do get defined from time to time, and if one is defined that looks sort of like your fake tag but doesn't quite work the same way, that can cause problems with your site on new browsers.
Why does CSS work with fake elements? Because it doesn't hurt anyone because you're not supposed to use them anyways.
Why doesn't my professor want me to use made-up elements? Because if that element is defined by a specification in the future your element will have an unpredictable behavior.
Also, why didn't he know that made-up elements exist and work with CSS. Are they uncommon? Because he, like most other web developers, understand that we shouldn't use things that might break randomly in the future.

Doubts related to html vs html5

Everyone is now talking about HTML5, i dont really understand what is the difference between it and the normal .xhtml
could you please solve some of my doubts:
What are the differences?
What are the advantages and disadvantages?
Is HTML5 considered a markup language or an scripting language?
What are the differences?
HTML 5 is just the next version of HTML. It takes HTML 4 and adds more stuff to it (while throwing out a few bits that shouldn't be used and changes the rules on how to parse it (to what browsers have been doing for the last decade anyway)).
What are the advantages and disadvantages?
It lets you do more stuff.
It doesn't have any disadvantages (other than that the new stuff isn't well supported yet).
Is HTML5 considered a markup language or an scripting language?
It is a markup language with a bunch of DOM APIs.
I guess the confusion is that HTML4 and HTML5 don't do much (on their own). You need javascript and css and that is when the party really starts.
When people are talking about HTML5 I guess they are talking about HTML5+ CSS3 + Javascript (compared to HTML4 + CSS2 + Javascript).
For a good example of the adventures you can have in the new world check this out:
http://slides.html5rocks.com/
Remember this isn't using .NET or PHP or any thing, its "just" HTML5 + javascript + css
pro
fancy new options
more elements to better define your content
video without flash
con
not supported by all browsers
Not a lot of good quality docs available (yet)
And it is still a markup language as it has no dynamic elements in it that are scriptable.
basically html5 is just like html, with some extra tags added,
like the canvas element for drawing, the video and audio elements for media playback, some new content specific elements, like article, footer, header, nav, section. it has also better support for local offline storage. and also some new form controls, like calendar, date, time, email, url and search
So actually not much you couldn't do earlier with some JavaScript (or jquery, as some people like to call it now :p), but it's designed to make those (nowadays) common tasks a lot easier
What are the differences?
See HTML5 differences from HTML4
What are the advantages and disadvantages?
That's going to depend to a certain extent on what you're trying to achieve, but it might help to understand the rationale behind the spec. Basically, backwards compatibility is one of the main goals so, if you avoid using any of the new features like video or canvas, there should be no disadvantages over the previous standards.
Is HTML5 considered a markup language or an scripting language?
From the WHATWG FAQ:
HTML5 is a new version of HTML4,
XHTML1, and DOM Level 2 HTML
addressing many of the issues of those
specifications while at the same time
enhancing (X)HTML to more adequately
address Web applications. Besides
defining a markup language that can be
written in both HTML and XML (XHTML)
it also defines many APIs that form
the basis of the Web architecture.
Some of these APIs were known as "DOM
Level 0" and were never documented
before. Yet they are extremely
important for browser vendors to
support existing Web content and for
authors to be able to build Web
applications.

What is DOM? (summary and importance)

What is the Document Object Model (DOM)?
I am asking this question because I primarily worked in .NET and I only have limited experience, but I often hear more experienced developers talk about/mention it. I read tutorials online but I am unable to make sense of the whole picture. I know that it is an API!
More specific questions are:
Where is it currently used?
What field(s) of developers use it (ex-.NET developers)?
How relevant is it for all developers in general to understand?
In general terms a DOM is a model for a structured document.
It is a central concept in today's IT and no developer can opt out of DOM. Be it in .net, in HTML, in XML or other domains where it is used.
It applies to all documents (word documents, HTML pages, XML files, etc). In the developer sphere it applies mainly in the HTML and the XML domains with slightly different meanings.
HTML
In the HTML arena, the DOM was introduced to support the revolution called in the late 90ies "dynamic HTML". Before IE4 and Netscape 4.0, HTML documents where not changeable inside the browser (all you had in these remote times to sprite up a web page was "animated GIF" !!!! and HTML was version 3.2).
Therefore dynamically manipulating inside the browser the document sent by the server was a huge revolution and initiated the march towards the attractive web sites we see today.
Javascript had been introduced by Netscape (baptised javascript to surf on the new Java trend, but unrelated) and was supported by both Netscape HTTP servers and Netscape browsers, with Internet Explorer eagerly following the move inside the browser. However When javascript is used to manipulate the content of a document, you need an easy way to designate the part of the document you want to interact with. That's where the DOM comes in. Although HTML 4 is not "well formed", browsers build an internal representation of the page with the "body" element at its top and plenty of html tags below, in a hierarchical organisation (child nodes, parent nodes attributes etc). The DOM is the model underpinning the API that allows to navigate this hierarchy.
Since both Netscape and IE browsers were competing solutions, there was little chance the NS and the IE DOM would converge. The W3C stepped in to allow smaller browser vendors to enter the competition and endeavoured to standardised the DOM. Hence the W3C DOM. All it did was just to introduce another dialect and as everybody knows it took years and two serious competitors to force MS to comply with the standards.
Even though more moderns navigating techniques like JQuery have shorthand notations for the DOM, they internally rely on the DOM.
XML
HTML made obvious the disadvantages of showing leniency towards the "well-formedness" of documents and this ushered a new craze : XML. In the web arena, XML and XSLT were first supported by IE5 and adopted in many more domains than just displaying pages.
To parse XML, in the Java Word mainly, you would develop a SAX parser which is basically a plugin to a SAX engine in which you describe what the engine should do of all the XML events (tags...) it will encounter in the parsed document. Developing a SAX parser is not straightforward but is a low footprint solution.
However you have to develop a specific one for each new document type...
So it was not long before libraries started to appear to parse any document and build an in-memory map of its hierarchy. Because it also had the same concepts of root, parents and children (inherited from SGML through HTML), it was also termed a DOM and the name applies regardless of the library.
Other domains
The concept of DOM is not restricted to or even invented for HTML or XML. A DOM is a general concept applicable to any document, especially those (the vast majority of them do) showing a hierarchical structure in which you need to navigate. You can speak about the DOM of a MS-Word document and there are APIs to navigate these as well.
The DOM is the application programming interface for well-defined HTML and XML structures (per W3C's document). It is used in any place where you interact with the elements of a web page (any element - style, text, attributes, etc). You will hear a lot about the DOM with JavaScript and/or JavaScript libraries, such as jQuery (which, of course, is JavaScript). It is also referenced with Java, ECMAScript, JScript, and VBScript.
If you are programming .NET it is important if you are doing web-based work. If you are doing application programming, it's not as important. The DOM is definitely not a thing of the past - it is used and worked within every day by many developers. With that said, there has been work towards standardization of the DOM across web browsers. (Again, libraries can help hide these differences. This is one reason jQuery is so popular. You don't have to worry about the browser specifics - you just do what you need to do.)
The document I linked to above does a great job of answering all your questions and more. I would highly recommend reading it. If you have more questions, you can also check out the links below:
What is the Document Object Model? (W3C)
Document Object Model (Wikipedia)
I'm really not going to be able to explain it any better than the Wikipedia Article on DOM
But to answer a few of your questions:
Where do we still use it?
Every web browser since the mid-nineties.
Who uses it,
Every web developer since the mid-nineties.
in what technology?
Mostly the web via JavaScript, but pretty much anytime you access XML/HTML programatically you are using some kind of DOM implementation.
How important is it for anyone in .net
carrier? [sic]
Extremely, although you probably use it without even knowing it.
Is this just a thing of the past which
was heavily used but had problems?
If it is then somebody needs to tell John Resig that he has wasted the past 3 years of his life.
When a browser loads an HTML page, its convert it to the Document Object Model (DOM).
The browser's produced HTML DOM, constructs as a tree that consists of all your HTML page element as objects. For example, assuming that you load below HTML page on a browser:
<!DOCTYPE html>
<html>
<head>
<title>website title</title>
</head>
<body>
<p id="js_paragraphId">I'm a paragraph</p>
some website
</body>
</html>
After loading, the browser converts it to:
Some of the ability of scripting languages on HTML DOM consists of:
1- Change all the HTML elements in the page.
2- change all the HTML attributes in the page.
3- Change all the CSS styles on the page.
4- Remove existing HTML elements and attributes.
5- Add new HTML elements and attributes.
6- React to all existing HTML events in the page.
7- create new HTML events on the page.
Let's back to your questions:
1- It currently used in all modern browsers.
2- Front-end developers.
3- All Front-end developers that using scripting languages especially JavaScript.
What is DOM?
The Document Object Model (DOM) is a programming API for HTML and XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated. A standard defined by w3 consortium.
Source: http://www.w3.org/TR/WD-DOM/introduction.html

Yet another question regarding the html5 dtd/schema

If there is no DTD or schema to validate the H5 document against, how are we supposed to do document validation? And by document validation, I mean "how are we supposed to ensure our html5 documents are both syntactically accurate and structurally sound?" Please help! This is going to become a huge problem for our industry if we have no way to accurately validate HTML5 documents!
Sure, the W3C has an online tool that validates individual pages. But, if I'm creating A LOT of pages (hundreds, say) and I want to validate them in a sort of batch mode, what is the accepted method of ensuring valid structure and syntax? I mean, it seems rather rudimentary to just look at the document and say "yep. that's a valid xml document." What about custom tags? What about tag attributes? It seems like the W3C is leaving us out in the cold a little bit here.
Maybe the best answer will be found in the HTML editor. But then you get DTD/schema fragmentation. Each editor vendor coming up with their own rendition of what a valid structure is.
Maybe the answer is "wait for HTML5 to become official". But I really can't wait for that. I need to start creating and validating content now. I have applications I want to publish that can only be accomplished with html5.
So, any thoughts?
If there is no DTD or schema to validate the H5 document against, how are we supposed to do document validation?
With a specialized HTML5 validator rather then a generic SGML or XML validator.
Obviously, as the specification is still in draft form, the tools that do exist are immature and likely to be out of date or become out of date.
Sure, the W3C has an online tool that validates individual pages. But, if I'm creating A LOT of pages (hundreds, say) and I want to validate them in a sort of batch mode, what is the accepted method of ensuring valid structure and syntax?
Either use a different tool or download the W3C validator and run a local copy. It has a SOAP API so writing a batch validation tool isn't difficult.
What about custom tags?
HTML5 doesn't allow custom elements.
What about tag attributes?
The only custom attributes in HTML5 are data-* attributes, so an HTML 5 validator can recognize them.
It seems like the W3C is leaving us out in the cold a little bit here.
It seems like you expect the state of QA tools for HTML 5 (unfinished) to be up to the same standard as those for HTML 4 (over a decade old). This isn't a realistic expectation.
Maybe the best answer will be found in the HTML editor. But then you get DTD/schema fragmentation. Each editor vendor coming up with their own rendition of what a valid structure is.
The specification is clear (although in flux) even if it isn't expressed in the form of a DTD or schema. If each editor has a different idea of what is valid, then most or all of them are going to be either out of date or just buggy.
Maybe the answer is "wait for HTML5 to become official". But I really can't wait for that. I need to start creating and validating content now. I have applications I want to publish that can only be accomplished with html5.
If you need to live in the bleeding edge, then you have to accept the limitations and risks of doing so.
You might find this question/answer interesting: Will HTML 5 validation be worth the candle? . The answer is written by the developer of http://about.validator.nu/ .
You should start by taking a look at http://about.validator.nu/ .
Some, though not all, of your concerns are addressed there. You can host your own validator, there's a python based submission script, you can use a RESTFUL web service API and there are ways to get validation output in a variety of different forms.
I can't however see a simple way to integrate XHTML5 with other applications of XML such that one can easily create a validator of such compound documents. Not that there's really been a way to do that with earlier versions of XHTML either though.
This is working well for me: https://github.com/hober/html5-el
To get this to work, I renamed the default '/etc/schema/schemas.xml' file in order to move it out of the way and let the 'html5-el' one be used by nxml-mode.
If there is no DTD or schema to validate the H5 document against, how are we supposed to do document validation? And by document validation, I mean "how are we supposed to ensure our html5 documents are both syntactically accurate and structurally sound?" Please help! This is going to become a huge problem for our industry if we have no way to accurately validate HTML5 documents!
If testing pages with either Firefox or Opera, both of those will report errors such as code that is not "well-formed" and mismatched tags. Beyond that, one of the validators such as validator.w3.org or validator.nu will definitely help.
Sure, the W3C has an online tool that validates individual pages. But, if I'm creating A LOT of pages (hundreds, say) and I want to validate them in a sort of batch mode, what is the accepted method of ensuring valid structure and syntax? I mean, it seems rather rudimentary to just look at the document and say "yep. that's a valid xml document."
There are ways to run the W3C validator in batch mode.
What about custom tags? What about tag attributes? It seems like the W3C is leaving us out in the cold a little bit here.
The easy answer to that one is that "custom tags" are simply not considered valid. The Working Group has thoroughly addressed the issue of "distributed extensibility", particularly with respect to allowing "decentralized
parties to create their own languages" and "extension attributes" (http:// lists.w3.org/Archives/Public/public-html/2011Feb/0085.html). There are numerous ways to extend HTML (http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#extensibility) but adding custom tags is not one of them. Custom data and microdata attributes should validate fine.
Maybe the answer is "wait for HTML5 to become official". But I really can't wait for that. I need to start creating and validating content now. I have applications I want to publish that can only be accomplished with html5.
Since HTML 5 was stabilized at the end of last year (Dec. 2010), IMO we don't need to wait for it to become an official "recommendation" by the W3C. The stabilized spec provides a solid base that all browser vendors can implement consistently and for the ongoing evolution beyond HTML 5 of the spec, which is now being called the "HTML Living Standard" (Jan. 2011 and later). There is a good diagram of this at http://www.HTML-5.com/html-versions-and-history.html#html-versions (scroll down to see the diagram).

Is semantic markup too open-ended? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am taking a peek at Dive Into HTML5. It seems nice and interesting, but I am puzzled.
In the 1990s, at the time when Netscape was the browser and HTML was HTML2 or HTML3, there were a lot of tags: address, cite, code... Most of them are unused as of today, probably even obsolete.
HTML5 introduces tags to express "semantic meaning" to the tag itself. This is all fun and games, but I see something very strange in this approach. Technically, the semantics can be very open ended. HTML5 has tags for article, time, navigation bars, footer. Why shouldn't it contain tags for post icon, author's place, name and surname, or whatever else you want to assign specific semantics to (I'm confident <rant> and <nsfw> would be very important tags): ? I thought XML was the strategy to assign semantics to stuff. Nothing forbids you to put an XML chunk under a XHTML div element, and assign a stylesheet to it so to style it properly, or to delegate to the proper viewer the handling of that namespace (for example, when handling RSS or SVG).
In conclusion, I don't understand the reason behind this extensions focused towards semantics, when it's clear that semantic is a very broad topic, which is guaranteed to require a potentially infinite amount of semantic tags. Since I am pretty sure there are clever people at W3C, I think I'm wrong, but I'd like to know why.
Why are tags for article, time, navigation bars, footer useful?
Because they facilitate parsing for text processing tools like Google.
It's nothing about semantics (at least in 'broad' meaning). Instead they just say: here is the body of page (most important text part) and there is the navigation bar full of links. With such an approach you can easily extract just what you need.
I too hate the way that W3C is going with their specs. There are many things that I don't like, and this "semantics" fad is one of them. (Others include taking forever to complete their specs and leaving too many important details for the browsers to implement as they choose)
Most of all I don't like it because it makes my work as a web developer more difficult. I often have to make a choice whether to make the webpage "semantically correct" or "visually/aesthetically pleasing". The latter wins of course, because that is what the users want, but as a result validations start failing and the whole thing gets quite non-semantic (tables for layout and other things).
Another issue at which I frown is that they have officialy declared that the "class" attribute is for semantics, but then they used it for visual presentation selectors in CSS.
Bottom line - DON'T MIX SEMANTICS AND VISUAL REPRESENTATION. If you use some mechanism for describing semantics (like tag names, attribute values, or what not else), then don't use it for funcional/visual purposes and vice versa.
If I would design HTML, I would simply add an attribute "semantic" which could (like the "class" attribute) be added to any tag. Then there would be a number of predefined values like all those headers/footers/articles/quotes/etc.
Tags would define functionality. Basically you could reduce HTML tags to just a handful, like "div", "table/tr/td", "a", "img", "form", "input" and "select". I probably missed a few but this is the bulk. Visual styling would be accomplished through CSS.
This way the three areas - semantics, visual representation, and functionality - would be completely independent and wouldn't clash in real life solutions.
Of course, I don't think W3C is interested in practical solutions...
There is already a lot of semantics in HTML markup in the forms of classes and IDs, of which there is a (near) infinite amount of possibilities of, And everyone has their own way of handling these semantics. One of the goals of HTML5 is to try to bring some structure to this. you will still be able to extend the semantics of tags with classes and ids. It will also most likely make things easier for search engines.
Look at it from the angle of trying to make statements either about the page, or about objects referenced from the page. If you see a <footer> tag, all you can say is "stuff in here is a footer" and pass it by. As such, adding custom tags is not as generic a solution as adding attributes and allowing people to use their own choice of URIs to specify predicates and optionally values - RDFa wins hands-down because you can express any triple-statement you like from RDF in a page, one way or another.
I just want to address one part of your question. You say:
In the nineties, at the time when
Netscape was the browser and html was
HTML2 or HTML3, there were a lot of
tags: address, cite, code... Most of
them are unused as of today, probably
even obsolete.
There are a great deal of tags to choose from in html, but the lack of usage does not imply that they are obsolete. In particular the header tags <h1>, etc, and <ul>, <ol> are used to join items into lists in a way I consider semantic. Many people may not use tags semantically, but the effort to create microformats is an ongoing continuation of the idea you consider an artifact of the 1990s. Efforts to make the semantic web be a winner keeps going, despite full-text search and link analysis (in the form of Google) being the winner as far as how to find and understand the web.
It would be great to see an updated version of Google's Web Stats which show "html as she is spoke." But you are right that many tags are underused.
Whether html5 will be successful is an open and interesting question, but the tags you describe as obsolete didn't go anywhere, they were there in HTML 4.01 and xhtml. HTML5 seems to be an effort to solidify what is useful in tags. In the end if html5 gets support in browsers and makes the job of web developers easier, it will succeed. xhtml2 failed because it roundly failed to gain adoption in browsers and did nothing to make the job of web page makers easier. The forces working on html5 seem keenly aware of the failure of xhtml2, and I think are avoiding having html5 suffer a similar fate.
"Why shouldn't it contain tags for post icon, author's place, name and surname, or whatever else you want to assign specific semantics to (I'm confident and would be very important tags): ?"
You use <dialog> to describe conversations or comments. Rant and NSFW are subjective terms therefore it makes sense not to use them.
From what I understand a bunch of experienced web developers did research and looked for what most websites have in common in html. They noticed that most websitse have id="header", id="footer", id="section" and id="nav" tags so they decided that we need HTML tags to replace those id's. So in other words, don't expect them to give you a HUGE amount of HTML vocabulary. Just keep it simple as possible as you can while addressing the MOST common needed HTML tags.
NAV tag is VERY important for providing accessibility as well. You want them to know where the navigation is rather than to force them to find whether links are for navigation or not.
I disagree with adding extra tags. If detailed vocabulary were actually import then there could be a different tag name for every word in the dictionary. Additional tags names are not helpful as they may communicate additional meaning to humans, but do nothing to facilitate machine parsing of the language. This is why I don't like the "semantic" tags for HTML5 as I believe this to be slippery slope to providing a vocabulary too complex while only providing a weak solution to a problem not fully addressed.
In my opinion markup language structure data as much as describe it in a tree diagram form. Through parsing of the structure and proper use of semantic conventions, such as RDFa, context can be leveraged to provide specific meaning to otherwise generic tag names. In such as case excessive vocabulary need not exist and structurally redundant tag names, such as footer and aside, could be eliminated. The final objective is to make content faster and more accurate to interpret by both humans and machines simultaneously while using as little code as possible to achieve that result. How that solution is lesser important, except to HTML5.
I thought XML was the strategy to assign semantics to stuff.
As far as I know, no it wasn’t. XML allows new languages to be defined which are all parsed in the same way, because they all use the XML syntax.
It doesn’t, of itself, provide any way to add meaning (“semantic” just means “meaningful”) to those languages. And until computers get artificial intelligence, they don’t actually understand meaning, so meaning is just what is agreed between human beings. HTML is the most commonly-used language with agreed meaning of its tags.
As HTML is so common, it’s helpful to add a few meaningful tags to it that are quite general in their application. The new HTML5 tags are aimed at that. The HTML5 spec’s authors could indeed carry on down this route, creating tags for every specific bit of meaning possible, but as they’re not robots, they probably won’t.
<section> is useful, and general enough to be meaningfully applicable in lots of documents. <author-last-name> isn’t. Distinguishing between the two is a judgment call, which is why humans, and not computers, write the spec.
For custom semantics that are too specific to be added to HTML as tags, HTML5 defines microdata.
I've been reading Andy Clark's book Transcending CSS (page 33).
...,it is now widely accepted that presentational names such as header, left, or red that describe an element's look or position are poor choices.
After reading these lines I asked myself: hey, aren't there elements in HTML5 spec such as header, footer?? Why is footer more semantic ? Andy in his book advocates to use site-info for the ID of the footer div and this makes more sense IMHO. Footer is a presentational name (describes the element's position).
In a word, AJAX. The new tags are meant to support what real-world developers are doing by replacing some of the <div class="sidebar-wrap"><div class="styling-hook"><div><ul class="nav"> type of divitis many websites suffer from. The only <div> left in the HTML5 is the styling hook.
The semantics that get promoted to tags from classes are those that developers have freely adopted en-masse as best practices, given an extended xhtml/css adoption period. Check out the WHATWG developer's edition of the spec's sections pagehere. The document itself is a pleasure, but I won't spoil it if you haven't seen it yet.
One of the less obvious reasons for some decisions made by the W3C is the importance of Webkit. If you look, you can see that they were better than some at taking the current work of the HTML5 Working Group and implementing ideas. They have historically been way out ahead in compliance (see here). The W3C placed a high priority on their (i.e. Android, iPhone, the Googlebot, Chrome, Safari, Dreamweaver, etc.,). Google, framework users, Wordpress/Moveable Type/Joomla! type users and others wanted self contained building blocks, so this is the style we get.
Facebook is modular. Responsive design's grids are modular. Wordpress is modular. Ajax works best with modular page structures. Widgets are modules. Plug-ins are modules. It would seem that we should be trying to figure out stuff like how to apply these tags to make it easier to hook the appropriate elements and activate them in our document/application/info-network hybrid Web 2.0.
In closing, HTML5 is meant to be written as xml (again, see the spec) in order to ensure that tools and machines making ajax requests for a portion of a document will get a well-formed useful response. How awesome in combination with things like media queries for devices like feed readers, braille printers, annotators, etc.,. I see a (near)future where anything with good semantic content is it's own newsfeed automagically! This only happens if developers adopt and write compliant documents.