Are we not supposed to be using the <main> element anymore? - html

Like most of my SO questions, this one stems from my inability to find up-to-date Google results.
It's been almost 3 years since <main> was accepted into the HTML5.1 spec. It seems to make perfect semantic sense to use:
<header></header>
<main></main>
<footer></footer>
But I see a lot of semantics-powered sites (like CanIUse and CSS-Tricks) that simply ignore the element, instead using something like:
<header></header>
<div class="main-wrapper">
<!--no ARIA role, nothing to semantically indicate "main" content-->
</div>
<footer></footer>
I feel like I've missed some conversation about how everyone needs to stop using <main> and Google's not helping me find that conversation. Was the element deemed unnecessary (i.e. clients don't really ever parse for it)?
Now it seems IE never ended up supporting it (sans polyfill), but is that why folks aren't using it? The same sites I've seen use div.main-wrapper do LOTs of things that still require polyfills for IE. Why not still use the semantic benefits of <main>, which only requires a 1 line JS shiv and a display:block?

(i.e. clients don't really ever parse for it)?
All major browsers except IE have implemented the parsing/styling and semantics mapping (role=main) for the main element, Edge has implemented. 3 years is not a long time in terms of uptake for a new element (although its usage is already much higher than some other new elements added years before it). Its use is steadily growing over time (you can grep the data from http://webdevdata.org if you are so inclined).
All major screen readers support main element semantics as part of landmark navigation.
63% of screenreader users sometimes/ often/ always use landmarks/ regions (so add them, or I’ll spank you). - Bruce Lawson

Answer: You should not avoid using the main element just because some other prominent sites/developers you run across are not using it.
There has been no conclusion among anybody anywhere to stop using main or to suggest to others that they should not be using it.
It’s not a requirement that you use it. But if you use it in a way that doesn’t cause the W3C validator to emit an error or warning, and in way you judge conveys the meaning/structure of your document as you the author intend—then go for it. That’s what it’s there for.

My guess is it's a chicken/egg thing. There's not much point in using it in sites if clients aren't doing anything special with it. And there's not much point in some clients doing anything with it if adoption is low. And I would guess the problem isn't causing enough pain for the majority of users & developers.

Related

Use header tag in html

Why would one use <header> tags or <footer> or <address> tags? Is it just for SEO, or another reason?
I ask this question because IE8 and older doesn't support many of these elements.
These are all tags introduced with HTML5. They are part of an evolution of HTML. They're not supported in IE8 because they were introduced after support for IE8 ended. They were introduced to provide more logical elements that were commonly used in web page designs.
If you need to support IE8, you can do so by not using these tags and sticking with <div> tags with classes, such as:
<div class="header"></div>
<div class="footer"></div>
<div class="address"></div>
accompanied, of course, by CSS styles for each.
They have nothing to do with SEO.
The tags you describe, such as a header tag, are what is known as 'syntactic sugar'. That is, it makes it easier to read and know what the intent of the tag is. This is good for human readers, but it is especially useful for automated systems.
Compare these examples that all could mean exactly the same thing:
<header>...</header>
<div class="header">...</div>
<div class="hdr">...</div>
Note that header is easy to differentiate from the two div tags. Only if you know what the class attribute means will you understand what the div tags are defining. Because the class attribute value is free-form, it means there is no standard definition.
SEO is one example of an automated system that might need to read the tag directly and understand that it has a semantic meaning. A particularly observant SEO engine might understand that the above three tags all refer to the same semantic definition, but you will agree that writing an engine that would presciently know all three mean 'header' would be difficult. In fact, there is nothing to disambiguate the latter two from <div class="zebra">.
Having automated systems be able to read your code and understand the semantic meaning goes well beyond SEO: it can make automated handling of mobile versions easier, for instance. You no longer have to hand-roll code for your specific implementation. You can use Javascript libraries that might do handy things with your headers, such as allow them to be sortable or auto-link them. Anyone writing those libraries has an easier time doing it.
You also ask why you would use something not supported by older browsers that are still widely used. That is a question of demographic: what is your app targeting? If you need the 6% of the world that is using an old browser to utilize your application, then you should absolutely use backwards compatible techniques. If, on the other hand, you want to make the UX the best possible for a set of users likely to be using a modern browser, then you should use the new tags. (Note that having bad UX or long development time as you roll your own solutions to things may cost you more than 6% of your application's userbase...)

Proper way to use h1? (Regarding document outline and SEO)

I'm still trying to familiarize myself with HTML5, and there's this stuff which feels a bit confusing....
I once read in Jeremy Keith's book and HTML5 Doctor (via this question) which say that HTML5 makes it possible to use multiple h1s. In HTML5, each section can have its own heading element so it is okay to have more than one h1. I've seen a Wordpress theme framework, "underscores", which seem to apply this in the fullest.
However, this may seem to pose problem for older browsers (yet to support HTML5) in defining the site structure/document outline. Also, it poses problem for SEO.
I stumbled upon Matt Cutts's (from Google) video and re-read Keith's book which recommend limiting the use of h1 and use the conventional document outline (only use one or two h1 per page, followed by multiple h2, h3, etc). Matt Cutts also imply that multiple h1 is not too good for SEO.
However,
I previously never paid serious attention to site structure/document outline. So I never know how old browsers (pre-HTML5) read a site structure/document outline. There exists a HTML5 outliner, but I can't find outliner for HTML4.
Matt Cutts's video (regarding HTML5 and SEO) is published in 2009. I
don't know if Google already support the new HTML5 way of outlining
document.
So my question is, if I want to:
Support older browsers (e.g. Firefox 3.0 and IE 6) to display correct site structure/document outline
Have a good result in SEO
Which one should I use: multiple h1s (the way it is done in HTML5) or the conventional way?
This HTML5 one (example taken from HTML5 Doctor):
<h1>My fantastic site</h1>
<section>
<h1>About me</h1>
<p>I am a man who lives a fascinating life. Oh the stories I could tell you...</p>
<section>
<h1>What I do for a living</h1>
<p>I sell enterprise-managed ant farms.</p>
</section>
</section>
<section>
<h1>Contact</h1>
<p>Shout my name and I will come to you.</p>
</section>
or the conventional way?
<h1>My fantastic site</h1>
<h2>About me</h2>
<p>I am a man who lives a fascinating life. Oh the stories I could tell you...</p>
<h3>What I do for a living</h3>
<p>I sell enterprise-managed ant farms.</p>
<h2>Contact</h2>
<p>Shout my name and I will come to you.</p>
Use the new format.
Plenty of people will use h3s or h2s, and that's perfectly fine as well.
In fact, they'll use the section or article or header or footer elements offered by html5, and then use h3 or h4 as headings for that document-segment (for fear of SEO penalties / legacy styling|layout quirks).
And that's fine, too.
If you watch Cuts' video again, he says to keep the h1 use to a minimum -- only using multiples when they're really warranted.
That hasn't really changed at this point.
Google isn't going to murder you for having multiples.
Google IS going to expect each one to mean that there was a fundamental change in content.
That's true whether or not you have the sectioning (section/article/etc) elements in there or not.
Google has also gotten to the point where they're properly spidering AJAX-only, or JavaScript-dependent websites, and have their own rich-content metadata system... ...they're sophisticated enough to parse section or article.
Worry more about the quality of the content, and if you're ready to take it on, the Google-specific metadata which they use for search-results, etc...
...and let Google worry about navigating the semantics (as long as you're using them well, and not doing anything shady).
Lesser crawlers, who knows... ...but that's on a per-crawler basis, and most people only need to be concerned with Google and Bing and Yahoo, with other crawlers either feeding off of Google, or being very domain-specific (like if you want to rank highly on an opt-in, car-rental crawler for some reason... ...at which point you should be supplying an XML/JSON feed of some sort, anyway).
deathlock, your second example doesn't contain any sectioning elements. However, you could use sectioning elements with headings other than h1. I think that's the point of your question:
h1 in every sectioning element
<section>
<h1>…</h1>
<section>
<h1>…</h1>
</section>
</section>
or "calculated" heading level
<section>
<h2>…</h2>
<section>
<h3>…</h3>
</section>
</section>
Semantically/technically, they are the same.
SEO shouldn't be a problem, because "h1 everywhere" will be (and already is) used all over the web, and the major search engines know this. If they want to support HTML5, they have to understand the outlining algorithm. I bet that their crawler/APIs already correctly calculate the real heading level, like the HTML5 outliner does, for example.
The only reason why you'd want to use h2-h6 as sectioning element heading would be old accessibility software, e.g. screenreaders. They usually offer an outline menu, so the user can jump directly to a certain heading. So if you always use h1, older screenreaders, that don't know HTML5, would announce all headings as h1, because they don't calculate the correct outline levels. However, Jaws 13 for example (current version of a screenreader), only gets "h1 everywhere" for HTML5 correct in IE, AFAIR, and it gets confused if you use other heading levels in a HTML5 page. This is, of course, a bug, but it's a nice example that sticking to the "old way" will not always work for newer software.
So you might get problems either way.
In my opinion you should stick with what the HTML5 spec recommends, and this would be: use h1 for all sectioning element headings. Because this specification is what future user-agents, accessibility tools, search engines and other services/softwares use to build their product.
However, it depends on your use case, of course. If you know your visitor statistics, you should use them to make the right decision for your special case. E.g. if your site will not live for many years in the future, use what is now best supported.
The best way is to use HTML5 and use this link to make them work in the old browser since Google is ready your website way better and consider you use new technology (so that your site is better) if you use the new tags.
<!--[if IE]>
<script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
Put it in the head section of your site and it'll work fine for old IE versions

Is it OK to use unknown HTML tags?

Correct me if I'm mistaken, but AFAIK, unknown HTML tags in markup (i.e. tags not defined in the HTML spec, like say, <foobar>) will eventually be treated as a regular <div> in an HTML 5 browser environment.
I'm thinking: how supportable is this practice? I mean, if I use unknown HTML tags in my markup, what pitfalls can I expect? Will a velociraptor pounce on me within the next few seconds?
The reason I ask is that if these tags defer to <div>, I can potentially use these tags in a more semantic manner than, say, assigning class names that identify modules. Have a look at this article, for example, of a .media class. Now what if instead of writing up that CSS to target .media, I make it target <media> instead? In my opinion, that makes the markup much more readable and maintainable, but I do acknowledge that it's not "correct" HTML.
EDIT
Just to be transparent, I did find this SO question from a few years back. It's been closed as off-topic, but I feel that I have a valid point in my own wording. It's a close duplicate, I admit, but it's from a few years back, so there might have been changes in the general environ of opinions across web developers about the topic.
user1309389 had a very good answer, and I agree with the appeal to the spec. But I disagree with their conclusion, and I think they're wrong about made-up elements leading to "undefined behaviour". I want to propose an alternative way of thinking about it, rooted in how the spec and browsers actually handle made-up elements.
It's 2015, we're on the verge of the CustomElement spec being widely adopted, and polyfills are readily available. Now is a great time to be wondering about "making up your own elements". In the near future, you'll be able to create new elements with your own choice of tag and behaviour in a fully standard and supported way that everyone can love. But until this lands in all browsers, and until the majority of people are using supporting browsers, you can take advantage of the hard work of the Polymer or X-Tags projects to check out the future of custom elements in a nearly-standard and mostly-supported way that quite a few people can love. This is probably the "right thing" to do. But it doesn't answer your question, and frankly I find "just use X" or "don't do X" to be less helpful than "here's how the spec covers this, and here's what browsers do". So, here's what I love.
Against the heartfelt recommendation (and sometimes screaming) of much of the web dev community, I've been working with "made-up" elements for the past year, in all of my production projects, without a polyfill, and I've had no unsolvable issues (yet), and no complaints. How? By relying on the standard behaviour of HTMLUnknownElement, the part of the W3C spec that covers the case of "made-up" elements. If a browser encounters an unrecognized HTML element, there is a well-defined and unambiguous way that it should be handled, and HTMLUnknownElement defines that behaviour, and you can build on top of that. HTMLUnknownElement also has to be powerful and correct enough to "not break the web" when encountering all the tags that are now obsolete, like the <blink> tag. It's not recommended that you use HTMLUnknownElement, but in theory and in practice, there's absolutely no harm in doing so, if you know what you're doing.
So how does HTMLUnknownElement work? It is just an extension of the HTMLElement interface, which is the standard interface underlying all HTML elements. Unlike most other elements however, HTMLUnknownElement doesn't add any special behaviour — you get a raw element, unadorned with any special behaviour nor constraining rules about use. The HTMLDivElement interface works almost exactly the same way, extending HTMLElement and adding almost no additional behaviour. Put simply, making up your own element is almost identical to using a div or span.
What I like about "making-up" elements is the change of mindset. You should use or invent HTML elements based on several factors, ranging from how clear it makes the markup to read, to how the browser and screen readers and search engines parse your code, to how likely your code is to be "correct" by some objective measure. I sparingly use made-up elements, but I use in exactly the way Richard described, to make things more meaningful for the author of the HTML, not just meaningful to a computer service that extracts metadata. When used in a consistent way across a team, there can be a big benefit since made-up elements can concisely express what they're for.
I particularly like using made-up elements to indicate when I will be using JS to define extra behaviour for an element. For instance, if I have an element that will have children added/removed by JS, I will use a made-up element as a clue that this element is subject to special behaviour. By the same token, I don't use a made-up element when a standard element will suffice. You will see <dynamic-list> live happily next to <div> in my code.
Now, about those pesky validators. Yes, using made-up elements isn't "valid" in the sense that it won't pass a "validator". But many commonly used features, patterns, and systems of modern HTML and JS development fail all the W3C validators. The validators aren't the law — the spec is. And the law isn't binding — the implementations in all the browsers are. The utility of validators has been dimishing for years as the flexability of HTML has been increasing, and as browsers have shifted in their relationship to the spec. Validators are great for people who aren't comfortable with HTML and need guidance. But if you're comfortable taking your guidance from the spec and from browser implementations, there's no reason to worry about being flunked by the validator. Certainly, if you follow many of the guidelines offered by Google, Apple, Microsoft, etc, for implementing any experimental features, you'll be working outside the bounds of the validator. This is absolutely an okay thing to do, so long as you're doing it deliberately and you know enough about what you're doing.
Therefore, if you're going to make up your own elements and rely on HTMLUnknownElement, don't just treat it like a wild west. You need to follow a few simple rules.
You have to use a hyphen in the name of your tag. If you do, you are guaranteed to never collide with a future edition of the HTML spec. So never say <wrong>, always say <quite-right>.
Made-up elements can't be self-closing — you have to close them with a closing tag. You can't just say <wrong> or <still-wrong />, you have to say <totally-good></totally-good>.
You have to define a display property for your element in CSS, otherwise the rendering behaviour is undefined.
That's about it. If you do these things, you should be good to use made-up elements in IE9 and up, relying on the safety net of the HTMLUnknownElement. For me, the benefits far, far outweigh the costs, so I've been using this pattern heavily. I run a SaaS site catering to major industrial corporations, and I've had no trouble or complaints thus far. If you have to support older versions of IE, it's wise to stay far away from any "2015" technology or their crude approximations, and stay safely within the well-trodden parts of the spec.
So, in summary, the answer to your question is "yes, if you know what you're doing".
You should always approach HTML as it is defined in its respective specification. "Defining" new tags is a bit of an extreme approach. It might pass a browser check because it implements various failsafes, but there is no guarantee of this. You're entering the land of Undefined Behaviour, at best. Not to mention you will fail validation tests, but you seem to be aware of that.
If you wish to be more semantically expressive in your markup, you can use HTML5 which defines quite a bit of more descriptive tags for describing the structure of your page instead of generic divs which need to be appended ids or classes.
In the end, a short answer: No, it's bad practice, you shouldn't do it and there could be unforeseen problems later on in your development.
No. You will fail validation, you will get random issues cross browser and you WILL be eaten by said dinosaurs. CSS is the answer if you want your page to behave predictably.
Yes We Can.
There is a new spec going on about custom elements/tag - http://w3c.github.io/webcomponents/spec/custom/.
Only issue with this is you have to use js to register your new element
You can read more about this at
https://developers.google.com/web/fundamentals/getting-started/primers/customelements
Rule #1 of browser interoperability is: don't have errors. No matter how many browsers you test in, there are always browsers you can't test, for instance because they don't exist yet.
Also, unknown elements will be treated as <span>, not <div> by most browsers currently.
If it's really source readability(*) you're after, you should look into XML+XSLT.
That way, you can use all the tag names you want, and make them behave in any way you like and you don't have to worry that <media> will be a real element in some future version of HTML.
One good real world example is the element <picture>. If a website ever used <picture> and relied on the notion that this element would have no styles or special content by itself, they are in trouble now!
(*) With XML+XSLT, the readability will be in the XML part, not the XSLT part, obviously.
Generally not recommendable, e.g. IE wont apply css-styles to unknown tags.
All other browsers render unknown tags as inline-Elements (which causes problems with nesting).
I recommend you the following article: http://diveintohtml5.info/ There is a section about unknown tags.
In my case I use a lot of them into my Webkit-powered game GUI system, and everything works.
W3C says:
HTML5 supports unknown tags as inline elements, but it recommends CSS
styling for it.
Here is the source: https://www.w3schools.com/html/html5_browsers.asp
In your example you are talking about <media>, it's could be great but if html6 adds this tag for another element, your code won't be retrocompatible.
The one downside that worries me is what if a custom tag I use now, becomes an official HTML-tag next year or even later?
Therefore how about this: Instead of custom tags use 'div' + a custom CSS-class.
CSS-classes are meant to be custom, it is definitely ok to have your own custom CSS-classes. Your div can then further have any number of CSS-classes associated with it making the semantic machinery even more flexible, you could call it multiple inheritance.
Instead of div you could use span for the same purpose. I would like to use something shorter actually, say p but unfortunately p has its own special behavior of what happens if you don't close it.
But definitely if you go the route of expressing semantics with CSS-classes then it does help to use a tag-name that is as short as possible. I wish there was something shorter than div, say for instance t for 'tag'.
By default, custom elements main parts of many JavaScript-Framwork. It is state of the art in modern javascript.:
var XFoo = document.registerElement('x-foo', {
prototype: Object.create(HTMLElement.prototype)
});
https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_custom_elements
https://www.html5rocks.com/en/tutorials/webcomponents/customelements/
Is it OK to use unknown HTML tags? If you ask wrong questions, you get allways wrong answers. If you define your unknown tag with javascript as a CustomeElement , your unknown tag is no longer part of the HTML5/HTML-definition.
Yes, you can use unknown tags, if you define the tag in your javascript.
What's wrong with the judicious use of < !-- your stuff here -- >. It worked for scripts back around the time of the Cretaceous–Paleogene boundary. Dinosaurs ceased to be a problem around that time, that is apart from the flying, feathered variety.
It's bad practice and you should not do it. The browser renders it as div as fallback solution in most cases but it's not valid html and therefore never will pass a validity test.

We hear so much about "semantic html". Where/what are the algorithms reading our semantic html?

I keep making attempts at properly using HTML5 but I feel like it's still not even close to anything semantically valuable.
My attempts:
HTML5 Article node Architecture
HTML5 Blog Page Architecture
But there's such subtleties in every single tag!
My question is, what specific software out there on the web is actually doing things like processing our HTML DOM, calculating and comparing elements to say "oh, this is a <header>, and it's just after <section>, and it has <time> in it, so the <time> tag must be "metadata" in relation to the <header>...", and saying "The content within the <time> tag not only is the "published time", but also relates to the author's birthday, so it must be a special post (say because there was also a <cite> or <address class='vcard'> tag in there too)".
I mean, what benefit am I ever going to get in using HTML5 if I don't know the algorithms that are interpreting it? If I just stuck with the basic div, ol, ul, li, p, a, h[1-6] tags, I could do everything with half the number of DOM elements.
Looking forward to some specific algorithms that I can use to shape how I structure the DOM from here on out.
I'm at the point where I don't even think we should be using HTML5 tags at all. For example, on the iPhone especially, the goal should be to minimize dom elements to decrease load time. Plus, if the iPhone site is a mirror of the traditional browser version, the search engines won't even see the iPhone site (ideally). So there's no real point in making the DOM semantic. So if I can use 1/2 the amount of <div> tags to achieve the same layout as if I used a somewhat "semantic HTML5" rendition, and that's a good thing for the iPhone, why don't I do that for the regular browser too? That's where I'm coming from.
Articles like this are basically saying it's pointless to worry about semantic HTML.
What algorithms are reading your semantic HTML? Google, that's who. Their algorithm tries to extract every bit of meaning from pages that it can, because that helps Google construct smart, relevant search results. For one example, Google tries to determine the dates of things by reading the HTML and gives headers extra consideration in determining the overall topic of a page.
Also, your assertion that we shouldn't use HTML5 tags on the iPhone "to minimize dom elements" isn't founded in any technical basis. HTML5 doesn't dictate that we use more DOM elements, and in fact it can let us leave out tags that would be required by XHTML. You should use HTML5 on the iPhone more than anywhere else. For example, the new input types like number and email don't do much on the desktop, but that extra information can really make things nicer on the iPhone by allowing it to present an appropriate interface.
Whenever a "machine" tries to make sense of your content.
In addition to search engines (→ SEO), screen readers (→ Accessibility) interpret the markup. They get better from version to version.
Also, think of all the tools that might come one day. The great thing about the Web is, that all the web pages could still exist in 5, 10, 100 … years from now. Imagine the user-agents and algorithms and search tools that might exist then, and how they could extract the meaning of your old documents.
Search engines can/will better interpret your pages which combined with other factors will result in better rankings for your pages.
Moreover if you use the tags consistently and semantically, you could build your own reusable widgets and libraries that derive knowledge from the HTML structure independent of how the data is stored in the backend.
Consider this sample Google search where you can filter results by date. By using semantic HTML, for let's say, <article> and <time>, you can write a simple crawler that recreates this functionality or allows users to specify a timespan within which to search articles in your own site(s).
Off the top of my head, I don’t know of any algorithms making use of the new semantic tags in HTML5. (Obviously, that doesn’t mean there aren’t any.)
But the idea that you should tailor your HTML to specific algorithms is, I think, a bit contrary to how the web works. The web is worldwide, and will hopefully be around for a long time. We can’t know what uses our HTML will be put to, and useful algorithms can’t be written until there’s a good amount of actual content out there.
The <a> tag wasn’t designed with Google’s PageRank algorithm in mind. Some people thought links would be useless if they weren’t inherently two-way, because you’d get too many broken links when one end went away.
Of course, if the vague possibility of undefined future benefits makes it not worth using some or all HTML5 tags for whatever project you’re working on, don’t use them.
For me, the benefit of using them is that there’s a well-known, public, non-proprietary specification that tells you, and anyone else working on the code, what we’ve agreed the tags mean. Future developers don’t just get a <div> with a class name that I made up in a coffee-fuelled 7 p.m. code print, they get a tag designed and documented by people smarter and more experienced than me. There’s also the chance that the code will become more useful in future if people use the meaning contained in HTML5 tags in algorithms, whereas there’s less chance of that if it’s all just a bunch of <div>s.
I don’t think the size increase of our pages from HTML5 tags is particularly worth worrying about though. After gzipping, the size increases aren’t enough to worry about, especially as mobile performance is as much hampered by the latency (which you can’t do much about) as the bandwidth. Plus mobile bandwidth is likely to trend up, rather than down.

Is semantic markup too open-ended? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am taking a peek at Dive Into HTML5. It seems nice and interesting, but I am puzzled.
In the 1990s, at the time when Netscape was the browser and HTML was HTML2 or HTML3, there were a lot of tags: address, cite, code... Most of them are unused as of today, probably even obsolete.
HTML5 introduces tags to express "semantic meaning" to the tag itself. This is all fun and games, but I see something very strange in this approach. Technically, the semantics can be very open ended. HTML5 has tags for article, time, navigation bars, footer. Why shouldn't it contain tags for post icon, author's place, name and surname, or whatever else you want to assign specific semantics to (I'm confident <rant> and <nsfw> would be very important tags): ? I thought XML was the strategy to assign semantics to stuff. Nothing forbids you to put an XML chunk under a XHTML div element, and assign a stylesheet to it so to style it properly, or to delegate to the proper viewer the handling of that namespace (for example, when handling RSS or SVG).
In conclusion, I don't understand the reason behind this extensions focused towards semantics, when it's clear that semantic is a very broad topic, which is guaranteed to require a potentially infinite amount of semantic tags. Since I am pretty sure there are clever people at W3C, I think I'm wrong, but I'd like to know why.
Why are tags for article, time, navigation bars, footer useful?
Because they facilitate parsing for text processing tools like Google.
It's nothing about semantics (at least in 'broad' meaning). Instead they just say: here is the body of page (most important text part) and there is the navigation bar full of links. With such an approach you can easily extract just what you need.
I too hate the way that W3C is going with their specs. There are many things that I don't like, and this "semantics" fad is one of them. (Others include taking forever to complete their specs and leaving too many important details for the browsers to implement as they choose)
Most of all I don't like it because it makes my work as a web developer more difficult. I often have to make a choice whether to make the webpage "semantically correct" or "visually/aesthetically pleasing". The latter wins of course, because that is what the users want, but as a result validations start failing and the whole thing gets quite non-semantic (tables for layout and other things).
Another issue at which I frown is that they have officialy declared that the "class" attribute is for semantics, but then they used it for visual presentation selectors in CSS.
Bottom line - DON'T MIX SEMANTICS AND VISUAL REPRESENTATION. If you use some mechanism for describing semantics (like tag names, attribute values, or what not else), then don't use it for funcional/visual purposes and vice versa.
If I would design HTML, I would simply add an attribute "semantic" which could (like the "class" attribute) be added to any tag. Then there would be a number of predefined values like all those headers/footers/articles/quotes/etc.
Tags would define functionality. Basically you could reduce HTML tags to just a handful, like "div", "table/tr/td", "a", "img", "form", "input" and "select". I probably missed a few but this is the bulk. Visual styling would be accomplished through CSS.
This way the three areas - semantics, visual representation, and functionality - would be completely independent and wouldn't clash in real life solutions.
Of course, I don't think W3C is interested in practical solutions...
There is already a lot of semantics in HTML markup in the forms of classes and IDs, of which there is a (near) infinite amount of possibilities of, And everyone has their own way of handling these semantics. One of the goals of HTML5 is to try to bring some structure to this. you will still be able to extend the semantics of tags with classes and ids. It will also most likely make things easier for search engines.
Look at it from the angle of trying to make statements either about the page, or about objects referenced from the page. If you see a <footer> tag, all you can say is "stuff in here is a footer" and pass it by. As such, adding custom tags is not as generic a solution as adding attributes and allowing people to use their own choice of URIs to specify predicates and optionally values - RDFa wins hands-down because you can express any triple-statement you like from RDF in a page, one way or another.
I just want to address one part of your question. You say:
In the nineties, at the time when
Netscape was the browser and html was
HTML2 or HTML3, there were a lot of
tags: address, cite, code... Most of
them are unused as of today, probably
even obsolete.
There are a great deal of tags to choose from in html, but the lack of usage does not imply that they are obsolete. In particular the header tags <h1>, etc, and <ul>, <ol> are used to join items into lists in a way I consider semantic. Many people may not use tags semantically, but the effort to create microformats is an ongoing continuation of the idea you consider an artifact of the 1990s. Efforts to make the semantic web be a winner keeps going, despite full-text search and link analysis (in the form of Google) being the winner as far as how to find and understand the web.
It would be great to see an updated version of Google's Web Stats which show "html as she is spoke." But you are right that many tags are underused.
Whether html5 will be successful is an open and interesting question, but the tags you describe as obsolete didn't go anywhere, they were there in HTML 4.01 and xhtml. HTML5 seems to be an effort to solidify what is useful in tags. In the end if html5 gets support in browsers and makes the job of web developers easier, it will succeed. xhtml2 failed because it roundly failed to gain adoption in browsers and did nothing to make the job of web page makers easier. The forces working on html5 seem keenly aware of the failure of xhtml2, and I think are avoiding having html5 suffer a similar fate.
"Why shouldn't it contain tags for post icon, author's place, name and surname, or whatever else you want to assign specific semantics to (I'm confident and would be very important tags): ?"
You use <dialog> to describe conversations or comments. Rant and NSFW are subjective terms therefore it makes sense not to use them.
From what I understand a bunch of experienced web developers did research and looked for what most websites have in common in html. They noticed that most websitse have id="header", id="footer", id="section" and id="nav" tags so they decided that we need HTML tags to replace those id's. So in other words, don't expect them to give you a HUGE amount of HTML vocabulary. Just keep it simple as possible as you can while addressing the MOST common needed HTML tags.
NAV tag is VERY important for providing accessibility as well. You want them to know where the navigation is rather than to force them to find whether links are for navigation or not.
I disagree with adding extra tags. If detailed vocabulary were actually import then there could be a different tag name for every word in the dictionary. Additional tags names are not helpful as they may communicate additional meaning to humans, but do nothing to facilitate machine parsing of the language. This is why I don't like the "semantic" tags for HTML5 as I believe this to be slippery slope to providing a vocabulary too complex while only providing a weak solution to a problem not fully addressed.
In my opinion markup language structure data as much as describe it in a tree diagram form. Through parsing of the structure and proper use of semantic conventions, such as RDFa, context can be leveraged to provide specific meaning to otherwise generic tag names. In such as case excessive vocabulary need not exist and structurally redundant tag names, such as footer and aside, could be eliminated. The final objective is to make content faster and more accurate to interpret by both humans and machines simultaneously while using as little code as possible to achieve that result. How that solution is lesser important, except to HTML5.
I thought XML was the strategy to assign semantics to stuff.
As far as I know, no it wasn’t. XML allows new languages to be defined which are all parsed in the same way, because they all use the XML syntax.
It doesn’t, of itself, provide any way to add meaning (“semantic” just means “meaningful”) to those languages. And until computers get artificial intelligence, they don’t actually understand meaning, so meaning is just what is agreed between human beings. HTML is the most commonly-used language with agreed meaning of its tags.
As HTML is so common, it’s helpful to add a few meaningful tags to it that are quite general in their application. The new HTML5 tags are aimed at that. The HTML5 spec’s authors could indeed carry on down this route, creating tags for every specific bit of meaning possible, but as they’re not robots, they probably won’t.
<section> is useful, and general enough to be meaningfully applicable in lots of documents. <author-last-name> isn’t. Distinguishing between the two is a judgment call, which is why humans, and not computers, write the spec.
For custom semantics that are too specific to be added to HTML as tags, HTML5 defines microdata.
I've been reading Andy Clark's book Transcending CSS (page 33).
...,it is now widely accepted that presentational names such as header, left, or red that describe an element's look or position are poor choices.
After reading these lines I asked myself: hey, aren't there elements in HTML5 spec such as header, footer?? Why is footer more semantic ? Andy in his book advocates to use site-info for the ID of the footer div and this makes more sense IMHO. Footer is a presentational name (describes the element's position).
In a word, AJAX. The new tags are meant to support what real-world developers are doing by replacing some of the <div class="sidebar-wrap"><div class="styling-hook"><div><ul class="nav"> type of divitis many websites suffer from. The only <div> left in the HTML5 is the styling hook.
The semantics that get promoted to tags from classes are those that developers have freely adopted en-masse as best practices, given an extended xhtml/css adoption period. Check out the WHATWG developer's edition of the spec's sections pagehere. The document itself is a pleasure, but I won't spoil it if you haven't seen it yet.
One of the less obvious reasons for some decisions made by the W3C is the importance of Webkit. If you look, you can see that they were better than some at taking the current work of the HTML5 Working Group and implementing ideas. They have historically been way out ahead in compliance (see here). The W3C placed a high priority on their (i.e. Android, iPhone, the Googlebot, Chrome, Safari, Dreamweaver, etc.,). Google, framework users, Wordpress/Moveable Type/Joomla! type users and others wanted self contained building blocks, so this is the style we get.
Facebook is modular. Responsive design's grids are modular. Wordpress is modular. Ajax works best with modular page structures. Widgets are modules. Plug-ins are modules. It would seem that we should be trying to figure out stuff like how to apply these tags to make it easier to hook the appropriate elements and activate them in our document/application/info-network hybrid Web 2.0.
In closing, HTML5 is meant to be written as xml (again, see the spec) in order to ensure that tools and machines making ajax requests for a portion of a document will get a well-formed useful response. How awesome in combination with things like media queries for devices like feed readers, braille printers, annotators, etc.,. I see a (near)future where anything with good semantic content is it's own newsfeed automagically! This only happens if developers adopt and write compliant documents.