HTML5 introduced many semantic elements (<nav>, <section>, <article>, etc.). But aside from helping read and structure our HTML, do they have any other unique properties? Or are they essentially just <div>s with different names? It seems like the latter is true.
I keep reading about the "semantics" but can't find a direct answer.
The <nav>, <section>, <article>, etc., elements don’t have any special properties that are exposed to frontend JavaScript code; instead they all just use the HTMLElement interface.
However, they do have special properties in screen readers—in that they get announced to screen-reader users in a special way that a div element doesn’t.
Screen readers can announce that a certain part of a document is a section or article, and allow screen-reader users to navigate through the document section-by-section, or to more easily jump among articles.
That said, screen readers also enable users to easily navigate through a document by jumping among its h1-h6 headings—regardless of whether those headings are in section or article elements—so for screen-reader users it’s actually more important that your documents have good informative h1-h6 headings and a logical structure.
Semantic elements are elements that describe the content. They provide more information to the browser without requiring any extra attributes.
Considering Microformats
When you’re adding semantics into your web pages, you should consider using microformats to add even more meaning, when appropriate. Microformats use human-readable text inside the HTML (usually in the class attribute of an element) to define the contents.
Microformats add semantic information about the elements, and this information is already being used in certain situations.
Above figure shows a Google search for reviews of the movie Ender’s Game. The second and third results show “rich snippets,” including information like the star rating.
Google and Bing are both using these types of rich snippets to enhance their search results, and most of the data they are using to get it is semantically marked-up HTML using microformats. You can learn more about how to use microformats in my book Sams Teach Yourself HTML5 Mobile Application Development in 24 Hours.
By writing semantic HTML, you give more information to user agents to use to display the information correctly. For example, if a screen reader sees the element, it knows that this is the main point of the page, and it will read it aloud before reading anything in an element. Plus, as web pages get more and more sophisticated, what the user agents do with them gains sophistication. For instance, in the future, your semantically marked-up recipe could tell a web-ready refrigerator what time to alert the robot butler to start the roast.
Related
This question already has answers here:
Why should I use 'li' instead of 'div'?
(15 answers)
Are new HTML5 elements like <section> and <article> pointless? [closed]
(8 answers)
Why use HTML5 tags? [duplicate]
(1 answer)
Closed 9 years ago.
Why use HTML5 semantic tags like headers, section, nav, and article instead of simply div with the preferred css to it?
I created a webpage and used those tags, but they do not make a difference from div. What is their main purpose?
Is it only for the appropriate names for the tags while using it or more than that?
Please explain. I have gone through many sites, but I could not find these basics.
The Oxford Dictionary states:
semantics: the branch of linguistics and logic concerned with meaning.
As their name says, these tags are meant to improve the meaning of your web page. Good semantics plays an important role the automated processing of documents. This automated processing happens more often than you realize - each website ranking from search engines is derived from automated processing of all the website out there.
If you visit a (well designed) web page, you as the human reader can immediately (visually) distinguish all the page elements and more importantly understand the content. In the top left you see the company logo, next to it is the site navigation, there is a search bar and some text about the company, a link to a product you can buy and a legal disclaimer at the bottom.
However, machines are dumb and cannot do this:
Looking at the same page as you, all the web crawler would see is an image, a list of anchors tags, a text node, an input field and an image with a link on it. At the bottom there is another text node.
Now, how should they know, what part of the document you intended to be the navigation or the main article, or some not-so-important footnote? They can guess by analyzing your document structure using some common criteria which are a hint for a specific element.
E.g. an ul list of internal links is most likely some kind of page navigation and the text at the end of the document is something necessary but not so important to the everyday viewer (the legal disclaimer).
Now imagine instead of a plain div, a nav element would be used – the machine immediately knows what the purpose of this element is:
// machine: okay, this structure looks like it might be a navigation element?
<div><ul><li><a href="internal_link">...</div>
// machine: ah, a navigation element!
<nav><ul><li><a>...</nav>
Now the text inside a main tag – this is clearly the most important information of the page! Over there to the left, that text node, the image and the anchor node all belong together, because they are grouped inside a section tag, and down at the bottom there is some text inside a footer element (they still don't know the meaning of that text, but now they can deduce it's some sort of fine print).
Example:
You, as the user (reading a page without seeing the actual markup), don't care if an element is enclosed in an <i> or <em> tag. In most browsers both of these tags will be rendered identically – as italic text – and as long as it stands out between the surrounding text it serves its purpose.
However, there is a big difference in terms of semantics:
<i> means italic - it's simply a presentational hint for the browser on how to render it (italic) and does not necessarily contain deeper semantic information.
<em> means emphasize - it indicates an important piece of information. Now the browser is not bound to the italic instruction any more, but could render it in italic or bold or underlined or in a different color... For visually impaired persons, the screen readers can raise the voice - whatever method seems most suited in a specific situation to emphasise this important information.
Final thought:
Semantic tags are not the end. There are things like metadata, ontologies, resource description languages which go a step further and help connect data between different web pages and can even help create new knowledge!
E.g. wikipedia is doing a really bad job at semantically presenting data.
https://en.wikipedia.org/wiki/Barack_Obama
https://en.wikipedia.org/wiki/Donald_Trump
https://en.wikipedia.org/wiki/Joe_Biden
All three are persons who at some point in time where president of the USA.
All three articles contain a sidebar that displays these information, and you can compare them (by opening both pages and then switching back and forth), but they are not semantically described.
Instead, if wikipedia used an ontology to describe a person: http://dbpedia.org/ontology/Person
<!-- President is a subclass of Politician which is a subclass of Person -->
<President>
<birthname>Barrack Hussein Obama II</birthname>
<birthdate>1961-08-04</birthdate>
<headOf>country::USA</headOf>
<tenure>2009-01-20 – 2017-01-20</tenure>
</President>
Not only could you (and machines) now directly compare those three directly (on a dynamically generated page!), but you could even create new knowledge, e.g. show a list of all presidents of the United States - quite boring but also cool stuff like who are all the current world leaders, how many female world leaders do we have, who is the youngest leader, how many types of leaders are there (presidents/emperors/queens/dictators), who served the longest, how many of them are taller than 175cm and have brown eyes, etc. etc.
In conclusion, good semantics is super cool (but also – on a technical level – hard to achieve and maintain).
There's a nice little article on HTML5 semantics on HTML5Doctor.
Semantics have been a part of HTML in some form or another. It helps you understand what's happening where on the page.
Earlier when <div> was used for pretty much everything, we still implemented semantics by giving it a "semantic" class name or an id name.
These tags help in proper structuring and understanding of the layout.
If you do,
<div class="nav"></div>
as opposed to,
<nav></nav>
OR
<div class="sidebar"></div>
as opposed to,
<aside></aside>
there's nothing wrong, but the latter helps in providing better readability for you as well as crawlers, readers, etc..
In the div tag you have to give an id which tells about what kind of content it is holding, either body, header, footer, etc.
While in case of semantic elements of HTML5, the name clearly defines what kind of code it is holding, and it is for which part of the website.
Semantic elements are <header>, <footer>, <section>, <aside>, etc.
From what I've read, HTML5 new tags like section header article give meaning and readability to a web page instead using too many meaningless divs.
However, my question is: Do they have any special behavior or limitations in browsers or CSS properties?
This question came to me when I knew that <img> tags can't have pseudo elements :before, :after
So what other tags have some differences?
In a way
All tags in HTML do something that do other tags do.
However, while some tags (like <img>, as you have found) restrict what you can do, and others provide functionality (such as <button> or <video>), some offer more of a semantic role.
When being rendered on the page, these elements may not appear any different to other elements (for some, such as <blockquote>, browsers may provide default styles, however these should not be relied upon as they will be different / non-existent on other browsers or Operating Systems), they still provide functionality.
How?
Screenreaders will use semantic elements to change their tone of voice, volume, or announce a new section, etc, depending on semantic tags;
Services such as Google will scan pages and find content based around semantic tags (for example, they could be used when creating featured snippets)
There are many other use cases. It is mainly so that computers (and other programmers looking at your code) can understand immediately what each element can contain. We can look at a website and see an item's styles to figure out it's a navbar, but a computer's job is made easier if it is a <nav> rather than <div class="second-navi-bar">, etc.
In Summary...
Semantic elements provide structure for your site's code, rather than styles for it's users. For those semantic elements that do provide styles (such as <code> or <blockquote>), these styles will change depending on your browser - you should always style semantic elements yourself.
They are called semantic elements.
They don't typically behave differently while being rendered by the browser. Usually, they're used by search engines or other applications which parse the data on a website.
For example, does the class film_review mean anything in <article class="film_review"> (example from MDN) if there's no CSS or Javascript interacting with the page, or does it provide semantic information?
It doesn't provide an information that contemporary browsers would interpret or use without CSS or Javascript per se.
However it can carry semantic information - see e.g. microformats. For example, you could put an hcard
<div id="hcard-John-Doe" class="vcard">
<span class="fn">John Doe</span>
<div class="org">Cool Institute, Inc.</div>
<div class="adr"><span class="locality">Prague</span></div>
</div>
on your page and it carries a semantic information. A search engine like Google could infer that "John Doe" is a name of a person located in "Prague". There are other microformats that can represent geo information, calendar events, etc.
Anyone can write their own processor of HTML documents that would interpret class attribute values, so the answer is yes, it provides semantic information.
Quoting from hcard microformat example:
Per the HTML4.01 specification, authors should be using the element to indicate the "contact information for a document or a major part of a document." E.g.
<address>
Tantek Çelik</address>
By adding hCard to such existing semantic XHTML, you can explicitly indicate the name of the person, their URL, etc.:
<address class="vcard">
<a class="fn url" href="http://tantek.com/">Tantek Çelik</a>
</address>
It provides semantics purely in the sense that it semantically connects that element with other elements of the same class.
There's no rule which states that anything (specifically CSS and/or JavaScript in this case) must use that class. The class itself is simply part of the markup and is coincidentally being ignored by the current styling rules.
You might have other elements with the film_review class, and they are "semantically" connected in the sense that they represent "film reviews" in the markup. That's really all semantic information is... context about the thing being represented in the code. Well-named classes can provide such additional context.
But there's nothing special that the browser is going to do with this information. It's just there in case anybody (styling, code, or even just somebody looking at the markup) wants to know that this article belongs to a named class of elements.
Semantics on HTML5 are more oriented on standarizing the most used elements around the web. As described on HTML Semantic Elements:
With HTML4, developers used their own favorite attribute names to style page elements:
header, top, bottom, footer, menu, navigation, main, container, content, article, sidebar, topnav, ...
This made it impossible for search engines to identify the correct web page content.
With HTML5 elements like: <header> <footer> <nav> <section> <article>, this will become easier.
So an element so specific as a "Film Review" would not provide that much semantic information at HTML5 level.
That depends. Who and what else is processing your HTML?
For example, microformats sometimes use classes to add semantic information to elements which don't naturally possess rich semantics. In that case, neither ECMAScript nor CSS process that information, but a microformats parser might. film_review doesn't belong to any well-known microformat, however.
Everything on the page gets parsed (read) by a search-engine, so your answer is, YES, it does provide semantic information, however there are different weighted value associated with different HTML tokens (elements, attribute-names, attribute-values).
However what really defines how much weight a HTML token gets, is really dependent on the type of document that you declare it is (HTML4/HTML5), the <!DOCTYPE> tag at the top of your page declares that to the search-engine bot/parser what type of document it is, which in turn controls the search-engine bot's parsing-schema (behavior) on how to read your document.
The entire purpose of HTML5 was to provide "semantics", allowing you to use different tags so you can markup/define your document giving content more importance allowing search-engines to understand it better. This allows the search-engine a much better way to then supply the end-user, whom is searching for something with more relevant content associated with their search term... if your not using HTML5 and using HTML4 then the bots are relying mostly on HTML attributes to define the content within tags such as a <div> which provides no semantic meaning to the content inside it.
I am using section tag for grouping topics and replies on the forum page. In cases that I need to load the topic and its replies on other article page, I use div tag for the same block and change topic title from h1 to h2. Although it is valid. But, for assistive technology, will this make navigating a bit confusing?
Assuming that the assistive technology you are talking about concerns mainly screenreaders, the best way for you to know how accessible your pages are is by downloading one yourself and testing it out. A free screenreader that I have used to do this is called NVDA but there are more out there.
In general, screenreaders work best when a page has a logical structure behind it. If you are displaying several articles, make sure that each article is located in a similar heirarchical location on the page and that each article itself resembles the others in terms of its structure. Using HTML5 semantic tags like article, aside and the like can be helpful but are not necessary. Screenreaders and other assistive technologies have made due for a lot longer than these tags have been around. They are certainly good to use when possible, but there are other more important ways to make your page accessible to as wide an audience as possible.
Another good thing to do is to use header tags for titles, and to use them in order. Screenreaders often give the option to users to skip from heading to heading in order to get a summary of what is on the page. You can also include visually invisible (via placing them far off the edge of the page using CSS) links at the top of the page, or in sections where placing a heading may not be appropriate visually. These will be read in context by screenreaders without your non-visually-impaired users seeing them.
If you are concerned about accessibility, a good way to get a clearer picture of how accessible your pages are is by following the WCAG (Web Content Accessibility Guidelines) standard recommendations. WCAG is managed by the W3C, and has various levels of accessibility that you can consider respecting when developing your content. The W3C has a list of validators that can be found here.
To answer your question from comments:
How it sound when read a topic title as h2, click it, then arrive the forum page and this topic title become h1?
This shouldn't confuse most people, especially if you do it consistently. I am assuming that you are making a news-like site.
Above Levi mentioned article tags. I would recommend using them if you are having multiple stories per page. The div tag is roughly the garbage can of the HTML world, you only should use it when nothing else is available. Article tags both give your code better syntaxical value as well as they have another feature, called a role. Roles allow a person using a screen reader to jump around a page, like they can with heading tags.
I guess I'm stupid but I don't see what is the reason behind some new HTML5 tags.
Some tags, like <audio> and <canvas> provide some great new functionalities to the browser, but others such as <article>, <section> etc. just show up as a <div> or <span>. Therefore, why have these tags been added to HTML5?
I made this jsfiddle: http://jsfiddle.net/6BHAM/. In Chrome at least, there is no exciting layout or whatever. One could just use <span> / <div>.
So what's the use of those HTML5 tags which don't add new functionality?
The semantics behind the tags are much more important than the resulting layout by a browser. HTML is not only parsed by a browser, it can also be parsed by a web crawler or accessibility tools such as a screen reader. A screen reader may add a particular emphasis for a certain tag used, or a web crawler could choose some content over others for the meta data it stores.
Further reading:
Semantic HTML - Wikipedia
Semantic HTML and Search Engine Optimization - dev.opera.com
The idea is to make the HTML code more understandable as it was not created only for graphical rendering.
Using tags such as article or section make the HTML reading easier, and it also help you styling your page because you can use a different css for article than others div without adding a special class or id.
The idea is that HTML would allow more to give the semantic meaning of the encapsulated test rather than how it should be rendered.
This allows a cleaner separation of content structuring and content presentation.
And it's best practice for SEO. robots will shearch and reference first in article before search in nav or aside area.
HTML's purpose is to describe what a document means as opposed to what it looks like. The extensions are helpful to this end.
The reason is we are moving to a Semantic Web. This is to help allow machines to understand the context of the information presented in a webpage.
The w3c defines a <div> as semantically meaningless and generic.
The DIV and SPAN elements, in conjunction with the id and class attributes, offer a generic mechanism for adding structure to documents. These elements define content to be inline (SPAN) or block-level (DIV) but impose no other presentational idioms on the content.
Now see the specification for <article>:
The article element represents a self-contained composition in a document, page, application, or site and that is, in principle, independently distributable or reusable, e.g. in syndication. This could be a forum post, a magazine or newspaper article, a blog entry, a user-submitted comment, an interactive widget or gadget, or any other independent item of content.