In semantic HTML does the class attribute mean anything in the absence of CSS or Javascript? - html

For example, does the class film_review mean anything in <article class="film_review"> (example from MDN) if there's no CSS or Javascript interacting with the page, or does it provide semantic information?

It doesn't provide an information that contemporary browsers would interpret or use without CSS or Javascript per se.
However it can carry semantic information - see e.g. microformats. For example, you could put an hcard
<div id="hcard-John-Doe" class="vcard">
<span class="fn">John Doe</span>
<div class="org">Cool Institute, Inc.</div>
<div class="adr"><span class="locality">Prague</span></div>
</div>
on your page and it carries a semantic information. A search engine like Google could infer that "John Doe" is a name of a person located in "Prague". There are other microformats that can represent geo information, calendar events, etc.
Anyone can write their own processor of HTML documents that would interpret class attribute values, so the answer is yes, it provides semantic information.
Quoting from hcard microformat example:
Per the HTML4.01 specification, authors should be using the element to indicate the "contact information for a document or a major part of a document." E.g.
<address>
Tantek Çelik</address>
By adding hCard to such existing semantic XHTML, you can explicitly indicate the name of the person, their URL, etc.:
<address class="vcard">
<a class="fn url" href="http://tantek.com/">Tantek Çelik</a>
</address>

It provides semantics purely in the sense that it semantically connects that element with other elements of the same class.
There's no rule which states that anything (specifically CSS and/or JavaScript in this case) must use that class. The class itself is simply part of the markup and is coincidentally being ignored by the current styling rules.
You might have other elements with the film_review class, and they are "semantically" connected in the sense that they represent "film reviews" in the markup. That's really all semantic information is... context about the thing being represented in the code. Well-named classes can provide such additional context.
But there's nothing special that the browser is going to do with this information. It's just there in case anybody (styling, code, or even just somebody looking at the markup) wants to know that this article belongs to a named class of elements.

Semantics on HTML5 are more oriented on standarizing the most used elements around the web. As described on HTML Semantic Elements:
With HTML4, developers used their own favorite attribute names to style page elements:
header, top, bottom, footer, menu, navigation, main, container, content, article, sidebar, topnav, ...
This made it impossible for search engines to identify the correct web page content.
With HTML5 elements like: <header> <footer> <nav> <section> <article>, this will become easier.
So an element so specific as a "Film Review" would not provide that much semantic information at HTML5 level.

That depends. Who and what else is processing your HTML?
For example, microformats sometimes use classes to add semantic information to elements which don't naturally possess rich semantics. In that case, neither ECMAScript nor CSS process that information, but a microformats parser might. film_review doesn't belong to any well-known microformat, however.

Everything on the page gets parsed (read) by a search-engine, so your answer is, YES, it does provide semantic information, however there are different weighted value associated with different HTML tokens (elements, attribute-names, attribute-values).
However what really defines how much weight a HTML token gets, is really dependent on the type of document that you declare it is (HTML4/HTML5), the <!DOCTYPE> tag at the top of your page declares that to the search-engine bot/parser what type of document it is, which in turn controls the search-engine bot's parsing-schema (behavior) on how to read your document.
The entire purpose of HTML5 was to provide "semantics", allowing you to use different tags so you can markup/define your document giving content more importance allowing search-engines to understand it better. This allows the search-engine a much better way to then supply the end-user, whom is searching for something with more relevant content associated with their search term... if your not using HTML5 and using HTML4 then the bots are relying mostly on HTML attributes to define the content within tags such as a <div> which provides no semantic meaning to the content inside it.

Related

Semantic HTML5 element properties

HTML5 introduced many semantic elements (<nav>, <section>, <article>, etc.). But aside from helping read and structure our HTML, do they have any other unique properties? Or are they essentially just <div>s with different names? It seems like the latter is true.
I keep reading about the "semantics" but can't find a direct answer.
The <nav>, <section>, <article>, etc., elements don’t have any special properties that are exposed to frontend JavaScript code; instead they all just use the HTMLElement interface.
However, they do have special properties in screen readers—in that they get announced to screen-reader users in a special way that a div element doesn’t.
Screen readers can announce that a certain part of a document is a section or article, and allow screen-reader users to navigate through the document section-by-section, or to more easily jump among articles.
That said, screen readers also enable users to easily navigate through a document by jumping among its h1-h6 headings—regardless of whether those headings are in section or article elements—so for screen-reader users it’s actually more important that your documents have good informative h1-h6 headings and a logical structure.
Semantic elements are elements that describe the content. They provide more information to the browser without requiring any extra attributes.
Considering Microformats
When you’re adding semantics into your web pages, you should consider using microformats to add even more meaning, when appropriate. Microformats use human-readable text inside the HTML (usually in the class attribute of an element) to define the contents.
Microformats add semantic information about the elements, and this information is already being used in certain situations.
Above figure shows a Google search for reviews of the movie Ender’s Game. The second and third results show “rich snippets,” including information like the star rating.
Google and Bing are both using these types of rich snippets to enhance their search results, and most of the data they are using to get it is semantically marked-up HTML using microformats. You can learn more about how to use microformats in my book Sams Teach Yourself HTML5 Mobile Application Development in 24 Hours.
By writing semantic HTML, you give more information to user agents to use to display the information correctly. For example, if a screen reader sees the element, it knows that this is the main point of the page, and it will read it aloud before reading anything in an element. Plus, as web pages get more and more sophisticated, what the user agents do with them gains sophistication. For instance, in the future, your semantically marked-up recipe could tell a web-ready refrigerator what time to alert the robot butler to start the roast.

Why use Schema.org microdata to mark up web page elements?

I understand why and how to use Schema.org to add microdata to your site, this is not a question about that. The question is why Schema.org has support for certain things that can be marked up with simple HTML5. Among these are
Types
WebPage and WebSite
I can see why WebPage and WebSite would be needed, for example, to reference the page/site of a certain organization in a link, but there's no need to mark up your own page with this—the <html> tag does this.
SiteNavigationElement
Why not just use <nav>?
Table
Just use <table>.
properties
WebPage/mainContentOfPage
<main> element
WebPage/relatedLink
<link> element inside <head>
This answer is primarily about the WebPageElement types (like SiteNavigationElement).
For WebPage, see my answer to the question Implicity of web page structure in Schema.org (tl;dr: it can be useful to provide WebPage, even for the current page).
For WebSite, similar reasons from the answer above apply. HTML doesn’t allow you to state something about the whole site (and, by the way, a Google rich result makes use of this type).
Schema.org is not restricted to HTML5.
Schema.org is a vocabulary which can be used with various syntaxes (like JSON-LD, Microdata, RDFa, Turtle, …), stand-alone or in various host languages (like HTML 4.01, XHTML 1.0/1.1, (X)HTML5, XML, SVG, …). So having other ways to specify that something is (or: is about; or: represents) a site-wide navigation, a table etc. is the exception rather than the rule.
But there can be reasons to use these types even in HTML5 documents, for example:
The HTML5 markup and the annotations from Microdata/RDFa are two "different worlds": a Microdata/RDFa parser is only interested in the annotations, and after successfully parsing a document, the underlying markup is of no relevance anymore (e.g., the information that something was specified in a table element is lost in the Microdata/RDFa layer).
By using types like WebPageElement, you can specify metadata that is not possible to specify in plain HTML5. For example, the author/license/etc. of a table.
You can use these types to specify data about something which does not exist on the current document, e.g., you could say on your personal website that you are the author of a table in Wikipedia.
That said, these are not typical use cases relevant for a broad range of authors. Unless you have a specific reason for using them, you might want to omit them. They are not useful for typical websites. Using them can even be problematic in some cases.
See also my Schema.org issue The purpose of WebPageElement and mainContentOfPage, where I suggested to deprecate WebPageElement and the mainContentOfPage property.
Just use <table>.
You seem to be reading the title of the pages and no further. The <table> tag doesn't have the dozens of special properties listed on that page like isFamilyFriendly or license or timeRequired.
Schema.org microdata is intended to build a standard set of additional, semantic metadata that can be used by automated systems - search engine spiders, parser robots, etc. - to better understand the nature and features of the content.

More semantic tags in HTML?

HTML5 introduced a variety of new semantic tags (such as <header>, <footer> and <article>) to replace divs and similar generic tags where appropriate. This seems like a good thing.
I've been experimenting with Polymer, and the Web Components spec seems like the future, albeit a little ahead of its time. In particular, its custom tag names and ability to separate data from layout makes pages of HTML extremely semantic and readable.
The components have templates that form the shadow DOM, and the templates are populated using content tags that select data elements based on their tag names. Putting HTML5's semantic tags and <content> tags together seems like a no-brainer, distributing uniquely-named elements rather than trying to insert three <div>s in three different places and relying on declaration order.
In particular, it would be great if we could use custom semantic tags for data, and use those semantic names for selection. For example, a name card component could have <content> tags that select data elements such as <name> and <description> rather than using attributes.
However, <name> and <description> are not valid tags in HTML5. Browsers treat all unrecognised tags as <span> elements, so there would be no implementation problems with allowing custom tags in the DOM. They would carry additional semantic information for readers of the code, and potentially search engine crawlers as well. Code might be less consistent, but arguably 'better'.
If this is too vague a question, I'll make it explicit: is there a definitive reason why tags such as <name> are not valid HTML5?
EDIT: I just tested the <content> selection, and verified that you can in fact use selectors other than just tag name in the select attribute. I hadn't seen this in any tutorials, but it mostly satisfies my practical desire for semantic tags. For my example, you could instead use <content select="p.name"> and <content select="p.description">.
I'm still interested from a theoretical perspective, however.
The HTML5 spec gives a reason, in section 3.2.1 Semantics:
Authors must not use elements, attributes, or attribute values that are not permitted by this specification or other applicable specifications, as doing so makes it significantly harder for the language to be extended in the future.

html5: how to markup proper names and common names?

<p>...the favourite color of Purple is purple...</p>
the first "Purple" is a name of a company, the second one is a color name,
how should I markup this according to html5 spec?
thank you in advance
You have a number of options:
Leave it as is, HTML isn't really concerned with semantics which aren't about describing document structure (paragraphs, headings, lists etc.). If you do want to express more detailed document or application semantics look at WAI-ARIA.
If it's important for you to distinguish between the two uses of the word purple as part of your website or app then use the class attribute or data-* attributes
If the words have canonical machine readable forms and you want the values to be parsed by a computer somehow, use the data element.
If distinguishing between the two uses is important to users or systems consuming your site content, use the semantic extensibility feature of HTML5: Microdata. (If you're using the XML dialect of HTML, see also: RDFa)
Combine any of the above approaches according to your immediate needs.
To decide between the approaches you should ask yourself:
For what purpose do I need to extend the semantic vocabulary of HTML?
Is it for my own uses, or am I trying to publish information to be used by others?
If I'm publishing for others, what shared vocabulary am I going to use?
Code examples:
Class attributes
What they're for is adding additional information to your markup, remember the class attribute is in the HTML spec, not the CSS spec:
<p>...the favourite color of <span class="company">Purple</span>
is <span class="color">purple</span>...</p>
Having said that, of course, the obvious thing to do once you have things marked up in this way is provide in page tools to do things like 'highlight all companies'. People have used the class attribute as the basis for a general purpose semantic extension mechanism however, for this approach taken to the extreme see microformats.
Data attributes
The data-* attributes are to allow you to add custom attributes to your markup for processing with scripts in a way which guarantees you won't accidentally use a custom attribute which then gets used in a future version of HTML:
<p>...the favourite color of <span data-typeofthing="company">Purple</span>
is <span data-typeofthing="color">purple</span>...</p>
It's up to scripts on your page to do something useful with the data-* attributes, browsers and other web clients will ignore them.
Custom data elements
Data elements are for things that have an imprecise natural language expression but also a precise machine readable expression. Assuming that the company can be uniquely identified by a ticker symbol and RGB will do for the colour:
<p>...the favourite color of <data value="purp">Purple</data>
is <data value="rgb(128,0,128)">purple</data>...</p>
Browsers probably won't do anything special with the data element. It's most likely you'll use data elements in concert with microformats, RDFa or Microdata.
Microdata
Using the Organization schema:
<p>...the favourite color of
<span itemscope itemtype="http://schema.org/Organization">
<span itemprop="name">Purple</span>
</span>
is purple...</p>
There isn't anything for colours that I'm aware of, but you could always publish your own schema for that if it's important to you. This approach only really benefits anyone if there is a shared vocabulary of some kind.
Which element?
The first task would be to decide which element should be used to enclose the "entities" (company name and color name). Most probably you want to use span here. If in doubt, use span. There are some cases (depends on content) where other elements could be used:
the b element might be used if the entity is some kind of keyword ("a span of text to which attention is being drawn for utilitarian purposes without conveying any extra importance and with no implication of an alternate voice or mood")
if the entity is the title of a work (book, film, song etc.), the cite element should be used.
the dfn element might be used if the entities are defined in the same paragraph or in the nearest ancestor sectioning element
in some cases the i element could be appropriate
If the term is an abbreviation/acronym, use the abbr element (instead of span resp. in addition to b/dfn/i).
A good idea might be to use the a element to link the entity name to a relevant webpage. The rel attribute might give additional metadata (you can use the rel values listed in the HTML5 spec, or the registered rel values in the microformats wiki), depending on the content/context.
Which attributes?
Have a look at the global attributes.
The class attribute would be used if you'd like to use microformats. You could of course use other class names, but they would only be useful for yourself (documentation, CSS, JS) or other people that read your markup (documentation, scraping).
For entity names like person/company names you probably want to use the translate attribute with the no keyword, because such names should not be translated.
The title might give additional information (note that it has special semantics for dfn/abbr), but don't rely on it for important information.
Use lang if the entity names are in a foreign language.
How to annotate the content with meaning?
There are three popular choices, which can also be used together (see this answer (the "third step") for a short summary of the differences):
microformats
RDFa
microdata
You may need additional elements; if so, use span.

What's the use of some HTML5 tags?

I guess I'm stupid but I don't see what is the reason behind some new HTML5 tags.
Some tags, like <audio> and <canvas> provide some great new functionalities to the browser, but others such as <article>, <section> etc. just show up as a <div> or <span>. Therefore, why have these tags been added to HTML5?
I made this jsfiddle: http://jsfiddle.net/6BHAM/. In Chrome at least, there is no exciting layout or whatever. One could just use <span> / <div>.
So what's the use of those HTML5 tags which don't add new functionality?
The semantics behind the tags are much more important than the resulting layout by a browser. HTML is not only parsed by a browser, it can also be parsed by a web crawler or accessibility tools such as a screen reader. A screen reader may add a particular emphasis for a certain tag used, or a web crawler could choose some content over others for the meta data it stores.
Further reading:
Semantic HTML - Wikipedia
Semantic HTML and Search Engine Optimization - dev.opera.com
The idea is to make the HTML code more understandable as it was not created only for graphical rendering.
Using tags such as article or section make the HTML reading easier, and it also help you styling your page because you can use a different css for article than others div without adding a special class or id.
The idea is that HTML would allow more to give the semantic meaning of the encapsulated test rather than how it should be rendered.
This allows a cleaner separation of content structuring and content presentation.
And it's best practice for SEO. robots will shearch and reference first in article before search in nav or aside area.
HTML's purpose is to describe what a document means as opposed to what it looks like. The extensions are helpful to this end.
The reason is we are moving to a Semantic Web. This is to help allow machines to understand the context of the information presented in a webpage.
The w3c defines a <div> as semantically meaningless and generic.
The DIV and SPAN elements, in conjunction with the id and class attributes, offer a generic mechanism for adding structure to documents. These elements define content to be inline (SPAN) or block-level (DIV) but impose no other presentational idioms on the content.
Now see the specification for <article>:
The article element represents a self-contained composition in a document, page, application, or site and that is, in principle, independently distributable or reusable, e.g. in syndication. This could be a forum post, a magazine or newspaper article, a blog entry, a user-submitted comment, an interactive widget or gadget, or any other independent item of content.