HTML5 and Schema.org, why use both? - html

Microdata with Schema.org already better describes any element than HTML5, it seems redundant? For example:
<nav itemscope itemtype="http://schema.org/SiteNavigationElement">
<!-- might as well just be... -->
<div itemscope itemtype="http://schema.org/SiteNavigationElement">
and
<article itemscope itemtype="http://schema.org/NewsArticle">
<!-- might as well just be... -->
<div itemscope itemtype="http://schema.org/NewsArticle">
Some elements create an "outline" for the webpage, but aside from that what's the point? Why not just use divs and forget about the semantic tags, and just use Microdata and Schema.org?

The schema.org definitions are specifically for applications such as search engines (From What is schema.org?):
This site provides a collection of schemas, i.e., html tags, that
webmasters can use to markup their pages in ways recognized by major
search providers. Search engines including Bing, Google, Yahoo! and
Yandex rely on this markup to improve the display of search results,
making it easier for people to find the right web pages.
Your mark-up needs to be understood by browsers and screen-readers as well as search engines (from the schema.org Getting started page):
Usually, HTML tags tell the browser how to display the information
included in the tag. For example, <h1>Avatar</h1> tells the browser to
display the text string "Avatar" in a heading 1 format. However, the
HTML tag doesn't give any information about what that text string
means—"Avatar" could refer to the hugely successful 3D movie, or it
could refer to a type of profile picture—and this can make it more
difficult for search engines to intelligently display relevant content
to a user.
So microdata allows you to add additional semantic meaning to your mark-up (using definitions provided by schema.org) which can be ignored by applications which don't need it, such as browsers, and read by applications which do, such as search engines.
Microdata is not a replacement for using the appropriate semantic-HTML tags where available, it should be used to augment that information. So the simple reason to use nav and article tags along with the microdata is that these tags have meaning to browsers and screen-readers, while the microdata does not.
Actually, your examples are fairly simplistic. I would suggest you have a look at some of the examples on the schema.org getting started page to see how microdata can be used more meaningfully.
To see microdata being used in practice, try googling yourself and inspecting the results. If I search for myself, the first three results (LinkedIn, github and my portfolio page) all display information marked up using microdata which google can pull from the pages and present to the user to help provide more meaningful search results.

The vast majority of terms that we have in schema.org have no overlap with HTML terminology, since they represent kinds of real world thing such as places, processes, products etc.
The problem area highlighted here is the small set of terms around http://schema.org/WebPageElement . I am not aware that any current search engine features make specific use of these, and I would suggest that any publishers who do see value in their use should also employ the corresponding pure HTML markup as well.

Related

Does rel="home" on an anchor tag (<a>) do anything helpful?

I see this a lot on site title links in WordPress themes (probably because Underscores does it and everyone copies that):
Some Site Title
I cannot find even a semi-authoritative statement anywhere that rel="home" on an anchor tag is used meaningfully today by any browser, screen reader, or other user agent. The only "official" documentation I've located is this draft specification from 2005 on the microformats.org site.
That doc proposes home as a valid value on both <link> tags in the <head>, as well as <a> tags. Using it on a <link> has some pedigree from HTML v3, and there's reference to it in the wild from 2002. But I haven't seen anything about the <a> tag usage.
So, is including it helpful for anything/anyone? Would I do better to use <link rel="home"> in the <head>, or is that obsolete too in 2020?
The rel="page" was part of an initiative to create permalinks (see section 'Permalink detection') as part of a standard in HTML 4.
However with HTML 5 it now has no purpose and does not offer any accessibility or SEO value. It also might not validate using W3C validator anymore (not tested).
rel="something" should only be used on <link> elements, with the exception of rel="noopener", rel="nofollow" or rel="noreferrer" on anchors (<a> tags).
Note - There may be other rel="" for hyperlinks but the two stated are the only ones I can think of, it is no longer valid to use it for page locations, bookmarks etc.
Update
Thanks to #Sean who pointed out in the comments other elements can accept rel="", however MicroFormats are not the preferred way of adding structured data according to Google and their development is not as full fledged as using https://schema.org and JSON+LD.
“We currently prefer JSON-LD markup. I think most of the new
structured data come out for JSON-LD first. So that’s
what we prefer.” - John Mueller
I am obviously incorrect in what I said as it is perfectly valid, however personally I would not bother and stick with what Google prefers apart from the few items I listed.
See #Sean's answer for a bit more info on the subject.
for clarity rel="" has no bearing on accessibility
home isn't one of the allowed keywords explicitly defined by the current HTML spec as allowed values for the rel attribute. However, the spec goes on to state that:
Types defined as extensions in the microformats wiki existing-rel-values page with the status "proposed" or "ratified" may be used with the rel attribute on link, a, and area elements in accordance to the "Effect on..." field.
On that microformats page, home has the "proposed" status—so it is valid to use according to the spec.
There's a specific rel-home page within that microformats site that goes into more detail about the usage with examples. It makes the statement—
Opera browser supports rel="home"
—which would imply that Opera has some functionality tied to that usage, but it doesn't provide any additional details.
Summary: rel="home" is valid to use on a elements. It's benefits aren't clear, but it doesn't hurt to use it. The draft spec for it has been around since 2005, so there's bound to be some technologies that make use of it.

In semantic HTML does the class attribute mean anything in the absence of CSS or Javascript?

For example, does the class film_review mean anything in <article class="film_review"> (example from MDN) if there's no CSS or Javascript interacting with the page, or does it provide semantic information?
It doesn't provide an information that contemporary browsers would interpret or use without CSS or Javascript per se.
However it can carry semantic information - see e.g. microformats. For example, you could put an hcard
<div id="hcard-John-Doe" class="vcard">
<span class="fn">John Doe</span>
<div class="org">Cool Institute, Inc.</div>
<div class="adr"><span class="locality">Prague</span></div>
</div>
on your page and it carries a semantic information. A search engine like Google could infer that "John Doe" is a name of a person located in "Prague". There are other microformats that can represent geo information, calendar events, etc.
Anyone can write their own processor of HTML documents that would interpret class attribute values, so the answer is yes, it provides semantic information.
Quoting from hcard microformat example:
Per the HTML4.01 specification, authors should be using the element to indicate the "contact information for a document or a major part of a document." E.g.
<address>
Tantek Çelik</address>
By adding hCard to such existing semantic XHTML, you can explicitly indicate the name of the person, their URL, etc.:
<address class="vcard">
<a class="fn url" href="http://tantek.com/">Tantek Çelik</a>
</address>
It provides semantics purely in the sense that it semantically connects that element with other elements of the same class.
There's no rule which states that anything (specifically CSS and/or JavaScript in this case) must use that class. The class itself is simply part of the markup and is coincidentally being ignored by the current styling rules.
You might have other elements with the film_review class, and they are "semantically" connected in the sense that they represent "film reviews" in the markup. That's really all semantic information is... context about the thing being represented in the code. Well-named classes can provide such additional context.
But there's nothing special that the browser is going to do with this information. It's just there in case anybody (styling, code, or even just somebody looking at the markup) wants to know that this article belongs to a named class of elements.
Semantics on HTML5 are more oriented on standarizing the most used elements around the web. As described on HTML Semantic Elements:
With HTML4, developers used their own favorite attribute names to style page elements:
header, top, bottom, footer, menu, navigation, main, container, content, article, sidebar, topnav, ...
This made it impossible for search engines to identify the correct web page content.
With HTML5 elements like: <header> <footer> <nav> <section> <article>, this will become easier.
So an element so specific as a "Film Review" would not provide that much semantic information at HTML5 level.
That depends. Who and what else is processing your HTML?
For example, microformats sometimes use classes to add semantic information to elements which don't naturally possess rich semantics. In that case, neither ECMAScript nor CSS process that information, but a microformats parser might. film_review doesn't belong to any well-known microformat, however.
Everything on the page gets parsed (read) by a search-engine, so your answer is, YES, it does provide semantic information, however there are different weighted value associated with different HTML tokens (elements, attribute-names, attribute-values).
However what really defines how much weight a HTML token gets, is really dependent on the type of document that you declare it is (HTML4/HTML5), the <!DOCTYPE> tag at the top of your page declares that to the search-engine bot/parser what type of document it is, which in turn controls the search-engine bot's parsing-schema (behavior) on how to read your document.
The entire purpose of HTML5 was to provide "semantics", allowing you to use different tags so you can markup/define your document giving content more importance allowing search-engines to understand it better. This allows the search-engine a much better way to then supply the end-user, whom is searching for something with more relevant content associated with their search term... if your not using HTML5 and using HTML4 then the bots are relying mostly on HTML attributes to define the content within tags such as a <div> which provides no semantic meaning to the content inside it.

Is Google microformats supposed to be visible on the web page?

I was trying to add microformats as following to my webpage:
<div itemscope itemtype="http://schema.org/Product">
<span itemprop="brand">Company Name</span>
<span itemprop="name">Product Name</span>
<span itemprop="description">Product Description</span>
Product #: <span itemprop="sku">12345</span>
</div>
I thought this microformat will only show up in a google search result page. But after adding it, those information became visible on my webpage, and not in a good shape.
Is there something wrong? Or should I use display:none to make it invisible on my webpage?
Microformats are meant to add machine readable meaning to existing content on the page. They're not invisible meta data, they augment content that's already there. So, yes, it'll show up. You can hide or style it via any of the usual ways in which you hide or style content.
You are using Microdata, not Microformats.
Microdata is a syntax to include structured data within HTML5. Ideally you would use your existing content (i.e., add the needed attributes like itemprop etc. to your already existing markup), and only if that’s not possible, the hidden elements meta and link (which are allowed in the body if used for Microdata).
If you don’t want to use your existing markup and the visible content, you could use an alternative syntax: JSON-LD. This gets included as a data block (using the script element), which is not visible by default.
Don't try to use hide or style on your content, it will have a bad impact on your site. You might get penalized for cloaking if you practice it on all of your pages.
If you are trying to mark/let the bots know about some more info that is not on your page you can try using either the Data Highlighter for simple things in you Search Engine Console (Webmaster Tools) or for more complicated stuff you can try using JSON-LD coding on you pages.
Microformats are HTML. Used to publish a standard API that is consumed and used by search engines, browsers, and other web sites. Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Microformats are a way to enable "smart scraping" of web pages, so that you can create tools and scripts that losslessly extract machine-readable information from cleanly-formatted, human-readable HTML. Structured Data is the name given to content which is marked up in a specific way, using MicroFormatting, to explain what that content is all about.
It is always recommended to show the Microdata information and not to hide it. You can probably try to give a good shape. It would show up in the Google and Bing result pages as well but you need to wait a little for that. There is nothing wrong with the Microformats applied by you. The thing is SEO need some more patience.

Why use Schema.org microdata to mark up web page elements?

I understand why and how to use Schema.org to add microdata to your site, this is not a question about that. The question is why Schema.org has support for certain things that can be marked up with simple HTML5. Among these are
Types
WebPage and WebSite
I can see why WebPage and WebSite would be needed, for example, to reference the page/site of a certain organization in a link, but there's no need to mark up your own page with this—the <html> tag does this.
SiteNavigationElement
Why not just use <nav>?
Table
Just use <table>.
properties
WebPage/mainContentOfPage
<main> element
WebPage/relatedLink
<link> element inside <head>
This answer is primarily about the WebPageElement types (like SiteNavigationElement).
For WebPage, see my answer to the question Implicity of web page structure in Schema.org (tl;dr: it can be useful to provide WebPage, even for the current page).
For WebSite, similar reasons from the answer above apply. HTML doesn’t allow you to state something about the whole site (and, by the way, a Google rich result makes use of this type).
Schema.org is not restricted to HTML5.
Schema.org is a vocabulary which can be used with various syntaxes (like JSON-LD, Microdata, RDFa, Turtle, …), stand-alone or in various host languages (like HTML 4.01, XHTML 1.0/1.1, (X)HTML5, XML, SVG, …). So having other ways to specify that something is (or: is about; or: represents) a site-wide navigation, a table etc. is the exception rather than the rule.
But there can be reasons to use these types even in HTML5 documents, for example:
The HTML5 markup and the annotations from Microdata/RDFa are two "different worlds": a Microdata/RDFa parser is only interested in the annotations, and after successfully parsing a document, the underlying markup is of no relevance anymore (e.g., the information that something was specified in a table element is lost in the Microdata/RDFa layer).
By using types like WebPageElement, you can specify metadata that is not possible to specify in plain HTML5. For example, the author/license/etc. of a table.
You can use these types to specify data about something which does not exist on the current document, e.g., you could say on your personal website that you are the author of a table in Wikipedia.
That said, these are not typical use cases relevant for a broad range of authors. Unless you have a specific reason for using them, you might want to omit them. They are not useful for typical websites. Using them can even be problematic in some cases.
See also my Schema.org issue The purpose of WebPageElement and mainContentOfPage, where I suggested to deprecate WebPageElement and the mainContentOfPage property.
Just use <table>.
You seem to be reading the title of the pages and no further. The <table> tag doesn't have the dozens of special properties listed on that page like isFamilyFriendly or license or timeRequired.
Schema.org microdata is intended to build a standard set of additional, semantic metadata that can be used by automated systems - search engine spiders, parser robots, etc. - to better understand the nature and features of the content.

<nav> vs <article> for SEO

In term of SEO, if I want to group relevant page content together to maximize search engine readability, should I use the tag <nav> or <article>?
1) It's not there yet.
2) If it was, and you were wrapping menus as article, or wrapping affiliate link-farms as article, Google would slap you (keep that in mind in three or four years).
3) If you have lots of legitimate content, and each piece of content is self-contained (ie: suitable for article), then not only should you wrap it in an article tag, but you should also learn how to use Google's "Rich Snippet Tool", which was recently renamed "Structured Data Tool".
If you learn how to mark things up, both in an html5-friendly way, and in a Google-friendly microformat, then GoogleBot will grab all of the content it knows how, and it will be displayed in search results and elsewhere, when relevant.
Like I said... ...that's if you've got content which is worthy of doing this, because otherwise, Google will slap you, eventually, if you try to use it for evil.
article tag:-
The tag allows to mark separate entries in an online publication, such as a blog or a magazine. It is expected that when articles are marked with the tag, this will make the HTML code cleaner because it will reduce the need to use tags. Also, probably search engines will put more weight on the text inside the tag as compared to the contents on the other parts of the page.
nav tag:-Navigation is one of the important factors for SEO and everything that eases navigation is welcome. The new tag can be used to identify a collection of links to other pages.
so both tag have their own functionality which can be implemented according to need.