Microdata vs RDFa - html

I have a quick question about RDFa and Microdata.
My current understanding is that RDFa is RDF implemented into HTML but is complicated for new developers like myself, Microdata seems really easy and quick to implement.
What are the other advantages and disadvantages around these two semantic formats ?

Differences between Microdata and RDFa
While there are many (technical, smaller) differences, here’s a selection of those I consider important (used my answer on Webmasters as a base).
Specifications
As W3C’s HTML WG found no volunteer to edit the Microdata specification, it is now merely a W3C Group Note (see history), which means that there are no plans for any further work on it.
So the Microdata section in WHATWG’s "HTML Living Standard" is the only place where Microdata may evolve. Depending on what gets changed, it may happen that their Microdata becomes incompatible to W3C’s HTML5.
Update: In 2017, work started again, with the aim to publish Microdata as W3C Recommendation.
RDFa is published as W3C Recommendation.
Applicability
Microdata can only be used in (X)HTML5 (resp. HTML as defined by the WHATWG).
RDFa can be used in various host languages, i.e. several (X)HTML variants and XML (thus also in SVG, MathML, Atom etc.).
And new host languages can be supported, as RDFa Core "is a specification for attributes to express structured data in any markup language".
Use of multiple vocabularies
In Microdata, it’s harder, and sometimes impossible, to use several vocabularies for the same content.
Thanks to its use of prefixes, RDFa allows to mix vocabularies.
Use of reverse properties
Microdata doesn’t provide a way to use reverse properties. You need this for vocabularies that don’t define inverse properties (e.g., they only define parent instead of parent & child). The popular Schema.org is such a vocabulary (with only a few older exceptions).
While the W3C Note Microdata to RDF defines the experimental itemprop-reverse, this attribute is not part of W3C’s nor WHATWG’s Microdata.
RDFa supports the use of reverse properties (with the rev attribute).
Semantic Web
By using Microdata, you are not directly playing part in the Semantic Web (and AFAIK Microdata doesn’t intend to), mostly because it’s not defined as RDF serialization (although there are ways to extract RDF from Microdata).
RDFa is an RDF serialization, and RDF is the foundation of W3C’s Semantic Web.
The specifications RDFa Core and HTML+RDFa may be more complex than HTML Microdata, but it’s not a "fair" comparison because they offer more features.
Similar to Microdata would be RDFa Lite (which "does work for most day-to-day needs"), and this spec, at least in my opinion, is way less complex than Microdata.
What to do?
If you want to support specific consumers (for example, a search engine and a browser add-on), you should check their documentation about supported syntaxes.
If you want to learn only one syntax and have no specific consumers in mind, (attention, subjective opinion!) go with RDFa. Why?
RDFa matured over the years and is a W3C Rec, while Microdata is a relatively new invention and not standardized by the W3C.
RDFa can be used in many languages, not only HTML5.
RDFa allows mixed use of vocabularies for the same content, and it natively supports the use of reverse properties.
Can’t decide? Use both.
Note that you can also use several syntaxes for the same content, so you could have Microdata and RDFa (and Microformats, and JSON-LD, and …) for maximum compatibility.
Here’s a simple Microdata snippet:
<p itemscope itemtype="http://schema.org/Person">
<span itemprop="name">John Doe</span> is his name.
</p>
Here’s the same snippet using RDFa (Lite):
<p typeof="schema:Person">
<span property="schema:name">John Doe</span> is his name.
</p>
And here both syntaxes are used together:
<p itemscope itemtype="http://schema.org/Person" typeof="schema:Person">
<span itemprop="name" property="schema:name">John Doe</span> is his name.
</p>
But it’s typically not necessary/recommended to go down this route.

The main advantage you get from any semantic format is the ability for consumers to reuse your data.
For example, search engines like Google are consumers that reuse your data to display Rich Snippets, such as this one:
In order to decide which format is best, you need to know which consumers you want to target. For example, Google says in their FAQ that they will only process microdata (though the testing tool does now work with RDFa, so it is possible that they accept RDFa).
Unless you know that your target consumer only accepts RDFa, you are probably best going with microdata. While many RDFa-consuming services (such as the semantic search engine Sindice) also accept microdata, microdata-consuming services are less likely to accept RDFa.

I'm not certain if unor's suggestion to use both Microdata and RDFa is a good idea. If you use Google's Structured Data Testing Tool (or other similar tools) on his example it shows duplicate data which seems to imply that the Google bot would pick up two people named John Doe on the webpage instead of one which was the original intention.
I'm assuming therefore that using one syntax for a given item is a better idea (you should still be able to mix syntaxes as long as they describe separate entities).
Though I would be happy to be proven wrong on this.

I would say it largely depends on the use case: For Scientific use cases, RDF is common and used in different aspects.
For enriching Websites, JSON-LD is now recommended, for example by Google.
A JavaScript notation embedded in a tag in the page head or
body. The markup is not interleaved with the user-visible text, which
makes nested data items easier to express, such as the Country of a
PostalAddress of a MusicVenue of an Event. Also, Google can read
JSON-LD data when it is dynamically injected into the page's contents,
such as by JavaScript code or embedded widgets in your content
management system.

Related

What exactly is the Prefix attribute on the HTML tag and why is it necessary/used?

So I came across BritishMuseum.org. Inspecting the HTML as you do I noticed a strange prefix attribute.
<html lang="en" prefix="content: http://purl.org/rss/1.0/modules/content/ dc: http://purl.org/dc/terms/ foaf: http://xmlns.com/foaf/0.1/ og: http://ogp.me/ns# rdfs: http://www.w3.org/2000/01/rdf-schema# schema: http://schema.org/ sioc: http://rdfs.org/sioc/ns# sioct: http://rdfs.org/sioc/types# skos: http://www.w3.org/2004/02/skos/core# xsd: http://www.w3.org/2001/XMLSchema#"> . . .</html>
Doing some research: some say Open Graph, some say RDF vocabulary and some 'Foaf, (Friend of a friend)' image say XML. I'm so confused. A post on Quora said this:
RDFa is used to implement the Semantic Web in web pages represented in many markup languages, like HTML and XML. Instead of having a web page telling the browser just how it should be structured, now you can also tell it what the page represents, like a Person, a List of Products, etc.
What do we mean by Semantic Web? We all strive to make a webpage completely accessible and 'Semantically' correct, sure. Section tags... Article tags... Can we literally enforce that by this prefix tag? This Museum site isn't a person, a List or a product but it is a place.
Why does this museum need to add this prefix tag to the HTML element?
Also inspecting... I can see what looks like Schema. Now, this does make sense because of the visibility they wish to gain through search engines.
<img src="/src.jpg" typeof="foaf:img">
Then on their Donate page
<div about="/support-us/donate" typeoof="schema:WebPage"> ... </div>
Who knew you could put URLs there? :-)
It seems with a majority of their headings we also have
<span property="schema:name">Corporate support</span>
Does doing this have any real benefit? And why the very long HTML prefix? Seldom seen on the web nowadays in my opinion. Penny for your thoughts on this?
What do we mean by Semantic Web?
Wikipedia has a good introduction, but essentially it involves adding more machine readable data to the WWW.
We all strive to make a webpage completely accessible and 'Semantically' correct, sure. Section tags... Article tags... Can we literally enforce that by this prefix tag?
The goal of the semantic web is to go beyond the semantics of HTML.
some say Open Graph, some say RDF vocabulary and some 'Foaf, (Friend of a friend)' image say XML. I'm so confused.
Open Graph is a means to encode a certain set of additional semantics.
FOAF is a means to encode a different set of additional semantics (specifically about how people relate to each other).
RDF is a more generic means to encode semantics. FOAF and Open Graph are used with it.
XML is a generic markup language that is designed to be used as the foundation of other markup languages.
RDF is often expressed using XML.
RDFa defines the prefix attribute that you have quoted.
foaf: http://xmlns.com/foaf/0.1/ defines a prefix for FOAF data in the document.
og: http://ogp.me/ns# defines a prefix for Open Graph data in the document.
Attributes then reference it. e.g. <img src="/src.jpg" typeof="foaf:img"> states the element is what FOAF defines as img.
Does doing this have any real benefit?
I have no data on either:
Third party tools interacting with the data on webpages (beyond the use of social media using Open Graph for thumbnails et al)
Any internal tools the museum is using on their own data.
And why the very long HTML prefix?
Because they have data from a lot of namespaces.

Why use Schema.org microdata to mark up web page elements?

I understand why and how to use Schema.org to add microdata to your site, this is not a question about that. The question is why Schema.org has support for certain things that can be marked up with simple HTML5. Among these are
Types
WebPage and WebSite
I can see why WebPage and WebSite would be needed, for example, to reference the page/site of a certain organization in a link, but there's no need to mark up your own page with this—the <html> tag does this.
SiteNavigationElement
Why not just use <nav>?
Table
Just use <table>.
properties
WebPage/mainContentOfPage
<main> element
WebPage/relatedLink
<link> element inside <head>
This answer is primarily about the WebPageElement types (like SiteNavigationElement).
For WebPage, see my answer to the question Implicity of web page structure in Schema.org (tl;dr: it can be useful to provide WebPage, even for the current page).
For WebSite, similar reasons from the answer above apply. HTML doesn’t allow you to state something about the whole site (and, by the way, a Google rich result makes use of this type).
Schema.org is not restricted to HTML5.
Schema.org is a vocabulary which can be used with various syntaxes (like JSON-LD, Microdata, RDFa, Turtle, …), stand-alone or in various host languages (like HTML 4.01, XHTML 1.0/1.1, (X)HTML5, XML, SVG, …). So having other ways to specify that something is (or: is about; or: represents) a site-wide navigation, a table etc. is the exception rather than the rule.
But there can be reasons to use these types even in HTML5 documents, for example:
The HTML5 markup and the annotations from Microdata/RDFa are two "different worlds": a Microdata/RDFa parser is only interested in the annotations, and after successfully parsing a document, the underlying markup is of no relevance anymore (e.g., the information that something was specified in a table element is lost in the Microdata/RDFa layer).
By using types like WebPageElement, you can specify metadata that is not possible to specify in plain HTML5. For example, the author/license/etc. of a table.
You can use these types to specify data about something which does not exist on the current document, e.g., you could say on your personal website that you are the author of a table in Wikipedia.
That said, these are not typical use cases relevant for a broad range of authors. Unless you have a specific reason for using them, you might want to omit them. They are not useful for typical websites. Using them can even be problematic in some cases.
See also my Schema.org issue The purpose of WebPageElement and mainContentOfPage, where I suggested to deprecate WebPageElement and the mainContentOfPage property.
Just use <table>.
You seem to be reading the title of the pages and no further. The <table> tag doesn't have the dozens of special properties listed on that page like isFamilyFriendly or license or timeRequired.
Schema.org microdata is intended to build a standard set of additional, semantic metadata that can be used by automated systems - search engine spiders, parser robots, etc. - to better understand the nature and features of the content.

Correct URL for itemtype product

I've been reading on Google about microdata for products and they show the itemtype url as:
itemtype="http://data-vocabulary.org/Product"
Yet when I look at, for example, Ebay, they use:
itemtype="http://schema.org/Product"
Is one correct and one not? Or do they both serve the same purpose?
There's not as much data out there as I would have thought...Is this still in it's infancy or just not really catching on?
I understand this is essentially two questions but they are related.
They both serve the same purpose but schema.org is becoming the standard
The data-vocabulary.org is the older version and on it's site it tells you to use schema.org
From www.data-vocabulary.org:
Since June 2011, several major search engines have been collaborating on a new common data vocabulary called schema.org.
The schema.org vocabulary can be used with both Microdata or RDFa 1.1 Lite syntax
Schema.org and Data-Vocabulary.org are different vocabularies.
Their respective Product types are not identical:
Schema.org defines it as "Any offered product or service" while Data-Vocabulary.org says it "represents a product" (not mentioning services; however, its RDF/XML version does).
Data-Vocabulary.org defines only 8 properties for this type, while Schema.org has more than 30. For example, you can’t specify the manufacturer with Data-Vocabulary.org (while it’s possible with Schema.org’s manufacturer property).
There is no "correct" or "wrong" type. While the Data-Vocabulary.org vocabulary is inactive and probably no longer subject to evolution, you can still use it. However, if you have no specific reason to use it (where a reason could be a specific consumers that interprets this markup), go with Schema.org.
You could also use both (or even more) vocabularies for the same content.
Microdata is limited in that regard, so you could only use Schema.org’s additionalType property for this:
<div itemtype="http://schema.org/Product">
<link itemprop="additionalType" href="http://data-vocabulary.org/Product">
<!-- you may only use properties from Schema.org; for properties from other vocabularies, you’d have to use absolute URIs -->
</div>
But with RDFa 1.1 (Lite), you have the full power of multi-vocabulary use:
<div typeof="schema:Product v:Product">
<!-- you may use any property, no matter from which vocabulary -->
</div>
(schema: and v: are pre-defined prefixes for Schema.org and Data-Vocabulary.org.)

Where is the official HTML 5 API?

For JavaScript it seems easy. If you want to know the API for the language itself just consult ES5. For a library such as jquery just check out www.api.jquery.com.
But for HTML 5, where is the go to place to look for the API for a specefic tag?
Suppose I want to know the interface for <video>
My guess is
https://developer.mozilla.org/en-US/
but this is from the perspective of a company - Mozilla. Is there a published API by those that release the specs?
Can we use <video> as an example?
Here is one useful site I found that states that it parses the different specefications:
http://html5index.org/
but it looks like is is just for the JS portion.
I found it using this google search:
https://www.google.com/?gws_rd=ssl#q=html5+api&spell=1
I have been using w3schools b.c. it has the best layout, but I've heard many on SO say not to use this.
If not, what is the go to resource?
There is no official HTML5 API, or official HTML5, so far. What people regard as “de facto standard” is one of the following:
W3C HTML5 CR, a Candidate Recommendation, which means that it is not expected to change substantially before it becomes W3C Recommendation (which is as official as things like this ever get), except that some features marked as being “at risk” may be removed due to lack of implementation.
W3C HTML 5.1 Nightly, an Editor’s Draft, a further development of W3C HTML5. As the name says, it may and will change daily.
WHATWG HTML Living Standard. Largely compatible with the W3C documents but with some minor and some major differences. Apparently never expected to become any more official than it is now: a mutable document maintained by Ian Hixie and his orchestra (the WHATWG group).
Note that even the most official of these, HTML5 CR, says: “This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.” In reality, it’s more stable and closer to a “standard” than this may suggest.
All the documents mentioned above are incomplete in the sense that they cite many documents, e.g. DOM specifications and drafts, leaving essential parts to be defined in them. And the cited documents may be very mutable and even sketchy. For example, WHATWG URL Living Standard is cited, instead of the Internet-standard on URLs (URIs), and instead of the various old DOM specs and drafts, new emerging documents are cited. Currently, HTML5 CR cites W3C DOM4 CR.
Here's the HTML standard. It sounds like that's what you're looking for.
http://www.whatwg.org/specs/web-apps/current-work/multipage/
For the <video> example, here's the interface:
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#htmlvideoelement
Of course, a lot of the interesting things are in the HTMLMediaElement interface.
If you keep going into the super-interfaces, you'll find that it extends the Element interface, which is part of DOM.
http://dom.spec.whatwg.org/#interface-element
Another popular standard comes from the W3C:
http://www.w3.org/TR/html5/
Here are a list of differences provided by W3C:
http://www.w3.org/wiki/HTML/W3C-WHATWG-Differences
The W3C published its official Recommendation of HTML5 on 28 October 2014.
There you will find a complete reference for all HTML5 elements including the video element.

Microdata & Vocabulary

I'm working on forum software, and looking to use HTML5 and Microdata(new to microdata). I was considering adding vocabulary to the software itself, instead of linking to schema or data-vocabulary, or whatever.
Then again, I wondered about the impact this may have server performance, being hit by all those spiders crawling the vocabulary.
What are your thoughts on this?
You should only create a new vocabulary if there is no (popular) existing one that could be used.
Dublin Core, FOAF and SIOC are popular ones that almost every forum could use. However, I'm not sure if these can be used with microdata (I guess it should be possible, but I don't know microdata very well). But they work with RDFa, which is very similar to microdata and a W3C recommendation (like HTML). RDFa can be used in HTML5, too. If new to RDF, you'd probably want to use RDFa Lite first (it has prefixes for DC, FOAF and schema.org vocabularies pre-defined).
I wondered about the impact this may have server performance, being hit by all those spiders crawling the vocabulary.
I don't think there are many (if any at all) microdata crawlers that would try to visit the vocabulary URIs. In most cases they wouldn't find any content there they could make use of, because the vocabulary URIs work as identifier only most of the time. Even if such crawlers arise, you'd hardly notice any performance impact because they'd probably cache it anyway.