Microdata: moving from microformats to schema.org (example hAtom news markup?) - html

BACKGROUND
I have been using microformats for the past 5 years. I'm switching to the schema.org approach for all new sites because it's — IMHO — a better separation of style and meta info.
In addition all the major search providers have adopted and now fully support the schema.org approach to microdata.
It's been a pretty painless process finding schema.org equivalents most microdata objects i.e. hCard, hCalendar etc. and am I pleased with the extra possibilities.
QUESTION
I am looking to find clear examples of the markup in the hAtom/hNews (hFeed)
flavour can anyone point me in the right direction/give some tips as
I have searched but been unsuccessful up to now. On schema.org I
don't see a clear equivalent.
We have this handy markup generator http://schema-creator.org/
for; Person, Product, Event, Organization, Movie, Book and Review,
but has anyone seen a tool for the creation of the markup of
schema.org variant of hFeeds.

question 01: Creativeworks -> Blog is schema's equivalent to hatom.
no clue if anyone's used it or written about it yet.
i'd like to know what about schema.org is better at separation of concerns vs. microformats? schema.org has meta elements within the body element. microformats are html classes and and as such natively support separation. also, every major search provider already provided coverage of microformats and it hasn't decreased. curious, i am.

You have to choose a page type, like for example http://schema.org/Blog and then add the article/blogposts http://schema.org/BlogPosting
Here is a very simple example:
<div itemscope itemtype="http://schema.org/Blog">
...
<article itemprop="blogPost" itemscope itemtype="http://schema.org/BlogPosting">
...
</article>
<article itemprop="blogPost" itemscope itemtype="http://schema.org/BlogPosting">
...
</article>
</div>
I have tried to implement it in a WordPress theme, perhaps my code will help you: https://github.com/pfefferle/SemPress/

Related

HTML5 and Schema.org, why use both?

Microdata with Schema.org already better describes any element than HTML5, it seems redundant? For example:
<nav itemscope itemtype="http://schema.org/SiteNavigationElement">
<!-- might as well just be... -->
<div itemscope itemtype="http://schema.org/SiteNavigationElement">
and
<article itemscope itemtype="http://schema.org/NewsArticle">
<!-- might as well just be... -->
<div itemscope itemtype="http://schema.org/NewsArticle">
Some elements create an "outline" for the webpage, but aside from that what's the point? Why not just use divs and forget about the semantic tags, and just use Microdata and Schema.org?
The schema.org definitions are specifically for applications such as search engines (From What is schema.org?):
This site provides a collection of schemas, i.e., html tags, that
webmasters can use to markup their pages in ways recognized by major
search providers. Search engines including Bing, Google, Yahoo! and
Yandex rely on this markup to improve the display of search results,
making it easier for people to find the right web pages.
Your mark-up needs to be understood by browsers and screen-readers as well as search engines (from the schema.org Getting started page):
Usually, HTML tags tell the browser how to display the information
included in the tag. For example, <h1>Avatar</h1> tells the browser to
display the text string "Avatar" in a heading 1 format. However, the
HTML tag doesn't give any information about what that text string
means—"Avatar" could refer to the hugely successful 3D movie, or it
could refer to a type of profile picture—and this can make it more
difficult for search engines to intelligently display relevant content
to a user.
So microdata allows you to add additional semantic meaning to your mark-up (using definitions provided by schema.org) which can be ignored by applications which don't need it, such as browsers, and read by applications which do, such as search engines.
Microdata is not a replacement for using the appropriate semantic-HTML tags where available, it should be used to augment that information. So the simple reason to use nav and article tags along with the microdata is that these tags have meaning to browsers and screen-readers, while the microdata does not.
Actually, your examples are fairly simplistic. I would suggest you have a look at some of the examples on the schema.org getting started page to see how microdata can be used more meaningfully.
To see microdata being used in practice, try googling yourself and inspecting the results. If I search for myself, the first three results (LinkedIn, github and my portfolio page) all display information marked up using microdata which google can pull from the pages and present to the user to help provide more meaningful search results.
The vast majority of terms that we have in schema.org have no overlap with HTML terminology, since they represent kinds of real world thing such as places, processes, products etc.
The problem area highlighted here is the small set of terms around http://schema.org/WebPageElement . I am not aware that any current search engine features make specific use of these, and I would suggest that any publishers who do see value in their use should also employ the corresponding pure HTML markup as well.

Microdata vs RDFa

I have a quick question about RDFa and Microdata.
My current understanding is that RDFa is RDF implemented into HTML but is complicated for new developers like myself, Microdata seems really easy and quick to implement.
What are the other advantages and disadvantages around these two semantic formats ?
Differences between Microdata and RDFa
While there are many (technical, smaller) differences, here’s a selection of those I consider important (used my answer on Webmasters as a base).
Specifications
As W3C’s HTML WG found no volunteer to edit the Microdata specification, it is now merely a W3C Group Note (see history), which means that there are no plans for any further work on it.
So the Microdata section in WHATWG’s "HTML Living Standard" is the only place where Microdata may evolve. Depending on what gets changed, it may happen that their Microdata becomes incompatible to W3C’s HTML5.
Update: In 2017, work started again, with the aim to publish Microdata as W3C Recommendation.
RDFa is published as W3C Recommendation.
Applicability
Microdata can only be used in (X)HTML5 (resp. HTML as defined by the WHATWG).
RDFa can be used in various host languages, i.e. several (X)HTML variants and XML (thus also in SVG, MathML, Atom etc.).
And new host languages can be supported, as RDFa Core "is a specification for attributes to express structured data in any markup language".
Use of multiple vocabularies
In Microdata, it’s harder, and sometimes impossible, to use several vocabularies for the same content.
Thanks to its use of prefixes, RDFa allows to mix vocabularies.
Use of reverse properties
Microdata doesn’t provide a way to use reverse properties. You need this for vocabularies that don’t define inverse properties (e.g., they only define parent instead of parent & child). The popular Schema.org is such a vocabulary (with only a few older exceptions).
While the W3C Note Microdata to RDF defines the experimental itemprop-reverse, this attribute is not part of W3C’s nor WHATWG’s Microdata.
RDFa supports the use of reverse properties (with the rev attribute).
Semantic Web
By using Microdata, you are not directly playing part in the Semantic Web (and AFAIK Microdata doesn’t intend to), mostly because it’s not defined as RDF serialization (although there are ways to extract RDF from Microdata).
RDFa is an RDF serialization, and RDF is the foundation of W3C’s Semantic Web.
The specifications RDFa Core and HTML+RDFa may be more complex than HTML Microdata, but it’s not a "fair" comparison because they offer more features.
Similar to Microdata would be RDFa Lite (which "does work for most day-to-day needs"), and this spec, at least in my opinion, is way less complex than Microdata.
What to do?
If you want to support specific consumers (for example, a search engine and a browser add-on), you should check their documentation about supported syntaxes.
If you want to learn only one syntax and have no specific consumers in mind, (attention, subjective opinion!) go with RDFa. Why?
RDFa matured over the years and is a W3C Rec, while Microdata is a relatively new invention and not standardized by the W3C.
RDFa can be used in many languages, not only HTML5.
RDFa allows mixed use of vocabularies for the same content, and it natively supports the use of reverse properties.
Can’t decide? Use both.
Note that you can also use several syntaxes for the same content, so you could have Microdata and RDFa (and Microformats, and JSON-LD, and …) for maximum compatibility.
Here’s a simple Microdata snippet:
<p itemscope itemtype="http://schema.org/Person">
<span itemprop="name">John Doe</span> is his name.
</p>
Here’s the same snippet using RDFa (Lite):
<p typeof="schema:Person">
<span property="schema:name">John Doe</span> is his name.
</p>
And here both syntaxes are used together:
<p itemscope itemtype="http://schema.org/Person" typeof="schema:Person">
<span itemprop="name" property="schema:name">John Doe</span> is his name.
</p>
But it’s typically not necessary/recommended to go down this route.
The main advantage you get from any semantic format is the ability for consumers to reuse your data.
For example, search engines like Google are consumers that reuse your data to display Rich Snippets, such as this one:
In order to decide which format is best, you need to know which consumers you want to target. For example, Google says in their FAQ that they will only process microdata (though the testing tool does now work with RDFa, so it is possible that they accept RDFa).
Unless you know that your target consumer only accepts RDFa, you are probably best going with microdata. While many RDFa-consuming services (such as the semantic search engine Sindice) also accept microdata, microdata-consuming services are less likely to accept RDFa.
I'm not certain if unor's suggestion to use both Microdata and RDFa is a good idea. If you use Google's Structured Data Testing Tool (or other similar tools) on his example it shows duplicate data which seems to imply that the Google bot would pick up two people named John Doe on the webpage instead of one which was the original intention.
I'm assuming therefore that using one syntax for a given item is a better idea (you should still be able to mix syntaxes as long as they describe separate entities).
Though I would be happy to be proven wrong on this.
I would say it largely depends on the use case: For Scientific use cases, RDF is common and used in different aspects.
For enriching Websites, JSON-LD is now recommended, for example by Google.
A JavaScript notation embedded in a tag in the page head or
body. The markup is not interleaved with the user-visible text, which
makes nested data items easier to express, such as the Country of a
PostalAddress of a MusicVenue of an Event. Also, Google can read
JSON-LD data when it is dynamically injected into the page's contents,
such as by JavaScript code or embedded widgets in your content
management system.

HTML5 - Correct usage of the <article> tag

Reading an article on the <article> tag on HTML5, I really think my biggest confusion is in the first question of this section:
Using <article> gives more semantic meaning to the content. By contrast <section> is only a block of related content, and <div> is only a block of content... To decide which of these three elements is appropriate, choose the first suitable option:
Would the content would make sense on its own in a feed reader? If so, use <article>.
Is the content related? If so, use <section>.
Finally, if there’s no semantic relationship, use <div>.
So I guess my question is really: What types of content belong in a feed reader?
The spec answers this quite clearly:
The article element represents a self-contained composition in a
document, page, application, or site and that is, in principle,
independently distributable or reusable, e.g. in syndication. This
could be a forum post, a magazine or newspaper article, a blog entry,
a user-submitted comment, an interactive widget or gadget, or any
other independent item of content.
see: http://dev.w3.org/html5/spec/Overview.html#the-article-element
The W3C spec leaves a lot open to interpretation and it ultimately comes down to the author's opinion. Here is a short and simple answer in the form of a question:
What are the primary significant pieces of content you want to share on the page?
Here are a few examples:
On this very page, each answer could be an article.
On flickr each photo displayed in the photostream could be considered an article.
On dribbble each shot displayed on the page could be an article.
On google each search result listed could be an article.
On a blog each article.. well each article could be an article.
On a blog page with an article and a series of comments you could have two major sections. One with an article and another for comments in which each comment could be considered an article.
It's the author's discretion as to how far they want to go. Most blog authors have an RSS feed for their articles, but others may also provide feeds for comments, and shared links.
A lot of people have written on this subject. For further information I highly recommend reading:
http://html5doctor.com/the-article-element/ (you've already shared this)
http://www.impressivewebs.com/html5-section/
http://www.iandevlin.com/blog/2011/04/html5/html5-section-or-article
You've brought up a good argument and yes the spec does rather clearly define <article> as a syndication-worthy collection of content. The way I see it, your article would be the composed blog post – what you as the content writer of the site produce. While comments on that section are related to the article, they are not, in fact, part of the article, and should be relegated to another block in the <section>, either a non-semantic <div> or simply <p>s with display:block set. This is a decision that's left to the designer, depending on how they semantically evaluate the worth of the commentary.
Remember too that you have the <aside> tag, which is almost tailor-made for commentary, whether from the author or from the reader.
Most feed readers can handle many types of content, it could include copy, images, videos, etc. The feed for your will include the content on your site that is repeated or includes multiple versions. A question and answer site will have a feed of new questions. A video sharing site will have a feed of new videos. A software review site will have a feed of new software or new reviews.
I'd recommend considering what the typical consumer of your content would want to find easily in their feed reader. You get to define what types of content belong in a feed reader.
A feed reader, in general, should contain a list of stories. Look at http://google.com/elections/ - it's a good example of the sort of thing a feed reader might contain. The important part is that all the stories are self-contained, and in theory do not need to be related at all.
The markup for that document could look like the following:
<body>
<header>...</header>
<nav>...</nav>
<article>
<section>
...
</section>
</article>
<aside>...</aside>
<footer>...</footer>
</body>
You may find more information in this article on A List Apart.

HTML5 <article> for ecommerce products

The new HTML5 article tag all seems very great and wonderful and there has been much discussion here and elsewhere about its uses.
Unfortunately, all this discussion seems to be in the context of blog or news sites where the content is all just that, content.
In an ecommerce site, the biggest question to be asking is, how do I now mark up a product?
Taking the spec for guidance, it seems that a saleable item is indeed something distinct that could be syndicated (and often is). The article tag seems like a good match, yet I see no mention of its use in this way.
Is it appropriate here but all the examples blogs etc. because they seem to fit a bit more intuitively with the name of the tag? Or am I stretching too hard at the spec?
Any guidance would be much appreciated.
I don't think <article> is suitable for product data. Although not using semantic elements, you may wish to look at the Product schema from schema.org.
EDIT :
See the following quote from the W3C spec. Perhaps article is suited after all, as a product can be considered an "independent item of content."
The article element represents a component of a page that consists of
a self-contained composition in a document, page, application, or site
and that is intended to be independently distributable or reusable,
e.g. in syndication. This could be a forum post, a magazine or
newspaper article, a blog entry, a user-submitted comment, an
interactive widget or gadget, or any other independent item of
content.
You should take a look at this article
Looks like <article> is not that bad an idea. I am using pointers from here and http://schema.org/Product for an e-commerce site project.
Having custom tags bothers IE a lot and we can not ignore the internet explorer yet.

Semantic HTML markup for a copyright notice

When a web site is licensed under Creative Commons, I use the rel-license microformat. When a web site is licensed under regular copyright, I have a boring paragraph element.
<p id="copyright">© 2008 Example Corporation</p>
That id attribute on there is just for CSS styling purposes. I'm wondering if there's some better way to markup a copyright notice that is more semantic. Is this a job for Dublin Core metadata? If so, how do I go about it? (I've never used Dublin Core before.)
Some web sites advocate using a meta tag in the head element:
<meta name="copyright" content="name of owner">
Which might be seen by search engines, but doesn't replace the user-visible notice on the page itself.
Thanks to Owen for pointing me in the direction of RDFa, I think I've got the solution now:
<div id="footer" xmlns:dc="http://purl.org/dc/elements/1.1/">
<p id="copyright" property="dc:rights">©
<span property="dc:dateCopyrighted">2008</span>
<span property="dc:publisher">Example Corporation</span>
</p>
</div>
Depending on the situation, it might be better to use dc:creator instead of dc:publisher. From the Dublin Core web site:
If the Creator and Publisher are the same, do not repeat the name in the Publisher area. If the nature of the responsibility is ambiguous, the recommended practice is to use Publisher for organizations, and Creator for individuals. In cases of lesser or ambiguous responsibility, other than creation, use Contributor.
I will also be adding a meta tag to my head element for search engines that don't support RDFa yet.
<meta name="copyright" content="© 2008 Example Corporation" />
Have you taken a look at RDFa? It was recently accepted as a W3C recommendation. I mention that just in case you want to take a look at other aspects of semantic structure it recommends. The licensing part is the same as the format you currently use. (So in that sense to answer your question, I think you're handling it correctly, assuming people adopt RDFa)
For lazy people who don't want to click links:
// RDFa recomendation and rel=license microformat
<a rel="license" href="http://creativecommons.org/licenses/by/3.0/">
a Creative Commons License
</a>
Probably the most semantically correct way to mark it up is with a definition list.
<dl id="copyright">
<dt title="Copyright">©</dt>
<dd>2008 Example Corporation</dd>
</dl>
Why not use the CC format, but indicate that no rights are granted?
In any case, the main problem with the use of the CC formats is that people do not clearly identify which elements of the webpage that they appear on they apply to.