Using schema.org WebPage and data-vocabulary.org Breadrumb together - html

I want to semantically enhance my HTML markup by adding elements from the schema.org WebPage vocabulary including semantic markup for the breadcrumb navigation. According to the definition I should use schema.org BreadcrumbList to achieve this.
When looking at Google's documentation about adding structured data for Breadcrumbs though, they explicitly state that the schema.org markup for breadcrumbs is not yet supported.
Instead, the apparently older definition for a data-vocabulary.org Breadcrumb should be applied. This seems to be due to the fact, that the schema.org BreadcrumbList is still disputed. Actually Google parses schema.org BreadcrumbList markup in their Structured Data Testing Tool but don't use it for nice representation in the search results like they do for breadcrumbs annoted using the data-vocabulary.org Breadcrumb definition.
However, it would be nice to bring together both worlds and have semantic markup for webpage and breadcrumbs. The best I was able to come up with looks like this (using itemref to prevent needing to nest each Breadcrumb into the other):
<body itemscope itemtype="http://schema.org/WebPage">
<h1 itemprop="name">George Orwell</h1>
<p itemprop="description">Eric Arthur Blair (25 June 1903 – 21 January
1950), who used the pen name George Orwell, was an English novelist,
essayist, journalist and critic.</p>
<nav itemprop="breadcrumb">
<ul itemscope>
<li id="bc1"itemscope itemref="bc2"
itemtype="http://data-vocabulary.org/Breadcrumb">
<a href="http://example.com/books" itemprop="url">
<span itemprop="title">Books</span>
</a> ›
</li>
<li id="bc2" itemscope itemprop="child" itemref="bc3"
itemtype="http://data-vocabulary.org/Breadcrumb">>
<a href="http://example.com/books/authors" itemprop="url">
<span itemprop="title">Authors</span>
</a> ›
</li>
<li id="bc3" itemscope itemprop="child"
itemtype="http://data-vocabulary.org/Breadcrumb">>
<a href="http://example.com/books/authors/orwell" itemprop="url">
<span itemprop="title">George Orwell</span>
</a>
</li>
</ul>
</nav>
</body>
The itemscope attribute on the <ul> is needed so the subsequent breadcrumbs with itemprop="child" are not interpreted as properties of WebPage.
When I throw this code at the Structured Data testing tool, all data is recognised as I want it to be, but there are warnings for the undefined ul item.
Is it safe to ignore these errors? Are there other approaches or even best practices to solve the problem? What about future-proofness: would it be wise to use this kind of code on a website that may not be updated for years?

When testing your markup, Google’s Testing Tool doesn’t seem to report any errors or warnings. It says "All good" for every item.
Your use of Microdata is valid. You are adding an item without type, which does not have any content (because no properties are added).
Using Schema.org’s breadcrumb property seems to be appropriate, as one of its expected types is Text. So Schema.org consumers would extract only the text content of the child items, no URLs:
Books › > Authors › > George Orwell
I don’t think that the linked issue is the reason why Google does not support Schema.org’s BreadcrumbList: the issue is from 2012, but the BreadcrumbList type was added only a few months ago (2014-12-11) to Schema.org.
The issue is about using the breadcrumb property without a type (which did not exist back then), which is not ideal because this does not allow to specify metadata for each breadcrumb (e.g., its URL).
The future-proof way would be to use both vocabularies for breadcrumbs. The Microdata syntax makes this hard/impossible, but the RDFa syntax allows this (however, the odd requirement from Data-Vocabulary.org that the breadcrumbs have to be nested might require markup changes).

Related

Is it ok to have undefined itemscope, or should I pick from available schema?

I'd like to use Microdata for a web page. But none of the existing available schema seem to fit my content. Do I need to stick with only defined schema or can I define my own? Also, can I have an empty itemscope or is it better to define?
<h1>Page Title</h1>
(table of contents)
term 1
term 2
...
<div itemscope>
<h2 itemprop="term">1. Piston</h2>
<h3>Definition - What does Piston mean?</h3>
<span itemprop="definition">A definition</span>
<h3>Explanation of Piston</h3>
<span itemprop="explanation">An explanation</span>
<h3>How to use Piston in a sentence.</h3>
<span itemprop="usage">Sentence using term.</span>
</div>
I have 10 terms on the same page, each with this same bit of info. Is it ok to have an undefined itemscope? Or should I define it something like "car parts"? Or can we not define our own itemscope and instead, choose from existing schema structure?
Ran through Google schema tool and it says no warning or errors, but of course gives me the 'unspecified type' and the following.
#type
https://search.google.com/term
https://search.google.com/definition
https://search.google.com/usage
Option 1: You could use itemscope without itemtype (like in your example). That would be a local vocabulary, and you can’t expect Microdata consumers to make use of the data.
<div itemscope>
<p itemprop="term">…</p>
<p itemprop="definition">…</p>
</div>
Option 2: You could define and use your own vocabulary. It’s unlikely that many Microdata consumers would make use of the data, though, as most of them only recognize certain vocabularies.
<div itemscope itemtype="https://example.com/my-vocabulary/">
<p itemprop="term">…</p>
<p itemprop="definition">…</p>
</div>
Option 3 (preferable): You could use Schema.org as far as possible, and use your own types/properties where Schema.org doesn’t offer suitable terms. Your own properties would have to be specified as absolute URIs, and your own types would have to be specified as URI values for Schema.org’s additionalType property. As Schema.org type, you could always use Thing if there is no more specific type available.
<div itemscope itemtype="http://schema.org/Thing">
<link itemprop="additionalType" href="https://example.com/my-vocabulary/CarPartTerm" />
<p itemprop="https://example.com/my-vocabulary/term">…</p>
<p itemprop="https://example.com/my-vocabulary/definition">…</p>
</div>
That said, it could be the case that Schema.org does offer suitable types/properties for your case, e.g., maybe DefinedTerm (Pending). If you think that a useful type/property is missing in Schema.org, you could propose that it gets added.

How to cite a blog post using HTML microdata and schema.org?

My goal is to cite a blog post by using HTML microdata.
How can I improve the following markup for citations?
I am seeking improvements on the syntax and semantics, to produce a result that works well with HTML5 standards, renders well in current browsers, and parses well in search engines.
The bounty on this question is for expert advice and guidance. My research is turning up many opinions and snippets, so I'm seeking clear answers, complete samples, and canonical documentation.
This is my work in progress and I'm seeking advice on it's correctness:
Use <div class="citation"> to wrap everything.
Use <article> with itemscope and BlogPost to wrap the post info including its nested info.
Use <header> and <h1 itemprop="headline"> to wrap the post name link.
Use <cite> to wrap the post name.
Use <footer> to wrap the author info and blog info.
Use <address> to wrap the author link and name.
Use rel="author" to annotate the link to the author's name.
Use itemprop="isPartOf" to connect the post to the blog.
This is my work in progress HTML source:
<!-- Hello World authored by Alice posted on Bob's blog -->
<div class="citation">
<article itemscope itemtype="http://schema.org/BlogPosting">
<header>
<h1 itemprop="headline">
<a itemprop="url" href="…">
<cite itemprop="name">Hello World</cite>
</a>
</h1>
</header>
<footer>
authored by
<span itemprop="author" itemscope itemtype="http://schema.org/Person">
<address>
<a itemprop="url" rel="author" href="…">
<span itemprop="name">Alice</span>
</a>
</address>
</span>
posted on
<span itemprop="isPartOf" itemscope itemtype="http://schema.org/Blog">
<a itemprop="url" href="…">
<span itemprop="name">Bob's blog</span>
</a>
</span>
</footer>
</article>
</div>
Related notes thus far:
The <cite> tag W3 reference says the tag is "phrase level", so it works like an inline span, not a block div. But the <article> tag seems to benefit from using <h1>, <header>, <footer>. As best I can tell, the spec does not give a solution for citing an article by using <cite> to wrap <article>. Is there a solution to this or a workaround? (The work in progress fudges this by using <div class="citation">)
The <address> tag W3 reference says the content "The address element must not be used to represent arbitrary addresses, unless those addresses are in fact the relevant contact information." As best I can tell, the spec does not give a solution for marking the article's author's URL and name, as distinct from the article's contact info. Is there a solution for this or a workaround? (The work in progress fudges this by using <address> for the author's URL and name)
Please ask questions in the comments. I will update this post as I learn more.
If you’d ask me which markup to use for a list of links to blog posts (OP’s context), without seeing your example, I’d go with something like this:
<body itemscope itemtype="http://schema.org/WebPage">
<ul>
<li>
<cite itemprop="citation" itemscope itemtype="http://schema.org/BlogPosting">
<span itemprop="name headline">Hello World</span>,
authored by <span itemprop="author" itemscope itemtype="http://schema.org/Person"><span itemprop="name">Alice</span></span>,
posted on <span itemprop="isPartOf" itemscope itemtype="http://schema.org/CreativeWork"><span itemprop="name">Bob’s blog</span></span>.
</cite>
</li>
<li>
<cite itemprop="citation" itemscope itemtype="http://schema.org/BlogPosting">…</cite>
</li>
</ul>
</body>
Using the sectioning content element article, like in your example, is certainly possible, although perhaps unusual (if I understand your use case correctly): As article is a sectioning content element, it creates an entry in the document outline, which may or may not be what you want for your case. (You can check the outline with the HTML5 Outliner, for example.)
Another indication that a sectioning content element might not be the best choice: Your article doesn’t contain any actual "main" content. Simply said, the main content of a sectioning content element could be determined by stripping its metadata: header, footer, and address elements. (This is not a explicitly specified, but it follows from the defintions in Sections.)
However, not having this content is not wrong. And one could easily imagine (and maybe you intend to do so anyway) that you’ll quote a snippet from the blog post (making this case similar to a search result entry), in which case you’d have:
<article>
<header></header>
<blockquote></blockquote> <!-- the non-metadata part of the article -->
<footer></footer>
</article>
I’ll further on assume that you want to use article.
Notes about your HTML5:
Semantically, the wrapping div is not needed. You could add the citation class to the article directly.
The header element is optional if it just contains a heading element (this element makes sense when your header consists of more than just a heading element). However, having it is not wrong, of course.
I’d prefer to include the a element in the cite element (similar to the fifth example in the spec).
The span element can only contain phrasing content, so address isn’t allowed as a child.
The address element must only be used if it contains contact information. So if this element is appropriate depends on what is available at the linked page: if it’s a contact form, yes; if it’s a list of authored blog posts, no.
The author link type might not be appropriate, as it’s defined to give information about the author of the article element. But, strictly speaking, you are the author. If the article would consist only of the blog post author’s actual content, using the author link type would be appropriate; but in your case, you are writing the content ("authored by", "posted on").
You might want to use the external link type for all external links.
Notes about your Microdata:
You can specify the Schema.org properties headline and name in the same itemprop (separated with space).
You might want to use Schema.org’s citation property on the article (which requires that you specify a parent type of CreativeWork or one of its child types; i.e., it could be WebPage or maybe even AboutPage in your case).
Taking your example, this would give:
<body itemscope itemtype="http://schema.org/WebPage">
<article itemprop="citation" itemscope itemtype="http://schema.org/BlogPosting" class="citation">
<header>
<h1>
<cite itemprop="headline name"><a itemprop="url" href="…" rel="external">Hello World</a></cite>
</h1>
</header>
<footer>
authored by
<span itemprop="author" itemscope itemtype="http://schema.org/Person">
<a itemprop="url" href="…" rel="external"><span itemprop="name">Alice</span></a>
</span>
posted on
<span itemprop="isPartOf" itemscope itemtype="http://schema.org/Blog">
<a itemprop="url" href="…" rel="external"><span itemprop="name">Bob’s blog</span></a>
</span>
</footer>
</article>
</body>
(All things considered, I still prefer the section-less variant.)

How to add Schema.org type for feeds?

<div itemscope="" itemtype="http://schema.org/ItemList" >
<span itemprop="???">RSS of this section</span>
<ul>
<li itemprop="itemListElement" itemscope itemtype="http://example.net/Item">
title
</li>
<li></li>
<li></li>
<li></li>
</ul>
</div>
What about feed in span tag? Is there any Schema.org type for that?
Update 2019: Schema.org will likely have a webFeed property for podcasts soon (see pull request), and its definition will probably be extended so that it can be used in non-podcast contexts, too, but I’d doubt that it will be possible to add it to ItemList.
The Schema.org vocabulary doesn’t define a class or a property for feeds.
But HTML5 defines a way to link feeds: use the alternate link type together with the type attribute set to application/rss+xml resp. application/atom+xml.
RSS of this section
(The first feed linked that way is treated as the default feed for autodiscovery; so when you have several section feeds linked on the same page, you should probably make sure that the first feed is a general one and not specific to only one section.)
Like a PDF or AMP page RSS is simply an output format, or encoding, of a document or resource. Here's what schema.org has to say about encoding:
In cases where a CreativeWork has several media type representations, encoding can be used to indicate each MediaObject alongside particular encodingFormat information.
Applied to your example using microdata:
<div itemscope itemtype="http://schema.org/MediaObject">
<meta itemprop="encodingFormat" content="application/rss+xml">
<a itemprop="contentUrl" href="feed.rss">
<span itemprop="description">RSS of this section</span>
</a>
</div>
And tested with Yandex results in the following:
mediaobject
itemType = http://schema.org/MediaObject
encodingformat = application/rss+xml
contenturl
href = feed.rss
text = RSS of this section
description = RSS of this section
You can also use JSON-LD or RDFa instead of microdata if you prefer.

Microdata - How to markup Item properties placed inside a child Item (and therefore out of it's scope)

Is there any way to markup Item properties placed inside a child Item (and therefore out of it's scope)?
I'm using microdata and schema.org to mark up some web page with. and I have a code like this:
<body itemscope itemtype="http://schema.org/WebPage">
<header itemscope itemtype="http://schema.org/WPHeader">
<a href="index.html">
<img id="logo" src="xxx" alt="xxx" itemprop="primaryImageOfPage">
</a>
</header>
<!--the rest of the page-->
</body>
I have the logo inside the WPHeader Item and I want it to be the primaryImageOfPage for the WebPage Item. I know i can use Itemref to include properties which are out of the item's scope, but like this you don't take this property out of the child item's scope. That's really a problem if both items can have the same property, such as name or description.
This is only an example to explain the problem I have. By the moment I solve it using itemref...but there has to be a better way to do that.
I know there's no need to markup everything, I just want to know which is the best way to avoid having this problem.
Microdata is RDFa rip-off constrained to be relevant to search cross-cutting concern for semantic fragments. It thus assumes the advanced scoping abilities of CURIEs is discardable. For wholeness that good quality domain-specific content pages exhibit, RDFa alongwith vocab covering domain-specific aspects accordingly is the ultimate way as yet. While search providers dominated HTML5 spec to make microdata part of standard, as the Web keeps growing more semantic, the differences between both are ending up as mere matter of "what's in a name?"

Marking up a search result list with HTML5 semantics

Making a search result list (like in Google) is not very hard, if you just need something that works. Now, however, I want to do it with perfection, using the benefits of HTML5 semantics. The goal is to define the defacto way of marking up a search result list that potentially could be used by any future search engine.
For each hit, I want to
order them by increasing number
display a clickable title
show a short summary
display additional data like categories, publishing date and file size
My first idea is something like this:
<ol>
<li>
<article>
<header>
<h1>
<a href="url-to-the-page.html">
The Title of the Page
</a>
</h1>
</header>
<p>A short summary of the page</p>
<footer>
<dl>
<dt>Categories</dt>
<dd>
<nav>
<ul>
<li>First category</li>
<li>Second category</li>
</ul>
</nav>
</dd>
<dt>File size</dt>
<dd>2 kB</dd>
<dt>Published</dt>
<dd>
<time datetime="2010-07-15T13:15:05-02:00" pubdate>Today</time>
</dd>
</dl>
</footer>
</article>
</li>
<li>
...
</li>
...
</ol>
I am not really happy about the <article/> within the <li/>. First, the search result hit is not an article by itself, but just a very short summary of one. Second, I am not even sure you are allowed to put an article within a list.
Maybe the <details/> and <summary/> tags are more suitable than <article/>, but I don't know if I can add a <footer/> inside that?
All suggestions and opinions are welcome! I really want every single detail to be perfect.
1) I think you should stick with the article element, as
[t]he article element represents a
self-contained composition in a
document, page, application, or site
and that is intended to be
independently distributable or
reusable [source]
You merely have a list of separate documents, so I think this is fully appropriate. The same is true for the front page of a blog, containing several posts with titles and outlines, each in a separate article element. Besides, if you intend to quote a few sentences of the articles (instead of providing summaries), you could even use blockquote elements, like in the example of a forum post showing the original posts a user is replying to.
2) If you're wondering if it's allowed to include article elements inside a li element, just feed it to the validator. As you can see, it is permitted to do so. Moreover, as the Working Draft says:
Contexts in which this element may be
used:
Where flow content is expected.
3) I wouldn't use nav elements for those categories, as those links are not part of the main navigation of the page:
only sections that consist of major navigation blocks are appropriate for the nav element. In particular, it is common for footers to have a short list of links to various pages of a site, such as the terms of service, the home page, and a copyright page. The footer element alone is sufficient for such cases, without a nav element. [source]
4) Do not use the details and/or summary elements, as those are used as part of interactive elements and are not intended for plain documents.
UPDATE: Regarding if it's a good idea to use an (un)ordered list to present search results:
The ul element represents a list of
items, where the order of the items is
not important — that is, where
changing the order would not
materially change the meaning of the
document. [source]
As a list of search results actually is a list, I think this is the appropriate element to use; however, as it seems to me that the order is important (I expect the best matching result to be on top of the list), I think that you should use an ordered list (ol) instead:
The ol element represents a list of
items, where the items have been
intentionally ordered, such that
changing the order would change the
meaning of the document. [source]
Using CSS you can simply hide the numbers.
EDIT: Whoops, I just realized you already use an ol (due to my fatique, I thought you used an ul). I'll leave my ‘update’ as is; after all, it might be useful to someone.
I'd markup it up this way (without using any RDFa/microdata vocabularies or microformats; so only using what the plain HTML5 spec gives):
<ol start="1">
<li id="1">
<article>
<h1>The Title of the Page</h1>
<p>A short summary of the page</p>
<footer>
<dl>
<dt>Categories</dt>
<dd>First category</dd>
<dd>Second category</dd>
<dt>File size</dt>
<dd>2 <abbr title="kilobyte">kB</code></dd>
<dt>Published</dt>
<dd><time datetime="2010-07-15T13:15:05-02:00">Today</time></dd>
</dl>
</footer>
</article>
</li>
<li id="2">
<article>
…
</article>
</li>
</ol>
start attribute for ol
If the search engine uses pagination, you should give the start attribute to the ol, so that each li reflects the correct ranking position.
id for each li
Each li should get id atribute, so that you can link to it. The value should be the rank/position.
One could think that the id should be given to the article instead, but I think this would be wrong: the rank/order could change by time. You are not referring to a specific result but to a result position.
Remove the header
It is not needed if it contains only the heading (h1).
Add rel="external" to the link
The link to each search result is an external link (leading to a different website), so it should get the rel value external.
Remove nav
The category links are not navigation in scope of the article. So remove the nav.
Each category in a dd
You used:
<dt>Categories</dt>
<dd>
<ul>
<li>First category</li>
<li>Second category</li>
</ul>
</dd>
Instead, you should list each category in its own dd and remove the ul:
<dt>Categories</dt>
<dd>First category</dd>
<dd>Second category</dd>
abbr for file size
The unit in "2 kB" should be marked-up with abbr:
2 <abbr title="kilobyte">kB</code>
Remove pubdate attribute
It's not in the spec anymore.
Other things that could be done
give hreflang attribute to the link if the linked result has a different language than the search engine
give lang attribute to the link description and the summary if it is in a different language than the search engine
summary: use blockquote (with cite attribute) instead of p, if the search engine does not create a summary itself but uses the meta-description or a snippet from the page.
title/link description: use q (with cite attribute) if the link description is exactly the title from the linked webpage
Aiming for a 'perfect' HTML5 template is futile because the spec itself is far from perfect, with most of the prescribed use-cases for the new 'semantic' elements obscure at best. As long as your document is structured in a logical fashion, you won't have any problems with search engines (most of the new tags don't have the slightest impact). Indeed, following the HTML5 spec to the letter - for example, using <h1> tags within each new sectioning element - may make your site less accessible (to screen readers, for example). Don't strive for 'perfect' or close-to, because it doesn't exist - HTML5 is not thought-out well enough for that. Just concentrate on keeping your markup logical and uncluttered.
I found a good resource for HTML5 is HTML5Doctor. Check the article archive for practical implementations of the new tags. Not a complete reference mind you, but nice enough to ease into it :)
As shown by the Footer element page, sections can contain footers :)