Preferred approach for marking up a blog with Schema.org - blogs

Background/context
As schema.org is relatively new, perhaps this question will promote more discussion than a definitive answer. Either way, hopefully some learning from others' application/experience can be gained.
Having studied the http://schema.org documentation pages – and whilst there seems to be an extensive array of properties (read: itemprop attributes) available to enrich a blog post, there seems to be some inconsistencies and 'grey areas' with regard to the best approach to mark up blog comments. Let me provide an example:
The schema.org documentation for blogs can be found on within Thing > CreativeWork > Blog and for reference, a blog post lives within Thing > CreativeWork > Article > BlogPosting
So far, the documentation and markup examples on the aforementioned pages provide enough reference to format a blog index page, and the bulk of content within an individual post (author, pubDate, articleBody, interactionCount, etc.)
The problem: applying the UserInteraction schema to individual blog comments
It is when we start to look at individual UserInteraction elements (blog comments) within the interactionCount that things get a little vague. The documentation leads us through to Thing > Event > UserInteraction > UserComments, and is described as 'User interaction: A comment about an item.' However all of the suggested properties of UserInteraction are geared towards a physical event.
The only property that appears to be relevant to a blog comment in this schema's documentation is description; which could be used for the comment body. What feels lacking is some specific context for user comments about a blog post. There's also no evidence of example markup for said comments, even a search for 'comments' on the site doesn't seem to yield any clarity.
Has anyone marked up their blog using schema.org – and how did you approach/solve this?
I'll also raise this matter via the schema.org feedback form and update this post if anything comes to light.

Have a look at the examples here http://schema.org/WebPage and notice how the reviews are used for the Books.
You can do the same for Comments in Article, here's an example:
<div itemscope itemtype="http://schema.org/Article">
<-- Article content -->
<div itemprop="comment" itemscope itemtype="http://schema.org/UserComments">
<meta itemprop="discusses" content="A masterpiece of literature" />
<span itemprop="creator">John Doe</span>
<time itemprop="commentTime" datetime="2011-05-08T19:30">May 8, 7:30pm</time>
<span itemprop="commentText">I really enjoyed this book. It captures the essential
challenge people face as they try make sense of their lives.</span>
</div>
<div itemprop="comment" itemscope itemtype="http://schema.org/UserComments">
<meta itemprop="discusses" content="A masterpiece of literature" />
<span itemprop="creator">John Doe</span>
<time itemprop="commentTime" datetime="2011-05-08T19:30">May 8, 7:30pm</time>
<span itemprop="commentText">I really enjoyed this book. It captures the essential
challenge people face as they try make sense of their lives.</span>
</div>
</div>

Some years later, http://schema.org/Comment has been introduced.
A comment on an item - for example, a comment on a blog post. The comment's content is expressed via the "text" property, and its topic via "about", properties shared with all CreativeWorks.

My understanding is that UserComments is not for marking up blog comments. It exists only as one of the possible interaction types to be used with the interactionCount property on CreativeWork, such as:
<div itemscope itemtype="http://schema.org/Article">
<span itemprop="http://schema.org/interactionCount">UserComments:7</span>
</div>
I would mark up each of the comments as a CreativeWork or Article, and make sure that their about property points to the blog post that they are commenting to.

Blaise appears to be correct as of now. The example used on schema.org/Comment is:
A comment on an item - for example, a comment on a blog post. The comment's content is expressed via the "text" property, and its topic via "about", properties shared with all CreativeWorks.

I personally don't like the <meta> approach when I have the content matching the data. Since schema.org is not yet fully documented, I went ahead with this:
<span itemprop="interactionCount">100</span>
and/or this:
<span itemprop="interactionCount">100 comments</span>
I know that it doesn't specify "UserComments" anywhere. Thoughts?

Related

Is it ok to have undefined itemscope, or should I pick from available schema?

I'd like to use Microdata for a web page. But none of the existing available schema seem to fit my content. Do I need to stick with only defined schema or can I define my own? Also, can I have an empty itemscope or is it better to define?
<h1>Page Title</h1>
(table of contents)
term 1
term 2
...
<div itemscope>
<h2 itemprop="term">1. Piston</h2>
<h3>Definition - What does Piston mean?</h3>
<span itemprop="definition">A definition</span>
<h3>Explanation of Piston</h3>
<span itemprop="explanation">An explanation</span>
<h3>How to use Piston in a sentence.</h3>
<span itemprop="usage">Sentence using term.</span>
</div>
I have 10 terms on the same page, each with this same bit of info. Is it ok to have an undefined itemscope? Or should I define it something like "car parts"? Or can we not define our own itemscope and instead, choose from existing schema structure?
Ran through Google schema tool and it says no warning or errors, but of course gives me the 'unspecified type' and the following.
#type
https://search.google.com/term
https://search.google.com/definition
https://search.google.com/usage
Option 1: You could use itemscope without itemtype (like in your example). That would be a local vocabulary, and you can’t expect Microdata consumers to make use of the data.
<div itemscope>
<p itemprop="term">…</p>
<p itemprop="definition">…</p>
</div>
Option 2: You could define and use your own vocabulary. It’s unlikely that many Microdata consumers would make use of the data, though, as most of them only recognize certain vocabularies.
<div itemscope itemtype="https://example.com/my-vocabulary/">
<p itemprop="term">…</p>
<p itemprop="definition">…</p>
</div>
Option 3 (preferable): You could use Schema.org as far as possible, and use your own types/properties where Schema.org doesn’t offer suitable terms. Your own properties would have to be specified as absolute URIs, and your own types would have to be specified as URI values for Schema.org’s additionalType property. As Schema.org type, you could always use Thing if there is no more specific type available.
<div itemscope itemtype="http://schema.org/Thing">
<link itemprop="additionalType" href="https://example.com/my-vocabulary/CarPartTerm" />
<p itemprop="https://example.com/my-vocabulary/term">…</p>
<p itemprop="https://example.com/my-vocabulary/definition">…</p>
</div>
That said, it could be the case that Schema.org does offer suitable types/properties for your case, e.g., maybe DefinedTerm (Pending). If you think that a useful type/property is missing in Schema.org, you could propose that it gets added.

Marking up product boxes in HTML5

What would be semantically correct way of marking up products in HTML5?
This is how I do it currently. I am using BEM in the following example:
<div class="product__box">
<h2 class="product__box-title">
Chair
</h2>
<img class="product__box-img" src="but-can-it-do-this.jpeg" alt="image-of-chair">
<p class="product__box-price>
$399
</p>
<a href="#" class="product__box-button role="button">
Add To Cart
</a>
</div>
I saw some people using <article> tags instead of <div>, but I am not sure that is correct. This is the shortened definition from W3C:
The article element (...) could be a forum post, a magazine or newspaper article, a blog entry, a user-submitted comment, an interactive widget or gadget, or any other independent item of content.
How would you mark up a product to follow HTML5 spec?
This is pretty much up for debate and depends on the general semantics of your website, but I think you could change the surrounding div with article in your example.
As the spec states:
article
Represents a section of a page that consists of a composition that
forms an independent part of a document, page, or site.
I'd argue a "product box" (a.k.a. teaser as far as I understand) is a "composition that forms an independent part"...
Some further reading about the article-tag and HTML5 semantics in general:
html5doctor: Let's talk about semantics
adactio.com: Pursuing semantic value

Correct use of Microdata and Schema.org to specify "email" as itemscope?

Schema.org is what I've been using to add microdata to my first website but I'm finding out that the website is very vague and most of the questions I've had so far I have been able to find the answer on Stackoverflow or WebMaster.
On one page I have the schema set up as
<div itemscope itemtype="https://schema.org/Person">
<h1><span itemprop="name">Name Here</span> — SEO</h1>
<h2 itemprop="address"><i>Manchester, New Hampshire</i></h2>
<h3>Email Me
—<span itemprop="email">email#gmail.com</span></h3>
</div>
No problem there, I hope.
On another page of my website I only have the email me header as none of the other information is relevant. Right now I have that schema set up as
<h2><a href="mailto:email#gmail.com?
Subject=Service%20Inquiry">Email Me</a> —<span itemscope
itemtype="https://schema.org/email">email#gmail.com</span></h2>
Email is an itemprop of person but I can't find the documentation saying that I can't use an itemprop as the itemtype. Did I use this correctly,and will the schema be read any differently to a crawler than the itemprop use of email?
You should not use a property as type and vice versa. While nothing stops you from doing this, it’s not defined what this would mean, so it’s likely useless for consumers.
If you want to markup the email address on the other page, you should add the appropriate type (to which this email address belongs to), e.g.:
<div itemscope itemtype="http://schema.org/Person">
<h2>Email Me — <span itemprop="email">email#gmail.com</span></h2>
</div>
If you want to make sure that consumers can understand that both Person items describe the same person, you could provide an itemid attribute. An implicit alternative would be to provide properties with unique values (and you are doing exactly this by specifying email). See my answer about both ways (itemid and unique property values).

Using schema.org WebPage and data-vocabulary.org Breadrumb together

I want to semantically enhance my HTML markup by adding elements from the schema.org WebPage vocabulary including semantic markup for the breadcrumb navigation. According to the definition I should use schema.org BreadcrumbList to achieve this.
When looking at Google's documentation about adding structured data for Breadcrumbs though, they explicitly state that the schema.org markup for breadcrumbs is not yet supported.
Instead, the apparently older definition for a data-vocabulary.org Breadcrumb should be applied. This seems to be due to the fact, that the schema.org BreadcrumbList is still disputed. Actually Google parses schema.org BreadcrumbList markup in their Structured Data Testing Tool but don't use it for nice representation in the search results like they do for breadcrumbs annoted using the data-vocabulary.org Breadcrumb definition.
However, it would be nice to bring together both worlds and have semantic markup for webpage and breadcrumbs. The best I was able to come up with looks like this (using itemref to prevent needing to nest each Breadcrumb into the other):
<body itemscope itemtype="http://schema.org/WebPage">
<h1 itemprop="name">George Orwell</h1>
<p itemprop="description">Eric Arthur Blair (25 June 1903 – 21 January
1950), who used the pen name George Orwell, was an English novelist,
essayist, journalist and critic.</p>
<nav itemprop="breadcrumb">
<ul itemscope>
<li id="bc1"itemscope itemref="bc2"
itemtype="http://data-vocabulary.org/Breadcrumb">
<a href="http://example.com/books" itemprop="url">
<span itemprop="title">Books</span>
</a> ›
</li>
<li id="bc2" itemscope itemprop="child" itemref="bc3"
itemtype="http://data-vocabulary.org/Breadcrumb">>
<a href="http://example.com/books/authors" itemprop="url">
<span itemprop="title">Authors</span>
</a> ›
</li>
<li id="bc3" itemscope itemprop="child"
itemtype="http://data-vocabulary.org/Breadcrumb">>
<a href="http://example.com/books/authors/orwell" itemprop="url">
<span itemprop="title">George Orwell</span>
</a>
</li>
</ul>
</nav>
</body>
The itemscope attribute on the <ul> is needed so the subsequent breadcrumbs with itemprop="child" are not interpreted as properties of WebPage.
When I throw this code at the Structured Data testing tool, all data is recognised as I want it to be, but there are warnings for the undefined ul item.
Is it safe to ignore these errors? Are there other approaches or even best practices to solve the problem? What about future-proofness: would it be wise to use this kind of code on a website that may not be updated for years?
When testing your markup, Google’s Testing Tool doesn’t seem to report any errors or warnings. It says "All good" for every item.
Your use of Microdata is valid. You are adding an item without type, which does not have any content (because no properties are added).
Using Schema.org’s breadcrumb property seems to be appropriate, as one of its expected types is Text. So Schema.org consumers would extract only the text content of the child items, no URLs:
Books › > Authors › > George Orwell
I don’t think that the linked issue is the reason why Google does not support Schema.org’s BreadcrumbList: the issue is from 2012, but the BreadcrumbList type was added only a few months ago (2014-12-11) to Schema.org.
The issue is about using the breadcrumb property without a type (which did not exist back then), which is not ideal because this does not allow to specify metadata for each breadcrumb (e.g., its URL).
The future-proof way would be to use both vocabularies for breadcrumbs. The Microdata syntax makes this hard/impossible, but the RDFa syntax allows this (however, the odd requirement from Data-Vocabulary.org that the breadcrumbs have to be nested might require markup changes).

Semantic HTML for messages

I'm making a small web-chat utility and am looking for advice on which elements to use for messages.
Here's what I'm thinking of using at the moment:
<p id="message-1">
<span class="timestamp" id="2009-03-10T12:04:01+00:00">
12:04
</span>
<cite class="admin">
Ross
</cite>
Lorem ipsum dolor sit amet.
</p>
I'd take advantage of CSS here to add brackets around the timestamp, icons for the cited user etc. I figured it would be silly (and incorrect) to use a blockquote for each message, although I consider the cite correct as it's referring to the user that posted the message.
I know this isn't a) an exact science and b) entirely essential but I'd prefer to use meaningful elements rather than spans throughout. Are there any other elements I should consider? Any microformats?
HTML isn't very semantic in a customizable way. Nevertheless your format should be understandable in any browser (with proper CSS, as you have pointed out).
What I see in the code example above is very similar to XML. It might be cumbersome and overkill for your needs, but I'd like to point out that you can use XML with XSLT as a substitute to both (X)HTML. This way you can get your tags as semantic as possible, and don't need to compromise with the limitations of the HTML tags.
w3schools has an article about the topic. I could swear that I saw a webpage in sun.com that was done in XML, but I can't find it anymore.
If you don't intend this to be interpreted or parsed by third party software, I'd nevertheless advise against this method, and stick with the proven HTML.
Seems reasonable to me, except that the ‘id’ is invalid. NAME tokens can't start with a number or contain ‘+’.
Plus if two people spoke at once you'd have non-unique IDs. Perhaps that data should go in another attribute, such as ‘title’ (so you can hover to see the exact timestamp).
If you're going for semantic HTML, you'll probably want to know that HTML5 doesn't consider your use of the <cite> element correct anymore.
A person's name is not the title of a work — even if people call that person a piece of work — and the element must therefore not be used to mark up people's names.
<ol>
<li class="message" value="1">
<span class="timestamp" id="2009-03-10T12:04:01+00:00">
12:04
</span>
<cite class="admin">
<address class="email">
<a href="mailto:ross#email.com">
Ross
</a>
</address>
</cite>
Lorem ipsum dolor sit amet.
</li>
</ol>
I would try something like the above. Notice I have placed everything in an Ordered list, as comments can be construed in the linear manner fitting an ordered list. Also, I have embedded, inside your Cite tag, an Address tag with an Anchor element. The unfortunately named Address element is actually meant to convey contact information for an Author, so you would probably want to link to the author's email address there.
What you suggested is already very good. If you want to take it a step further and be able to allow tons of different presentation options with the same markup (at the expense of heavier html) you may want to do something like:
<dl class="message" id="message-1">
<dt class="datetime">Datetime</dt>
<dd class="datetime">
<span class="day">Wed</span>
<span class="dayOfMonth">11</span>
<span class="month">Mar</span>
<span class="year">2009</span>
<span class="hourMin">17:34</span>
<span class="sec">33</span>
</dd>
<dt class="author">Author</dt>
<dd class="author">Ross</dd>
<dt class="message">Message</dt>
<dd class="message">Lorem ipsum dolor sit amet</dd>
</dl>
Since you mention microformats in the question, you are no doubt already familiar with the microformats wiki. It contains a good number of examples for different situations.
Another possibility would be to borrow parts of SIOC, which among other things is an ontology for forums - pretty similar to chat.
By re-using existing formats, you can take advantage of plugins and tools like Operator and maybe get more out of your app for free.
I'd use XML with XSLT to transform (style) the data.
It makes sense semantically here, but you also have the conversations in a suitable format for archiving (i.e. XML) - I assume you will have some sort of log or 'history'.
As #bobince said, the id="2009-03-10T12:04:01+00:00" is invalid.
You should change this:
<span class="timestamp" id="2009-03-10T12:04:01+00:00">
12:04
</span>
To this:
<time class="timestamp" datetime="2009-03-10T12:04:01+00:00">
12:04
</time>
You can get more information about the time tag at HTML5 Doctor and W3C:
The time tag on HTML5 offers a new element for unambiguously encoding dates and times for machines while still displaying them in a human-readable way.
The time element represents either a time on a 24 hour clock, or a precise date in the proleptic Gregorian calendar, optionally with a time and a time-zone offset.
...
I agree with the ordered list (ol) solution posted by #Robotsu, except by the time tag I just posted and the incorrect address inside cite tag!