After reading thousand of posts, questions, blog articles and opinions, I'm still a bit confused about how to markup a web page with microdata. If the main purpose of microdata is to help search engine to better understand the content of a web page (and web page is assumed implicitly), is it correct to start with itemtype Webpage in the body element, and then continue to markup the rest of nested elements defining which is the main entity, or is it better to start with an itemtype that is ideally the main topic of the web page and associate properties at the top level, or is better to have different itemtype at the top level (i.e. webpage, blog post and main topic of the page)?
An example will explain better my question: if I have to markup a webpage that contains a blog post about a specific topic (let's say about wireless technology), what should be the item at the top level? Should be webpage, blogposting, or wireless technology?
The more the better (with exceptions)
When it comes to structured data, the guideline should be, in the typical case: the more the better. If you provide more structured data (i.e., you make things explicit instead of keeping them implicit), the chance is higher that a consumer finds something it can make use of.
Reasons not to follow this guideline might include:
You know exactly which consumers you want to support, and what they look for, and you don’t care about other (e.g., unknown or new) consumers.
You know that a consumer is bugged in a way that it can’t cope with certain structures.
You need to save as many characters as possible (bandwith/performance).
It’s too complex/expensive to provide additional structured data.
The structured data is most likely useless to any conceivable consumer.
…
What WebPage offers
So unless you have a reason not to, it’s probably a good idea to provide the WebPage type … if you can provide possibly interesting data. For example:
It allows you to provide different URIs for the page and the thing(s) on the page, or what the page represents, like a person, a building, etc. (see why this can be useful and a slightly more technical answer with details).
hasPart allows you to connect items which might otherwise be top-level items, for which it wouldn’t necessarily be clear in which relation they are.
isPartOf allows you to make this WebPage part of something else (e.g., of the website if you provide a WebSite item, or of a CollectionPage).
You have breadcrumbs on the page: use breadcrumb to make clear that they represent the breadcrumbs for this page.
You provide accessibility information: use accessibilityAPI, accessibilityControl, accessibilityFeature, accessibilityHazard
The author/contributor/copyrightHolder/editor/funder/etc. of the page is different from the author/… of e.g. the page’s main content.
The page has a different license than some of the parts included in the page.
You provide actions that can be done on/with the page: use potentialAction.
…
Of course it also allows you to use mainEntity, but if this were the only thing you need the WebPage item for, you could as well use the inverse property mainEntityOfPage.
More specific WebPage types
And the same is true for the more specific types, which give additional signals:
AboutPage if it’s a page about e.g. the site, you, or your organization.
CheckoutPage if it’s the checkout page in a web shop.
CollectionPage if it’s a page about multiple things (e.g., a pagination page listing blog posts, a gallery, a product category, …).
ContactPage if it’s the contact page.
ItemPage if it’s about a single thing (e.g., a blog posting, a photograph, …).
ProfilePage e.g. for user profiles.
QAPage if it’s … well, this very page.
SearchResultsPage for the result pages of your search function.
…
Your example
Your three cases are:
<!-- A - only the topic -->
<div itemscope itemtype="http://schema.org/Thing">
<span itemprop="name">wireless technology</span>
</div>
<!-- B - the blog post + the topic -->
<div itemscope itemtype="http://schema.org/BlogPosting">
<div itemprop="about" itemscope itemtype="http://schema.org/Thing">
<span itemprop="name">wireless technology</span>
</div>
</div>
<!-- C - the web page + the blog post + the topic -->
<div itemscope itemtype="http://schema.org/ItemPage">
<div itemprop="mainEntity" itemscope itemtype="http://schema.org/BlogPosting">
<div itemprop="about" itemscope itemtype="http://schema.org/Thing">
<span itemprop="name">wireless technology</span>
</div>
</div>
</div>
A conveys: there is something with the name "wireless technology".
B conveys: there is a blog post about "wireless technology".
C conveys: there is a web page that contains a single blog post (as main content for that page) about "wireless technology".
While I wouldn’t recommend to use A, using B is perfectly fine and probably sufficient for most use cases. While C already provides more details than B (namely that the page is for a single thing, and that this thing is the blog post, and not some other item that might also be on the page), it’s probably not needed for such a simple case. But this changes as soon as you can provide more data, in which case I’d go with C.
Related
I'm trying to improve the accessibility of a web app that is mostly used for reading RSS. It has a lot of complex functionalities and is written quite some time ago (doesn't use HTML5). I don't want to change anything about its layout and markup, I only want to specify the appropriate roles of the elements. I have a single container that holds articles coming from the RSS feed, so it sounds perfect to use the "feed" role. However, that's also the main content section. I can't wrap it in another parent container. So which role is best to use - "main" or "feed"?
EDIT: Articles load dynamically. Also if I use the 'feed' role I will have no 'main' one (there's also no tag in the markup).
The markup is similar to this:
<div role="main/feed">
<div role="article"></div>
<div role="article"></div>
<div role="article"></div>
</div>
Per the mozilla description for feed:
A feed is a dynamic scrollable list of articles in which articles are added to or removed from either end of the list as the user scrolls.
Based on dynamic content, feed is probably your best bet.
I'm describing a site for distance education with schema.org and JSON. Any pages are with this:
<body itemscope itemtype="http://schema.org/WebPage">
but I have many pages for every course - German course, Economics course, Hairdressing course etc. (Product pages) and I was described them with
<body itemscope itemtype="http://schema.org/ItemPage">
How is better to describe them - with WebPage or with ItemPage ?
In schema.org the description for ItemPage is "A page devoted to a single item, such as a particular product or hotel." I am not sure what is better. Please help.
Thanks
If the page is about a single course, then yes, you may use ItemPage.
As a general rule, always go with the most specific type. So if you have a WebPage, check its "More specific Types"; if one of the types applies to your case, select it; otherwise, keep WebPage.
Of course this type only represents the web page about your course, not the course itself. So you’d probably want to use mainEntity to reference an EducationEvent (or whichever type applies).
Microdata with Schema.org already better describes any element than HTML5, it seems redundant? For example:
<nav itemscope itemtype="http://schema.org/SiteNavigationElement">
<!-- might as well just be... -->
<div itemscope itemtype="http://schema.org/SiteNavigationElement">
and
<article itemscope itemtype="http://schema.org/NewsArticle">
<!-- might as well just be... -->
<div itemscope itemtype="http://schema.org/NewsArticle">
Some elements create an "outline" for the webpage, but aside from that what's the point? Why not just use divs and forget about the semantic tags, and just use Microdata and Schema.org?
The schema.org definitions are specifically for applications such as search engines (From What is schema.org?):
This site provides a collection of schemas, i.e., html tags, that
webmasters can use to markup their pages in ways recognized by major
search providers. Search engines including Bing, Google, Yahoo! and
Yandex rely on this markup to improve the display of search results,
making it easier for people to find the right web pages.
Your mark-up needs to be understood by browsers and screen-readers as well as search engines (from the schema.org Getting started page):
Usually, HTML tags tell the browser how to display the information
included in the tag. For example, <h1>Avatar</h1> tells the browser to
display the text string "Avatar" in a heading 1 format. However, the
HTML tag doesn't give any information about what that text string
means—"Avatar" could refer to the hugely successful 3D movie, or it
could refer to a type of profile picture—and this can make it more
difficult for search engines to intelligently display relevant content
to a user.
So microdata allows you to add additional semantic meaning to your mark-up (using definitions provided by schema.org) which can be ignored by applications which don't need it, such as browsers, and read by applications which do, such as search engines.
Microdata is not a replacement for using the appropriate semantic-HTML tags where available, it should be used to augment that information. So the simple reason to use nav and article tags along with the microdata is that these tags have meaning to browsers and screen-readers, while the microdata does not.
Actually, your examples are fairly simplistic. I would suggest you have a look at some of the examples on the schema.org getting started page to see how microdata can be used more meaningfully.
To see microdata being used in practice, try googling yourself and inspecting the results. If I search for myself, the first three results (LinkedIn, github and my portfolio page) all display information marked up using microdata which google can pull from the pages and present to the user to help provide more meaningful search results.
The vast majority of terms that we have in schema.org have no overlap with HTML terminology, since they represent kinds of real world thing such as places, processes, products etc.
The problem area highlighted here is the small set of terms around http://schema.org/WebPageElement . I am not aware that any current search engine features make specific use of these, and I would suggest that any publishers who do see value in their use should also employ the corresponding pure HTML markup as well.
So recently am reading a book called Adaptive Webdesign and I came across something called an hcard, hcalendar and I went to it's respective documentation page. Now the question is am not understanding how this works? It is used to represent people..and the markup goes like this
<div class="vcard">
<a class="url fn" href="http://tantek.com/">Tantek Çelik</a>
</div>
Now I know these classes have meanings like url indicates that a given link takes the user to a webpage and fn signifies formatted name so on...
So does these classes point the search engines that the content is a hCard or it render's differently etc..Can someone explain me how this works, whats the benefits to do so, and does this have importance from SEO point of view and are these classes predefined?
Edit: So are these classes reserved? What if I use them for other elements? And is there any javvascript which I can call onclick of a button to save a vcard on computer/user device?
This concept allows machines the get detailed informations about content. It's quite simple, you know what a given name is. Machines does not... :)
So you need a way to tell a machine what kind of data your html contains.
For example: You could enrich your data like the example below and allow, maybe an Adressbook-Application, to get detailed informations about which fields should be filled.
<div class="vcard">
<a class="url fn" href="http://tantek.com/">
<span class="family-name">Tantek</span>
<span class="given-name">Çelik</span>
</a>
</div>
This snippet allows the Adressbook-App. to find the given name easily and set it to the correct field. Order doesn't matter here.
Test your "Rich Snippets": http://www.google.com/webmasters/tools/richsnippets
If you haven't declared that you're using the hCard syntax (by using the vcard class), then you're free to use whatever class names you'd like. Even if you did start using the hCard microformat, no styles will be applied implicitly, as microformats are not related to display style.
The purpose of using microformats is to open an interface for exposing metadata. By providing the data in a standardized microformat, anyone parsing your website can use the microformat to find relevant information.
Search engines in particular benefit from this as it allows them to provide more information about a particular resource on their results page.
vCard is a standard for an electronic business card. hCard takes these labels and uses them as class names around data in HTML.Every hCard starts inside a block that has class="vcard".
Some of these types have subproperties. For example, the 'tel' value contains 'type' and 'value'. This way you can specify separate home and business phone numbers. The 'adr' type has a lot of subproperties (post-office-box, extended-address, street-address, locality, region, postal-code, country-name, type, value).
<div class="vcard">
<div class="fn">xxxxx</div>
<div class="adr">
<span class="locality">yyyy</span>,
<span class="country-name">zzzzz</span>
</div>
</div>
The class names don't have to mean anything within your page. However, you can always take advantage of them to style your contact information. You could also style them in your browser's User Style Sheet, so that you can find them while you surf the web. (Original source)
Regarding the SEO aspects, Please checkout this article Tips for Local Search Engine Optimization for Your Site
I don't know exactly of hcard and hcalendar, but for instance, look up a Stack Overflow question on Google, you'll see that the time when it was posted appears next to the content, for many sites it also displays the name of the author.
In other words, Google will use these microformats to enhance the search experience, by providing meta-data for the search as it was parsed from the page.
You help Google, they help you.
I'd recommend you to use http://schema.org/ for microformats. Google officially recommends using it, and it is also fully supported by Bing and many other search engines. When you use schema.org microformats, search engine crawlers will extract data entities from your markup and will display them in search results in corresponding manner.
So yes, there are benefits of using microformats. By using them you can improve behavior of search engine crawlers, your content will be properly indexed and what is more important, it will be properly categorized, so it will appear in customized searches.
Reading an article on the <article> tag on HTML5, I really think my biggest confusion is in the first question of this section:
Using <article> gives more semantic meaning to the content. By contrast <section> is only a block of related content, and <div> is only a block of content... To decide which of these three elements is appropriate, choose the first suitable option:
Would the content would make sense on its own in a feed reader? If so, use <article>.
Is the content related? If so, use <section>.
Finally, if there’s no semantic relationship, use <div>.
So I guess my question is really: What types of content belong in a feed reader?
The spec answers this quite clearly:
The article element represents a self-contained composition in a
document, page, application, or site and that is, in principle,
independently distributable or reusable, e.g. in syndication. This
could be a forum post, a magazine or newspaper article, a blog entry,
a user-submitted comment, an interactive widget or gadget, or any
other independent item of content.
see: http://dev.w3.org/html5/spec/Overview.html#the-article-element
The W3C spec leaves a lot open to interpretation and it ultimately comes down to the author's opinion. Here is a short and simple answer in the form of a question:
What are the primary significant pieces of content you want to share on the page?
Here are a few examples:
On this very page, each answer could be an article.
On flickr each photo displayed in the photostream could be considered an article.
On dribbble each shot displayed on the page could be an article.
On google each search result listed could be an article.
On a blog each article.. well each article could be an article.
On a blog page with an article and a series of comments you could have two major sections. One with an article and another for comments in which each comment could be considered an article.
It's the author's discretion as to how far they want to go. Most blog authors have an RSS feed for their articles, but others may also provide feeds for comments, and shared links.
A lot of people have written on this subject. For further information I highly recommend reading:
http://html5doctor.com/the-article-element/ (you've already shared this)
http://www.impressivewebs.com/html5-section/
http://www.iandevlin.com/blog/2011/04/html5/html5-section-or-article
You've brought up a good argument and yes the spec does rather clearly define <article> as a syndication-worthy collection of content. The way I see it, your article would be the composed blog post – what you as the content writer of the site produce. While comments on that section are related to the article, they are not, in fact, part of the article, and should be relegated to another block in the <section>, either a non-semantic <div> or simply <p>s with display:block set. This is a decision that's left to the designer, depending on how they semantically evaluate the worth of the commentary.
Remember too that you have the <aside> tag, which is almost tailor-made for commentary, whether from the author or from the reader.
Most feed readers can handle many types of content, it could include copy, images, videos, etc. The feed for your will include the content on your site that is repeated or includes multiple versions. A question and answer site will have a feed of new questions. A video sharing site will have a feed of new videos. A software review site will have a feed of new software or new reviews.
I'd recommend considering what the typical consumer of your content would want to find easily in their feed reader. You get to define what types of content belong in a feed reader.
A feed reader, in general, should contain a list of stories. Look at http://google.com/elections/ - it's a good example of the sort of thing a feed reader might contain. The important part is that all the stories are self-contained, and in theory do not need to be related at all.
The markup for that document could look like the following:
<body>
<header>...</header>
<nav>...</nav>
<article>
<section>
...
</section>
</article>
<aside>...</aside>
<footer>...</footer>
</body>
You may find more information in this article on A List Apart.