How do I use Microdata fields correctly? - html

The process of generating a list of events in a page a bit confusing to me.
In this example is the url supposed to represent the current page or a page you are referring to?
<div itemscope itemtype="http://data-vocabulary.org/Person">
Would that be different from this example where I assume it literally does refer to the href?
www.example.com
Can itemprop="locality" be used on zip codes or other postal codes?
Also, is there a way to specify you are referring to an Event and not a Person?

You definitely seem to have the right idea so far. I have used Schema.org before, for setting up Microdata, and they supply an event property to hook into.
<div itemscope itemtype="http://www.schema.org/Event">
Also, navigate to the Event information page to get a full readout of what properties it has.
It does have the option for a location with itemprop="location". You can see which Itemscopes you can use location on here. One of the options is PostalAddress which has a some of examples which include using postalCode as a property.
<div itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="name">Google Inc.</span>
P.O. Box<span itemprop="postOfficeBoxNumber">1234</span>
<span itemprop="addressLocality">Mountain View</span>,
<span itemprop="addressRegion">CA</span>
<span itemprop="postalCode">94043</span>
<span itemprop="addressCountry">United States</span>
</div>
There is also a full tree-view of all of the properties available to hook into.

The value 'locality' refers to the city of a delivery address, so if you want to specify the postal code you can use the attribute postal-code:
<span itemprop="postal-code">99999</span>
Source: http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#names:-the-itemprop-attribute

Related

Using XPATH to get text AFTER one closed tag and BEFORE the beginning of another specific tag?

I'm using XPATH to extract information from a website which generates data of the following structure:
<span class="classA">
<span class="classA1">aaa:</span> <strong>ccc</strong><br>
<span class="classA1">ddd:</span> eee<br>
<span class="classA1">fff:</span> <b>ggg gg </b><br>
...
<span class="classA1">hhh:</span>
jjj,
...
lll<br>
<br>
</span>
<span class="classB">mmm <b>nnn</b> ...
<br><br>
</span>
<span class="classA">
<span class="classA1">ooo:</span> ppp<br>
<span class="classA1">qqq:</span> rrr<br>
...
</span>
A few things to note first:
the exact number of <span class="classA1"> tags varies
the <a> tags after <span class="classA1">hhh:<span> varies
To extract what follows the individual classA1 spans, I use this XPATH definition:
//span[contains(text(),'aaa:')]//following::text()[1]
//span[contains(text(),'ddd:')]//following::text()[1]
//span[contains(text(),'fff:')]//following::text()[1]
...
And so on.
Trying to extract the text after <span class="classA1">hhh:<span>, that is, either the plain text "jjj" and "lll" or the whole html part (i.e. "jjj,...lll"), I keep running into problems.
Since, as I mention above, the number of tags there may vary greatly and is unpredictable, I cannot simply identify them by index number. And if I use the following, I also get everything that follows including the following classB span, which I definitely don't need or want.
//span[contains(text(),'hhh:')]//following::text()
Can you, please, suggest an XPATH solution?
Many thanks!
Since your source html shows indention not corresponding the parent/child relation, it is not totally clear but maybe this helps:
//span[contains(.,'mmm')]/preceding::span[contains(.,'hhh:')][1]/following-sibling::a[not(span[contains(.,'mmm')])]
If I understand correctly what are you asking for, this should give you all the a elements coming after the <span class="classA1">hhh:</span> element:
//span[#class='classA1' and text()='hhh']/following-sibling::a
Now you can iterate over the list or resulting a elements and extract their texts.
Alternatively you can get their texts directly with this:
//span[#class='classA1' and text()='hhh']/following-sibling::a/text()

How to implement Schema.org properties in meta data?

Schema.org describes how to implement object properties using the meta tag but the examples given are properties with primitive types such as Text or Boolean. Let's say I want to display a grid of images and each image is of type ImageObject. The copyrightHolder property itself is either an Organization or Person. If I want to include the organization legal name, how would I do that using only meta data?
With "regular" HTML elements I would write:
<span itemprop="copyrightHolder" itemscope itemtype="http://schema.org/Organization">
<span itemprop="legalName">ACME Inc.</span>
</span>
This obviously doesn't look right:
<meta itemprop="copyrightHolder" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="legalName" content="ACME Inc.">
</meta>
The only thing that comes into mind is using a set of hidden spans or divs.
Using Microdata, if you want to provide structured data that is not visible on the page, you can make use of these elements:
link (with itemprop) for values that are URLs
meta (with itemprop) for values that aren’t URLs
div/span (with itemscope) for items
So your example could look like this:
<div itemscope itemtype="http://schema.org/ImageObject">
<div itemprop="copyrightHolder" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="legalName" content="ACME Inc." />
</div>
</div>
If you want to provide the whole structured data in the head element (where div/span aren’t allowed), see this answer. If you only want to provide a few properties in the head element, you can make use of the itemref attribute.
That said, if you want to provide much data in that hidden way, you might want to consider using JSON-LD instead of Microdata (see a comparison).
I was reading Getting Started again and noticed 2b that states
When browsing the schema.org types, you will notice that many properties have "expected types". This means that the value of the property can itself be an embedded item (see section 1d: embedded items). But this is not a requirement—it's fine to include just regular text or a URL.
So I assume it would be fine to just use
<meta itemprop="copyrightHolder" content="ACME Inc.">

SDTT: "A value for the image field is required"

I have this snippet in a LocalBusiness listing (based on this example):
<div itemscope itemtype="http://schema.org/LocalBusiness">
<div itemprop="image" itemscope="" itemtype="http://schema.org/ImageObject">
<img itemprop="contentUrl" src="/images/trouwlocatiefotos/medium/315_24_83_Veranda-005.jpg">
</div>
</div>
But Google's structured data testing tool throws an error:
image
A value for the image field is required.
Why is it throwing the error?
Testing the URL directly: https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fwww.wonderweddings.com%2Fweddingvenues%2F315%2Fbeachclub-sunrise
The markup snippet you posted doesn’t give the quoted error. So your actual markup is probably doing things differently.
It seems that your image property isn’t nested under the LocalBusiness item:
Line 396: <div itemscope itemtype="http://schema.org/LocalBusiness">
Line 372: <div itemprop='image' itemscope itemtype='http://schema.org/ImageObject'>
No itemref involved.
So your LocalBusiness item really doesn’t have an image property. Instead, the image property seems to be specified without any parent item (= itemscope), which is invalid.
Google’s SDTT probably ignores this error and parses the ImageObject as a top-level item, which is why it’s listed on its own (next to LocalBusiness and BreadcrumbList).
How to fix this?
If you can’t move the elements to nest them (like in your example snippet), you could make use of Microdata’s itemref attribute:
<div itemscope itemtype="http://schema.org/LocalBusiness" itemref="business-image"></div>
<div itemprop='image' itemscope itemtype='http://schema.org/ImageObject' id="business-image"></div>
Add in snippet
In LocalBusiness schema, Required image, PriceRange field
Properties from Thing - Google returns errors..
Error:
image=A value for the image field is required.
priceRange=The priceRange field is recommended. Please provide a value if available.
Ans: add in code
1.For (image,logo,photo)= Image Object or URL = An image of the item. This can be a URL or a fully described ImageObject.
For priceRange = Text = The price range of the business, for example $$$.
That items mandatory in LocalBusiness

Represent person/representative or writer/agent relationship with RDFa, schema.org and FOAF

I'm a little confused how to model a writer-agent relationship using RDFa (Lite), schema.org and FOAF. I'm not even sure if I need FOAF.
Let's say I publish a book, me being the writer and represented by an agent. So we have two Persons, one is me and one is the agent. To clarify, my intention is to link the agent as a contact point for the writer, while at the same time indicating that the writer is me, the subject of the page:
<!-- the agent representing me -->
<div resource="/Writecorp/Michael Stern" vocab="http://schema.org/" typeof="Person">
<span property="name">Michael Stern</span>
<div property="memberOf">
<div typeof="Organization">
<span property="name">Writecorp Inc. agency</span>
</div>
</div>
</div>
<!-- the writer, me -->
<div rel="me" vocab="http://schema.org/" typeof="Person">
<link rel="agent" property="contactPoint" href="/Writecorp/Michael Stern" />
<span property="name">H. P. Lovecraft</span>
</div>
The <link> solution I gleaned from https://stackoverflow.com/a/19389163/441662.
When I feed this to the RDFa 1.1 Distiller and Parser, it shows the following output:
#prefix ns1: <http://www.w3.org/ns/rdfa#> .
#prefix ns2: <http://schema.org/> .
<> ns2:me [ a ns2:Person;
ns2:contactPoint </Writecorp/Michael Stern>;
ns2:name "H. P. Lovecraft" ];
ns1:usesVocabulary ns2: .
</Writecorp/Michael Stern> a ns2:Person;
ns2:memberOf """
Writecorp Inc. agency
""";
ns2:name "Michael Stern" .
[] a ns2:Organization;
ns2:name "Writecorp Inc. agency" .
Did it recognize rel="me" properly? It is showing ns1:me, but I can't find anything about it in the referred namespace vocabulary, schema.org. Should I use a FOAF prefix and then use foaf:me? I can't find many examples on that either.
How do I model the agent as a contactPoint relationship? According to schema.org and Google's testing tool, a Person is not allowed to be a contactPoint.
Solution?
One solution proposed further down is to have an entity that is both a ContactPoint and a Person, but Google's validator doesn't seem to like it much.
Another possible solution is to have both agent and writer point to the same ContactPoint resource (see https://stackoverflow.com/a/30055747/441662).
Concerning rel="me", that came from a microformats example and is not possible with schema.org (yet, as #unor states in his answer) or foaf.
/edit 7-5-2015: I raised a GitHub issue for this problem. I'll update this post when I learn more...
While agent is a Schema.org property, its domain is Action (which you don’t seem to intend). And it’s neither a FOAF property nor a registered link type (so it must not be used in HTML5). So I guess you’d have to find an appropriate property instead.
me is a link type, not a Schema.org or FOAF property. But as you are using vocab, the RDFa parser assumes that it’s a property from the default vocabulary (Schema.org, in your case). I’m not sure if you really intend to use it as link type (as you are using in the RDFa-way on non-link elements).
(If the use of link types is intended, a possible solution is to use prefix instead of vocab. This way, unprefixed values of rel are interpreted as link types, prefixed values as properties.)
If using Schema.org, the book would be of type Book. You would be the author of this Book.
You’d have to check the available properties for Book (if the agent is related to your work, not your person) or Person (or Organization if it’s your business) if Schema.org offers a suitable property for specifying your agent. Ah, I missed that the agent should be a ContactPoint. Now, I doubt if Schema.org intended that this type could also refer to organizations or persons, but I guess nothing is stopping you from stating that something is a ContactPoint and an Organization.
Regarding resource (or about): Yes, it’s usually better to provide URIs for your entities instead of using blank nodes. That way, you and others can make statements about these entities, in the same or a different document.
So ideally, you would give every entity an URI (including yourself, different to the document’s URI).
For example, on the web page http://example.com/lovecraft, you could have:
<body prefix="schema: http://schema.org/">
<div typeof="schema:Person" resource="#me"></div>
<div typeof="schema:Organization schema:ContactPoint" resource="#agent"></div>
<div typeof="schema:Book" resource="#book-1"></div>
</body>
Now your URI is http://example.com/lovecraft#me (this represents you, the person, not the page about you), your agent’s organization has the URI http://example.com/lovecraft#agent, your book has the URI http://example.com/lovecraft#book-1.
This allows you to make statements about these, in various ways, e.g.:
<body prefix="schema: http://schema.org/">
<div typeof="schema:Person" resource="#me">
<link property="schema:contactPoint" href="#agent" />
<link property="schema:author" href="#book-1" />
</div>
<div typeof="schema:Organization schema:ContactPoint" resource="#agent"></div>
<div typeof="schema:Book" resource="#book-1"></div>
</body>
To state that the page (http://example.com/lovecraft) is about you (http://example.com/lovecraft#me), you could wait for Schema.org’s mainEntity property (included in the next release), and/or use Schema.org’s about property, and/or use FOAF’s isPrimaryTopicOf property.
One option is to have both agent and writer point to the same ContactPoint resource.
This seems to work somewhat. This allows the proper markdown to format the agent and its contact details and at the same time have the writer point to the agent's contact details. However, this is still not relating the agent properly as a representative for the writer (i.e. don't deal with me, but with my agent). And I'm not sure how machine readers will handle this situation.
<!-- the agent representing me -->
<div resource="/Writecorp/MichaelStern" vocab="http://schema.org/" typeof="Person">
<span property="name">Michael Stern</span>
<link property="contactPoint" href="/Writecorp/MichaelStern#contact" />
<div resource="/Writecorp/MichaelStern#contact" vocab="http://schema.org/" typeof="ContactPoint">
<meta property="name" content="Michael Stern" />
<div>Phone:
<span property="telephone">(540) 961-4469</span>
</div>
<div>
<a property="email" href="mailto:michael.stern#writecorp.inc.agency.com">michael.stern#writecorp.inc.agency.com</a>
</div>
</div>
<div property="memberOf">
<div typeof="Organization">
<span property="name">Writecorp Inc. agency</span>
</div>
</div>
</div>
<!-- the writer, me -->
<div vocab="http://schema.org/" typeof="Person">
<link property="contactPoint" href="/Writecorp/MichaelStern#contact" />
<span property="name">H. P. Lovecraft</span>
</div>
Notice that the name property on ContactPoint is a meta tag instead of a normal span element. This is to prevent double name output, while giving machine readers a way to still populate their data model with a contact point name.
Turtle output in RDFa 1.1 Distiller and Parser:
#prefix ns1: <http://www.w3.org/ns/rdfa#> .
#prefix ns2: <http://schema.org/> .
<> ns1:usesVocabulary ns2: .
</Writecorp/MichaelStern> a ns2:Person;
ns2:contactPoint </Writecorp/MichaelStern#contact>;
ns2:memberOf """
Writecorp Inc. agency
""";
ns2:name "Michael Stern" .
</Writecorp/MichaelStern#contact> a ns2:ContactPoint;
ns2:email <mailto:michael.stern#writecorp.inc.agency.com>;
ns2:name "Michael Stern";
ns2:telephone "(540) 961-4469" .
[] a ns2:Person;
ns2:contactPoint </Writecorp/MichaelStern#contact>;
ns2:name "H. P. Lovecraft" .
[] a ns2:Organization;
ns2:name "Writecorp Inc. agency" .
/edit 7-5-2015: I raised a GitHub issue for this problem. I'll update this post when I learn more...

UserComments in test: “dtstart required”, but not part of standard?

I put some effort in marking up an ancient message board with schema.org/UserComments microdata. Testing it in WMT yields an error message: Missing required field "dtstart".
Here’s an item, and apart from the table markup, I think it’s all fine:
<tr itemscope itemtype="http://schema.org/UserComments" itemprop="comment">
<td>
<meta content="2013-09-23T17:39:14+01:00" itemprop="commentTime">
<meta content="http://example.com/cmts/?id=321" itemprop="replyToUrl">
<meta content="comment’s title" itemprop="name">
<div itemscope itemtype="http://schema.org/Person" itemprop="creator">
<a itemprop="url" href="http://www.example.com/user/Nickname">
<img itemprop="image" src="http://cdn.example.com/pic.jpg">
<span itemprop="name">Nickname</span>
</div>
</td>
<td>
<p itemprop="commentText">the comment’s actual text</p>
</td>
</tr>
In UserComments, there’s no field named “dtstart”. In a similiar, yet not helpful question, there’s another link to WMT, stating somewhat implicit that startDate and dtstart are synonyms. This does not prove true, at least not for UserComments.
Is it a hitch at Google, so I can disregard it? Am I missing some point (datetime instead of content)?
Your Microdata and Schema.org usage is correct. They don’t define any required properties. So when the Google Structured Data Testing Tool reports "Missing required …" errors, it only means that Google (probably) won’t consider displaying a Rich Snippet when specific properties are missing.
When testing your snippet with a parent item for the comment property, no errors are reported, e.g.:
<article itemscope itemtype="http://schema.org/CreativeWork">
<table>
<!-- your tr here -->
</table>
</article>
Another solution: adding a startDate property (but Google might want to see a date from the future here.)
(The term "dtstart" probably comes from the data-vocabulary.org vocabulary, where Google required this property for the Event Rich Snippet. And Schema.org’s UserComments is also some kind of Event, see notes below.)
If you don’t care about Google’s Rich Snippets, you can keep it like that.
Notes about your snippet:
You might want to use Comment instead of UserComments (because the latter one is an Event, not a CreativeWork).
However, currently, the comment property expects UserComments, but this will most likely change in one of the next Schema.org updates.
For specifying replyToUrl, you must use link instead of meta.