Using JSON-LD for on-site reviews - html

I read the article The Complete Guide to Creating On-Site Reviews + Testimonials Pages. I would like to create my own solution on our website to collect reviews on our website that Google can find. I'm not 100% sure if I understand this correctly.
So I would create a form with appropriate inputs and take that user input and create a JSON-LD object in a <script> tag and place that in the head of our /reviews/ page. So each review listed on our /reviews/ page would be in an array of JSON-LD objects, and that's how Google can find it?
Is it as simple as that? Placing the JSON-LD in the <head> with the correct data?
This site was used as an example on the article I linked. They use a third-party service that is basically doing what I am going to set out to do. I don't see the data in the head when viewing source, but I guess it's a good practice to hide the JSON-LD somewhere? I see a JSON-LD script, but it's empty.
Can someone help me understand this better?

The idea is to provide machine-readable structured data about the reviews, using the vocabulary Schema.org. Three syntaxes are supported: JSON-LD, Microdata, RDFa.
See a comparison. With Microdata and RDFa, you would add HTML attributes to the existing markup for the reviews. With JSON-LD, you would add the structured data in a separate script element and leave the review markup untouched.
This script element can be in the head or in the body. By default, it’s visually hidden no matter where it’s placed.
If you provide such structured data, consumers (like Google Search) may make use of it. For example, Google Search offers the Review rich result feature. Their documentation describes which Schema.org types/properties are needed to qualify for it.

Related

HTML element for importing semantic (linked) data

I want to include some semantic information of another website in my own site (for reusing the information instead of copying it). Is there a standardized HTML tag for this? (like it is possible with videos, images, etc.)
As an example, let's take some code from schema.org:
<div itemscope itemtype="http://schema.org/Offer">
<span itemprop="name">Blend-O-Matic</span>
<span itemprop="price">$19.95</span>
<link itemprop="availability" href="http://schema.org/InStock"/>Available today!
</div>
Now, I want to include the price information in my site. How can I do this? (I imagined to use something like this <information src="..." type="microdata" attributes="price" query="name=Blend-O-Matic" type="http://schema.org/Offer"/> but haven't found anything.)
HTML doesn’t offer something like this. The equivalent of img/video/etc. would be iframe, but this only allows displaying the whole HTML document, not just a specific part of it.
On the level of structured data (e.g., using Microdata, or RDF serializations like RDFa and JSON-LD), you can refer to another thing by referencing that thing’s URI (if the publisher defined one), but not to a property of that thing.
If you want to display the data on your page, you have to
get the data (scraping, API, SPARQL, …),
include the data (either on the client-side with JavaScript, or on the server-side with a programming language of your choice), and
regularly check the original source for updates.

Alternative to HTML standard for expressing static documents content

The content tends to be mixed with it's form when expressed as a HTML+CSS+JS document. Almost every modern website requires CSS and/or JavaScript to be readable. Most of them are not easy to parse automatically because they relay on web browser to render it. Sections of the document are defined using visual clues, colors and formatting. One can use HTML5 tags like <article> but those are not a part of any bigger structure as far as I know, and still can contain non-content elements.
Websites are basically apps or clients.
Is there any standard that can be used to serve content of a website that has a well defined schema? An API for websites that could be used to express content in the form that is easy to server, parse, store, cryptographically sign...
I'm aware of formats like XML and JSON but I have not managed to find any standardized way to express a blog post as a JSON document.
An example of what I have in mind:
This question can be fetched as an JSON document using Stackexchange API. The result is machine readable and easy to parse but is in not standardized. It reflects details of Stackexchange specific data structures. Other QA website will have different API, with different structure and formats even though both have questions and answers.
There are two important standards out there dealing with the semantic aspect of a web page, like the one you are looking for. Microdata and RDFa. With their aid, you can pick a certain open vocabulary to describe your data or create your own based on them.
With JSON-LD also, you can create a schema for JSON documents like the XML schema is for the XML documents.

Is Google microformats supposed to be visible on the web page?

I was trying to add microformats as following to my webpage:
<div itemscope itemtype="http://schema.org/Product">
<span itemprop="brand">Company Name</span>
<span itemprop="name">Product Name</span>
<span itemprop="description">Product Description</span>
Product #: <span itemprop="sku">12345</span>
</div>
I thought this microformat will only show up in a google search result page. But after adding it, those information became visible on my webpage, and not in a good shape.
Is there something wrong? Or should I use display:none to make it invisible on my webpage?
Microformats are meant to add machine readable meaning to existing content on the page. They're not invisible meta data, they augment content that's already there. So, yes, it'll show up. You can hide or style it via any of the usual ways in which you hide or style content.
You are using Microdata, not Microformats.
Microdata is a syntax to include structured data within HTML5. Ideally you would use your existing content (i.e., add the needed attributes like itemprop etc. to your already existing markup), and only if that’s not possible, the hidden elements meta and link (which are allowed in the body if used for Microdata).
If you don’t want to use your existing markup and the visible content, you could use an alternative syntax: JSON-LD. This gets included as a data block (using the script element), which is not visible by default.
Don't try to use hide or style on your content, it will have a bad impact on your site. You might get penalized for cloaking if you practice it on all of your pages.
If you are trying to mark/let the bots know about some more info that is not on your page you can try using either the Data Highlighter for simple things in you Search Engine Console (Webmaster Tools) or for more complicated stuff you can try using JSON-LD coding on you pages.
Microformats are HTML. Used to publish a standard API that is consumed and used by search engines, browsers, and other web sites. Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Microformats are a way to enable "smart scraping" of web pages, so that you can create tools and scripts that losslessly extract machine-readable information from cleanly-formatted, human-readable HTML. Structured Data is the name given to content which is marked up in a specific way, using MicroFormatting, to explain what that content is all about.
It is always recommended to show the Microdata information and not to hide it. You can probably try to give a good shape. It would show up in the Google and Bing result pages as well but you need to wait a little for that. There is nothing wrong with the Microformats applied by you. The thing is SEO need some more patience.

Why use Schema.org microdata to mark up web page elements?

I understand why and how to use Schema.org to add microdata to your site, this is not a question about that. The question is why Schema.org has support for certain things that can be marked up with simple HTML5. Among these are
Types
WebPage and WebSite
I can see why WebPage and WebSite would be needed, for example, to reference the page/site of a certain organization in a link, but there's no need to mark up your own page with this—the <html> tag does this.
SiteNavigationElement
Why not just use <nav>?
Table
Just use <table>.
properties
WebPage/mainContentOfPage
<main> element
WebPage/relatedLink
<link> element inside <head>
This answer is primarily about the WebPageElement types (like SiteNavigationElement).
For WebPage, see my answer to the question Implicity of web page structure in Schema.org (tl;dr: it can be useful to provide WebPage, even for the current page).
For WebSite, similar reasons from the answer above apply. HTML doesn’t allow you to state something about the whole site (and, by the way, a Google rich result makes use of this type).
Schema.org is not restricted to HTML5.
Schema.org is a vocabulary which can be used with various syntaxes (like JSON-LD, Microdata, RDFa, Turtle, …), stand-alone or in various host languages (like HTML 4.01, XHTML 1.0/1.1, (X)HTML5, XML, SVG, …). So having other ways to specify that something is (or: is about; or: represents) a site-wide navigation, a table etc. is the exception rather than the rule.
But there can be reasons to use these types even in HTML5 documents, for example:
The HTML5 markup and the annotations from Microdata/RDFa are two "different worlds": a Microdata/RDFa parser is only interested in the annotations, and after successfully parsing a document, the underlying markup is of no relevance anymore (e.g., the information that something was specified in a table element is lost in the Microdata/RDFa layer).
By using types like WebPageElement, you can specify metadata that is not possible to specify in plain HTML5. For example, the author/license/etc. of a table.
You can use these types to specify data about something which does not exist on the current document, e.g., you could say on your personal website that you are the author of a table in Wikipedia.
That said, these are not typical use cases relevant for a broad range of authors. Unless you have a specific reason for using them, you might want to omit them. They are not useful for typical websites. Using them can even be problematic in some cases.
See also my Schema.org issue The purpose of WebPageElement and mainContentOfPage, where I suggested to deprecate WebPageElement and the mainContentOfPage property.
Just use <table>.
You seem to be reading the title of the pages and no further. The <table> tag doesn't have the dozens of special properties listed on that page like isFamilyFriendly or license or timeRequired.
Schema.org microdata is intended to build a standard set of additional, semantic metadata that can be used by automated systems - search engine spiders, parser robots, etc. - to better understand the nature and features of the content.

HTML5 and Schema.org, why use both?

Microdata with Schema.org already better describes any element than HTML5, it seems redundant? For example:
<nav itemscope itemtype="http://schema.org/SiteNavigationElement">
<!-- might as well just be... -->
<div itemscope itemtype="http://schema.org/SiteNavigationElement">
and
<article itemscope itemtype="http://schema.org/NewsArticle">
<!-- might as well just be... -->
<div itemscope itemtype="http://schema.org/NewsArticle">
Some elements create an "outline" for the webpage, but aside from that what's the point? Why not just use divs and forget about the semantic tags, and just use Microdata and Schema.org?
The schema.org definitions are specifically for applications such as search engines (From What is schema.org?):
This site provides a collection of schemas, i.e., html tags, that
webmasters can use to markup their pages in ways recognized by major
search providers. Search engines including Bing, Google, Yahoo! and
Yandex rely on this markup to improve the display of search results,
making it easier for people to find the right web pages.
Your mark-up needs to be understood by browsers and screen-readers as well as search engines (from the schema.org Getting started page):
Usually, HTML tags tell the browser how to display the information
included in the tag. For example, <h1>Avatar</h1> tells the browser to
display the text string "Avatar" in a heading 1 format. However, the
HTML tag doesn't give any information about what that text string
means—"Avatar" could refer to the hugely successful 3D movie, or it
could refer to a type of profile picture—and this can make it more
difficult for search engines to intelligently display relevant content
to a user.
So microdata allows you to add additional semantic meaning to your mark-up (using definitions provided by schema.org) which can be ignored by applications which don't need it, such as browsers, and read by applications which do, such as search engines.
Microdata is not a replacement for using the appropriate semantic-HTML tags where available, it should be used to augment that information. So the simple reason to use nav and article tags along with the microdata is that these tags have meaning to browsers and screen-readers, while the microdata does not.
Actually, your examples are fairly simplistic. I would suggest you have a look at some of the examples on the schema.org getting started page to see how microdata can be used more meaningfully.
To see microdata being used in practice, try googling yourself and inspecting the results. If I search for myself, the first three results (LinkedIn, github and my portfolio page) all display information marked up using microdata which google can pull from the pages and present to the user to help provide more meaningful search results.
The vast majority of terms that we have in schema.org have no overlap with HTML terminology, since they represent kinds of real world thing such as places, processes, products etc.
The problem area highlighted here is the small set of terms around http://schema.org/WebPageElement . I am not aware that any current search engine features make specific use of these, and I would suggest that any publishers who do see value in their use should also employ the corresponding pure HTML markup as well.