Schema.org WebPage markup on 'body' element with 'id' attribute - html

I would like to insert WebPage markup in my code. So it's written that this - itemscope="" itemtype="http://schema.org/WebPage" - should be put into body tag. But in body tag I have id="top-page".
When I copy the Microdata line to the body and test it with Google’s tool, it shows me http://xxxx.yy/top-page as #id.
How to avoid it?

This is most likely a bug in Google’s tool. You don’t have to worry about it. But if you do, there are two workarounds:
Specify itemid on body in addition. This is the correct attribute responsible for providing an ID in Microdata, not id. You should provide the canonical URL of the web page as value.
Specify itemscope itemtype="http://schema.org/WebPage" on a different element (one that doesn’t have an id attribute). While it’s often useful to specify WebPage on the body, this is not required.
The first solution is preferable, as it’s generally a good practice to provide IDs for your structured data items.

Related

In semantic HTML does the class attribute mean anything in the absence of CSS or Javascript?

For example, does the class film_review mean anything in <article class="film_review"> (example from MDN) if there's no CSS or Javascript interacting with the page, or does it provide semantic information?
It doesn't provide an information that contemporary browsers would interpret or use without CSS or Javascript per se.
However it can carry semantic information - see e.g. microformats. For example, you could put an hcard
<div id="hcard-John-Doe" class="vcard">
<span class="fn">John Doe</span>
<div class="org">Cool Institute, Inc.</div>
<div class="adr"><span class="locality">Prague</span></div>
</div>
on your page and it carries a semantic information. A search engine like Google could infer that "John Doe" is a name of a person located in "Prague". There are other microformats that can represent geo information, calendar events, etc.
Anyone can write their own processor of HTML documents that would interpret class attribute values, so the answer is yes, it provides semantic information.
Quoting from hcard microformat example:
Per the HTML4.01 specification, authors should be using the element to indicate the "contact information for a document or a major part of a document." E.g.
<address>
Tantek Çelik</address>
By adding hCard to such existing semantic XHTML, you can explicitly indicate the name of the person, their URL, etc.:
<address class="vcard">
<a class="fn url" href="http://tantek.com/">Tantek Çelik</a>
</address>
It provides semantics purely in the sense that it semantically connects that element with other elements of the same class.
There's no rule which states that anything (specifically CSS and/or JavaScript in this case) must use that class. The class itself is simply part of the markup and is coincidentally being ignored by the current styling rules.
You might have other elements with the film_review class, and they are "semantically" connected in the sense that they represent "film reviews" in the markup. That's really all semantic information is... context about the thing being represented in the code. Well-named classes can provide such additional context.
But there's nothing special that the browser is going to do with this information. It's just there in case anybody (styling, code, or even just somebody looking at the markup) wants to know that this article belongs to a named class of elements.
Semantics on HTML5 are more oriented on standarizing the most used elements around the web. As described on HTML Semantic Elements:
With HTML4, developers used their own favorite attribute names to style page elements:
header, top, bottom, footer, menu, navigation, main, container, content, article, sidebar, topnav, ...
This made it impossible for search engines to identify the correct web page content.
With HTML5 elements like: <header> <footer> <nav> <section> <article>, this will become easier.
So an element so specific as a "Film Review" would not provide that much semantic information at HTML5 level.
That depends. Who and what else is processing your HTML?
For example, microformats sometimes use classes to add semantic information to elements which don't naturally possess rich semantics. In that case, neither ECMAScript nor CSS process that information, but a microformats parser might. film_review doesn't belong to any well-known microformat, however.
Everything on the page gets parsed (read) by a search-engine, so your answer is, YES, it does provide semantic information, however there are different weighted value associated with different HTML tokens (elements, attribute-names, attribute-values).
However what really defines how much weight a HTML token gets, is really dependent on the type of document that you declare it is (HTML4/HTML5), the <!DOCTYPE> tag at the top of your page declares that to the search-engine bot/parser what type of document it is, which in turn controls the search-engine bot's parsing-schema (behavior) on how to read your document.
The entire purpose of HTML5 was to provide "semantics", allowing you to use different tags so you can markup/define your document giving content more importance allowing search-engines to understand it better. This allows the search-engine a much better way to then supply the end-user, whom is searching for something with more relevant content associated with their search term... if your not using HTML5 and using HTML4 then the bots are relying mostly on HTML attributes to define the content within tags such as a <div> which provides no semantic meaning to the content inside it.

Why use Schema.org microdata to mark up web page elements?

I understand why and how to use Schema.org to add microdata to your site, this is not a question about that. The question is why Schema.org has support for certain things that can be marked up with simple HTML5. Among these are
Types
WebPage and WebSite
I can see why WebPage and WebSite would be needed, for example, to reference the page/site of a certain organization in a link, but there's no need to mark up your own page with this—the <html> tag does this.
SiteNavigationElement
Why not just use <nav>?
Table
Just use <table>.
properties
WebPage/mainContentOfPage
<main> element
WebPage/relatedLink
<link> element inside <head>
This answer is primarily about the WebPageElement types (like SiteNavigationElement).
For WebPage, see my answer to the question Implicity of web page structure in Schema.org (tl;dr: it can be useful to provide WebPage, even for the current page).
For WebSite, similar reasons from the answer above apply. HTML doesn’t allow you to state something about the whole site (and, by the way, a Google rich result makes use of this type).
Schema.org is not restricted to HTML5.
Schema.org is a vocabulary which can be used with various syntaxes (like JSON-LD, Microdata, RDFa, Turtle, …), stand-alone or in various host languages (like HTML 4.01, XHTML 1.0/1.1, (X)HTML5, XML, SVG, …). So having other ways to specify that something is (or: is about; or: represents) a site-wide navigation, a table etc. is the exception rather than the rule.
But there can be reasons to use these types even in HTML5 documents, for example:
The HTML5 markup and the annotations from Microdata/RDFa are two "different worlds": a Microdata/RDFa parser is only interested in the annotations, and after successfully parsing a document, the underlying markup is of no relevance anymore (e.g., the information that something was specified in a table element is lost in the Microdata/RDFa layer).
By using types like WebPageElement, you can specify metadata that is not possible to specify in plain HTML5. For example, the author/license/etc. of a table.
You can use these types to specify data about something which does not exist on the current document, e.g., you could say on your personal website that you are the author of a table in Wikipedia.
That said, these are not typical use cases relevant for a broad range of authors. Unless you have a specific reason for using them, you might want to omit them. They are not useful for typical websites. Using them can even be problematic in some cases.
See also my Schema.org issue The purpose of WebPageElement and mainContentOfPage, where I suggested to deprecate WebPageElement and the mainContentOfPage property.
Just use <table>.
You seem to be reading the title of the pages and no further. The <table> tag doesn't have the dozens of special properties listed on that page like isFamilyFriendly or license or timeRequired.
Schema.org microdata is intended to build a standard set of additional, semantic metadata that can be used by automated systems - search engine spiders, parser robots, etc. - to better understand the nature and features of the content.

HTML-tag to annotate the origin of a section?

Google don't like it when you use same content across multiple sites, according to some.
Is there any way to annotate/tag a block of content with the "source".
Something like an attribute:
<div original-content="http://some.url">
The purpose is solely to let Google that we have duplicated the content (I.e. not as part of a search ranking strategy). Search engines could then use this information somehow.
This might help you out:
http://searchengineland.com/google-creates-metatags-to-help-id-original-news-sources-56115
Looks like the meta tag you want is
meta name=”original-source” content=”[url]”
However it looks like that is only for an entire page.
Use the canonical tag, which tells the web engine crawler that the text is duplicated from the original website.
Example:
Place this in the header of your HTML page (in the duplicated content page)
<link rel="canonical" href="http://www.original-website.com" />
Reference: Canonical URL Tag - The Most Important Advancement in SEO Practices Since Sitemaps
No, HTML has no such element or attribute.
If you quote the content (in a q or blockuote element), you could use the cite attribute. But you must not use these elements for anything other than quotes.
If the whole document is duplicated (or is a subset), you could use the canonical link type. But you must not use this if only part of the document is duplicated while the other parts are different.

Hide Microdata property value in 'content' attribute?

I work on a website that recently had Schema.org markup added to it, but I think it is being used wrong.
Schema.org gives the example of
<span itemprop="name">Generic Name Here</span>
Our website implemented it in the following way
<span itemprop="name" content="Generic Name Here"></span>
Is the second way, our way, considered cloaking? We display the data to the user but at a different point and it is not marked up with itemprop.
In HTML5, the content attribute is only allowed on the meta element. Microdata doesn’t define it as global attribute either. But RDFa extends HTML to make content a global attribute.
According to your example, you are using Microdata. So you shouldn’t use the content attribute for span.
Microdata defines a way to add name-value pairs without having to mark up visible content: Microdata extends HTML5 to allow meta and link in body (in the future, this will be defined in the HTML5 spec directly; see the "Contexts in which this element can be used" for link and meta in the HTML 5.1 Editor’s Draft).
So instead of
<span itemprop="name" content="Generic Name Here"></span>
you should use
<meta itemprop="name" content="Generic Name Here" />
For schema.org, see Missing/implicit information: use the meta tag with content:
This technique should be used sparingly. Only use meta with content for information that cannot otherwise be marked up.
If you want to stick with microdata schema then you need to switch to the meta tag, exactly as 'unor' has written and explained very well. However, you could go with JSON-LD and put everything in the header and eliminate the badly written microdata if you want to save time. JSON uses the same schema method as microdata, but the coding is different.
I mean technically it correlates with the ideology of cloaking in the sense that the spiders are seeing something that the users aren't. Which is why i'm inclined to advise you to avoid such markup but also i'm not sure upon googles stance; as such markup isn't indicative of cloaking for SEO.
"Cloaking is a search engine optimization (SEO) technique in which the content presented to the search engine spider is different from that presented to the user's browser." .
Source - Wikipedia

The difference between two different HTML hyperlinks? (link & html tags)

I've been googling the internet and still can't seem to find an answer. I was wondering what the difference is between using something like:
<link rel="profile" href="http://gmpg.org/xfn/11" />
and
<html xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml">
I'm using a HTML5 doctype and would like to keep everything clean. Am I wrong in thinking that these are somehow similar? Thanks!
These two types of links have about nothing in common, other than using HTTP URIs.
The profile link element links to another resource (often a web page), which should be relevant to the current page. Some browsers might show this link somehow in the user interface, or interpret it otherwise. Or search machines might use this.
For some rel values (like rel="stylesheet"), there are definitions on how to interpret them in the relevant standards, others are only used by human readers.
The xmlns:... links define an XML namespace prefix (og or fb) for the current document, with an URI used simply as identifier for the namespace. This means that you can now use elements in these namespaces, in addition to the normal HTML elements (by prefixing their names with og: or fb:).
The document at that URI will not be retrieved. The elements will either be already known by the XML processor reading the file, or simply ignored (if this is a simple browser interpreting this as HTML).
This is structural metadata about the current document (or element, in fact, as they are allowed on non-root elements, too, and only apply to the element they are on and its enclosed elements).
For your next question in the comment:
The Dublin Core metadata is information about the content of the current document. I can see no reason to use links (or URIs) here, so in fact neither of them fits. If you would put the metadata in a separate document, you could link to them (using a link element), but normally you would use a meta element with a name from the Dublin Core standard. (Inside the head element, of course.)
xmlns: is an XML attribute. HTML5 is not XML, so this is a worthless attribute in your document.