Hide Microdata property value in 'content' attribute? - html

I work on a website that recently had Schema.org markup added to it, but I think it is being used wrong.
Schema.org gives the example of
<span itemprop="name">Generic Name Here</span>
Our website implemented it in the following way
<span itemprop="name" content="Generic Name Here"></span>
Is the second way, our way, considered cloaking? We display the data to the user but at a different point and it is not marked up with itemprop.

In HTML5, the content attribute is only allowed on the meta element. Microdata doesn’t define it as global attribute either. But RDFa extends HTML to make content a global attribute.
According to your example, you are using Microdata. So you shouldn’t use the content attribute for span.
Microdata defines a way to add name-value pairs without having to mark up visible content: Microdata extends HTML5 to allow meta and link in body (in the future, this will be defined in the HTML5 spec directly; see the "Contexts in which this element can be used" for link and meta in the HTML 5.1 Editor’s Draft).
So instead of
<span itemprop="name" content="Generic Name Here"></span>
you should use
<meta itemprop="name" content="Generic Name Here" />
For schema.org, see Missing/implicit information: use the meta tag with content:
This technique should be used sparingly. Only use meta with content for information that cannot otherwise be marked up.

If you want to stick with microdata schema then you need to switch to the meta tag, exactly as 'unor' has written and explained very well. However, you could go with JSON-LD and put everything in the header and eliminate the badly written microdata if you want to save time. JSON uses the same schema method as microdata, but the coding is different.

I mean technically it correlates with the ideology of cloaking in the sense that the spiders are seeing something that the users aren't. Which is why i'm inclined to advise you to avoid such markup but also i'm not sure upon googles stance; as such markup isn't indicative of cloaking for SEO.
"Cloaking is a search engine optimization (SEO) technique in which the content presented to the search engine spider is different from that presented to the user's browser." .
Source - Wikipedia

Related

Does rel="home" on an anchor tag (<a>) do anything helpful?

I see this a lot on site title links in WordPress themes (probably because Underscores does it and everyone copies that):
Some Site Title
I cannot find even a semi-authoritative statement anywhere that rel="home" on an anchor tag is used meaningfully today by any browser, screen reader, or other user agent. The only "official" documentation I've located is this draft specification from 2005 on the microformats.org site.
That doc proposes home as a valid value on both <link> tags in the <head>, as well as <a> tags. Using it on a <link> has some pedigree from HTML v3, and there's reference to it in the wild from 2002. But I haven't seen anything about the <a> tag usage.
So, is including it helpful for anything/anyone? Would I do better to use <link rel="home"> in the <head>, or is that obsolete too in 2020?
The rel="page" was part of an initiative to create permalinks (see section 'Permalink detection') as part of a standard in HTML 4.
However with HTML 5 it now has no purpose and does not offer any accessibility or SEO value. It also might not validate using W3C validator anymore (not tested).
rel="something" should only be used on <link> elements, with the exception of rel="noopener", rel="nofollow" or rel="noreferrer" on anchors (<a> tags).
Note - There may be other rel="" for hyperlinks but the two stated are the only ones I can think of, it is no longer valid to use it for page locations, bookmarks etc.
Update
Thanks to #Sean who pointed out in the comments other elements can accept rel="", however MicroFormats are not the preferred way of adding structured data according to Google and their development is not as full fledged as using https://schema.org and JSON+LD.
“We currently prefer JSON-LD markup. I think most of the new
structured data come out for JSON-LD first. So that’s
what we prefer.” - John Mueller
I am obviously incorrect in what I said as it is perfectly valid, however personally I would not bother and stick with what Google prefers apart from the few items I listed.
See #Sean's answer for a bit more info on the subject.
for clarity rel="" has no bearing on accessibility
home isn't one of the allowed keywords explicitly defined by the current HTML spec as allowed values for the rel attribute. However, the spec goes on to state that:
Types defined as extensions in the microformats wiki existing-rel-values page with the status "proposed" or "ratified" may be used with the rel attribute on link, a, and area elements in accordance to the "Effect on..." field.
On that microformats page, home has the "proposed" status—so it is valid to use according to the spec.
There's a specific rel-home page within that microformats site that goes into more detail about the usage with examples. It makes the statement—
Opera browser supports rel="home"
—which would imply that Opera has some functionality tied to that usage, but it doesn't provide any additional details.
Summary: rel="home" is valid to use on a elements. It's benefits aren't clear, but it doesn't hurt to use it. The draft spec for it has been around since 2005, so there's bound to be some technologies that make use of it.

Schema.org WebPage markup on 'body' element with 'id' attribute

I would like to insert WebPage markup in my code. So it's written that this - itemscope="" itemtype="http://schema.org/WebPage" - should be put into body tag. But in body tag I have id="top-page".
When I copy the Microdata line to the body and test it with Google’s tool, it shows me http://xxxx.yy/top-page as #id.
How to avoid it?
This is most likely a bug in Google’s tool. You don’t have to worry about it. But if you do, there are two workarounds:
Specify itemid on body in addition. This is the correct attribute responsible for providing an ID in Microdata, not id. You should provide the canonical URL of the web page as value.
Specify itemscope itemtype="http://schema.org/WebPage" on a different element (one that doesn’t have an id attribute). While it’s often useful to specify WebPage on the body, this is not required.
The first solution is preferable, as it’s generally a good practice to provide IDs for your structured data items.

In semantic HTML does the class attribute mean anything in the absence of CSS or Javascript?

For example, does the class film_review mean anything in <article class="film_review"> (example from MDN) if there's no CSS or Javascript interacting with the page, or does it provide semantic information?
It doesn't provide an information that contemporary browsers would interpret or use without CSS or Javascript per se.
However it can carry semantic information - see e.g. microformats. For example, you could put an hcard
<div id="hcard-John-Doe" class="vcard">
<span class="fn">John Doe</span>
<div class="org">Cool Institute, Inc.</div>
<div class="adr"><span class="locality">Prague</span></div>
</div>
on your page and it carries a semantic information. A search engine like Google could infer that "John Doe" is a name of a person located in "Prague". There are other microformats that can represent geo information, calendar events, etc.
Anyone can write their own processor of HTML documents that would interpret class attribute values, so the answer is yes, it provides semantic information.
Quoting from hcard microformat example:
Per the HTML4.01 specification, authors should be using the element to indicate the "contact information for a document or a major part of a document." E.g.
<address>
Tantek Çelik</address>
By adding hCard to such existing semantic XHTML, you can explicitly indicate the name of the person, their URL, etc.:
<address class="vcard">
<a class="fn url" href="http://tantek.com/">Tantek Çelik</a>
</address>
It provides semantics purely in the sense that it semantically connects that element with other elements of the same class.
There's no rule which states that anything (specifically CSS and/or JavaScript in this case) must use that class. The class itself is simply part of the markup and is coincidentally being ignored by the current styling rules.
You might have other elements with the film_review class, and they are "semantically" connected in the sense that they represent "film reviews" in the markup. That's really all semantic information is... context about the thing being represented in the code. Well-named classes can provide such additional context.
But there's nothing special that the browser is going to do with this information. It's just there in case anybody (styling, code, or even just somebody looking at the markup) wants to know that this article belongs to a named class of elements.
Semantics on HTML5 are more oriented on standarizing the most used elements around the web. As described on HTML Semantic Elements:
With HTML4, developers used their own favorite attribute names to style page elements:
header, top, bottom, footer, menu, navigation, main, container, content, article, sidebar, topnav, ...
This made it impossible for search engines to identify the correct web page content.
With HTML5 elements like: <header> <footer> <nav> <section> <article>, this will become easier.
So an element so specific as a "Film Review" would not provide that much semantic information at HTML5 level.
That depends. Who and what else is processing your HTML?
For example, microformats sometimes use classes to add semantic information to elements which don't naturally possess rich semantics. In that case, neither ECMAScript nor CSS process that information, but a microformats parser might. film_review doesn't belong to any well-known microformat, however.
Everything on the page gets parsed (read) by a search-engine, so your answer is, YES, it does provide semantic information, however there are different weighted value associated with different HTML tokens (elements, attribute-names, attribute-values).
However what really defines how much weight a HTML token gets, is really dependent on the type of document that you declare it is (HTML4/HTML5), the <!DOCTYPE> tag at the top of your page declares that to the search-engine bot/parser what type of document it is, which in turn controls the search-engine bot's parsing-schema (behavior) on how to read your document.
The entire purpose of HTML5 was to provide "semantics", allowing you to use different tags so you can markup/define your document giving content more importance allowing search-engines to understand it better. This allows the search-engine a much better way to then supply the end-user, whom is searching for something with more relevant content associated with their search term... if your not using HTML5 and using HTML4 then the bots are relying mostly on HTML attributes to define the content within tags such as a <div> which provides no semantic meaning to the content inside it.

Is Google microformats supposed to be visible on the web page?

I was trying to add microformats as following to my webpage:
<div itemscope itemtype="http://schema.org/Product">
<span itemprop="brand">Company Name</span>
<span itemprop="name">Product Name</span>
<span itemprop="description">Product Description</span>
Product #: <span itemprop="sku">12345</span>
</div>
I thought this microformat will only show up in a google search result page. But after adding it, those information became visible on my webpage, and not in a good shape.
Is there something wrong? Or should I use display:none to make it invisible on my webpage?
Microformats are meant to add machine readable meaning to existing content on the page. They're not invisible meta data, they augment content that's already there. So, yes, it'll show up. You can hide or style it via any of the usual ways in which you hide or style content.
You are using Microdata, not Microformats.
Microdata is a syntax to include structured data within HTML5. Ideally you would use your existing content (i.e., add the needed attributes like itemprop etc. to your already existing markup), and only if that’s not possible, the hidden elements meta and link (which are allowed in the body if used for Microdata).
If you don’t want to use your existing markup and the visible content, you could use an alternative syntax: JSON-LD. This gets included as a data block (using the script element), which is not visible by default.
Don't try to use hide or style on your content, it will have a bad impact on your site. You might get penalized for cloaking if you practice it on all of your pages.
If you are trying to mark/let the bots know about some more info that is not on your page you can try using either the Data Highlighter for simple things in you Search Engine Console (Webmaster Tools) or for more complicated stuff you can try using JSON-LD coding on you pages.
Microformats are HTML. Used to publish a standard API that is consumed and used by search engines, browsers, and other web sites. Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Microformats are a way to enable "smart scraping" of web pages, so that you can create tools and scripts that losslessly extract machine-readable information from cleanly-formatted, human-readable HTML. Structured Data is the name given to content which is marked up in a specific way, using MicroFormatting, to explain what that content is all about.
It is always recommended to show the Microdata information and not to hide it. You can probably try to give a good shape. It would show up in the Google and Bing result pages as well but you need to wait a little for that. There is nothing wrong with the Microformats applied by you. The thing is SEO need some more patience.

Have you ever seen usage of <span> like this?

<span content="2010-01-08 21:35:12" property="dc:date">
What does it mean?
It seems to be XHML with Dublin Core metadata, a set of metadata field standards.
In HTML, Dublin Core info is used in meta and link elements only, and I can not find any instance where the data is validly used in a span element. Also, the content attribute is not valid in HTML.
See Expressing Dublin Core in HTML/XHMTL meta and link elements.
The case is different with XHTML: As #tomlog points out in his comment, the notation you quote is used in this example on Wikipedia.
Those aren't standard tags, but they are probably used by some javascript on the page that can search based on those properties, or they are akin to comments that the programmer is inserting in the html output.
I would say it appears to be meta-information for whatever goes within the span, or it's storing values for Javascript to use at a later time, or both.
Seeing the "dc" makes me think that there may be more crucial bits that aren't included in your example.
It's a kind of meta data implementation. "dc" stands for Dublin Core which is a meta data implementation standard.
The appropriate software that can read these meta tags will know to look for a span element and then use the property and content attributes to retrieve the relevant information.
property="dc:date" is a Dublin Core Metadata tag of type date. It makes the data in that span, machine readable using RDFa semantics. Google/ other crawlers can read that info and index it appropriately for searching and relating to other documents. You can test a sites metatdata here.
The inclusion of the DC tag in a span is very common.