Have you ever seen usage of <span> like this? - html

<span content="2010-01-08 21:35:12" property="dc:date">
What does it mean?

It seems to be XHML with Dublin Core metadata, a set of metadata field standards.
In HTML, Dublin Core info is used in meta and link elements only, and I can not find any instance where the data is validly used in a span element. Also, the content attribute is not valid in HTML.
See Expressing Dublin Core in HTML/XHMTL meta and link elements.
The case is different with XHTML: As #tomlog points out in his comment, the notation you quote is used in this example on Wikipedia.

Those aren't standard tags, but they are probably used by some javascript on the page that can search based on those properties, or they are akin to comments that the programmer is inserting in the html output.

I would say it appears to be meta-information for whatever goes within the span, or it's storing values for Javascript to use at a later time, or both.
Seeing the "dc" makes me think that there may be more crucial bits that aren't included in your example.

It's a kind of meta data implementation. "dc" stands for Dublin Core which is a meta data implementation standard.
The appropriate software that can read these meta tags will know to look for a span element and then use the property and content attributes to retrieve the relevant information.

property="dc:date" is a Dublin Core Metadata tag of type date. It makes the data in that span, machine readable using RDFa semantics. Google/ other crawlers can read that info and index it appropriately for searching and relating to other documents. You can test a sites metatdata here.
The inclusion of the DC tag in a span is very common.

Related

Why use Schema.org microdata to mark up web page elements?

I understand why and how to use Schema.org to add microdata to your site, this is not a question about that. The question is why Schema.org has support for certain things that can be marked up with simple HTML5. Among these are
Types
WebPage and WebSite
I can see why WebPage and WebSite would be needed, for example, to reference the page/site of a certain organization in a link, but there's no need to mark up your own page with this—the <html> tag does this.
SiteNavigationElement
Why not just use <nav>?
Table
Just use <table>.
properties
WebPage/mainContentOfPage
<main> element
WebPage/relatedLink
<link> element inside <head>
This answer is primarily about the WebPageElement types (like SiteNavigationElement).
For WebPage, see my answer to the question Implicity of web page structure in Schema.org (tl;dr: it can be useful to provide WebPage, even for the current page).
For WebSite, similar reasons from the answer above apply. HTML doesn’t allow you to state something about the whole site (and, by the way, a Google rich result makes use of this type).
Schema.org is not restricted to HTML5.
Schema.org is a vocabulary which can be used with various syntaxes (like JSON-LD, Microdata, RDFa, Turtle, …), stand-alone or in various host languages (like HTML 4.01, XHTML 1.0/1.1, (X)HTML5, XML, SVG, …). So having other ways to specify that something is (or: is about; or: represents) a site-wide navigation, a table etc. is the exception rather than the rule.
But there can be reasons to use these types even in HTML5 documents, for example:
The HTML5 markup and the annotations from Microdata/RDFa are two "different worlds": a Microdata/RDFa parser is only interested in the annotations, and after successfully parsing a document, the underlying markup is of no relevance anymore (e.g., the information that something was specified in a table element is lost in the Microdata/RDFa layer).
By using types like WebPageElement, you can specify metadata that is not possible to specify in plain HTML5. For example, the author/license/etc. of a table.
You can use these types to specify data about something which does not exist on the current document, e.g., you could say on your personal website that you are the author of a table in Wikipedia.
That said, these are not typical use cases relevant for a broad range of authors. Unless you have a specific reason for using them, you might want to omit them. They are not useful for typical websites. Using them can even be problematic in some cases.
See also my Schema.org issue The purpose of WebPageElement and mainContentOfPage, where I suggested to deprecate WebPageElement and the mainContentOfPage property.
Just use <table>.
You seem to be reading the title of the pages and no further. The <table> tag doesn't have the dozens of special properties listed on that page like isFamilyFriendly or license or timeRequired.
Schema.org microdata is intended to build a standard set of additional, semantic metadata that can be used by automated systems - search engine spiders, parser robots, etc. - to better understand the nature and features of the content.

Hide Microdata property value in 'content' attribute?

I work on a website that recently had Schema.org markup added to it, but I think it is being used wrong.
Schema.org gives the example of
<span itemprop="name">Generic Name Here</span>
Our website implemented it in the following way
<span itemprop="name" content="Generic Name Here"></span>
Is the second way, our way, considered cloaking? We display the data to the user but at a different point and it is not marked up with itemprop.
In HTML5, the content attribute is only allowed on the meta element. Microdata doesn’t define it as global attribute either. But RDFa extends HTML to make content a global attribute.
According to your example, you are using Microdata. So you shouldn’t use the content attribute for span.
Microdata defines a way to add name-value pairs without having to mark up visible content: Microdata extends HTML5 to allow meta and link in body (in the future, this will be defined in the HTML5 spec directly; see the "Contexts in which this element can be used" for link and meta in the HTML 5.1 Editor’s Draft).
So instead of
<span itemprop="name" content="Generic Name Here"></span>
you should use
<meta itemprop="name" content="Generic Name Here" />
For schema.org, see Missing/implicit information: use the meta tag with content:
This technique should be used sparingly. Only use meta with content for information that cannot otherwise be marked up.
If you want to stick with microdata schema then you need to switch to the meta tag, exactly as 'unor' has written and explained very well. However, you could go with JSON-LD and put everything in the header and eliminate the badly written microdata if you want to save time. JSON uses the same schema method as microdata, but the coding is different.
I mean technically it correlates with the ideology of cloaking in the sense that the spiders are seeing something that the users aren't. Which is why i'm inclined to advise you to avoid such markup but also i'm not sure upon googles stance; as such markup isn't indicative of cloaking for SEO.
"Cloaking is a search engine optimization (SEO) technique in which the content presented to the search engine spider is different from that presented to the user's browser." .
Source - Wikipedia

html5: how to markup proper names and common names?

<p>...the favourite color of Purple is purple...</p>
the first "Purple" is a name of a company, the second one is a color name,
how should I markup this according to html5 spec?
thank you in advance
You have a number of options:
Leave it as is, HTML isn't really concerned with semantics which aren't about describing document structure (paragraphs, headings, lists etc.). If you do want to express more detailed document or application semantics look at WAI-ARIA.
If it's important for you to distinguish between the two uses of the word purple as part of your website or app then use the class attribute or data-* attributes
If the words have canonical machine readable forms and you want the values to be parsed by a computer somehow, use the data element.
If distinguishing between the two uses is important to users or systems consuming your site content, use the semantic extensibility feature of HTML5: Microdata. (If you're using the XML dialect of HTML, see also: RDFa)
Combine any of the above approaches according to your immediate needs.
To decide between the approaches you should ask yourself:
For what purpose do I need to extend the semantic vocabulary of HTML?
Is it for my own uses, or am I trying to publish information to be used by others?
If I'm publishing for others, what shared vocabulary am I going to use?
Code examples:
Class attributes
What they're for is adding additional information to your markup, remember the class attribute is in the HTML spec, not the CSS spec:
<p>...the favourite color of <span class="company">Purple</span>
is <span class="color">purple</span>...</p>
Having said that, of course, the obvious thing to do once you have things marked up in this way is provide in page tools to do things like 'highlight all companies'. People have used the class attribute as the basis for a general purpose semantic extension mechanism however, for this approach taken to the extreme see microformats.
Data attributes
The data-* attributes are to allow you to add custom attributes to your markup for processing with scripts in a way which guarantees you won't accidentally use a custom attribute which then gets used in a future version of HTML:
<p>...the favourite color of <span data-typeofthing="company">Purple</span>
is <span data-typeofthing="color">purple</span>...</p>
It's up to scripts on your page to do something useful with the data-* attributes, browsers and other web clients will ignore them.
Custom data elements
Data elements are for things that have an imprecise natural language expression but also a precise machine readable expression. Assuming that the company can be uniquely identified by a ticker symbol and RGB will do for the colour:
<p>...the favourite color of <data value="purp">Purple</data>
is <data value="rgb(128,0,128)">purple</data>...</p>
Browsers probably won't do anything special with the data element. It's most likely you'll use data elements in concert with microformats, RDFa or Microdata.
Microdata
Using the Organization schema:
<p>...the favourite color of
<span itemscope itemtype="http://schema.org/Organization">
<span itemprop="name">Purple</span>
</span>
is purple...</p>
There isn't anything for colours that I'm aware of, but you could always publish your own schema for that if it's important to you. This approach only really benefits anyone if there is a shared vocabulary of some kind.
Which element?
The first task would be to decide which element should be used to enclose the "entities" (company name and color name). Most probably you want to use span here. If in doubt, use span. There are some cases (depends on content) where other elements could be used:
the b element might be used if the entity is some kind of keyword ("a span of text to which attention is being drawn for utilitarian purposes without conveying any extra importance and with no implication of an alternate voice or mood")
if the entity is the title of a work (book, film, song etc.), the cite element should be used.
the dfn element might be used if the entities are defined in the same paragraph or in the nearest ancestor sectioning element
in some cases the i element could be appropriate
If the term is an abbreviation/acronym, use the abbr element (instead of span resp. in addition to b/dfn/i).
A good idea might be to use the a element to link the entity name to a relevant webpage. The rel attribute might give additional metadata (you can use the rel values listed in the HTML5 spec, or the registered rel values in the microformats wiki), depending on the content/context.
Which attributes?
Have a look at the global attributes.
The class attribute would be used if you'd like to use microformats. You could of course use other class names, but they would only be useful for yourself (documentation, CSS, JS) or other people that read your markup (documentation, scraping).
For entity names like person/company names you probably want to use the translate attribute with the no keyword, because such names should not be translated.
The title might give additional information (note that it has special semantics for dfn/abbr), but don't rely on it for important information.
Use lang if the entity names are in a foreign language.
How to annotate the content with meaning?
There are three popular choices, which can also be used together (see this answer (the "third step") for a short summary of the differences):
microformats
RDFa
microdata
You may need additional elements; if so, use span.

Do html tags have numbers in spec?

just wondering, I need a numbered way of identifying certain HTML tags. Now I could do this in an arbitrary way myself but would rather use whatever official table I can find. So, my question is: Is there an official numbered list of HTML tags like some periodic table of them (bad example?) in official or sudo-official documentation?
The HTML4 spec has an official list of elements
The HTML5 spec also has an official list of elements
And, people have written an unofficial periodic table containing the HTML5 elements, but of course that's a little tongue in cheek
I haven't ever seen anything but the tag name used in the HTML 4 and HTML 5 specs. Occasionally various sections in a language spec will be referenced by number, but for HTML the tag name is a strong enough identifier in nearly any scenario. Why make it ambiguous by abstracting that?
http://www.w3.org/TR/2011/WD-html5-20110525/#auto-toc-4

What is the cite attribute for?

The cite attribute specifies the address of the source of the quoted text, I think, but who uses this information?
For example:
<q cite="http://www.example.com/quote">
<p>“A quote”</p>
<footer>—Person quoted</footer>
</q>
The source of the quoted text isn't visible to the end-user in a normal browser, so who does use this information, and how?
First, it's not only blockquote where you can use the cite attribute.
You can use the cite attribute on the following elements also:
<blockquote>
<del>
<ins>
<q>
Why would one use cite in above elements?
To point to where the content is taken from, or change or deletion is referred.
Here is what w3.org says,
User agents may allow users to follow such citation links, but they
are primarily intended for private use (e.g: by server-side scripts
collecting statistics about a site's edits), not for readers
Now, the question, who uses it?
The cite attribute is used to identify the online source of the quotation in the form of a URI (for example, "http://sourcewebsite.doc/document.html").
The value of the cite attribute isn't rendered on screen (although this potentially useful meta data could be extracted and written back into the webpage through the magic of DOM Scripting).
As such, browser support for this attribute is marked as none, but because it has other potential uses (for search engine indexing, retrieval via DOM scripting, and more) and there is the likelihood of improved native support being provided for the attribute in future browser versions, you should use the cite attribute when you use the above elements.
So, currently no one uses it, but in future maybe used in user-agent or my search engines, so better to use it.
Both the <cite> tag and the cite attribute are for semantic purposes, which means that they are simply for giving a website more "meaning". For example, I could use a <div> tag for a quote, rather than using a <blockquote> tag, but this provides less meaning to the browser, and hence using <blockquote> is recommended for quotes.
The same is with the <cite> tag and cite attributes. As per the MDN definition for the cite attribute (link here):
Use the cite attribute on a or element to reference
an online resource for a source.
"so who does use this information, and how?" - I believe that search engines (e.g. Google) would use this information to show potential links between documents. If you think about this it is a major point. Check out the image below:
Notice how it shows the "Samsung Group" information box on the right. The guys who work at Google don't write this information - rather, it is sourced from Wikipedia. However, this information would be of greater relevance to the search "samsung" when this information is also written on other websites, with the cite attribute linking this information to Wikipedia (hence increasing the relevancy of Wikipedia). This is why Wikipedia's information is used here, and not some primary school's website regarding Samsung phones.
The cite attribute simply provides more meaning to the website. Tim Berners-Lee has described the semantic web as a component of "Web 3.0" - in other words, many components of the updating HTML language are simply to provide more meaning onto the webpage, as a step closer to Web 3.0.
TL;DR - in simpler terms, the cite attribute is just to provide more meaning to the web page, and may be used for search engines for better web linkage.
W3C has this to say:
The value of this attribute is a URI that designates a source document or message. This attribute is intended to give information about the source from which the quotation was borrowed.
It's not visible and I can't think of anywhere it's used except perhaps by search engines.
It is meant to be used by machine which collect and arrange data eg. search engines, but it can be used by any machine. It is meant to make webpages more systematic to be read by machine. As they can not understand which part of text represent citation and quote based on only context.
you can look up Semantic Web for more information.
http://en.wikipedia.org/wiki/Semantic_Web
Yes, the source of the quotation isn't visible to end user. So it's just a reference to the source.
Definition from WHATWG.ORG:
Content inside a q element must be
quoted from another source, whose
address, if it has one, may be cited
in the cite attribute. The source may
be fictional, as when quoting
characters in a novel or screenplay.
If the cite attribute is present, it
must be a valid URL potentially
surrounded by spaces.
Quoted from W3Schools:
The cite attribute is not supported by any of the major browsers.
However, search engines may use it to get more information about the quotation.
http://www.w3schools.com/tags/att_q_cite.asp
It's just another meta data chunk that can be used by server side scripts to collect statistics or by front end developpers to add functionnalities (they can choose to print the source, allow to access the original source, etc...).
It's just a good practice to have the original source written somewhere although it is actually not very useful for the end user.