What is the cite attribute for? - html

The cite attribute specifies the address of the source of the quoted text, I think, but who uses this information?
For example:
<q cite="http://www.example.com/quote">
<p>“A quote”</p>
<footer>—Person quoted</footer>
</q>
The source of the quoted text isn't visible to the end-user in a normal browser, so who does use this information, and how?

First, it's not only blockquote where you can use the cite attribute.
You can use the cite attribute on the following elements also:
<blockquote>
<del>
<ins>
<q>
Why would one use cite in above elements?
To point to where the content is taken from, or change or deletion is referred.
Here is what w3.org says,
User agents may allow users to follow such citation links, but they
are primarily intended for private use (e.g: by server-side scripts
collecting statistics about a site's edits), not for readers
Now, the question, who uses it?
The cite attribute is used to identify the online source of the quotation in the form of a URI (for example, "http://sourcewebsite.doc/document.html").
The value of the cite attribute isn't rendered on screen (although this potentially useful meta data could be extracted and written back into the webpage through the magic of DOM Scripting).
As such, browser support for this attribute is marked as none, but because it has other potential uses (for search engine indexing, retrieval via DOM scripting, and more) and there is the likelihood of improved native support being provided for the attribute in future browser versions, you should use the cite attribute when you use the above elements.
So, currently no one uses it, but in future maybe used in user-agent or my search engines, so better to use it.

Both the <cite> tag and the cite attribute are for semantic purposes, which means that they are simply for giving a website more "meaning". For example, I could use a <div> tag for a quote, rather than using a <blockquote> tag, but this provides less meaning to the browser, and hence using <blockquote> is recommended for quotes.
The same is with the <cite> tag and cite attributes. As per the MDN definition for the cite attribute (link here):
Use the cite attribute on a or element to reference
an online resource for a source.
"so who does use this information, and how?" - I believe that search engines (e.g. Google) would use this information to show potential links between documents. If you think about this it is a major point. Check out the image below:
Notice how it shows the "Samsung Group" information box on the right. The guys who work at Google don't write this information - rather, it is sourced from Wikipedia. However, this information would be of greater relevance to the search "samsung" when this information is also written on other websites, with the cite attribute linking this information to Wikipedia (hence increasing the relevancy of Wikipedia). This is why Wikipedia's information is used here, and not some primary school's website regarding Samsung phones.
The cite attribute simply provides more meaning to the website. Tim Berners-Lee has described the semantic web as a component of "Web 3.0" - in other words, many components of the updating HTML language are simply to provide more meaning onto the webpage, as a step closer to Web 3.0.
TL;DR - in simpler terms, the cite attribute is just to provide more meaning to the web page, and may be used for search engines for better web linkage.

W3C has this to say:
The value of this attribute is a URI that designates a source document or message. This attribute is intended to give information about the source from which the quotation was borrowed.
It's not visible and I can't think of anywhere it's used except perhaps by search engines.

It is meant to be used by machine which collect and arrange data eg. search engines, but it can be used by any machine. It is meant to make webpages more systematic to be read by machine. As they can not understand which part of text represent citation and quote based on only context.
you can look up Semantic Web for more information.
http://en.wikipedia.org/wiki/Semantic_Web

Yes, the source of the quotation isn't visible to end user. So it's just a reference to the source.
Definition from WHATWG.ORG:
Content inside a q element must be
quoted from another source, whose
address, if it has one, may be cited
in the cite attribute. The source may
be fictional, as when quoting
characters in a novel or screenplay.
If the cite attribute is present, it
must be a valid URL potentially
surrounded by spaces.

Quoted from W3Schools:
The cite attribute is not supported by any of the major browsers.
However, search engines may use it to get more information about the quotation.
http://www.w3schools.com/tags/att_q_cite.asp

It's just another meta data chunk that can be used by server side scripts to collect statistics or by front end developpers to add functionnalities (they can choose to print the source, allow to access the original source, etc...).
It's just a good practice to have the original source written somewhere although it is actually not very useful for the end user.

Related

Right element/style for words of buttons, options and so on in instructions

I read the difference between <b> and <strong>, <i> and <em> and some other sources, but am still not sure which element to choose when I write instructions like the following:
Go to the page > right-click Download > Save link as. What are right elements for Download and Save link as? Or should I simply use CSS to style them? Then should I use font-weight: bold or font-style: italic? I guess I should use <strong> because they are key words in my sentence, but I'm not sure. Here's a real-world example: Download a file.
In linguistics, italics are often used when we are using a word of the language to talk about the language, not to represent a meaning, as is often the case in ordinary speech. With this in mind, I think that you should mark with italics all the words in your instructions that are not part of the explanation but refer to words that the user will see on the screen. With this in mind I recommend that you add those marks in the HTML, that is, using <em> instead of a CSS class and properties since this practice is more accessible to accessibility tools.
Important part of semantics is context. If your whole article for example can be replaced with this single line, you probably should use strong. And if your article is not about downloading files and this line doesn't have so much strong importance, but you still want to draw reader's attention, you probably should use b.
According to MDN:
strong indicates that its contents have strong importance, seriousness, or urgency.
b is used to draw the reader's attention to the element's contents, which are not otherwise granted special importance.
Source: docs/Web/HTML/Element/strong and docs/Web/HTML/Element/b
In my opinion you should find out usage cases, read formal definition of the element from web docs (for example MDN) or web specs and find out which fits you better. You should keep in mind that everyone's case is different. There is no 100% percent correct answer or algorithm which you could use to determine if you need to use strong, b, em, i or something else. What is the topic of the site and the context of the article? In which part of the article is this line placed? So.. basically what am I trying to tell you is that you better know which semantic meaning this text have.
Edit: And SO question you referenced is a bit outdated (answer was written in 2008 which is the year when HTML5 was not so widely used). So it's better to reference web docs or web specs as I mentioned above.

Does Google's data-nosnippet break the convention for the "data-" attribute prefix? Is it the first to do so?

Today, in a blog post entitled More options to help websites preview their content on Google Search, Google announced new behaviour for the Google search engine. The part that interests me is that Googlebot will now interpret the HTML attribute data-nosnippet like this:
A new way to help limit which part of a page is eligible to be shown as a snippet is the "data-nosnippet" HTML attribute on span, div, and section elements. With this, you can prevent that part of an HTML page from being shown within the textual snippet on the [Google search engine results page].
For example:
<p><span data-nosnippet>Harry Houdini</span> is undoubtedly the most famous magician ever to live.</p>
I am surprised that they chose to use an attribute beginning with the prefix data-. This is what the HTML living standard by WHATWG says about data- attributes (emphasis mine):
A custom data attribute is an attribute in no namespace whose name starts with the string "data-" [...]
Custom data attributes are intended to store custom data, state, annotations, and similar, private to the page or application, for which there are no more appropriate attributes or elements.
As a web developer, I always thought that the point of the data- prefix was to give web developers a namespace intended just for their CSS and scripts to manipulate. A custom HTML attribute without the data- prefex is not future-proof, it may suddenly have meaning in browsers of the future or in search engine bots of the future.
It looks like Googlebot is breaking this convention, and is now choosing to look for and interpret the data-nosnippet HTML attribute. As web developers, we can no longer be confident that data- attributes are "private to the page or application", maybe Google will do this again for another data- attribute in the future!
Is my interpretation correct?
Is Googlebot the first to interpret data- attributes this way, or has the ship sailed and are browsers and bots interpreting data- attributes already?

Why use Schema.org microdata to mark up web page elements?

I understand why and how to use Schema.org to add microdata to your site, this is not a question about that. The question is why Schema.org has support for certain things that can be marked up with simple HTML5. Among these are
Types
WebPage and WebSite
I can see why WebPage and WebSite would be needed, for example, to reference the page/site of a certain organization in a link, but there's no need to mark up your own page with this—the <html> tag does this.
SiteNavigationElement
Why not just use <nav>?
Table
Just use <table>.
properties
WebPage/mainContentOfPage
<main> element
WebPage/relatedLink
<link> element inside <head>
This answer is primarily about the WebPageElement types (like SiteNavigationElement).
For WebPage, see my answer to the question Implicity of web page structure in Schema.org (tl;dr: it can be useful to provide WebPage, even for the current page).
For WebSite, similar reasons from the answer above apply. HTML doesn’t allow you to state something about the whole site (and, by the way, a Google rich result makes use of this type).
Schema.org is not restricted to HTML5.
Schema.org is a vocabulary which can be used with various syntaxes (like JSON-LD, Microdata, RDFa, Turtle, …), stand-alone or in various host languages (like HTML 4.01, XHTML 1.0/1.1, (X)HTML5, XML, SVG, …). So having other ways to specify that something is (or: is about; or: represents) a site-wide navigation, a table etc. is the exception rather than the rule.
But there can be reasons to use these types even in HTML5 documents, for example:
The HTML5 markup and the annotations from Microdata/RDFa are two "different worlds": a Microdata/RDFa parser is only interested in the annotations, and after successfully parsing a document, the underlying markup is of no relevance anymore (e.g., the information that something was specified in a table element is lost in the Microdata/RDFa layer).
By using types like WebPageElement, you can specify metadata that is not possible to specify in plain HTML5. For example, the author/license/etc. of a table.
You can use these types to specify data about something which does not exist on the current document, e.g., you could say on your personal website that you are the author of a table in Wikipedia.
That said, these are not typical use cases relevant for a broad range of authors. Unless you have a specific reason for using them, you might want to omit them. They are not useful for typical websites. Using them can even be problematic in some cases.
See also my Schema.org issue The purpose of WebPageElement and mainContentOfPage, where I suggested to deprecate WebPageElement and the mainContentOfPage property.
Just use <table>.
You seem to be reading the title of the pages and no further. The <table> tag doesn't have the dozens of special properties listed on that page like isFamilyFriendly or license or timeRequired.
Schema.org microdata is intended to build a standard set of additional, semantic metadata that can be used by automated systems - search engine spiders, parser robots, etc. - to better understand the nature and features of the content.

Why use quote tags when quotation marks will do?

I'm new to coding and have a trivial question: Why is it necessary to use the quote tag when it's easier to just write out the quotation marks in the text?
It's All About Semantics
The purpose of HTML is to allow you to add semantic information to the resource. In other words, when you surround a quote with quote tags, you are describing to the program that will be using this resource as to what the content means. Programs aren't always just browsers that render the HTML into an image for us to view; they might be screen readers for the vision impaired or a program that reads information from a web page and inserts data into a database (such as the web crawler for a search engine).
Why Semantics?
A similar question to yours would be, "why use a header tag when I could use a tag with a custom style to make the font larger and bold?"
The reason is because by marking text with a header tag (h1, h2, etc.) you are telling the program reading the HTML document that the content has special meaning. The program can then do things with the document besides simply displaying it to the user; if the HTML document has header tags in place, the program could automatically create a table of contents of the document by simply listing out the contents of the header tags (similar to how a Wikipedia article can automatically create a table of contents on the top of the page).
So, everything starts with adding semantic information. As others have pointed out, you can style the content of a quote tag, where you cannot style content within two quote characters. This is a by-product of adding semantics, however, and not necessarily the end goal. Of course, you could have styled the quotes by surrounding it with <div class="quote">..</div>. However, by doing this you lose the ability to have the browser help you render the quote as you like (see the quotes css attribute), or even have the browser render the quotation marks using the default quotation marks of the user's locale.
Even after this, there is more to the quote element than styling. For example, the quote element offers the ability to show additional information with a "site" attribute.
How could semantic information be used for quotes?
For example, let's say I create an HTML page called "http://example.com/MyThoughts". In that page, I have the following HTML...
<p>The W3C page <cite>About W3C</cite> says the W3C's mission is <q
cite="http://www.w3.org/Consortium/">To lead the World Wide Web to its
full potential by developing protocols and guidelines that ensure
long-term growth for the Web</q>. I disagree with this mission.</p>
Notice the cite attribute on the <q> element here (not to be confused with the <cite> element). If people added quote elements in this way, we could now create a web crawler that goes through the internet looking for pages that have a quote and citation. Then, using that data, we could create a database of documents and their citations. We could create a new site where the output might look like...
Resource: "About W3C"
Location: http://www.w3.org/Consurtium
Document Citing This Resources:
Resource: "My Thoughts"
Location: http://example.com/MyThoughts
...or equally useful...
Resource: "My Thoughts"
Location: http://example.com/MyThoughts
Documents Cited in this Resource:
Resource: "About W3C"
Location: http://www.w3.org/Consurtium
As you can see, we have created an application that joins together data from other websites without the need for APIs or direct database access. That is the power of adding semantics to your documents.
Conclusion
When doing day to day development work, the possibilities of what interesting things can be done be introducing semantics into your HTML documents is typically ignored in place of ensuring that the site "looks good in a browser". Semantically adding information can still help due to things such as helping you style your quotes correctly for the specific user.
To be sure, if you used a quote character instead of a quote element Tim Berners-Lee is not going to come bust down your door. However, the built-in browser rendering of quotes specific to the user's locale is a nice carrot on the stick.
It certainly is not necessary, but there are a number of reasons why using <q> might be preferred:
The quote element is more semantically explicit than quotation marks
The quote element can be styled (as noted by #Juhana)
The quote element allows for characters other than '"' (the guillemet - « - for example)
The quote element can be used with the cite attribute to explicitly link the quote and the source

Have you ever seen usage of <span> like this?

<span content="2010-01-08 21:35:12" property="dc:date">
What does it mean?
It seems to be XHML with Dublin Core metadata, a set of metadata field standards.
In HTML, Dublin Core info is used in meta and link elements only, and I can not find any instance where the data is validly used in a span element. Also, the content attribute is not valid in HTML.
See Expressing Dublin Core in HTML/XHMTL meta and link elements.
The case is different with XHTML: As #tomlog points out in his comment, the notation you quote is used in this example on Wikipedia.
Those aren't standard tags, but they are probably used by some javascript on the page that can search based on those properties, or they are akin to comments that the programmer is inserting in the html output.
I would say it appears to be meta-information for whatever goes within the span, or it's storing values for Javascript to use at a later time, or both.
Seeing the "dc" makes me think that there may be more crucial bits that aren't included in your example.
It's a kind of meta data implementation. "dc" stands for Dublin Core which is a meta data implementation standard.
The appropriate software that can read these meta tags will know to look for a span element and then use the property and content attributes to retrieve the relevant information.
property="dc:date" is a Dublin Core Metadata tag of type date. It makes the data in that span, machine readable using RDFa semantics. Google/ other crawlers can read that info and index it appropriately for searching and relating to other documents. You can test a sites metatdata here.
The inclusion of the DC tag in a span is very common.