When is it appropriate to use semantic elements? - html

I do not really understand how (often) I should use semantic elements like time, header, footer.
Do article, nav, figure, time, etc just replace div id="post", div id="navbar", div id="illustration", span id="time" and, therefore, I should use them only when I need wrap some content for styling purposes or they are something more than that?

General: You should use them so often you can.
If you developing an intranet application, most time no one will care about it.
The Good thing of semantic use is in the public area (Internet)
A Search engine wants to know about semantic, so it could better understand your page
A Screenreader can say "This is a Blockquote", "This is a Navigation", "This is a Footer"
Semantic is not for styling a page, semantik is for understanding a page. Blind people don't see your css for example, but a good structured website is better for text to speach and help blind people.
Also take a look at https://schema.org/
And what about the following example : there is a story with some
dates on a webpage. Do I have to put these dates inside time tags?
Yes. Take look at the Mozilla Documentation here: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/time
According to it, you can do this:
<time datetime="2001-05-15 19:00">May 15</time>
You can even do it without time (which is optional according to documentation)
<time datetime="2001-05-15">May 15</time>

A <footer> typically contains the author of the document, copyright information, links to terms of use, contact information, etc. The same story aplies for a <header>. You can use multiple <footer> and <header> elements in your html file.
The <time> element can be used as a way to encode dates and times in a machine-readable way so that, for example, user agents can offer to add birthday reminders or scheduled events to the user's calendar, and search engines can produce smarter search results.
An example:
<p>We open at <time>10:00</time> every morning.</p>
<p>I have a date on <time datetime="2008-02-14">Valentines day</time>.</p>
Here is a link to the W3Schools page:
Footer page
Time page
I hope this answers your question :)

Related

<s> vs <del> in HTML

So I'm writing a list of todos in HTML. Some of these todos are, well, done.
<h1>TODO</h1>
<ul>
<li>I'm still to be done</li>
<li>I'm done</li>
</ul>
Now I'm wondering what the best way to mark up and style these items. When it's done, I could mark up each item with <s>, which seems much more acceptable these days, as it's 'no longer relevant':
<li><s>I'm done</s></li>
I could go for <del> as, in some sense, the user has edited the list and set this item for removal (kinda):
<li><del>I'm done</del></li>
I could add a class to say what this item means:
<li class="todo done">I'm done</li>
Or some combination of the three. Or something else entirely.
My concerns are accessibility and semantics - I want the markup to convey the meaning of a 'done' item.
What's the best way of doing this?
Both answers are great. s and del are indeed semantic tags so it's good for accessibility. Unfortunately, no browsers surface those tags in the accessibility tree so screen readers cannot convey any information regarding the tags. But you can work around it with CSS. There's a simple blog that talks about the <mark> element, which is also semantic but does not convey info to screen readers but the blog gives a workaround. You can do something similar with s or del. That is, use either s or del but also use CSS to augment the tag for screen reader users.
A class has no semantic meaning, so if accessibility and semantic are important for you, then you have to use del or s.
If you should use s or del is not easy to say. For s the specs have this example:
<p>Buy our Iced Tea and Lemonade!</p>
<p><s>Recommended retail price: $3.99 per bottle</s></p>
<p><strong>Now selling for just $2.99 a bottle!</strong></p>
So you want to show the reader the old information, but also tell the the information is not relevant anymore.
Your TODO example is covered in the specs in the del section 4.6.2 The del element
<h1>To Do</h1>
<ul>
<li>Empty the dishwasher</li>
<li><del datetime="2009-10-11T01:25-07:00">Watch Walter Lewin's lectures</del></li>
<li><del datetime="2009-10-10T23:38-07:00">Download more tracks</del></li>
<li>Buy a printer</li>
</ul>
I think the main difference is that you have more possibilities to add semantic information to del then to s. So del is more about if the information that something was deleted is really important, e.g. the tracking of changes (diff tool), a TODO list, that a part od a specification was removed, ... . And s is some kind of informally additional information.
In plain HTML, class values don’t convey any meaning. You can make use of classes in addition to making use of semantic elements (classes can be useful for CSS, JavaScript, documentation purposes etc.), but you should not use classes instead of semantic elements.
With del and s, you found the two relevant elements that can make sense in this context. Which one to use? It, most likely, doesn’t make a practical difference.
The semantic differences are subtle:
With del, you convey that the content was removed from the document (semantically, it doesn’t matter if the content is still visible or if you visually hide it with CSS). It represents the actual edit to the document.
With s, you convey that the content is no longer relevant.
I guess the purpose why you show the done items can help in making the choice which element to use:
If the todo list could work as well without showing the done items (so showing them has the purpose of tracking changes, or detecting errors), go for del. In theory, a user agent could offer viewing the list at a specific point in time (making use of the datetime attribute), and a default view could only show the actual current content (i.e., without any content in del).
If it’s relevant for the meaning of the todo list to show what has already been done, go for s.
If there is a relevant difference in your case between removing an item (e.g., because it was added by accident, or didn’t make sense etc.) and marking an item as done, then you might want to use del for the former and s for the latter. You can ignore this if there is no relevant difference (e.g., if you would not keep showing items from the former case anyway).
(Side note: If using del, it would make sense to also use ins for adding new todo items.)

What is the purpose of the blockquote attribute ''cite'' in html?

I cannot comprehend this.
Using <cite> text </cite> separately like this just makes the text appear a little bite italic, but i cannot understand the purpose of cite being used as an attribute in blockquote.
For example:
<blockquote cite="http://www.example.com">
For 50 years, WWF has been protecting the future of nature.
</blockquote>
Now, where does this url link appear? Everywhere i look it just says "it's for reference", but reference where?
The link is not showing on the output unless I use href and <p> to make it appear.
So what exactly does this attribute cite does in this case? Where does this url appear?
As per https://www.w3.org/TR/2011/WD-html5-20110525/text-level-semantics.html#attr-q-cite
Content inside a q element must be quoted from another source, whose address, if it has one, may be cited in the cite attribute. The source may be fictional, as when quoting characters in a novel or screenplay.
If the cite attribute is present, it must be a valid URL potentially surrounded by spaces. To obtain the corresponding citation link, the value of the attribute must be resolved relative to the element. User agents should allow users to follow such citation links.
<p>
... or better said by Frank,
<q cite="https://www.goodreads.com/author/show/22302.Frank_Zappa">
So many books, so little time.
</q>
</p>
Since it's not a link (not something a human can follow) it's clearly for SEO purpose, but mostly for indexing. So if you take a quotation from another resource, like another websites page, a cite attribute pointing to the site you've taken that quote from - helps search engines index such resources relations.

HTML5 semantic element for Tip/Warning/Error pullouts?

I am using XML-safe HTML5 and would like to know the semantic way of documenting extra tip/warning/error boxes (like those often found in technical manuals):
<div class="info-tip" role="contentinfo">
<p><strong>Tip:</strong> Holding the control key when doing this will make life easier.</p>
</div>
Except if possible I would like to use a more appropriate element. I am not even sure if contentinfo is an appropriate choice here.
ADDED: I am after a HTML5 alternative of the <note> element in DITA.
A little context: I will be using stylesheets (both XSLT2 and CSS) to re-format the content for a number of outputs.
The semantically closest one seems to be the <details> element - usage

Correct use of the <small> tag, or how to markup "less important" text

Yet another tag that was given new meaning in HTML5, <small> apparently lives on:
http://www.w3.org/TR/html-markup/small.html#small
The small element represents so-called “fine print” or “small print”,
such as legal disclaimers and caveats.
This unofficial reference seems to take it a little further:
http://html5doctor.com/small-hr-element/
<small> is now for side comments, which are the inline equivalent of
<aside> — content which is not the main focus of the page. A common
example is inline legalese, such as a copyright statement in a page
footer, a disclaimer, or licensing information. It can also be used
for attribution.
I have a list of people I want to display, which includes their real name and nickname. The nickname is sort of an "aside", and I want to style it with lighter text:
<li>Laurence Tureaud <small>(Mr.T)</small></li>
I'll need to do something like this for several sections of the site (people, products, locations), so I'm trying to develop a sensible standard. I know I can use <span class="quiet"> or something like that, but I'm trying to avoid arbitrary class names and use the correct HTML element (if there is one).
Is <small> appropriate for this, or is there another element or markup structure that would be appropriate?
The spec you're looking at is old, you should look at the HTML5 spec:
https://html.spec.whatwg.org/multipage/
I suggest <em> here instead of small:
<p>Laurence Tureaud also called <em>Mr. T</em> is famous for his role
in the tv series A-TEAM.</p>
<small> is not used commonly in an article sentence, but like this:
<footer>
<p>
Search articles about Laurence Tureaud,
<small>or try articles about A-TEAM.</small>
</p>
</footer>
<footer>
<p>
Call the Laurence Tureaud's "life trainer chat line" at
555-1122334455 <small>($1.99 for 1 minute)</small>
</p>
</footer>
Article sentence:
<p>
My job is very interesting and I love it: I work in an office
<small>(123 St. Rome, Italy)</small> with a lot of funny guys that share
my exact interests.
</p>
Personally I would think <small> would not be the correct tag for this as it suggests the text will be physically smaller which doesn't seem to be the case with your example. I think using a <span> would be more appropriate or possible the HTML <aside>. http://dev.w3.org/html5//spec-author-view/the-aside-element.html
You should ask yourself how you would prefer the document to be displayed when style sheets are not applied. Select the markup according to this, instead of scholarly or scholastic theories about “semantic markup” (see my pragmatic guide to HTML).
If smaller size is what you want, then use <small> or <font size=2>. The former is more concise and easier to style, and it is more “resistant” (on some browsers, settings that tell the browser to ignore font sizes specified on web pages do not remove the effect of small). So it’s a rather simple choice.
On the other hand, font size variation inside a line of text is typographically questionable. In printed matter, it is much more often accidental, an error, rather than intentional. Putting something in parentheses is normally a sufficient indication of being somehow secondary

Any ideas on how to identify the main content of the page?

if you had to identify the main text of the page (e.g. on a blog page to identify the post's content) what would you do? What do you think is the simplest way to do it?
Get the page content with cURL
Maybe use a DOM parser to identify the elements of the page
That's a pretty hard task but I would start by counting spaces inside of DOM elements. A tell tale sign of human-readable content is spaces and periods. Most articles seem to encapsulate the content in paragraph tags so you could look at all p tags with n spaces and at least one punctuation mark.
You could also use the amount of grouped paragraph tags inside an element.. So if a div has N paragraph children, it could very well be the content you're wanting to extract.
There are some framework that can archive this, one of them is http://code.google.com/p/boilerpipe/ which uses some statistics.
Some features that can detect html block with main content:
p, div tags
amount of text inside/outside
amount of links inside/outside (i.e remove munus)
some css class names and id (frequntly those block have classes or ids with main, main_block, content e.t.c)
relation between title and text inside content
You might consider:
Boilerpipe: "The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page. The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings."
Ruby Readability: "Ruby Readability is a tool for extracting the primary readable content of a webpage. It is a Ruby port of arc90's readability project."
The Readability API: "If you'd like access to the Readability parser directly, the Content API is available upon request. Contact us if you're interested."
It seems like the best answer is "it depends". As in, it depends on how the site in question is marked up.
If the author uses "common" tags, you could look for a container
element ID'd as "content" or "main."
If the author is using HTML5, you should in theory be able to query for the <article> element, if it's a page with only one "story" to tell.
Recently I faced the same problem. I developed a news article scraper and I had to detect the main textual content of the article pages. Many news sites are displaying lots of other textual content beside the "main article" (e.g 'read next', 'you might be interested in'). My first approach was to collect all text between <p> tags. But this did't work because there were news sites that used the <p> for other elements like navigation, 'read more', etc. too. Some time ago I stumbled on the Boilerpipe libary.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.
That sounded like the perfect solution for my problem, but it wasn't. It failed at many news sites, because it was often not able to parse the whole text of the news article. I don't know why, but think that the boilerpipe algorithm can't deal with badly written html. So in many cases it just returned an empty string and not the main content of the news article.
After this bad experience I tried to develop my own "article text extractor" algorithm. The main idea was to split the html into different depths, for example:
<html>
<!-- depth: 1 -->
<nav>
<!-- depth: 2 -->
<ul>
<!-- depth: 3 -->
<li>Site<!-- depth: 5 --></li>
<li>Site<!--- depth: 5 ---></li>
</ul>
</nav>
<div id='text'>
<!--- depth: 2 --->
<p>Thats the main content...<!-- depth: 3 --></p>
<p>main content, bla, bla bla ... <!-- depth: 3 --></p>
<p>bla bla bla interesting bla bla! <!-- depth: 3 --></p>
<p>whatever, bla... <!-- depth: 3 --></p>
</div>
</html>
As you can see, to filer out the surplus "clutter" with this algorithm, things like navigation elements, "you may like" sections, etc. must be on a different depth than the main content. Or in other words: the surplus "clutter" must be described with more (or less) html tags than the main textual content.
Calculate the depth of every html element.
Find the depth with the highest amount of textual content.
Select all textual content with this depth
To proof this concept I wrote a Ruby script, which works out good, with most of the news sites. In addition to the Ruby script I also developed the textracto.com api which you can use for free.
Greetings,
David
It depends very much on the page. Do you know anything about the page's structure beforehand? If you are in luck, it might provide an RSS feed that you could use or it might be marked up with some of the new HTML5 tags like <article>, <section> etc. (which carry more semantic power than pre-HTML5 tags).
I've ported the original boilerpipe java code into a pure ruby implementation Ruby Boilerpipe also a Jruby version wrapping the original Java code Jruby Boilerpipe