Accessibility of using <sup> to indicate powers (such as 4^2)? - html

Is using the <sup> tag the proper way to indicate powers? In a screen reader, wouldn't 3<sup>2</sup> just read as three-two or thirty-two? It seems like there has to be a more accessible way to do this.

There is no other markup element in HTML to indicate powers. Even the sup element does not mean exponent; it means superscript, which in turn may mean different things. According the HTML5 drafts, sup and sub should be used “only if the absence of those elements would change the meaning of the content. But they still show examples like <span lang="fr"><abbr>M<sup>lle</sup></abbr>, where superscripting is stylistic.
The treatment of sup varies by browser, but we cannot really expect browsers to read 3<sup>2</sup> as “three superscript two” or “three to the power two”. If they do such things, M<sup>lle</sup> will sound really odd.
Similar remarks can be made about the use of superscript characters, such as “²”. Their inherent meaning is to be superscripted, not to denote exponentation.
This also implies that you can reasonably consider using other markup, like 3<span class=sup>2</span>. The span element is semantically empty, but it’s not that much worse than span, which has “presentational semantics.” The reason for doing so might be that there is no really satisfactory way to style sup elements; with span, you start from a clean desk.

In general, when there is an Unicode equivalent for the subscript/superscript, you should use that character directly instead of the sub/sup element (it's not a must, just a recommendation, so go on using whatever you like more).
You can find some of those subscripted/superscripted Unicode characters in these Wikipedia articles:
Unicode subscripts and superscripts
Latin-1 Supplement (Unicode block) (for ¹, ² and ³)
Phonetic symbols in Unicode
You could enter (or copy/paste) these chars directly in your document, if you set the charset to an Unicode encoding like UTF-8. Example for HTML5: <meta charset="utf-8">. Otherwise you'd use character entities.
Exception: in a mathematical context, you should use MathML with msub/msup (so don't use these special subscripted/superscripted Unicode characters there). You can use MathML in HTML5 within the math element.
Source for these two "shoulds": Unicode in XML and other Markup Languages: 5.6 Superscripts and Subscripts (via blog post in German)
It seems like there has to be a more accessible way to do this.
By marking up these subscripts/superscripts equivalents with the sub/sup elements OR by using the relevant Unicode chars, you are doing exactly your job: telling user-agents the "meaning", so they can know how to process the text, e.g. for in-document search ('CTRL + f' in browsers), text-to-speech (screenreaders), web search (search engines), …
If a screenreader wouldn't correctly read/announce 3² or 3<sup>2</sup>, it would be a (serious) bug which they should fix.
The only problem for such user-agents would be if people misuse these characters/elements. It's not relevant in this case, but maybe for other readers: When the subscript/superscript is stylistic only (and carries no new meaning by being subscripted/superscripted), you should use CSS (instead of the special Unicode characters or sub/sup elements): vertical-align with sub/super (and you'd have to adjust the font size, too. See this answer to a different question.)

Related

Do I need to use HTML tags for en as well as em dashes?

I'm editing the HTML for an ebook. I'm using — (—) for when there is an em dash, but do I need to do the same for a regular hyphen, as in "micro-dot" or "over-sensitive"? Or can I just leave the "-" as-is in the text?
You do not need to use the entity —, since you can enter “—” as such, if the e-book is UTF-8 encoded, as it should. Neither do you need to change — to the em dash itself, if you now have it in the data.
There is no need to escape the common hyphen, i.e. the Ascii hyphen (officially called HYPHEN-MINUS in Unicode), in any way in HTML.
Note that at least according to Merriam-Webster, the words “microdot” and “oversensitive” are written without a hyphen. If you would like the spell them that way but specify an allowed hyphenation point (for automatic hyphenation by a browser), you would use the SOFT HYPHEN character (U+00AD). It, too, can be written as such in HTML, but since it is normally invisible, you might find it more convenient to use a named character reference ­ for it, e.g.
micro­dot

what is the diference to output superscript using the below two methods in html

in html, i can use both &trade and <sup></sup> to output superscript. Then, what is the difference between '&trade' and '<sup></sup>'?
™ is a single character, whereas <sup></sup> changes the way "normal" characters are displayed.
Using <sup>TM</sup> is just displaying the two letters T and M using superscript text layout, whereas ™ is the Unicode Character 'TRADE MARK SIGN' (U+2122).
Practically speaking:
™ might not survive copy-and-paste across different documents (given how broken some applications are with respect to Unicode).
It's not necessarily available in every font you might use.
Anyone searching for "TM" won't find ™ (unless the software they're using is clever about that specific case).
with <sup> you can add any superscript. &trade is for TM only
http://jsfiddle.net/btevfik/4quqt/
™ is the entity for the trademark symbol ™. <sup> is for superscript in general. You can include any text at all in between the <sup> tags.
As far as ™ vs <sup>tm</sup> goes, the former should use the correct trademark glyph of the font you are using or the fallback. It was more than likely designed by the type designer to look optimal, while with <sup> the browser more than likely just scaled the t and m characters down. The former will have better spacing, look clearer at small sizes, and be generally more aesthetically pleasing.

Is it advisable to have the ® symbol in an alt attribute?

Is it advisable to have the ® symbol in <img>'s tag alt attribute or not? I am interested in knowing if there are problems with using the Registered symbol, such as not rendering properly in some browsers.
It all depends on your character encoding for the file. If the encoding is set correctly, the browser should display it correctly. There is nothing in the spec that suggests otherwise.
By the specifications, an alt attribute value may contain any character (though some characters need to be escaped in HTML markup). In practice, old browsers had many limitations in this area, but this is hardly relevant these days.
The main questions are: 1) When the attribute value is rendered as text, will the fonts available contain the character? Most probably. The “®” character is present in all normal fonts. 2) When the attribute value is rendered in speech or Braille, what will happen? I would not be so sure of this. Speech browsers might not know what to do with special characters. And would you really like to have alt="ACME® and Foobar® products" read e.g. as “ACME registered sign and Foobar registered sign products”? What purpose would it serve?
There are also font quality issues. Many fonts contain “®” in a rather small size, whereas a few fonts like Cambria contain a rather large “®”. The font used to render alt attributes might be outside the author’s control. (In many browsers, it is affected by CSS, but e.g. IE 9 uses a font determined by system settings.)
The bottom line is: No, it is not advisable, but it is technically possible, with various risks.

Does HTML support alternative text for special characters (e.g., accented characters)?

I envision HTML support that might look like this:
<span alt="Antonin Dvorak">Antonín Dvořák</span>
where if a browser could not render any of the special characters, it could fall back to the plain-ASCII "alt" text. Another benefit could be that searching for "cafe" would match "café" (which my browsers don't, at least not at present).
Is there any way to achieve something like this, or am I just being paranoid about a non-existent problem?
Thanks.
No, there is no such markup in HTML. What comes closest is the title attribute, which is usually shown as a tooltip on mouseover (and spoken by speech synthesizers in some situations). But it’s a dull weapon, a feature with poor implementations; if you want something like that, use a CSS tooltip instead. And it’s not really an alternative but “advisory title”.
The best you can do is to make a reasonable effort in ensuring that the characters you use will be properly displayed thanks to the use of suitable fonts. This isn’t usually a problem with Czech letters for example, since they are normally present in fonts that web pages typically use, like Arial, Verdana, Georgia. But it could be a problem if you use a downloadable font, or if you use characters with more limited support. The general idea is to use a font-family list that contains only fonts that have all the characters used on the page, and to use such a list that almost all computers have at least one of the font families. More on this: Guide to using special characters in HTML.

HTML tags for translation

What HTML markup and tags should I use if write in article.
This `foreign word` translated from foreign language as `this word in native reader language word`.
Use the most appropriate markup (using a generic element if nothing better presents itself) with a lang attribute.
<body lang="en">
<!-- etc -->
<p><span lang="de">unbekanntes Flugobjekt</span> is German for UFO.</p>
This won't generally provide automatic translation, but the option exists for browsers / browser extensions to provide such a mechanism. Translation tools such as Google Translate may use it as a hint to identify the "from" language. Text to speech software may use it to select a pronunciation guide. And so on.
There is no HTML markup specifically for such purposes. It really depends on the conventions of the human language used on the page, as well as presentation style. Typically, either quotation marks or italic is used when mentioning words or expressions, rather than using them in normal use. For these, there are different options in HTML. Quotation marks are best written as such, using proper characters as per language rules, though some people still think that q markup is useful. For italic, you can use i markup or CSS font-style: italic.
In any case, if it is relevant to your purposes somehow that translations are marked up, e.g. in order to style them uniformly later, the best shot is to use classes.
The use of lang markup is recommendable in principle, and it is gaining some practical importance (e.g., for automatic hyphenation). In the following example, the span markup is used only to indicate the language (because you need an element for that):
The French word “<span lang=fr>cheval</span>” means “horse”.