Is it advisable to have the ® symbol in <img>'s tag alt attribute or not? I am interested in knowing if there are problems with using the Registered symbol, such as not rendering properly in some browsers.
It all depends on your character encoding for the file. If the encoding is set correctly, the browser should display it correctly. There is nothing in the spec that suggests otherwise.
By the specifications, an alt attribute value may contain any character (though some characters need to be escaped in HTML markup). In practice, old browsers had many limitations in this area, but this is hardly relevant these days.
The main questions are: 1) When the attribute value is rendered as text, will the fonts available contain the character? Most probably. The “®” character is present in all normal fonts. 2) When the attribute value is rendered in speech or Braille, what will happen? I would not be so sure of this. Speech browsers might not know what to do with special characters. And would you really like to have alt="ACME® and Foobar® products" read e.g. as “ACME registered sign and Foobar registered sign products”? What purpose would it serve?
There are also font quality issues. Many fonts contain “®” in a rather small size, whereas a few fonts like Cambria contain a rather large “®”. The font used to render alt attributes might be outside the author’s control. (In many browsers, it is affected by CSS, but e.g. IE 9 uses a font determined by system settings.)
The bottom line is: No, it is not advisable, but it is technically possible, with various risks.
Related
I would like to use the UTF-8 character ✖ on my site but I am not sure if this will be supported cross browser.
I am worried that:
a) Users will not have access to a font containing that character
b) IE will not find the character even if the user has a font that could display it. I am worried about this because of this info:
By the specifications, browsers should display a character if there is any font in the system that contains it. If the fonts specified by the author (in CSS font-family settings or, rarely these days, using font markup in HTML) do not contain the character, browsers are supposed to use fallback fonts. The same applies if no fonts are specified by the author; browsers should use primarily their default fonts, using alternate fonts for any character not covered by the primary font.
In practice, things don’t always work that way. Especially IE is notorious for its failures in this respect. It often fails to display a character, even though it could do that if it used all the fonts in the system. If a browser cannot render a character, it may show a small rectangle, possibly containing a question mark, ?, or some similar indicator. Here’s a quick test (character U+0840, which is probably not supported by any font on your computer): ࡀ.
Source.
c) Other issues that I have though of.
There is a resource called Unify, that will show what devices the character is supported on but it currently (Sept 14, 2015) only suport 107 characters.
So to summarize, the question is: How can I determine if it is safe to use a utf-8 special character on my site? Is it safe to use ✖ specifically on my site?
It's always safe - your user's computers won't suddenly burst into flame.
From a technical perspective, your best bet is to use a web font that has support for every Unicode character you want to use. That is not a catch-all (the user might have web fonts disabled or is using a command line browser, etc...), but it should support the vast majority of computers.
From there I would apply common sense. If the displaying of a character is absolutely crucial and lives depend on it, try to not use Unicode. Otherwise I'd say 'go ahead'.
This is as much a UX question as it is a technical one, so I will mention both.
As a comparison, on my IE11 browser, it looks like this: , but on my Firefox 31.8, it looks like this: . A good user experience is generally associated with consistency, and this approach is not very portable. So from a UX perspective, this is not a great solution.
I would say using a tiny *.gif or *.bmp, or even *.png if you need transparency, is a better solution. Even better yet, go with *.svg so scaling will not be an issue. From a technical aspect, the overhead of something that small is generally insignificant.
The only problem you can face is that exotic symbols are not implemented in many fonts, so the user can see a dummy character (e.g. square) instead of this. I personally like to use svg symbols for this purpose.
An alternative solution would be to use a web font with those icons in it (although probably a subset version of, so that it's less and 1 kb and doesn't weight down your pages).
HTML entities are not working on chrome and IE (on windows).
I have entered the following code in my page and it works fine on mac chrome or firefox or safari, but not on windows.
<span class="font-family:Arial;">〈 〉 〉 〈 </span>
This is primarily a font issue, though there is a nasty silent change in HTML specs involved.
Modern browsers interpret 〈 and 〉 as referring to U+27E8 MATHEMATICAL LEFT ANGLE BRACKET “⟨” and U+27E9 MATHEMATICAL RIGHT ANGLE BRACKET “⟩”, informally known as “bra” and “ket”. This interpretation is being made official in the named character references section of HTML5.
These characters are adequate for use in many mathematical notations, and the ISO 80000-2 standard explicitly specifies that they are used e.g. for certain scalar product notations. But support to them in fonts is rather limited. In old Windows systems, no font contains them. In newer Windows systems, from Windows Vista onwards, Cambria Math should be available. It is possible that you have been testing on an old Windows version, but it is also possible that Chrome is unable to find the right font. To give it a helping hand, use a CSS rule that suggests that font, e.g. with the attribute
style="font-family: Cambria Math"
You might consider adding some other fonts to the list, using fonts that are known to contain the characters. See my Guide to using special characters in HTML.
The nasty change is that in HTML 4.01, in the entities section, 〈 and 〉 are defined as referring to U+2329 LEFT-POINTING ANGLE BRACKET “〈” and U+232A RIGHT-POINTING ANGLE BRACKET “〉”. They are logically less satisfactory (and deprecated by the Unicode Standard), but they have somewhat wider font support.
So in addition to declaring fonts that contain the characters you use, you need to decide which pair of these characters you use or whether you use something else; it's a complicated question. If you use them, it is best to use them as such (in a UTF-8 encoded HTML document) or using numeric character references such as ⟨. The reason is that 〈 and 〉 should not be expected to work consistently; they probably work the HTML5 way in all modern browsers, but there is hardly any reason to take the risk, when you can unambiguously indicate the characters you want.
That particular character is simply a unicode codepoint which is an arbitrary number. There are a lot of unicode codepoints that do not have an 'official' symbol. Even if they do have a symbol, it is not necessarily the case that your font has a symbol for that codepoint. If you choose a different font, you may end up with a different symbol.
I looked at the CSS for the page and it shows this character displaying in Arial (plus a bunch of other fonts that do not matter). Windows comes with Arial so it should always pick up that font first. It looks like Arial does not have a symbol for that unicode codepoint. Anytime you do not have a glyph for a codepoint, it puts in some form of a box indicating there is no glyph
It depends on the entity, and the fonts on the system your reader is using. The issue is that these characters are not in the MathJax web fonts, so MathJax has to fall back on system fonts to find them. Some browsers are better at that than others. Your configuration controls what fonts MathJax lists for the browser to look in, so you may want to modify that to include fonts where you know your entities can be found (and you may want to think about the fact that you may have people reading your site on Windows, Mac, and Linux, and also mobile devices, so such decisions are not always easy).
Notice that when you install STIX fonts, it works for you. This is because STIXGeneral is in the default list of fonts that MathJax uses for unknown characters. You want to add others to that list (it is stored in the undefinedFamily property of the HTML-CSS and SVG sections of your configuration). Note however, that IE will stop checking fonts once it encounters a font that is installed on the system, even if it doesn't include the needed character and later fonts in the list do, so you have to be careful about the order that you use.
I envision HTML support that might look like this:
<span alt="Antonin Dvorak">Antonín Dvořák</span>
where if a browser could not render any of the special characters, it could fall back to the plain-ASCII "alt" text. Another benefit could be that searching for "cafe" would match "café" (which my browsers don't, at least not at present).
Is there any way to achieve something like this, or am I just being paranoid about a non-existent problem?
Thanks.
No, there is no such markup in HTML. What comes closest is the title attribute, which is usually shown as a tooltip on mouseover (and spoken by speech synthesizers in some situations). But it’s a dull weapon, a feature with poor implementations; if you want something like that, use a CSS tooltip instead. And it’s not really an alternative but “advisory title”.
The best you can do is to make a reasonable effort in ensuring that the characters you use will be properly displayed thanks to the use of suitable fonts. This isn’t usually a problem with Czech letters for example, since they are normally present in fonts that web pages typically use, like Arial, Verdana, Georgia. But it could be a problem if you use a downloadable font, or if you use characters with more limited support. The general idea is to use a font-family list that contains only fonts that have all the characters used on the page, and to use such a list that almost all computers have at least one of the font families. More on this: Guide to using special characters in HTML.
Is using the <sup> tag the proper way to indicate powers? In a screen reader, wouldn't 3<sup>2</sup> just read as three-two or thirty-two? It seems like there has to be a more accessible way to do this.
There is no other markup element in HTML to indicate powers. Even the sup element does not mean exponent; it means superscript, which in turn may mean different things. According the HTML5 drafts, sup and sub should be used “only if the absence of those elements would change the meaning of the content. But they still show examples like <span lang="fr"><abbr>M<sup>lle</sup></abbr>, where superscripting is stylistic.
The treatment of sup varies by browser, but we cannot really expect browsers to read 3<sup>2</sup> as “three superscript two” or “three to the power two”. If they do such things, M<sup>lle</sup> will sound really odd.
Similar remarks can be made about the use of superscript characters, such as “²”. Their inherent meaning is to be superscripted, not to denote exponentation.
This also implies that you can reasonably consider using other markup, like 3<span class=sup>2</span>. The span element is semantically empty, but it’s not that much worse than span, which has “presentational semantics.” The reason for doing so might be that there is no really satisfactory way to style sup elements; with span, you start from a clean desk.
In general, when there is an Unicode equivalent for the subscript/superscript, you should use that character directly instead of the sub/sup element (it's not a must, just a recommendation, so go on using whatever you like more).
You can find some of those subscripted/superscripted Unicode characters in these Wikipedia articles:
Unicode subscripts and superscripts
Latin-1 Supplement (Unicode block) (for ¹, ² and ³)
Phonetic symbols in Unicode
You could enter (or copy/paste) these chars directly in your document, if you set the charset to an Unicode encoding like UTF-8. Example for HTML5: <meta charset="utf-8">. Otherwise you'd use character entities.
Exception: in a mathematical context, you should use MathML with msub/msup (so don't use these special subscripted/superscripted Unicode characters there). You can use MathML in HTML5 within the math element.
Source for these two "shoulds": Unicode in XML and other Markup Languages: 5.6 Superscripts and Subscripts (via blog post in German)
It seems like there has to be a more accessible way to do this.
By marking up these subscripts/superscripts equivalents with the sub/sup elements OR by using the relevant Unicode chars, you are doing exactly your job: telling user-agents the "meaning", so they can know how to process the text, e.g. for in-document search ('CTRL + f' in browsers), text-to-speech (screenreaders), web search (search engines), …
If a screenreader wouldn't correctly read/announce 3² or 3<sup>2</sup>, it would be a (serious) bug which they should fix.
The only problem for such user-agents would be if people misuse these characters/elements. It's not relevant in this case, but maybe for other readers: When the subscript/superscript is stylistic only (and carries no new meaning by being subscripted/superscripted), you should use CSS (instead of the special Unicode characters or sub/sup elements): vertical-align with sub/super (and you'd have to adjust the font size, too. See this answer to a different question.)
Consider an HTML page which is encoded as UTF-8, and a bizarre unicode character appears in it - form a rare language or some other Unicode idiosyncrasy.
Is there a standard behavior for such scenario? Will the browser try to find an appropriate font? Can the browser behavior be configured using HTML parameters?
The CSS 2.1 font matching algorithm means that a browser shall select, for each character, a glyph from the fonts suggested in the applicable font-family declarations and, failing that, use a browser-dependent default font. If even it does not contain the character, then “the UA [= browser] may use other means to determine a suitable font for that character. The UA should map each character for which it has no suitable font to a visible symbol chosen by the UA, preferably a ‘missing character’ glyph from one of the font faces available to the UA.”
So it is pretty well defined, but with browser dependencies. The algorithm allows a browser to display a missing character symbol even if some of the fonts in the system contains a glyph for it. Modern browsers usually don’t do that, but IE isn’t particularly modern in this respect either. Moreover, there are quirks and oddities in browsers, partly because they sometimes fail to get proper information about a font from the font itself.
You can’t configure the basic behavior, but you can play by its rules. The thing that works best is the use of author-supplied font families. If you have an odd character, you should try and determine a set of fonts that contain it and write a suitable CSS rule. However, for very rare characters the options are really: 1) the use of a downloadable font for it, 2) the use of an image. More info: http://www.cs.tut.fi/~jkorpela/html/characters.html
Yes, the browser will typically try to display it in some font as best it can. Some browsers/operating system do a better job than others. Some may simply give up if the default font for the page doesn't contain the character, but most will try to find other installed fonts that contain the character. If none matches, the browser will display some placeholder, usually a square.
And that's all. Nothing bizarre about it, that's how font rendering works.