In French, typography requires that we use narrow non-breaking space (U+202F) at various places (“Comme ça !”).
Apparently every browser on windows fails to support that and they all display a weird character instead. This works on most browsers on Mac OS X as well as Linux.
Does anyone know how to make Windows browsers render it correctly?
(I’m assuming it’s a Windows bug rather than a browser bug since Firefox and Safari both support it as long as it’s not on Windows).
Hmmm... no. If the only problem is the fact that (U+2009) is still breaking, I will prefer to use:
<span style="white-space:nowrap"> </span>
to correct the breaking behavior. Why?
Because the French fine is effectively using a nearly fixed width between one sixth to one fourth of a cadratin (0.166 ca. to 0.25 ca., when the standard space is 0.5 ca.) The choice of the width depends on the approach already present within the glyphs defined in a given font.
Fonts are made to adjust the width of their thin space (U+2009): if these fonts were made by French typographs (for rendering French), the approche (or gaps between letters in words) is narrower than in fonts designed for English: this is because French texts generally contain more letters than equivalent English texts (for exemple the printed Bible), and to avoid increasing the number of printed pages, the glyphs in French fonts were made a bit narrower and with a reduced approach; to compensate this reduction, the French fine was increased in size. (It is often said that U+2009 is one fifth of a cadratin, i.e. 0.2 ca., but this is wrong as this value is just a reasonnable median value, which should effectively be adjusted in fonts according to their design).
In English with an English typography, the inter-character gaps is already large enough to justify the fact that a no thin space is used in texts near most double punctuation signs. However, if French is rendered with a font using an English typography (which has larger inter-character gaps), the fine should be narrower and should be reduced to 1/6th cadratin.
So yes, U+2009 ( in the SGML repository) is slightly adjustable, depending on fonts.
In addition it may be partically justified (when full justification is used, where not just the usual inter-word spaces have their width increased, but also all gaps between characters and normal or ''fine'' spaces (but the other quad spaces MUST NOT have their width adjusted: they are really fixed).
When you are rendering a document whose page layout is already precomputed (with known fonts and with exact metrics), the thin space (U+2009) is already what you want (because you won't have to worry about the breaking behavior.
Unfortunately, Unicode forgot to assign to these quad spaces U+2000..U+2006 (and to the thin space U+2009) the non-breaking behavior in the line-breaking properties.
The only way for Unicode to correct it (for plain text documents only), was to add another character, namely U+202F (the NARROW NON-BREAKING SPACE) in Unicode 5.1, which was later given the SGML symbolic name "nnbsp" for character references (but the mapping of this named character entity to U+202F is not part of any HTML or XML standard, so this named entity should not be used as well, unless your document defines it explicitly in its embedded DTD !)
But unfortunately, most browsers have forgotten to apply this addition and why it was needed: they assume that the character should be in fonts, but this is clearly not the case.
ALL browsers SHOULD treat U+202F as non-breaking (this is already the case, even if they don't know the character in their internal copy of the UCD).
However, browsers SHOULD NOT depend on the fact that U+202F is defined in a font, instead, they SHOULD provide a fallback to U+2009 (THIN SPACE) when rendering it, each time U+202F is not mapped in the current font, but U+2009 is mapped in the same font (this is generally the case with many fonts).
So this is a problem in HTML renderers (i.e. browsers); I also think that this is more than just a problem of fonts, it is really a BUG of browsers (rather than a bug or limit in fonts), if they don't provide such fallbacks for whitespaces. Of course, all new fonts should map U+202F to the same glyph as U+2009.
Given that the thin space (U+2009, or ) is very well supported in many fonts, and has the correct width for rendering French texts with fonts made with French typographic metrics, or for rendering English texts made with English typographic metrics, this should really be the correct fallback to use each time the narrow non-breaking space is not available !
You can perfectly emulate the desired behavior of U+202F in HTML, by just using U+2009 and making it non-breaking using CSS's "white-space:nowrap". It will always be better than changing the font-size to display a pseudo half-space (because this is not correct with many fonts for which this will still be too large, and also because this does not work as expected in spans of text that have colored backgrounds: changing the font-size modifies the line-height).
So please use this code instead in your HTML or SVG documents (keep U+202F only for plain-text documents):
<span style="white-space:nowrap"> </span>
You can save this sequence in a reusable template, that you can name Template:nnbsp in MediaWiki for example, for transcluding it in your pages as {{nnbsp}}.
Note that it is still preferable to reference the thin space symbolically as rather than forcing an exact Unicode code point like : the named entity can be remapped by the renderer, or according to user preferences to another working whitespace.
Note that MS-word really uses U+2009 and not U+202F for representing its own fine. It is correct, given that Word documents have a precomputed layout, and given that MS-Word enforces locally the non-breaking behavior when computing the page layout. Word documents are not plain-text documents.
Exemple of rendering (using background colors to exhibit that the line-height is not modified, but unfortunately this site does not allow setting background colors except in <code> sections like here, which use monospace fonts) :
Exemple de « fine » insécable française correctement codée !
The same without the <code> container does not display the background color, but it does use the normal proportional fonts, so that the thin space is effectively rendered as thin:
Exemple de « fine » insécable française ; correctement codée !
Example using (NNBSP which is generally not supported in most fonts, but this may eventually work with your current browser and the fonts installed on your system, such as DejaVu Sans):
Exemple de « fine » insécable française ; correctement codée !
Example using (SIXTH OF CADRATIN, may work but may be too narrow for your fonts, and may not exhibit the non-breaking property):
Exemple de « fine » insécable française ; correctement codée ! (hmmm... not really)
Example using (which is almost always too large):
Exemple de « fine » insécable française ; correctement codée ! (hmm... not really)
I've done a bit more digging, and it does seem like a font problem. FileFormatInfo is very useful for dealing with Unicode issues in general, and it includes a page listing the fonts that support this particular character. There is even a Flash tool (click inside the blue box on the page listing the supported fonts to get to it - I can't make a correct URL for some reason) that lists all your locally installed fonts and shows this character for each one.
Addendum 2021:
Nowadays, all actual browsers render (narrow non-breaking space) fine.
These "hacks" aren't needed any more :-)
Why not just ?
You could do this, but it's not ideal.
<span style="font-size:50%;"> </span>
You can press Ctrl+shift+2 in a WYSIWYG editor like CKEditor and then go to source HTML view.
The worst of hacks:
<span style="margin:-0.08em"> </span>
Related
Is there a HTML character that, on all (major) browsers (plus IE8 sadly) displays nothing and doesn't add any extra space?
So, an alternative to but which doesn't add whitespace to the page, and which won't ever show up as an ugly "unrecognised character" marker or ?.
Why: in my case, I'm trying to work around a problem on an old, proprietary CMS that is removing empty but necessary HTML elements that are required because other parts of the system will fill them dynamically.
Imagine something like (simplified trivial example) <span class="placeholder" data-type="username"></span> which is populated with a user's username if a user is logged in - but this old-school CMS sees it as being empty and removes it.
There seem to be two options that mostly fit the bill. They seem to reliably not show anything when in a <span>, but they (particularly the second option) might have a minor effect on copy/paste and word breaking in some cases.
Zero-width space
aka which behaves the same as the (now in HTML5) <wbr> - used to make words break at certain points without changing the display of the words.
<h1>This text is full<span></span> of spans with char<span></span>acte<span></span>rs that affe<span></span>ct word brea<span></span>king but don't show up</h1>
<h1>Especially in das super<span></span>doupercrazy<span></span>long<span></span>worden.</h1>
Seems to work fine on modern browsers and IE7+ (not tested on IE6).
Soft hyphen
- like a zero-width space but (in theory) adds a hyphen when it breaks a word across a line.
<h1>This text is full<span></span> of spans with char<span></span>acte<span></span>rs that affe<span></span>ct word brea<span></span>king but don't show up</h1>
<h1>Especially in das super<span></span>doupercrazy<span></span>long<span></span>worden.</h1>
<h1>Example where das superdoupercrazylongword contains no spans.</h1>
Fine on modern browsers and IE7+ (not tested on IE6), though as some comments note there are issues with these turning into regular hyphens when copied and pasted, for example, here's how it pastes from Chrome to Notepad, on Windows 8.1:
Within a span, it seems to never add a hyphen (but still better to use zero-width spaces if possible).
Edit: I found an older SO answer discussing these as a solution to a different problem which suggests these are robust except for possible copy/paste quirks.
The only other issue with these I could find in research is that apparently some search engines may treat words containing these as being split (e.g. awesome might match searches for awe and some instead of awesome).
There are two characters that are graphic characters but defined to be zero width: U+200B ZERO WIDTH SPACE and U+FEFF ZERO WIDTH NO-BREAK SPACE. The former acts like a space character, so that it is a separator between words and allows line breaking in formatting, whereas the latter explicitly forbids line breaks. It depends on the purpose and context which one you should use. The can be represented in HTML as and .
There characters work well in most browsing situations. However, in IE 6, they tend to be rendered as small rectangles, since IE 6 does not know these characters and tries to render them as if they were graphic characters (which lack glyphs).
There are also control characters that are allowed in HTML, such as U+200E LEFT-TO-RIGHT MARK and U+200D ZERO WIDTH JOINER. They have no rendering as such, though they may affect rendering of graphic characters, e.g. by setting writing direction, affecting ligature behavior, etc. Due to the possibility of such effects, it might be risky to use them as “dummy” characters.
HTML entities are not working on chrome and IE (on windows).
I have entered the following code in my page and it works fine on mac chrome or firefox or safari, but not on windows.
<span class="font-family:Arial;">〈 〉 〉 〈 </span>
This is primarily a font issue, though there is a nasty silent change in HTML specs involved.
Modern browsers interpret 〈 and 〉 as referring to U+27E8 MATHEMATICAL LEFT ANGLE BRACKET “⟨” and U+27E9 MATHEMATICAL RIGHT ANGLE BRACKET “⟩”, informally known as “bra” and “ket”. This interpretation is being made official in the named character references section of HTML5.
These characters are adequate for use in many mathematical notations, and the ISO 80000-2 standard explicitly specifies that they are used e.g. for certain scalar product notations. But support to them in fonts is rather limited. In old Windows systems, no font contains them. In newer Windows systems, from Windows Vista onwards, Cambria Math should be available. It is possible that you have been testing on an old Windows version, but it is also possible that Chrome is unable to find the right font. To give it a helping hand, use a CSS rule that suggests that font, e.g. with the attribute
style="font-family: Cambria Math"
You might consider adding some other fonts to the list, using fonts that are known to contain the characters. See my Guide to using special characters in HTML.
The nasty change is that in HTML 4.01, in the entities section, 〈 and 〉 are defined as referring to U+2329 LEFT-POINTING ANGLE BRACKET “〈” and U+232A RIGHT-POINTING ANGLE BRACKET “〉”. They are logically less satisfactory (and deprecated by the Unicode Standard), but they have somewhat wider font support.
So in addition to declaring fonts that contain the characters you use, you need to decide which pair of these characters you use or whether you use something else; it's a complicated question. If you use them, it is best to use them as such (in a UTF-8 encoded HTML document) or using numeric character references such as ⟨. The reason is that 〈 and 〉 should not be expected to work consistently; they probably work the HTML5 way in all modern browsers, but there is hardly any reason to take the risk, when you can unambiguously indicate the characters you want.
That particular character is simply a unicode codepoint which is an arbitrary number. There are a lot of unicode codepoints that do not have an 'official' symbol. Even if they do have a symbol, it is not necessarily the case that your font has a symbol for that codepoint. If you choose a different font, you may end up with a different symbol.
I looked at the CSS for the page and it shows this character displaying in Arial (plus a bunch of other fonts that do not matter). Windows comes with Arial so it should always pick up that font first. It looks like Arial does not have a symbol for that unicode codepoint. Anytime you do not have a glyph for a codepoint, it puts in some form of a box indicating there is no glyph
It depends on the entity, and the fonts on the system your reader is using. The issue is that these characters are not in the MathJax web fonts, so MathJax has to fall back on system fonts to find them. Some browsers are better at that than others. Your configuration controls what fonts MathJax lists for the browser to look in, so you may want to modify that to include fonts where you know your entities can be found (and you may want to think about the fact that you may have people reading your site on Windows, Mac, and Linux, and also mobile devices, so such decisions are not always easy).
Notice that when you install STIX fonts, it works for you. This is because STIXGeneral is in the default list of fonts that MathJax uses for unknown characters. You want to add others to that list (it is stored in the undefinedFamily property of the HTML-CSS and SVG sections of your configuration). Note however, that IE will stop checking fonts once it encounters a font that is installed on the system, even if it doesn't include the needed character and later fonts in the list do, so you have to be careful about the order that you use.
Consider an HTML page which is encoded as UTF-8, and a bizarre unicode character appears in it - form a rare language or some other Unicode idiosyncrasy.
Is there a standard behavior for such scenario? Will the browser try to find an appropriate font? Can the browser behavior be configured using HTML parameters?
The CSS 2.1 font matching algorithm means that a browser shall select, for each character, a glyph from the fonts suggested in the applicable font-family declarations and, failing that, use a browser-dependent default font. If even it does not contain the character, then “the UA [= browser] may use other means to determine a suitable font for that character. The UA should map each character for which it has no suitable font to a visible symbol chosen by the UA, preferably a ‘missing character’ glyph from one of the font faces available to the UA.”
So it is pretty well defined, but with browser dependencies. The algorithm allows a browser to display a missing character symbol even if some of the fonts in the system contains a glyph for it. Modern browsers usually don’t do that, but IE isn’t particularly modern in this respect either. Moreover, there are quirks and oddities in browsers, partly because they sometimes fail to get proper information about a font from the font itself.
You can’t configure the basic behavior, but you can play by its rules. The thing that works best is the use of author-supplied font families. If you have an odd character, you should try and determine a set of fonts that contain it and write a suitable CSS rule. However, for very rare characters the options are really: 1) the use of a downloadable font for it, 2) the use of an image. More info: http://www.cs.tut.fi/~jkorpela/html/characters.html
Yes, the browser will typically try to display it in some font as best it can. Some browsers/operating system do a better job than others. Some may simply give up if the default font for the page doesn't contain the character, but most will try to find other installed fonts that contain the character. If none matches, the browser will display some placeholder, usually a square.
And that's all. Nothing bizarre about it, that's how font rendering works.
I have a webpage that uses the special character ︾ in the HTML. In Firefox 4.0B12 this looks like a double downward-pointing chevron (︾). In Internet Explorer 8.0.7600.16385, however, this it just looks like ︾.
What do I need to do to get this character to display in IE the way it does in FF?
Thanks,
PaulH
Explicitly specify a font that you know contains that character (eg in a font-family CSS rule), so that you're not relying on the font fallback functionality of the browser (which varies, but Firefox is typically better at it than IE).
U+FE3E is a character intended for use as a close bracket in vertical ideographic text (Chinese, Japanese etc). You shouldn't expect it to be available on a machine that doesn't have East Asian fonts installed, and using it to get a particular shape unrelated to parentheses is really a misuse. I would not use it on the web. There are a limited number of ‘symbol’ characters that generally render reliably across the main OS default installs and this isn't one of them.
Make sure to end the character reference with a semicolon, like ︾.
You need to include the semi-colon after the HTML code: ︾.
Suppose you have an HTML document with non-breaking spaces ( ). In IE 6 - 8 running on Windows XP, when you select the non-breaking spaces and copy/paste them, they will be copied/pasted as "normal" spaces (U+0020).
Does anyone know of any systems, browsers, etc., or combinations of, that will not exhibit this behavior. That is, the non-breaking spaces will copy and/or paste as a non-breaking space (U+00A0)?
EDIT: To provide a little more context: the application I'm working on has been localized. I suspect that most North/South American and European systems will behave similarly. I'm somewhat concerned about Asian languages and systems.
While I'm not aware of differences between browsers in terms of how they handle copied / pasted text, I would suggest that it is actually the operating system's clipboard that would be responsible for interpreting the character encoding of an HTML page's text (only guessing here, though).
Either way - I would suggest that your best option to ensure that your copied text is interpreted correctly would be to include the lang attribute in your page elements (ref: W3C Recommendations). This would explicitly set a locale for a given element if that wasn't immediately clear by your page's content type declaration in the <head> meta data.
Outside of making sure that your HTML is semantically correct, I can't see how else you would be able to accommodate or predict regional differences.