Why do browsers render as a regular space character? - html

Consider the following HTML snippet:
<p>Space Test</p>
When this HTML is used in a web page and the page rendered by a browser, the character actually rendered by the browser between "Space" and "Test" is a regular space character (U+0020), not a non-breaking space character (U+00A0).
(This can be observed by, for example, using the Firefox extension Character Identifier.)
I tried this in Firefox 5, Internet Explorer 8, and Chrome 12; all had the same behavior of writing out U+0020 instead of U+00A0 on the rendered web page, even though though the source document contained rather than a regular space character.
Why do browsers render a regular space character instead of a non-breaking space character in this way?

This is a relic of pre-Unicode times, when the NBSP character didn't exist in the standard character set. HTML defined the escape sequence as simply a space that shouldn't cause word wrapping.

Related

Making a 1-character HTML input field that accepts Emoji

I'm trying to make an input field of 1 character, but I'm finding it's impossible to paste Emoji or use the OSX emoji picker launched by the browser.
<input maxlength='1' value='👼'>
The emoji is shown and it's possible to paste in a 3-byte Unicode like ★, but not another Emoji. It also works fine for a maxlength of 2.
Is this a browser bug or is it compliant with HTML spec? I'm seeing same on Chrome and Firefox.
Demo
This behaviour is compliant with the HTML spec. The definition of the maxlength attribute says:
Constraint validation: If an element has a maximum allowed value length, its dirty value flag is true, its value was last changed by a user edit (as opposed to a change made by a script), and the code-unit length of the element's value is greater than the element's maximum allowed value length, then the element is suffering from being too long.
(My emphasis.) As a general rule, the web platform treats strings as sequences of 16-bit code units, rather than as sequences of Unicode characters, so characters such as emoji that are outside the basic multilingual plane are treated as having length two.
The obvious workaround is to set maxlength=2 (or ditch it altogether) and do the validation in JavaScript.

Is there a HTML character that is blank (including no whitespace) on all browsers?

Is there a HTML character that, on all (major) browsers (plus IE8 sadly) displays nothing and doesn't add any extra space?
So, an alternative to but which doesn't add whitespace to the page, and which won't ever show up as an ugly "unrecognised character" marker or ?.
Why: in my case, I'm trying to work around a problem on an old, proprietary CMS that is removing empty but necessary HTML elements that are required because other parts of the system will fill them dynamically.
Imagine something like (simplified trivial example) <span class="placeholder" data-type="username"></span> which is populated with a user's username if a user is logged in - but this old-school CMS sees it as being empty and removes it.
There seem to be two options that mostly fit the bill. They seem to reliably not show anything when in a <span>, but they (particularly the second option) might have a minor effect on copy/paste and word breaking in some cases.
Zero-width space
​ aka ​ which behaves the same as the (now in HTML5) <wbr> - used to make words break at certain points without changing the display of the words.
<h1>This text is full<span>​</span> of spans with char<span>​</span>acte<span></span>rs that affe<span>​</span>ct word brea<span></span>king but don't show up</h1>
<h1>Especially in das super<span>​</span>douper​crazy<span>​</span>long<span></span>worden.</h1>
Seems to work fine on modern browsers and IE7+ (not tested on IE6).
Soft hyphen
­ - like a zero-width space but (in theory) adds a hyphen when it breaks a word across a line.
<h1>This text is full<span>­</span> of spans with char<span>­</span>acte<span></span>rs that affe<span>­</span>ct word brea<span></span>king but don't show up</h1>
<h1>Especially in das super<span>­</span>douper­crazy<span>­</span>long<span></span>worden.</h1>
<h1>Example where das super­douper­crazy­longword contains no spans.</h1>
Fine on modern browsers and IE7+ (not tested on IE6), though as some comments note there are issues with these turning into regular hyphens when copied and pasted, for example, here's how it pastes from Chrome to Notepad, on Windows 8.1:
Within a span, it seems to never add a hyphen (but still better to use zero-width spaces if possible).
Edit: I found an older SO answer discussing these as a solution to a different problem which suggests these are robust except for possible copy/paste quirks.
The only other issue with these I could find in research is that apparently some search engines may treat words containing these as being split (e.g. awe­some might match searches for awe and some instead of awesome).
There are two characters that are graphic characters but defined to be zero width: U+200B ZERO WIDTH SPACE and U+FEFF ZERO WIDTH NO-BREAK SPACE. The former acts like a space character, so that it is a separator between words and allows line breaking in formatting, whereas the latter explicitly forbids line breaks. It depends on the purpose and context which one you should use. The can be represented in HTML as ​ and .
There characters work well in most browsing situations. However, in IE 6, they tend to be rendered as small rectangles, since IE 6 does not know these characters and tries to render them as if they were graphic characters (which lack glyphs).
There are also control characters that are allowed in HTML, such as U+200E LEFT-TO-RIGHT MARK and U+200D ZERO WIDTH JOINER. They have no rendering as such, though they may affect rendering of graphic characters, e.g. by setting writing direction, affecting ligature behavior, etc. Due to the possibility of such effects, it might be risky to use them as “dummy” characters.

How to enable line break at hyphens on Firefox

Chrome, IE, and Safari break lines at hyphens but Firefox doesn't.
Is there any way to make Firefox break lines at hyphens, like other browsers?
Insert the <wbr> tag after the hyphen. This tag is not present in any HTML specification (yet—it is in HTML5 drafts), but it has worked for a long time in browsers.
Firefox automatically treats a hyphen as allowing a line break after it when there are sufficiently many characters around the hyphen. But if you wish to allow line breaks more widely than that, use <wbr>, e.g. pre-<wbr>war.
Not easily. Try inserting a zero-width space (​) after each hyphen. For example:
a-​really-​long-​hyphenated-​phrase
This will make Firefox wrap as if there's a space, but it won't visually display that space.
It's easier to implement this if you have something processing your output server-side. Just run hyphens through a quick string replace.

displaying special characters in IE

I have a webpage that uses the special character &#xFE3E in the HTML. In Firefox 4.0B12 this looks like a double downward-pointing chevron (︾). In Internet Explorer 8.0.7600.16385, however, this it just looks like &#xFE3E.
What do I need to do to get this character to display in IE the way it does in FF?
Thanks,
PaulH
Explicitly specify a font that you know contains that character (eg in a font-family CSS rule), so that you're not relying on the font fallback functionality of the browser (which varies, but Firefox is typically better at it than IE).
U+FE3E is a character intended for use as a close bracket in vertical ideographic text (Chinese, Japanese etc). You shouldn't expect it to be available on a machine that doesn't have East Asian fonts installed, and using it to get a particular shape unrelated to parentheses is really a misuse. I would not use it on the web. There are a limited number of ‘symbol’ characters that generally render reliably across the main OS default installs and this isn't one of them.
Make sure to end the character reference with a semicolon, like ︾.
You need to include the semi-colon after the HTML code: ︾.

How to render narrow non-breaking spaces in HTML for Windows?

In French, typography requires that we use narrow non-breaking space (U+202F) at various places (“Comme ça !”).
Apparently every browser on windows fails to support that and they all display a weird character instead. This works on most browsers on Mac OS X as well as Linux.
Does anyone know how to make Windows browsers render it correctly?
(I’m assuming it’s a Windows bug rather than a browser bug since Firefox and Safari both support it as long as it’s not on Windows).
Hmmm... no. If the only problem is the fact that   (U+2009) is still breaking, I will prefer to use:
<span style="white-space:nowrap"> </span>
to correct the breaking behavior. Why?
Because the French fine is effectively using a nearly fixed width between one sixth to one fourth of a cadratin (0.166 ca. to 0.25 ca., when the standard space is 0.5 ca.) The choice of the width depends on the approach already present within the glyphs defined in a given font.
Fonts are made to adjust the width of their thin space (U+2009): if these fonts were made by French typographs (for rendering French), the approche (or gaps between letters in words) is narrower than in fonts designed for English: this is because French texts generally contain more letters than equivalent English texts (for exemple the printed Bible), and to avoid increasing the number of printed pages, the glyphs in French fonts were made a bit narrower and with a reduced approach; to compensate this reduction, the French fine was increased in size. (It is often said that U+2009 is one fifth of a cadratin, i.e. 0.2 ca., but this is wrong as this value is just a reasonnable median value, which should effectively be adjusted in fonts according to their design).
In English with an English typography, the inter-character gaps is already large enough to justify the fact that a no thin space is used in texts near most double punctuation signs. However, if French is rendered with a font using an English typography (which has larger inter-character gaps), the fine should be narrower and should be reduced to 1/6th cadratin.
So yes, U+2009 (  in the SGML repository) is slightly adjustable, depending on fonts.
In addition it may be partically justified (when full justification is used, where not just the usual inter-word spaces have their width increased, but also all gaps between characters and normal or ''fine'' spaces (but the other quad spaces MUST NOT have their width adjusted: they are really fixed).
When you are rendering a document whose page layout is already precomputed (with known fonts and with exact metrics), the thin space (U+2009) is already what you want (because you won't have to worry about the breaking behavior.
Unfortunately, Unicode forgot to assign to these quad spaces U+2000..U+2006 (and to the thin space U+2009) the non-breaking behavior in the line-breaking properties.
The only way for Unicode to correct it (for plain text documents only), was to add another character, namely U+202F (the NARROW NON-BREAKING SPACE) in Unicode 5.1, which was later given the SGML symbolic name "nnbsp" for character references (but the mapping of this named character entity to U+202F is not part of any HTML or XML standard, so this named entity should not be used as well, unless your document defines it explicitly in its embedded DTD !)
But unfortunately, most browsers have forgotten to apply this addition and why it was needed: they assume that the character should be in fonts, but this is clearly not the case.
ALL browsers SHOULD treat U+202F as non-breaking (this is already the case, even if they don't know the character in their internal copy of the UCD).
However, browsers SHOULD NOT depend on the fact that U+202F is defined in a font, instead, they SHOULD provide a fallback to U+2009 (THIN SPACE) when rendering it, each time U+202F is not mapped in the current font, but U+2009 is mapped in the same font (this is generally the case with many fonts).
So this is a problem in HTML renderers (i.e. browsers); I also think that this is more than just a problem of fonts, it is really a BUG of browsers (rather than a bug or limit in fonts), if they don't provide such fallbacks for whitespaces. Of course, all new fonts should map U+202F to the same glyph as U+2009.
Given that the thin space (U+2009, or  ) is very well supported in many fonts, and has the correct width for rendering French texts with fonts made with French typographic metrics, or for rendering English texts made with English typographic metrics, this should really be the correct fallback to use each time the narrow non-breaking space is not available !
You can perfectly emulate the desired behavior of U+202F in HTML, by just using U+2009 and making it non-breaking using CSS's "white-space:nowrap". It will always be better than changing the font-size to display a pseudo half-space (because this is not correct with many fonts for which this will still be too large, and also because this does not work as expected in spans of text that have colored backgrounds: changing the font-size modifies the line-height).
So please use this code instead in your HTML or SVG documents (keep U+202F only for plain-text documents):
<span style="white-space:nowrap"> </span>
You can save this sequence in a reusable template, that you can name Template:nnbsp in MediaWiki for example, for transcluding it in your pages as {{nnbsp}}.
Note that it is still preferable to reference the thin space symbolically as   rather than forcing an exact Unicode code point like   : the named entity can be remapped by the renderer, or according to user preferences to another working whitespace.
Note that MS-word really uses U+2009 and not U+202F for representing its own fine. It is correct, given that Word documents have a precomputed layout, and given that MS-Word enforces locally the non-breaking behavior when computing the page layout. Word documents are not plain-text documents.
Exemple of rendering (using background colors to exhibit that the line-height is not modified, but unfortunately this site does not allow setting background colors except in <code> sections like here, which use monospace fonts) :
Exemple de « fine » insécable française correctement codée !
The same without the <code> container does not display the background color, but it does use the normal proportional fonts, so that the thin space is effectively rendered as thin:
Exemple de « fine » insécable française ; correctement codée !
Example using   (NNBSP which is generally not supported in most fonts, but this may eventually work with your current browser and the fonts installed on your system, such as DejaVu Sans):
Exemple de « fine » insécable française ; correctement codée !
Example using   (SIXTH OF CADRATIN, may work but may be too narrow for your fonts, and may not exhibit the non-breaking property):
Exemple de « fine » insécable française ; correctement codée ! (hmmm... not really)
Example using (which is almost always too large):
Exemple de « fine » insécable française ; correctement codée ! (hmm... not really)
I've done a bit more digging, and it does seem like a font problem. FileFormatInfo is very useful for dealing with Unicode issues in general, and it includes a page listing the fonts that support this particular character. There is even a Flash tool (click inside the blue box on the page listing the supported fonts to get to it - I can't make a correct URL for some reason) that lists all your locally installed fonts and shows this character for each one.
Addendum 2021:
Nowadays, all actual browsers render   (narrow non-breaking space) fine.
These "hacks" aren't needed any more :-)
Why not just  ?
You could do this, but it's not ideal.
<span style="font-size:50%;"> </span>
You can press Ctrl+shift+2 in a WYSIWYG editor like CKEditor and then go to source HTML view.
The worst of hacks:
<span style="margin:-0.08em"> </span>