Browser support for UTF-8 symbols

Browser support for UTF-8 symbols - html

I would like to replace some of the image icons in one of my websites with UTF-8 symbols. It seems that by far not every font in every browser supports all UTF-8 characters. Is there an overview anywhere which UTF-8 symbols are safe to use with the standard browser fonts (Verdana, Arial, etc.)?

No font supports all Unicode characters (which is what you mean by “UTF-8 characters”). Verdana and Arial are not standard, just common, and a font may exist in different versions (with different character repertoires).
So there is no general answer. But as a rough guide, the Windows Glyph List 4 is an attempt at defining a useful pan-European character collection, something that is sufficient for normal needs of European languages, including many commonly used special characters. It was meant to provide a guideline to font designers, and many modern fonts actually cover it, or most of it.
Replacing image icons by characters is usually unproductive, since things commonly used as icons are either not encoded as characters at all or encoded only recently and have very limited (or no) font support. But this really depends on the kind of symbols used and on the requirements on their shapes.

Related

Why does chrome render different unicode characters ”你”(U+2F804)”和你”(U+4F60) are the same characters

In Windows 7 system, Chrome uses the "Microsoft YaHei" font to display characters U+2F804 (你) as U+4F60 (你)
but there is no U+2F804 corresponding character in this font.
The results found using fontCreator are shown below
In windows 10 System, because There is Yu Gothic font, so the result is correct.
What puzzles me is why Windows 7 will show up as U+4F60(你)
The code's URL is：http://yanglikun.github.io/encoding/code.html
I think it should display question mark、口、or other characters when there is no corresponding character in the font of Microsoft YaHei, but not the wrong character U+4F60(你)

A note: unicode code point and font glyphs are not directly related. The actual glyph depends on context, ligature, combining characters, language, and possibly other factors (see Unicode standard).
Unicode defines that U+2F804 is decomposable to U+4F60. Often Unicode texts are normalized by software. Either by decomposing them (so often splitting characters and accents e.g. for Latin languages), or by composing them. Such algorithms are described in Unicode. So in that case, it is considered U+4F60 semantically fully equal to U+2F804 (and preferred form). It is not frequent to see decomposition which contain the same number of code points (but also not unseen). And it is also not seldom to have decomposition in one direction, and no relation on the other direction.
This character is in CJK Compatibility Ideographs Supplement, so the important part is compatibility, and this is confirmed also by wikipedia article (https://en.wikipedia.org/wiki/CJK_Compatibility_Ideographs_Supplement).
Compatibility codepoint were introduced to simplify introduction of Unicode, by providing a lossless roundtrip conversion of other encoding. [In such manner, one could implement Unicode on different layers, without problems and fully transparent, and without requiring to change other layers (or worse: to change all stack in one step).

Why the HTML entity doesn't work on the Windows Chrome?

HTML entities are not working on chrome and IE (on windows).
I have entered the following code in my page and it works fine on mac chrome or firefox or safari, but not on windows.
<span class="font-family:Arial;">〈 〉 〉 〈 </span>

This is primarily a font issue, though there is a nasty silent change in HTML specs involved.
Modern browsers interpret 〈 and 〉 as referring to U+27E8 MATHEMATICAL LEFT ANGLE BRACKET “⟨” and U+27E9 MATHEMATICAL RIGHT ANGLE BRACKET “⟩”, informally known as “bra” and “ket”. This interpretation is being made official in the named character references section of HTML5.
These characters are adequate for use in many mathematical notations, and the ISO 80000-2 standard explicitly specifies that they are used e.g. for certain scalar product notations. But support to them in fonts is rather limited. In old Windows systems, no font contains them. In newer Windows systems, from Windows Vista onwards, Cambria Math should be available. It is possible that you have been testing on an old Windows version, but it is also possible that Chrome is unable to find the right font. To give it a helping hand, use a CSS rule that suggests that font, e.g. with the attribute
style="font-family: Cambria Math"
You might consider adding some other fonts to the list, using fonts that are known to contain the characters. See my Guide to using special characters in HTML.
The nasty change is that in HTML 4.01, in the entities section, 〈 and 〉 are defined as referring to U+2329 LEFT-POINTING ANGLE BRACKET “〈” and U+232A RIGHT-POINTING ANGLE BRACKET “〉”. They are logically less satisfactory (and deprecated by the Unicode Standard), but they have somewhat wider font support.
So in addition to declaring fonts that contain the characters you use, you need to decide which pair of these characters you use or whether you use something else; it's a complicated question. If you use them, it is best to use them as such (in a UTF-8 encoded HTML document) or using numeric character references such as ⟨. The reason is that 〈 and 〉 should not be expected to work consistently; they probably work the HTML5 way in all modern browsers, but there is hardly any reason to take the risk, when you can unambiguously indicate the characters you want.

That particular character is simply a unicode codepoint which is an arbitrary number. There are a lot of unicode codepoints that do not have an 'official' symbol. Even if they do have a symbol, it is not necessarily the case that your font has a symbol for that codepoint. If you choose a different font, you may end up with a different symbol.
I looked at the CSS for the page and it shows this character displaying in Arial (plus a bunch of other fonts that do not matter). Windows comes with Arial so it should always pick up that font first. It looks like Arial does not have a symbol for that unicode codepoint. Anytime you do not have a glyph for a codepoint, it puts in some form of a box indicating there is no glyph

It depends on the entity, and the fonts on the system your reader is using. The issue is that these characters are not in the MathJax web fonts, so MathJax has to fall back on system fonts to find them. Some browsers are better at that than others. Your configuration controls what fonts MathJax lists for the browser to look in, so you may want to modify that to include fonts where you know your entities can be found (and you may want to think about the fact that you may have people reading your site on Windows, Mac, and Linux, and also mobile devices, so such decisions are not always easy).
Notice that when you install STIX fonts, it works for you. This is because STIXGeneral is in the default list of fonts that MathJax uses for unknown characters. You want to add others to that list (it is stored in the undefinedFamily property of the HTML-CSS and SVG sections of your configuration). Note however, that IE will stop checking fonts once it encounters a font that is installed on the system, even if it doesn't include the needed character and later fonts in the list do, so you have to be careful about the order that you use.

'font-family: Symbol' and Windows-1252

I have a bunch of HTML documents that contain some simple text in Windows-1252 encoding, but throughout the text there are numerous appearances of span elements with font-family: Symbol.
For example:
<span style='font-family:Symbol'>Ñ</span>
Which appears as the greek delta - Δ in the browser.
Google told me that using the Symbol font might show different results on different systems, as it's not actually a well defined font.
Is this really true? Is it "unsafe" to use the Symbol font?
If so, is there any way to reliably convert (on my own system) such symbols in the Symbol font to their Windows-1252 counterparts?

It's been always unsafe to rely on having certain font installed on all the computers/smartphones/gadgets that visit your site. There're some font embedding techniques that work reasonably well in some modern browsers but you'd need to repack the Symbol font and I doubt the copyright owner allows you to do it.
Of course, most characters in the Symbol font are not in the Windows-1252 encoding but that should not be an issue. You can use the following map to obtain the appropriate HTML entities. However, you'll have to write a script or program using a programming language (HTML is just a markup language).

When using font-family, if neither of the listed font faces are found on the client, that is without the webfont embeds, may result in changing to default font of client hence a different font replacement for what you'd show to your users.
You may want to use UTF-8 encoding and put the delta (Δ) sign in your HTML content, or use webfont embeds to provide an option, "use the font I want from this".

The problem is that the greek letter you see is just the appearance, the actual letter is something completely different.
I can think of two ways to convert it:
1. Write a script (in your language of choice) that converts each letter to it's Greek counterpart. (Ñ => Δ)
2. Take a screenshot of the document/page and use an OCR-program to convert it to Greek text.

Prevent browsers from using default/fallback fonts

I have a web app in which a user can change the font family of an input text area in a WSIWYG-kind style. Now, let's say the user inputs some Chinese text in the text area, but selects a Font that has no support for Chinese characters. In my application, I'd like the user to see those nasty squares (or something like that) that are usually shown when the font doesn't support the character. That way, the user would know that the font doesn't support the language and could choose a different one. The issue I'm having is that the browsers (Firefox 17 and Chrome 23) seem to render the Chinese part of the text with fonts (as Arial) that do support those Chinese characters, making the user believe that the font he's trying to use works fine.
Is there a way (I'm guessing through CSS) to prevent this? Is there a way of making the browsers not to be so "nice" for only this time?
Thank you in advance.

As the other answer already explained, the solution is to use a fallback font which includes 'all' unicode codepoints. However the difficult part was to find or built one which doesn't weight a few MBs.
A few years later there is now a more lightweight solution for a fallback font, the NotDef font by Adobe. It shows a box with a cross for 1,111,998 Unicode code points, is only about 22Kb and is using the SIL OPEN FONT LICENSE Version 1.1.
If you don't want to show anything there is also the Adobe Blank font.

You can intercept the font substitution process by throwing in a catchall font, using some equivalent of font-family: userChoice, yourCatchAll where yourCatchAll is a font that has a generic glyph for all characters.
The problem is in finding such a font. The LastResort font distributed by the Unicode Consortium would be ideal, since it also visually indicates the category of the character in broad terms, but its EULA does not seem to allow modifications. It is debatable whether this applies to the construction of web font formats (like .eot and .woff).
The Unicode BMP Fallback Font appears to have more liberal rules of use, but it displays a character simply as its Unicode number in a box (and supports only Basic Multilingual Plane, though it contains all characters that most people ever heard of).

Unicode support in Web standard fonts

I need to decide whether to render geometric symbols in a web GUI (e.g. arrows and triangles for buttons, menus, etc.) as Unicode symbols (MUCH easier and color-independent) or GIF/PNG files (lots of hassle I would like to avoid).
However, I have seen clients that have trouble displaying even advanced punctuation symbols declared as unicode characters (Example).
Does anybody know from which version on, OSs / Service Packs / Applications ship with Unicode versions of the standard fonts? There is, for example, Microsoft's Arial unicode that ships with Office since 1999, however I do not have office installed and still my Arial has at least some of the Unicode range.
Also, what is the situation with Mac OS and Linux?
Could somebody point me towards some comprehensive resources on this - reports, lists, overviews?

There's not really such a thing as a “Unicode version” of a font(*). “Arial Unicode” is a misleading name: it's not materially different to normal “Arial”, it just has some more characters in it. It does not contain usable glyphs for every single one of the tens of thousands of characters defined so far, and indeed there is no one OS standard font that does.
The significant question is merely whether the characters you want to use have glyphs in the default fonts of commonly-deployed operating systems. You need to look to look at font support for particular characters you wish to use on an individual basis.
The character U+0360 Combining Double Tilde you mentioned is not really ‘advanced punctuation’, it's an curious and rarely-used diacritical mark for phonetics work. So it's not really surprising that font support for it is poor. On the other hand, Stack Overflow can get away with using U+25CF Black Circle (●) because lots of fonts have it. Some of the other characters from the Geometric Shapes block such as U+25B2 Black Up-pointing Triangle (▲) are also pretty common.
fileformat.info has a list of common fonts that support each character, so you can check there to get a feel of how widely supported a symbol is, and whether the default OS fonts you recognise are present, before using it as a replacement for an image. For example U+25CF is in many fonts, but U+0360 isn't that well-supported: none of the default Windows install fonts are there, and the ‘Libertine’ font renders it badly wrong.
(*: OK, there is sort of such a thing as a Unicode font, in that a font's internal character lookup tables may be denominated in Unicode or some other character set. However this makes no practical difference as the application will always be addressing it as Unicode; the OS will do the conversion on lookup transparently.)

This question may be a duplicate of Unicode and fonts where I posted a list of unicode font links:
http://en.wikipedia.org/wiki/Category%3AFree%5Fsoftware%5FUnicode%5Ftypefaces
http://en.wikipedia.org/wiki/Unicode%5Ftypefaces
http://unifoundry.com/unifont.html
http://www.fileformat.info/info/unicode/font/index.htm
http://www.alanwood.net/unicode/fontsbyrange.html
http://www.alanwood.net/unicode/fonts.html
http://www.unifont.org/fontguide/
http://www.wazu.jp/index.html
However, I am not sure how the Unicode standard defines how to render a stand-alone COMBINING DOUBLE character, as it is supposed to combine other characters.

Some random observations:
On OS X the Unicode support is perfect, at least for your needs.
On Windows the situation seems to depend on the browser. I don’t use many arcane characters, but the few I do (mostly punctuation) seem to display just fine in Firefox. The only problem is in Internet Explorer, as usual.
If you have some control over your clients you could distribute some good free fonts?
Even web fonts could work.
One drawback to Unicode charactes is that they are often quite ugly. Too big, too small, have wrong position, etc.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008