HTML - Missing Unicode characters

HTML - Missing Unicode characters - html

I have a page which contains some HTML-encoded Unicode characters: ▲ (▲), ▼ (▼), ◄ (◄), ► (►) and ✓ (✓). Some users complain about some of these not showing up in their browser.
What is the best way to solve this without installing fonts on the users' machines? Do I have to make bitmaps for each Unicode character? If yes, is there a tool to convert characters to bitmaps? Or is there a better way?

As this is primarily a font problem, the alternatives are to use a downloadable font, also known as web font, via #font-face, and to use images instead of characters. I wrote “primarily”, because there is an additional difficulty: some browsers will not display some characters unless they can be found in the fonts listed in the applicable font-family list. For generalities on these issues, see my Guide to using special characters in HTML.
In this special case, the character “✓” may cause problems since it is present in relatively few fonts. I would expect the others to be OK in most cases, but my expectation was wrong: Android (2.x) shows ▲ and ▼ OK, but not ◄ and ►.
Using a downloadable font to get such characters rendered might be overkill, so it might be best to use images. You can just write the character in some program using some suitable font in large size, take screen captures of each, and save them as images. Then you would use them via img tags, setting the heights of the images to suitable height in CSS, using the em unit. The reason is that this way they will look good in different font sizes – adapting to font size, and scaling downwards tends to produce better results than scaling upwards.
If you take this way, then it is best to represent all of ▲, ▼, ◄, ► as images, to make them stylistically matching.

Simply use UTF-8 for encoding of your pages and in case some fonts are missing those symbols use Arial, because it contains them all.

Related

Is It Safe To Use Unicode Literals in HTML?

I am making an application, and I want to add a "HOME" button.
After much struggling with various icon libraries, I stumbled upon this site,
http://graphemica.com/%F0%9F%8F%A0, with this
🏠
A unicode symbol, which is more akin to a letter than an image.
I pasted it into my HTML, and it just workedTM.
All this seems a little too easy, though. Are unicode symbols widely supported? Is there some kind of problem with them that leads people to use icon libraries instead?

It depends on what do you mean for "safe".
User should have the fonts, so you must include the relative font, and in various formats: there is not yet a format recognized by most used web-browsers.
Additionally, font with multiple colours are not fully understood by various systems, so you should care about what do you expect from users (click, select, copy, etc.).
Additionally, every fonts has own design, so between different fonts (so browsers and operating system) things can look differently. We do not have yet a "Helvetica 'Home'", a "Times New Roman 'Home'".
All this points, could be solved by using a web font, with monochrome glyphs (but it could be huge, if it includes all Unicode code points (+ usual combinations).
It seems that various recent browser crashes if there are many different glyphs, but usually it should not be a problem.
I also recommend aria stuffs so that you page could be used also by e.g. readers (and braille screen).
Note: on the plus side, the few people that use text browser can better see the HOME (not the case in case of an image), if somebody still care about this use case.

Some things you want to make sure you’re doing:
Save your HTML file as UTF-8. In fact, save all text files as UTF-8 unless there’s some reason you can’t.
Put the line <meta charset="utf-8" /> near the top of your HTML file.
Make sure your server isn’t misconfigured to tell all browsers that webpages are in the wrong encoding.
If, somehow, it is and you can’t fix it, fall back on &entities;.
Specify a font stack for your emoji in CSS with a set of fonts that cover nearly every system, perhaps including Apple Color Emoji, Noto Color Emoji, Segoe UI Emoji and Twemoji.
If a free font such as Noto or Symbola contains the emoji you use, you can package it as a WOFF to be sure it will always display the way you want. (As of 2018, Tor browser does not show most emoji correctly by default, but mainstream browsers do.)

I think using unicode is a good practice for development. Beacause The unicodes are essentially part of your operating system so you don’t need any special library or plugin and you treat them like regular text.
The only problem is - code can be defficult to read or understand. I think it is not easy to understand that (&#12796 8;🏠) printing home icon.
Even the 8 bit PNGs are faster then the font icons.
Image icons can be lightweight but still slow down your site with another HTTP request and time for the image to load. With images you don’t have flexibility over the color and scaling. SVG vector image alternatives are still not faster than plain-text (Unicode characters). Unicode doesn’t require additional HTTP requests and can be made to scale nicely.
If you are developing a website using only simple shapes, you can use unicode UTF-8 symbols as replacement for font icons.
I think :
Almost every developer use libraries for icons because of readablility of code, Easy to use and get more options.
Safe or Not
I can not say whether it is safe or not.
Because Unicode contains such a large number of characters and incorporates the varied writing systems of the world, incorrect usage can expose programs or systems to possible security attacks. This is especially important as more and more products are internationalized. This document describes some of the security considerations that programmers, system analysts, standards developers, and users should take into account, and provides specific recommendations to reduce the risk of problems.
Read about UNICODE SECURITY CONSIDERATIONS

Here are few precautions to be taken while doing that, I did some research and found this to be more helpful for your question. Also I dont know how you can do but credits go to Mr.GOY
Displaying unicode symbols in HTML

html special character UTF-8

i want to design my own video player. I want to add this special character as a "volume"-button: 🔉 http://www.fileformat.info/info/unicode/char/1f509/index.htm. But it always shows a weird rectangle with the number 01F509.
Whats wrong ?
greetings

When you see a weird rectangle with the number 01F509, it means that the browser has correctly recognized the character but cannot display it due to lack of glyph for it. Either the system has no font containing a glyph for the character, or the character is unable to use such a font, due to a browser bug. For generalities, see my Guide to using special characters in HTML.
The Fileformat.info page cited has a link to a list of fonts that support the character. The list is short: Quivira, Segoe UI Emoji, Segoe UI Symbol, Symbola. (LastResort is not a real font.)
Segoe UI fonts are proprietary and available only in relatively new versions of Windows. Besides, there fonts exist in several version, and this character seems to be a rather recent addition.
This means that you wold need to use Quivira or Symbola as a web font (downloadable font). If you choose to do that, use e.g. http://www.fontsquirrel.com/ to generate the font files and the CSS code for using them. Note that both Quivira and Symbola rather large fonts, so using them just to get one glyph is a bit disproportionate.
At this point, it is rather obvious that some other approach is most probably better, e.g. using an image in sufficient size and reducing it according to font size by setting its height in em units.

To avoid encoding issues, try including the character in your HTML code using the hexadecimal entity notation: 🔉 then make sure that the font you are using contains this character, else you'll see a rectangle in place of the character.
You may create your own font with the desired character using tools such as http://icomoon.io
Also note that some browsers have issues displaying characters outside of the range 0000 to FFFF (plane 0 of the Basic Multilingual Plane (BMP)). I have experienced the issue with Safari on Windows and IE <= 8. So try avoiding this range if you want to support all browsers.

'font-family: Symbol' and Windows-1252

I have a bunch of HTML documents that contain some simple text in Windows-1252 encoding, but throughout the text there are numerous appearances of span elements with font-family: Symbol.
For example:
<span style='font-family:Symbol'>Ñ</span>
Which appears as the greek delta - Δ in the browser.
Google told me that using the Symbol font might show different results on different systems, as it's not actually a well defined font.
Is this really true? Is it "unsafe" to use the Symbol font?
If so, is there any way to reliably convert (on my own system) such symbols in the Symbol font to their Windows-1252 counterparts?

It's been always unsafe to rely on having certain font installed on all the computers/smartphones/gadgets that visit your site. There're some font embedding techniques that work reasonably well in some modern browsers but you'd need to repack the Symbol font and I doubt the copyright owner allows you to do it.
Of course, most characters in the Symbol font are not in the Windows-1252 encoding but that should not be an issue. You can use the following map to obtain the appropriate HTML entities. However, you'll have to write a script or program using a programming language (HTML is just a markup language).

When using font-family, if neither of the listed font faces are found on the client, that is without the webfont embeds, may result in changing to default font of client hence a different font replacement for what you'd show to your users.
You may want to use UTF-8 encoding and put the delta (Δ) sign in your HTML content, or use webfont embeds to provide an option, "use the font I want from this".

The problem is that the greek letter you see is just the appearance, the actual letter is something completely different.
I can think of two ways to convert it:
1. Write a script (in your language of choice) that converts each letter to it's Greek counterpart. (Ñ => Δ)
2. Take a screenshot of the document/page and use an OCR-program to convert it to Greek text.

Prevent browsers from using default/fallback fonts

I have a web app in which a user can change the font family of an input text area in a WSIWYG-kind style. Now, let's say the user inputs some Chinese text in the text area, but selects a Font that has no support for Chinese characters. In my application, I'd like the user to see those nasty squares (or something like that) that are usually shown when the font doesn't support the character. That way, the user would know that the font doesn't support the language and could choose a different one. The issue I'm having is that the browsers (Firefox 17 and Chrome 23) seem to render the Chinese part of the text with fonts (as Arial) that do support those Chinese characters, making the user believe that the font he's trying to use works fine.
Is there a way (I'm guessing through CSS) to prevent this? Is there a way of making the browsers not to be so "nice" for only this time?
Thank you in advance.

As the other answer already explained, the solution is to use a fallback font which includes 'all' unicode codepoints. However the difficult part was to find or built one which doesn't weight a few MBs.
A few years later there is now a more lightweight solution for a fallback font, the NotDef font by Adobe. It shows a box with a cross for 1,111,998 Unicode code points, is only about 22Kb and is using the SIL OPEN FONT LICENSE Version 1.1.
If you don't want to show anything there is also the Adobe Blank font.

You can intercept the font substitution process by throwing in a catchall font, using some equivalent of font-family: userChoice, yourCatchAll where yourCatchAll is a font that has a generic glyph for all characters.
The problem is in finding such a font. The LastResort font distributed by the Unicode Consortium would be ideal, since it also visually indicates the category of the character in broad terms, but its EULA does not seem to allow modifications. It is debatable whether this applies to the construction of web font formats (like .eot and .woff).
The Unicode BMP Fallback Font appears to have more liberal rules of use, but it displays a character simply as its Unicode number in a box (and supports only Basic Multilingual Plane, though it contains all characters that most people ever heard of).

Unicode character and browsers

Consider an HTML page which is encoded as UTF-8, and a bizarre unicode character appears in it - form a rare language or some other Unicode idiosyncrasy.
Is there a standard behavior for such scenario? Will the browser try to find an appropriate font? Can the browser behavior be configured using HTML parameters?

The CSS 2.1 font matching algorithm means that a browser shall select, for each character, a glyph from the fonts suggested in the applicable font-family declarations and, failing that, use a browser-dependent default font. If even it does not contain the character, then “the UA [= browser] may use other means to determine a suitable font for that character. The UA should map each character for which it has no suitable font to a visible symbol chosen by the UA, preferably a ‘missing character’ glyph from one of the font faces available to the UA.”
So it is pretty well defined, but with browser dependencies. The algorithm allows a browser to display a missing character symbol even if some of the fonts in the system contains a glyph for it. Modern browsers usually don’t do that, but IE isn’t particularly modern in this respect either. Moreover, there are quirks and oddities in browsers, partly because they sometimes fail to get proper information about a font from the font itself.
You can’t configure the basic behavior, but you can play by its rules. The thing that works best is the use of author-supplied font families. If you have an odd character, you should try and determine a set of fonts that contain it and write a suitable CSS rule. However, for very rare characters the options are really: 1) the use of a downloadable font for it, 2) the use of an image. More info: http://www.cs.tut.fi/~jkorpela/html/characters.html

Yes, the browser will typically try to display it in some font as best it can. Some browsers/operating system do a better job than others. Some may simply give up if the default font for the page doesn't contain the character, but most will try to find other installed fonts that contain the character. If none matches, the browser will display some placeholder, usually a square.
And that's all. Nothing bizarre about it, that's how font rendering works.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008