Is It Safe To Use Unicode Literals in HTML?

Is It Safe To Use Unicode Literals in HTML? - html

I am making an application, and I want to add a "HOME" button.
After much struggling with various icon libraries, I stumbled upon this site,
http://graphemica.com/%F0%9F%8F%A0, with this
🏠
A unicode symbol, which is more akin to a letter than an image.
I pasted it into my HTML, and it just workedTM.
All this seems a little too easy, though. Are unicode symbols widely supported? Is there some kind of problem with them that leads people to use icon libraries instead?

It depends on what do you mean for "safe".
User should have the fonts, so you must include the relative font, and in various formats: there is not yet a format recognized by most used web-browsers.
Additionally, font with multiple colours are not fully understood by various systems, so you should care about what do you expect from users (click, select, copy, etc.).
Additionally, every fonts has own design, so between different fonts (so browsers and operating system) things can look differently. We do not have yet a "Helvetica 'Home'", a "Times New Roman 'Home'".
All this points, could be solved by using a web font, with monochrome glyphs (but it could be huge, if it includes all Unicode code points (+ usual combinations).
It seems that various recent browser crashes if there are many different glyphs, but usually it should not be a problem.
I also recommend aria stuffs so that you page could be used also by e.g. readers (and braille screen).
Note: on the plus side, the few people that use text browser can better see the HOME (not the case in case of an image), if somebody still care about this use case.

Some things you want to make sure you’re doing:
Save your HTML file as UTF-8. In fact, save all text files as UTF-8 unless there’s some reason you can’t.
Put the line <meta charset="utf-8" /> near the top of your HTML file.
Make sure your server isn’t misconfigured to tell all browsers that webpages are in the wrong encoding.
If, somehow, it is and you can’t fix it, fall back on &entities;.
Specify a font stack for your emoji in CSS with a set of fonts that cover nearly every system, perhaps including Apple Color Emoji, Noto Color Emoji, Segoe UI Emoji and Twemoji.
If a free font such as Noto or Symbola contains the emoji you use, you can package it as a WOFF to be sure it will always display the way you want. (As of 2018, Tor browser does not show most emoji correctly by default, but mainstream browsers do.)

I think using unicode is a good practice for development. Beacause The unicodes are essentially part of your operating system so you don’t need any special library or plugin and you treat them like regular text.
The only problem is - code can be defficult to read or understand. I think it is not easy to understand that (&#12796 8;🏠) printing home icon.
Even the 8 bit PNGs are faster then the font icons.
Image icons can be lightweight but still slow down your site with another HTTP request and time for the image to load. With images you don’t have flexibility over the color and scaling. SVG vector image alternatives are still not faster than plain-text (Unicode characters). Unicode doesn’t require additional HTTP requests and can be made to scale nicely.
If you are developing a website using only simple shapes, you can use unicode UTF-8 symbols as replacement for font icons.
I think :
Almost every developer use libraries for icons because of readablility of code, Easy to use and get more options.
Safe or Not
I can not say whether it is safe or not.
Because Unicode contains such a large number of characters and incorporates the varied writing systems of the world, incorrect usage can expose programs or systems to possible security attacks. This is especially important as more and more products are internationalized. This document describes some of the security considerations that programmers, system analysts, standards developers, and users should take into account, and provides specific recommendations to reduce the risk of problems.
Read about UNICODE SECURITY CONSIDERATIONS

Here are few precautions to be taken while doing that, I did some research and found this to be more helpful for your question. Also I dont know how you can do but credits go to Mr.GOY
Displaying unicode symbols in HTML

Related

Is there any reason not to use a custom font for icons on a web site?

I was curious how Imgur was rendering their upvote/downvote arrows:
I assumed they were images, but I found something that I did not expect:
A custom font that contains glyphs for up and down arrows, mapped to the 'o' and 'x' characters, respectively:
Is this method considered acceptable these days? I have never considered using a custom font for something that doesn't semantically map into an alphabet. This approach is not even on my radar of best practices for web design.
I can imagine the reasons for:
Your site uses a standard icon set that can be mapped to single-character codes.
You only need control over foreground/background color for the icons.
You want icons that scale the same as text.
I want to know any specific reasons against using this method.
In particular, I'm looking for answers that address any of the following:
browser/platform compatibility
future maintenance implications
semantics
performance
standards compliance
The only thing I have come up with so far, is that, semantically, it does not make sense to map an upvote icon to the character 'o' and a downvote icon to the character 'x'. And, just to be specific, I'm not talking about keyboard mappings, but rather language mappings, character codes. It seems to me that raster images or SVG are much more preferable alternatives in this case.
I thought of one other possibility: language and encoding compatibility. Would the html lang attribute or character encoding of the page have any effect on the character mappings into the font in the CSS stylesheet (the stylesheet uses 'x' to represent a downvote icon)?
However, I'm certain Imgur has thought all of this through already. So, why am I wrong?

Modern browsers (e.g. IE9 above) support custom fonts.
Even Bootstrap also uses custom fonts for icons, known as Glyphicons! It is a nice way to beautify the websites icons without having to do it from Photoshop as an image which may cause responsive issues.
They are usually used by calling the class name which links to the CSS that call the icons from the font family. Html lang would not have any issues with it.

Many websites use "icon fonts". But yes, assigning language letters to them would be wrong. It would be best to assign an arrow icon to the Unicode character code for a similar arrow. Another option would be to use the Private Use Area of Unicode. In this case, if your font fails to load for any reason, you won't have a good fallback strategy. But if you choose meaningful char codes for your icons, you would.
Many people are in favor of using SVGs over icon fonts. But there are pros and cons to both icon fonts and SVGs. I think that it's great that as web developers, we get to choose among different implementations or solutions to the same problem.
To answer your question, I would say that if done right, there is nothing wrong with using fonts for implementing icons.
As Mike 'Pomax' Kamermans put it:
"Fonts are for encoding vector graphics that are to be used in
typesetting context. That can mean letters, or icons, or emoji"

One big reason is accessibility. There are many browser extensions which swap out a website's font for one that's more legible for people with different visual impairments. If you use fonts for your icons, these will be swapped out too, leaving your user looking at whatever string you placed in for your icons.

Is it safe to use UTF-8 characters (like ✖) on my web-page?

I would like to use the UTF-8 character ✖ on my site but I am not sure if this will be supported cross browser.
I am worried that:
a) Users will not have access to a font containing that character
b) IE will not find the character even if the user has a font that could display it. I am worried about this because of this info:
By the specifications, browsers should display a character if there is any font in the system that contains it. If the fonts specified by the author (in CSS font-family settings or, rarely these days, using font markup in HTML) do not contain the character, browsers are supposed to use fallback fonts. The same applies if no fonts are specified by the author; browsers should use primarily their default fonts, using alternate fonts for any character not covered by the primary font.
In practice, things don’t always work that way. Especially IE is notorious for its failures in this respect. It often fails to display a character, even though it could do that if it used all the fonts in the system. If a browser cannot render a character, it may show a small rectangle, possibly containing a question mark, ?, or some similar indicator. Here’s a quick test (character U+0840, which is probably not supported by any font on your computer): ࡀ.
Source.
c) Other issues that I have though of.
There is a resource called Unify, that will show what devices the character is supported on but it currently (Sept 14, 2015) only suport 107 characters.
So to summarize, the question is: How can I determine if it is safe to use a utf-8 special character on my site? Is it safe to use ✖ specifically on my site?

It's always safe - your user's computers won't suddenly burst into flame.
From a technical perspective, your best bet is to use a web font that has support for every Unicode character you want to use. That is not a catch-all (the user might have web fonts disabled or is using a command line browser, etc...), but it should support the vast majority of computers.
From there I would apply common sense. If the displaying of a character is absolutely crucial and lives depend on it, try to not use Unicode. Otherwise I'd say 'go ahead'.

This is as much a UX question as it is a technical one, so I will mention both.
As a comparison, on my IE11 browser, it looks like this: , but on my Firefox 31.8, it looks like this: . A good user experience is generally associated with consistency, and this approach is not very portable. So from a UX perspective, this is not a great solution.
I would say using a tiny *.gif or *.bmp, or even *.png if you need transparency, is a better solution. Even better yet, go with *.svg so scaling will not be an issue. From a technical aspect, the overhead of something that small is generally insignificant.

The only problem you can face is that exotic symbols are not implemented in many fonts, so the user can see a dummy character (e.g. square) instead of this. I personally like to use svg symbols for this purpose.

An alternative solution would be to use a web font with those icons in it (although probably a subset version of, so that it's less and 1 kb and doesn't weight down your pages).

Prevent browsers from using default/fallback fonts

I have a web app in which a user can change the font family of an input text area in a WSIWYG-kind style. Now, let's say the user inputs some Chinese text in the text area, but selects a Font that has no support for Chinese characters. In my application, I'd like the user to see those nasty squares (or something like that) that are usually shown when the font doesn't support the character. That way, the user would know that the font doesn't support the language and could choose a different one. The issue I'm having is that the browsers (Firefox 17 and Chrome 23) seem to render the Chinese part of the text with fonts (as Arial) that do support those Chinese characters, making the user believe that the font he's trying to use works fine.
Is there a way (I'm guessing through CSS) to prevent this? Is there a way of making the browsers not to be so "nice" for only this time?
Thank you in advance.

As the other answer already explained, the solution is to use a fallback font which includes 'all' unicode codepoints. However the difficult part was to find or built one which doesn't weight a few MBs.
A few years later there is now a more lightweight solution for a fallback font, the NotDef font by Adobe. It shows a box with a cross for 1,111,998 Unicode code points, is only about 22Kb and is using the SIL OPEN FONT LICENSE Version 1.1.
If you don't want to show anything there is also the Adobe Blank font.

You can intercept the font substitution process by throwing in a catchall font, using some equivalent of font-family: userChoice, yourCatchAll where yourCatchAll is a font that has a generic glyph for all characters.
The problem is in finding such a font. The LastResort font distributed by the Unicode Consortium would be ideal, since it also visually indicates the category of the character in broad terms, but its EULA does not seem to allow modifications. It is debatable whether this applies to the construction of web font formats (like .eot and .woff).
The Unicode BMP Fallback Font appears to have more liberal rules of use, but it displays a character simply as its Unicode number in a box (and supports only Basic Multilingual Plane, though it contains all characters that most people ever heard of).

Why shouldn't I use weird Characters in code/HTML documents?

I'm wondering if it's a bad idea to use weird characters in my code. I recently tried using them to create little dots to indicate which slide you're on and to change slides easily:
There are tons of these types of characters, and it seems like they could be used in place of icons/images in many cases, they are style-able and scale-able, and screen readers would be able to make sense of them.
But, I don't see anyone doing this, and I've got a feeling this is a bad idea, I just can't decide why. I guess it seems too easy to be true. Could someone tell me why this is or isn't okay? Here are some more examples of the characters i'm talking about:
↖ ↗ ↙ ↘ ㊣ ◎ ○ ● ⊕ ⊙ ○　 △ ▲ ☆ ★ ◇ ◆ ■ □ ▽ ▼ § ￥ 〒 ￠ ￡ ※ ♀ ♂ &⁂ ℡ ↂ░ ▣ ▤ ▥ ▦ ▧ ✐✌✍✡✓✔✕✖ ♂ ♀ ♥ ♡ ☜ ☞ ☎ ☏ ⊙ ◎ ☺ ☻ ► ◄ ▧ ▨ ♨ ◐ ◑ ↔ ↕ ♥ ♡ ▪ ▫ ☼ ♦ ▀ ▄ █ ▌ ▐ ░ ▒ ▬ ♦ ◊
PS: I would also welcome general information about these characters, what they're called and stuff (ASCII, Unicode)?

There are three things to deal with:
1. As characters in a sentence/text:
The problem is that some fonts simply do not have them. However since CSS can control font use you probably will not run into this problem. As long as you use a web safe font, and know that that character is available in that font, you should probably be okay.
You can also use an embedded font, though be sure to fall back on a web safe font that contains the character you need as many browser will not support embedded fonts.
However sometimes certain devices will not have multiple fonts to choose from. If that font does not support your character you will run into problems. However depending on what your site does and the audience you are targeting this may not be a problem for you. Not to mention that devices like that are very old, and uncommon.
All in all it was probably not a good idea a handful of years ago, but now you are not likely to have problems as long as you cover all your bases.
It is important however to point out that you should never hard code those characters, instead use HTML entities. Just inserting those characters into your code can lead to unpredictable results. I recently copied some text from Word directly into my code, Word used smart quotes (quote marks that curve inwards properly). They showed up fine in Notepad++, but when I viewed the page I did not get quotes, I got some weird symbol.
I could have either replaced them with normal quotes " or with HTML entities to keep the style “ and ” (“ and ”).
Any Unicode character can be inserted this way (even those without special names).
Wikipedia has a good reference:
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
2. As UI elements:
While it may be safe to use them in many cases, it is still better to use HTML elements where possible. You could simply style some div elements to be round and filled/not filled for your example.
As far as design goes they are really limiting, finding one that fits with the style of your page can be a hassle, and may mean that you will definitely need to embed a font, which is still only supported by the latest browsers.
Plus many devices do not support heavy font manipulation, and will often display them poorly. It works in the flow of your text, but as a vital part of the UI there can be major problems. Any possible issue one of those characters can bring will be multiplied by the fact that it is part of your UI.
From an artistic stand point they simply limit your abilities too much.
3. What are you doing?
Finaly you need to consider this:
Text is for telling
Image is for showing
HTML is for organizing
CSS is for making things look good while you show them
JavaScript is for functionality
Those characters are text, they are for telling someone something. So ask the question: "What am I doing?" and then use what was designed for that task. If you are telling use them, if you are showing use Image, or CSS.

I've seen this done before (the stars) and I think it's an awesome idea! It's also becoming quite popular to use a font (with #font-face) full of icons, like this one: http://fortawesome.github.com/Font-Awesome/
I can't see any downside to using a font like "font awesome" (only the upsides you mention like scalabilty and the ability to change color with CSS). Perhaps there's a downside to using the special characters you mention but none that I know of.

The problem with using those characters is that not all of them are available in all fonts used by all users, which means your application may look strange, or in the worst case be unusable. That said, it is becoming more common to assume the characters available in certain common fonts (Apple/Microsoft's Arial, Bitstream Vera). You can't even assume that you can download a font, as some users may capture content for offline reading with a service like Instapaper or Read It Later.

There are a number of problems:
Portability: using anything other than the 7-bit ASCII characters in code can make your code less portable, as recipients may use the wrong encoding. You can do a lot to mitigate this (eg. use UTF16 or at least UTF-8 encoded files). Most languages allow you to specify strings in characters using some form of escape notation (eg. "\u1234" in C#), which will avoid the problem, but loses some of the advantages.
Font-dependency: user interface elements that depend on special characters being available in a font may be harder to internationalize, since those glyphs might not be in the font that you want/need to use for a particular audience.
No color, limited choice of art: while font glyphs might seem useful to a coder, they probably look pretty poor to a UI designer.

The question is very broad; it could be split to literally thousands of questions of the type “why shouldn’t I use character ... in HTML documents?” This seems to be what the question is about—not really about code. And it’s about characters, seen as “weird” or “uncommon” or “special” from some perspective, not about character encodings. (None of the characters mentioned are encoded in ASCII. Some are encoded in ISO-8895-1. All are encoded in Unicode.)
The characters are used in HTML documents. There is no general reason against not using them, but loads of specific reasons why some specific characters might not be the best approach in a specific situation.
For example, the “little dots” you mention in your example (probably not dots at all but circles or bullets), when used as control elements as you describe, would mean poor usability and poor accessibility. Making them significantly larger would improve the situation, but this more or less proves that such text characters are not suitable for controls.
Screen readers could make sense of special characters if they used a database of various properties of characters. Well, they don’t, and they often fail to read properly even the most common special characters. Just reading the Unicode name of a character can be cryptic or outright misleading. The proper reading would generally depend on meaning and context.
The main issue, however, is that people do not generally recognize characters in the meanings that you would assign to them. How many people know what the circled plus symbol “⊕” stands for? Maybe 1 out of 1,000, optimistically thinking. It might be all right to use in on a page about advanced mathematics or physics, especially if the notation is defined there. But used in general text, it would be just… a weird character, and people would read different meanings into it, or just get puzzled.
So using special characters just because they look cool isn’t a good idea. Even when there is time and place for a special character, there are technical issues with them. How many fonts do you expect to contain “⊕”? How many of those fonts do you expect Joe Q. Public to have in his computer? In this specific case, you would find the font coverage reasonably good, but you would still have to analyze it and write a longish list of font names in your CSS code to cover most platforms. In the pile of poo case (♨), it would be unrealistic to expect most people to see anything but a symbol for unrepresentable character. Regarding the methods of finding out such things, check out my Guide to using special characters in HTML.

I've run into problems using unusual characters: the tools editor, compiler, interpreter etc.) often complain and report errors. In the end, it wasn't worth the hassle. Darn western hegemony, or homogeneity, or, well, something!

Unicode support in Web standard fonts

I need to decide whether to render geometric symbols in a web GUI (e.g. arrows and triangles for buttons, menus, etc.) as Unicode symbols (MUCH easier and color-independent) or GIF/PNG files (lots of hassle I would like to avoid).
However, I have seen clients that have trouble displaying even advanced punctuation symbols declared as unicode characters (Example).
Does anybody know from which version on, OSs / Service Packs / Applications ship with Unicode versions of the standard fonts? There is, for example, Microsoft's Arial unicode that ships with Office since 1999, however I do not have office installed and still my Arial has at least some of the Unicode range.
Also, what is the situation with Mac OS and Linux?
Could somebody point me towards some comprehensive resources on this - reports, lists, overviews?

There's not really such a thing as a “Unicode version” of a font(*). “Arial Unicode” is a misleading name: it's not materially different to normal “Arial”, it just has some more characters in it. It does not contain usable glyphs for every single one of the tens of thousands of characters defined so far, and indeed there is no one OS standard font that does.
The significant question is merely whether the characters you want to use have glyphs in the default fonts of commonly-deployed operating systems. You need to look to look at font support for particular characters you wish to use on an individual basis.
The character U+0360 Combining Double Tilde you mentioned is not really ‘advanced punctuation’, it's an curious and rarely-used diacritical mark for phonetics work. So it's not really surprising that font support for it is poor. On the other hand, Stack Overflow can get away with using U+25CF Black Circle (●) because lots of fonts have it. Some of the other characters from the Geometric Shapes block such as U+25B2 Black Up-pointing Triangle (▲) are also pretty common.
fileformat.info has a list of common fonts that support each character, so you can check there to get a feel of how widely supported a symbol is, and whether the default OS fonts you recognise are present, before using it as a replacement for an image. For example U+25CF is in many fonts, but U+0360 isn't that well-supported: none of the default Windows install fonts are there, and the ‘Libertine’ font renders it badly wrong.
(*: OK, there is sort of such a thing as a Unicode font, in that a font's internal character lookup tables may be denominated in Unicode or some other character set. However this makes no practical difference as the application will always be addressing it as Unicode; the OS will do the conversion on lookup transparently.)

This question may be a duplicate of Unicode and fonts where I posted a list of unicode font links:
http://en.wikipedia.org/wiki/Category%3AFree%5Fsoftware%5FUnicode%5Ftypefaces
http://en.wikipedia.org/wiki/Unicode%5Ftypefaces
http://unifoundry.com/unifont.html
http://www.fileformat.info/info/unicode/font/index.htm
http://www.alanwood.net/unicode/fontsbyrange.html
http://www.alanwood.net/unicode/fonts.html
http://www.unifont.org/fontguide/
http://www.wazu.jp/index.html
However, I am not sure how the Unicode standard defines how to render a stand-alone COMBINING DOUBLE character, as it is supposed to combine other characters.

Some random observations:
On OS X the Unicode support is perfect, at least for your needs.
On Windows the situation seems to depend on the browser. I don’t use many arcane characters, but the few I do (mostly punctuation) seem to display just fine in Firefox. The only problem is in Internet Explorer, as usual.
If you have some control over your clients you could distribute some good free fonts?
Even web fonts could work.
One drawback to Unicode charactes is that they are often quite ugly. Too big, too small, have wrong position, etc.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008