HTML: How do i debug why a language does not display correctly - html

i was recently asked why a tumblr theme of mine does not display Vietnamese correctly on this site. how do i debug whats the problem.
i wonder if its because of the use of
a custom font or cufon?
maybe its a character set issue? but
UTF-8 shld support most languages?

Debugging is difficult, especially if you don't read the language in question. There are some things you should check though:
1.) Fonts. This is the main cause of trouble. If you want to display a character you must have that character in the selected font. If you use standard fonts that may work on internationalised Windows but there are also "unicode" fonts (ie, Arial Unicode MS) you may want to specify explicitly.
2.) Encoding. Make sure the page is served in an appropriate character set. Check the HTTP and HTML headers "charset". UTF-8 is appropriate for most languages.
3.) Browser and OS Support. It's pretty much a given these days that browsers support non-latin character sets, however it's possible the client has a very old or unusual browser. Can't hurt to find out which browser/os combination they are using and what their "Regional Settings" are.

Related

Is It Safe To Use Unicode Literals in HTML?

I am making an application, and I want to add a "HOME" button.
After much struggling with various icon libraries, I stumbled upon this site,
http://graphemica.com/%F0%9F%8F%A0, with this
🏠
A unicode symbol, which is more akin to a letter than an image.
I pasted it into my HTML, and it just workedTM.
All this seems a little too easy, though. Are unicode symbols widely supported? Is there some kind of problem with them that leads people to use icon libraries instead?
It depends on what do you mean for "safe".
User should have the fonts, so you must include the relative font, and in various formats: there is not yet a format recognized by most used web-browsers.
Additionally, font with multiple colours are not fully understood by various systems, so you should care about what do you expect from users (click, select, copy, etc.).
Additionally, every fonts has own design, so between different fonts (so browsers and operating system) things can look differently. We do not have yet a "Helvetica 'Home'", a "Times New Roman 'Home'".
All this points, could be solved by using a web font, with monochrome glyphs (but it could be huge, if it includes all Unicode code points (+ usual combinations).
It seems that various recent browser crashes if there are many different glyphs, but usually it should not be a problem.
I also recommend aria stuffs so that you page could be used also by e.g. readers (and braille screen).
Note: on the plus side, the few people that use text browser can better see the HOME (not the case in case of an image), if somebody still care about this use case.
Some things you want to make sure you’re doing:
Save your HTML file as UTF-8. In fact, save all text files as UTF-8 unless there’s some reason you can’t.
Put the line <meta charset="utf-8" /> near the top of your HTML file.
Make sure your server isn’t misconfigured to tell all browsers that webpages are in the wrong encoding.
If, somehow, it is and you can’t fix it, fall back on &entities;.
Specify a font stack for your emoji in CSS with a set of fonts that cover nearly every system, perhaps including Apple Color Emoji, Noto Color Emoji, Segoe UI Emoji and Twemoji.
If a free font such as Noto or Symbola contains the emoji you use, you can package it as a WOFF to be sure it will always display the way you want. (As of 2018, Tor browser does not show most emoji correctly by default, but mainstream browsers do.)
I think using unicode is a good practice for development. Beacause The unicodes are essentially part of your operating system so you don’t need any special library or plugin and you treat them like regular text.
The only problem is - code can be defficult to read or understand. I think it is not easy to understand that (&#12796 8;🏠) printing home icon.
Even the 8 bit PNGs are faster then the font icons.
Image icons can be lightweight but still slow down your site with another HTTP request and time for the image to load. With images you don’t have flexibility over the color and scaling. SVG vector image alternatives are still not faster than plain-text (Unicode characters). Unicode doesn’t require additional HTTP requests and can be made to scale nicely.
If you are developing a website using only simple shapes, you can use unicode UTF-8 symbols as replacement for font icons.
I think :
Almost every developer use libraries for icons because of readablility of code, Easy to use and get more options.
Safe or Not
I can not say whether it is safe or not.
Because Unicode contains such a large number of characters and incorporates the varied writing systems of the world, incorrect usage can expose programs or systems to possible security attacks. This is especially important as more and more products are internationalized. This document describes some of the security considerations that programmers, system analysts, standards developers, and users should take into account, and provides specific recommendations to reduce the risk of problems.
Read about UNICODE SECURITY CONSIDERATIONS
Here are few precautions to be taken while doing that, I did some research and found this to be more helpful for your question. Also I dont know how you can do but credits go to Mr.GOY
Displaying unicode symbols in HTML

Is it safe to use UTF-8 characters (like âś–) on my web-page?

I would like to use the UTF-8 character âś– on my site but I am not sure if this will be supported cross browser.
I am worried that:
a) Users will not have access to a font containing that character
b) IE will not find the character even if the user has a font that could display it. I am worried about this because of this info:
By the specifications, browsers should display a character if there is any font in the system that contains it. If the fonts specified by the author (in CSS font-family settings or, rarely these days, using font markup in HTML) do not contain the character, browsers are sup­posed to use fallback fonts. The same applies if no fonts are specified by the author; brows­ers should use primarily their default fonts, using alternate fonts for any character not covered by the primary font.
In practice, things don’t always work that way. Especially IE is notorious for its failures in this respect. It often fails to display a character, even though it could do that if it used all the fonts in the system. If a browser cannot render a character, it may show a small rectangle, possibly containing a question mark, ?, or some similar indicator. Here’s a quick test (char­ac­ter U+0840, which is probably not supported by any font on your computer): ࡀ.
Source.
c) Other issues that I have though of.
There is a resource called Unify, that will show what devices the character is supported on but it currently (Sept 14, 2015) only suport 107 characters.
So to summarize, the question is: How can I determine if it is safe to use a utf-8 special character on my site? Is it safe to use âś– specifically on my site?
It's always safe - your user's computers won't suddenly burst into flame.
From a technical perspective, your best bet is to use a web font that has support for every Unicode character you want to use. That is not a catch-all (the user might have web fonts disabled or is using a command line browser, etc...), but it should support the vast majority of computers.
From there I would apply common sense. If the displaying of a character is absolutely crucial and lives depend on it, try to not use Unicode. Otherwise I'd say 'go ahead'.
This is as much a UX question as it is a technical one, so I will mention both.
As a comparison, on my IE11 browser, it looks like this: , but on my Firefox 31.8, it looks like this: . A good user experience is generally associated with consistency, and this approach is not very portable. So from a UX perspective, this is not a great solution.
I would say using a tiny *.gif or *.bmp, or even *.png if you need transparency, is a better solution. Even better yet, go with *.svg so scaling will not be an issue. From a technical aspect, the overhead of something that small is generally insignificant.
The only problem you can face is that exotic symbols are not implemented in many fonts, so the user can see a dummy character (e.g. square) instead of this. I personally like to use svg symbols for this purpose.
An alternative solution would be to use a web font with those icons in it (although probably a subset version of, so that it's less and 1 kb and doesn't weight down your pages).

Why the HTML entity doesn't work on the Windows Chrome?

HTML entities are not working on chrome and IE (on windows).
I have entered the following code in my page and it works fine on mac chrome or firefox or safari, but not on windows.
<span class="font-family:Arial;">〈 〉 〉 〈 </span>
This is primarily a font issue, though there is a nasty silent change in HTML specs involved.
Modern browsers interpret 〈 and 〉 as referring to U+27E8 MATHEMATICAL LEFT ANGLE BRACKET “⟨” and U+27E9 MATHEMATICAL RIGHT ANGLE BRACKET “⟩”, informally known as “bra” and “ket”. This interpretation is being made official in the named character references section of HTML5.
These characters are adequate for use in many mathematical notations, and the ISO 80000-2 standard explicitly specifies that they are used e.g. for certain scalar product notations. But support to them in fonts is rather limited. In old Windows systems, no font contains them. In newer Windows systems, from Windows Vista onwards, Cambria Math should be available. It is possible that you have been testing on an old Windows version, but it is also possible that Chrome is unable to find the right font. To give it a helping hand, use a CSS rule that suggests that font, e.g. with the attribute
style="font-family: Cambria Math"
You might consider adding some other fonts to the list, using fonts that are known to contain the characters. See my Guide to using special characters in HTML.
The nasty change is that in HTML 4.01, in the entities section, 〈 and 〉 are defined as referring to U+2329 LEFT-POINTING ANGLE BRACKET “〈” and U+232A RIGHT-POINTING ANGLE BRACKET “〉”. They are logically less satisfactory (and deprecated by the Unicode Standard), but they have somewhat wider font support.
So in addition to declaring fonts that contain the characters you use, you need to decide which pair of these characters you use or whether you use something else; it's a complicated question. If you use them, it is best to use them as such (in a UTF-8 encoded HTML document) or using numeric character references such as ⟨. The reason is that 〈 and 〉 should not be expected to work consistently; they probably work the HTML5 way in all modern browsers, but there is hardly any reason to take the risk, when you can unambiguously indicate the characters you want.
That particular character is simply a unicode codepoint which is an arbitrary number. There are a lot of unicode codepoints that do not have an 'official' symbol. Even if they do have a symbol, it is not necessarily the case that your font has a symbol for that codepoint. If you choose a different font, you may end up with a different symbol.
I looked at the CSS for the page and it shows this character displaying in Arial (plus a bunch of other fonts that do not matter). Windows comes with Arial so it should always pick up that font first. It looks like Arial does not have a symbol for that unicode codepoint. Anytime you do not have a glyph for a codepoint, it puts in some form of a box indicating there is no glyph
It depends on the entity, and the fonts on the system your reader is using. The issue is that these characters are not in the MathJax web fonts, so MathJax has to fall back on system fonts to find them. Some browsers are better at that than others. Your configuration controls what fonts MathJax lists for the browser to look in, so you may want to modify that to include fonts where you know your entities can be found (and you may want to think about the fact that you may have people reading your site on Windows, Mac, and Linux, and also mobile devices, so such decisions are not always easy).
Notice that when you install STIX fonts, it works for you. This is because STIXGeneral is in the default list of fonts that MathJax uses for unknown characters. You want to add others to that list (it is stored in the undefinedFamily property of the HTML-CSS and SVG sections of your configuration). Note however, that IE will stop checking fonts once it encounters a font that is installed on the system, even if it doesn't include the needed character and later fonts in the list do, so you have to be careful about the order that you use.

Detect UTF-16 support, and replace with images otherwise

As part of an HUD I'm designing, I'd like to use this symbol: 🏥
Since my site serves files in Windows-1252, any characters outside the 0-255 range are represented as &#x____;, with appropriate hex digits filling in the blank. In this case: 🏥
However, when testing this out I noticed that IE can render the symbol, but Chrome cannot. I guess this question is this: What does IE have that Chrome doesn't, is there any way to give Chrome the ability to render these symbols, and if not can I detect that and replace it with an image?
It is not only enough to properly encode the charset information inside your HTML document, but on the other hand, the browser must be able to use the right encoding (check the settings) and it must be able to display it. In the following example, the browser is properly configured, it is just that the operating system is not able to use UTF-8 characters:
As far as I (mis-)understood it also is a font issue, where smart browsers possibly substitute the missing char by one out of another font. The newer browsers can also be served fonts from the server. Maybe that helps.
I hope you get a better answer.

How to give tool tips (title) in locale language (UTF-8) in IE and Chrome?

I want to show tool tips (title) in locale specific language by UTF-8 formatted values.
I tried It's working in firefox but not working in IE and Chrome. What I have to do for this problem?
<div title='(some UTF-8 formatted value)'
above code is working perfectly in firefox.
Thanx in advance.
The font(s) used in tooltips depend(s) on the browser, which may or may not use settings made at the operating system level. Thus it may be controllable by the user, though few users know about this. In any case, it is outside the control of the author.
This implies that the repertoire of characters you can use there may vary. A plain square or rectangle in text typically indicates that there is a recognized character but it cannot be displayed because it is not present in the font(s) being used.
Partly for reasons like this, authors are more and more moving towards using other techniques than the title attribute, namely “CSS tooltips” (or maybe “JavaScript tooltips”). This lets you use the same fonts as in the textual content or, if preferred, to set some suitable other fonts.