Chinese text not displaying properly on web page - html

I am adding some Chinese text to a primarily English web page and am having trouble getting the characters to display properly. I've got the encoding set to UTF-8 in the meta content type tag, and I am copying/pasting the Chinese I was sent from a Word document. The text is still rendering as follows:
繁體中文版
rather than in Chinese characters:
繁體中文版
I'm sure it's an easy fix, but I'm lost as to how to make this happen.
Thanks very much for any help.

just because the meta tag says that the encoding is UTF8, doesn't mean that the content (file) itself is in UTF8. I mean, if you have a file index.html, the file itself should be encoded as utf8.
To change the encoding of a file in lunix, you can use this command
iconv --from-code=ISO-8859-1 --to-code=UTF-8 ./index.html > ./newIndex.html
but i guess that you are working with windows... and the only way i know change the encoding in windows is the Notepad++
Hope this helps

Related

Chinese text encoding missing characters when viewed in web browser

I have a HTML file which contains Chinese text. When I open the file in any web browser, there are characters which appear to be missing.
Here's an example copied from the browser window:
本函旨在邀請您參�� 定於
I know for a fact that all other characters seen here are correct aside from the missing ones (confirmed by a native Chinese speaker).
In the HTML header, I have a tag which signifies the file contains UTF-8 encoded characters:
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
I've already tried some other charsets in this META tag, but so far it seems any encoding method I try aside from UTF-8 ends up looking worse.
I also considered the possibility that it is a font issue, so I installed 3 different traditional Chinese fonts on my system and forced Chrome to use them. None of them made any difference - missing characters were still present.
If I open the HTML file with Notepad++, here's what I can see:
http://i.imgur.com/GoS07WX.png
If I select and copy-paste this text into regular MS Notepad, I get this:
本函旨在邀請您參劦nbsp;定於
So you can see here that the "xE5 x8A" visible in Notepad++ seems to have been replaced by 劦.
Is there any reason why the browser would be showing �� instead of 劦 in this scenario?
Look again at the HTML file.
I see the first 2 bytes of a character encoded in UTF-8, followed by ... let's imagine there was originally a \xA0, and this was mutated to when the file was created by applying global substitutions to the UTF-8-encoded data.
However, \xE5\x8A\xA0 UTF-8 decodes to U+52A0 which is not the same as the alien character which is U+52A6 ... not close enough to an answer.

HTML Character Encoding tag do not work for me

Hello to everyone and sorry for my novice question.
I have an HTML document in which I would like to put some Greek Characters.
I have used the following:
<head>
<meta charset="UTF-8">
</head>
but the result is a sequence of such characters "Ξ Ξ±Ξ½Ξ±Ξ³ΞΉΟΟΞ·Ο ΞΞ­ΟΞ²Ξ±Ο"
Could you please advice a solution?
Thanks in advance and cheers to everyone!!
Mybe your file encoding is ANSI, you can change your file encoding with notepad++ in encoding tab
The cake is:
Make sure you are writing down your HTML code in UTF-8/UTF-16 character encoding, according to your configuration HTML editor, notepad, vim or whatever.
Set the value of the meta HTML tag exactly with the character encoding in which you have saved your HTML file.
This is optional, but is extremely recommendable to assure that the web server recognize your HTML encoding to avoid mistranslations.
To sum up: you have to be sure that your encoding file in which you leave your HTML code is the same character encoding you specify in the meta HTML tag, otherwise the client web browser will show the characters according to the meta info HTML tag and will probably make that unreadable.

Spanish characters are not displayed by browser

I need to support Spanish language for a website which I am developing. I have created an XML file which contains the english text and corresponding Spanish text. I am reading this XML file on user's choice (language dropbox). Everything is working fine unless there are some spanish characters which browser is not able to display properly.
The content of XML file is:
<Spanish>
<title></title>
<loginBoxHeader1>Login Panel -</loginBoxHeader1>
<loginBoxHeader2>Por favor, proporcione las credenciales siguientes!</loginBoxHeader2>
<username>Nombre de Usuario:</username>
<password>Contraseña:</password>
<LoginBtn>iniciar la sesión</LoginBtn>
<RememberCheckbox>Recordar mi usuario en este equipo</RememberCheckbox>
</Spanish>
characters ñ and ó are not visible in browser. I set encoding of XML file to ISO-8859-1. Also I added following meta tag in HTML page:
<meta http-equiv="Content-Type" content="text/html; charset=ISO 8859-1">
I also tried with UTF-8 encoding but problem persists. Any thoughts?
Thanks.
have you tried google fonts? http://www.google.com/fonts Click on "New to Google Fonts?" and follow the steps. I picked 1, just "Normal 400", followed the steps. I had the same problem, my spanish characters would show up as black diamonds with question marks on them.
Maybe the fonttype doesn't support the special characters? I have had the same problem before and it was just becuase the font didn't have the symbols in it.
Changed the font and problem solved.
Your page is decoded with utf-8 (but should in iso...), the reason can be:
page file itself is utf-8 encoded (in that case meta tags are ignored)
www server content-type header is utf-8 (and should be changed to iso...)
Or go other way and save xml in utf-8 using for example notepad++
As #pawlakppp mentioned, have you checked the Encoding of your xml file.
Try this -
Open the XML file in an editor like NOTEPAD++.
Go to 'Encoding' Menu
Click on 'Encofing with UTF-8' or 'Encoding with UTF-8 without BOM'.

Why do symbols like apostrophes and hyphens get replaced with black diamonds on my website?

A website I've made has a few problems... On one of the pages, wherever there's an apostrophe (') or a dash (-), the symbol gets replaced with a weird black diamond with a question mark in the center of it
Here's what I mean
It seems this is happening all over the site wherever these symbols appear. I've never seen this before, can anyone explain it to me?
Suggestions on how to fix it would also be greatly appreciated.
See http://test.rfinvestments.co.za/index.php?c=team for a clear look at the problem.
It's an encoding problem. You have to set the correct encoding in the HTML head via meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
Replace "ISO-8859-1" with whatever your encoding is (e.g. 'UTF-8'). You must find out what encoding your HTML files are. If you're on an Unix system, just type file file.html and it should show you the encoding. If this is not possible, you should be able to find out somewhere what encoding your editor produces.
You need to change your text to 'Plain text' before pasting into the HTML document. This looks like an error I've had before by pasting straight from MS word.
MS word and other rich text editors often place hidden or invalid chars into your code. Try using — for your dashes, or ’ for apostrophes (etc), to eliminate the need for relying on your char encoding.
I have the same issue in my asp.net web application. I solved by this link
I just replace ' with ’ text like below and my site in browser show apostrophe without rectangle around as in question ask.
Original text in html page
Click the Edit button to change a field's label, width and type-ahead options
Replace text in html page
Click the Edit button to change a field’s label, width and type-ahead options
Look at your actual html code and check that the weird symbols are not originating there. This issue came up when I started coding in Notepad++ halfway after coding in Notepad. It seems to me that the older version of Notepad I was using may have used different encoding to Notepad's++ UTF-8 encoding. After I transferred my code from Notepad to Notepad++, the apostrophes got replaced with weird symbols, so I simply had to remove the symbols from my Notepad++ code.
If you are editing HTML in Notepad you should use "Save As" and alter the default "Encoding:" selection at the botom of the dialog to UTF-8.
you should also include-
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
This un-ambiguously sets the correct character set and informs the browser.
I experienced the same problem when I copied a text that has an apostrophe from a Word document to my HTML code.
To resolve the issue, all I did was deleted the particular word in my HTML and typed it directly, including the apostrophe. This action nullified the original copy and paste acton and displayed the newly typed apostrophe correctly
What I really don't understand with this kind of problem is that the html page I ran as a local file displayed perfectly in Chromium browser, but as soon as I uploaded it to my website, it produced this error.
Even stranger, it displayed perfectly in the Vivaldi browser whether displayed from the local or remote file.
Is this something to do with the way Chromium reads the character set? But why only with a remote file?
I fixed the problem by retyping the text in a simple text editor and making sure the single quote mark was the one I used.

Displaying unicode symbols in HTML

I want to simply display the tick (✔) and cross (✘) symbols in a HTML page but it shows up as either a box or goop ✔ - obviously something to do with the encoding.
I have set the meta tag to show utf-8 but obviously I'm missing something.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Edit/Solution: From comments made, using FireBug I found the headers being passed by my page were in fact "Content-Type: text/html" and not UTF-8. Looking at the file format using Notepad++ showed my file was formatted as "UTF-8 without BOM". Changing this to just UTF-8 the symbols now show correctly... but firebug still seems to indicate the same content-type.
You should ensure the HTTP server headers are correct.
In particular, the header:
Content-Type: text/html; charset=utf-8
should be present.
The meta tag is ignored by browsers if the HTTP header is present.
Also ensure that your file is actually encoded as UTF-8 before serving it, check/try the following:
Ensure your editor save it as UTF-8.
Ensure your FTP or any file transfer program does not mess with the file.
Try with HTML encoded entities, like &#uuu;.
To be really sure, hexdump the file and look as the character, for the ✔, it should be E2 9C 94 .
Note: If you use an unicode character for which your system can't find a glyph (no font with that character), your browser should display a question mark or some block like symbol. But if you see multiple roman characters like you do, this denotes an encoding problem.
I know an answer has already been accepted, but wanted to point a few things out.
Setting the content-type and charset is obviously a good practice, doing it on the server is much better, because it ensures consistency across your application.
However, I would use UTF-8 only when the language of my application uses a lot of characters that are available only in the UTF-8 charset. If you want to show a unicode character or symbol in one of cases, you can do so without changing the charset of your page.
HTML renderers have always been able to display symbols which are not part of the encoding character set of the page, as long as you mention the symbol in its numeric character reference (NCR). Sounds weird but its true.
So, even if your html has a header that states it has an encoding of ansi or any of the iso charsets, you can display a check mark by using its html character reference, in decimal - ✓ or in hex - ✓
So its a little difficult to understand why you are facing this issue on your pages. Can you check if the NCR value is correct, this is a good reference http://www.fileformat.info/info/unicode/char/2713/index.htm
Make sure that you actually save the file as UTF-8, alternatively use HTML entities (&#nnn;) for the special characters.
Unlike proposed by Nicolas, the meta tag isn’t actually ignored by the browsers. However, the Content-Type HTTP header always has precedence over the presence of a meta tag in the document.
So make sure that you either send the correct encoding via the HTTP header, or don’t send this HTTP header at all (not recommended). The meta tag is mainly a fallback option for local documents which aren’t sent via HTTP traffic.
Using HTML entities should also be considered a workaround – that’s tiptoeing around the real problem. Configuring the web server properly prevents a lot of nuisance.
I think this is a file problem, you simple saved your file in 1-byte encoding like latin-1. Google up your editor and how to set files to utf-8.
I wonder why there are editors that don't default to utf-8.