I need to support Spanish language for a website which I am developing. I have created an XML file which contains the english text and corresponding Spanish text. I am reading this XML file on user's choice (language dropbox). Everything is working fine unless there are some spanish characters which browser is not able to display properly.
The content of XML file is:
<Spanish>
<title></title>
<loginBoxHeader1>Login Panel -</loginBoxHeader1>
<loginBoxHeader2>Por favor, proporcione las credenciales siguientes!</loginBoxHeader2>
<username>Nombre de Usuario:</username>
<password>Contraseña:</password>
<LoginBtn>iniciar la sesión</LoginBtn>
<RememberCheckbox>Recordar mi usuario en este equipo</RememberCheckbox>
</Spanish>
characters ñ and ó are not visible in browser. I set encoding of XML file to ISO-8859-1. Also I added following meta tag in HTML page:
<meta http-equiv="Content-Type" content="text/html; charset=ISO 8859-1">
I also tried with UTF-8 encoding but problem persists. Any thoughts?
Thanks.
have you tried google fonts? http://www.google.com/fonts Click on "New to Google Fonts?" and follow the steps. I picked 1, just "Normal 400", followed the steps. I had the same problem, my spanish characters would show up as black diamonds with question marks on them.
Maybe the fonttype doesn't support the special characters? I have had the same problem before and it was just becuase the font didn't have the symbols in it.
Changed the font and problem solved.
Your page is decoded with utf-8 (but should in iso...), the reason can be:
page file itself is utf-8 encoded (in that case meta tags are ignored)
www server content-type header is utf-8 (and should be changed to iso...)
Or go other way and save xml in utf-8 using for example notepad++
As #pawlakppp mentioned, have you checked the Encoding of your xml file.
Try this -
Open the XML file in an editor like NOTEPAD++.
Go to 'Encoding' Menu
Click on 'Encofing with UTF-8' or 'Encoding with UTF-8 without BOM'.
Related
Today I've started my first HTML page. Where is the page encoding stored exactly?
At first, é turned into é. Then I used my text editor to save the file with an encoding. "UTF-8" didn't work. Then I used "ISO 8859-1", which did work. How did my browser know it was encoded with "ISO 8859-1"?
I can't see it anywhere in my file, so I'm very curious about where the info is stored.
The encoding is stored in the header of the file itself. Notepad++ and similar programs usually provide a number of options to change and view it.
Additionally, you can provide a value by using the meta tag:
<meta charset="UTF-8"> (HTML5)
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"> (HTML4)
Those tags are used by browsers to parse your file. However, they do not define the encoding of the file itself (and that's what seems to be happening in your case: your file has encoding A, and the browser is trying to read encoding B), and browsers can ignore those conditions.
The default encoding can also be defined (and overwritten) by your server. A sample .htaccess encoding configuration:
AddDefaultCharset utf-8
AddType 'text/html; charset=utf-8' .html .htm .shtml
UTF-8 is the recommended encoding standard for the web.
The UTF-8 encoding for é is the two hex bytes C3A9.
C3 A9, when interpreted as ISO 8859-1 is two characters: é.
Browsers tend to guess correctly at the encoding. Or you can explicitly tell it how to interpret the bytes. Try that out -- you will probably see the text change between é and é.
A third case is when "double encoding" occurs. That is, somehow, the é is seen as UTF-8, hex C383 C2A9.
So, to really be sure of what is going on, you need to get the HEX.
I have a HTML file which contains Chinese text. When I open the file in any web browser, there are characters which appear to be missing.
Here's an example copied from the browser window:
本函旨在邀請您參�� 定於
I know for a fact that all other characters seen here are correct aside from the missing ones (confirmed by a native Chinese speaker).
In the HTML header, I have a tag which signifies the file contains UTF-8 encoded characters:
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
I've already tried some other charsets in this META tag, but so far it seems any encoding method I try aside from UTF-8 ends up looking worse.
I also considered the possibility that it is a font issue, so I installed 3 different traditional Chinese fonts on my system and forced Chrome to use them. None of them made any difference - missing characters were still present.
If I open the HTML file with Notepad++, here's what I can see:
http://i.imgur.com/GoS07WX.png
If I select and copy-paste this text into regular MS Notepad, I get this:
本函旨在邀請您參劦nbsp;定於
So you can see here that the "xE5 x8A" visible in Notepad++ seems to have been replaced by 劦.
Is there any reason why the browser would be showing �� instead of 劦 in this scenario?
Look again at the HTML file.
I see the first 2 bytes of a character encoded in UTF-8, followed by ... let's imagine there was originally a \xA0, and this was mutated to when the file was created by applying global substitutions to the UTF-8-encoded data.
However, \xE5\x8A\xA0 UTF-8 decodes to U+52A0 which is not the same as the alien character which is U+52A6 ... not close enough to an answer.
Ok, I think the title pretty much sums the question up nicely. Basically, I've written an help file on my windows machine in HTML, so it includes characters like the following:
®, ', ", ...
Obviously it displays fine on Windows, but when I copy the file to my Mac and try to view it the characters above turn jibberish and look foriegn. I could type them on my Mac and save it, but I'm just worried that I need to do something to prevent the same thing from happening on other computers/environments.
If anybody knows how I can stop this from happening, as easily as possible, I'd be greatful to know. Thanks in advance...
Make sure your HTML file is saved as UTF8 and use the UTF8 meta tag:
To save a file as UTF-8, open it in using NotePad and choose "save as", then make sure encoding is set as UTF-8.
To add the UTF-8 meta tag to your HTML file, just add the following line in the "head" section: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
UTF8 is designed for backward compatibility with ASCII and to avoid the complications of endianness and byte order marks in UTF-16 and UTF-32. See: Wikipedia
My assumption is either due to file encoding (maybe one uses UTF-8 and the other iso-8859-1) or due to differences between editors. Try on the Windows machine pasting the code into Notepad or Wordpad, then sending that code to the Mac.
You can save it as unicode and add the meta like John Riche said or replace it by its HTML entities:
® = ®
http://www.w3schools.com/tags/ref_entities.asp
I used certain characters in website such as • — “ ” ‘ ’ º ©.
I found that when testing to see what my website looked like under different browsers (BrowserLab)
the afore-mentioned characters are replaced with �.
I then changed the charset in the webpage header from:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
to
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Suddenly all the pages have the above mentioned characters replaced with a ?.
Even more puzzling is this is not always consistent across and even within the same page, as some sections display the character • and © correctly.
In particular, I need to replace the character • with one that will display across browsers, can anyone help me with the answer? Thanks.
You should save your HTML source as UTF8.
Alternatively, you can use HTML entities instead.
The source code needs to be saved in the same encoding as you're instructing the browser to parse it in. If you're saving your files in UTF-8, instruct the browser to parse it as UTF-8 by setting an appropriate HTTP header or HTML meta tag (headers preferable, your web server may be setting one without you knowing). Use a decent editor that clearly tells you what encoding you're saving the file as. If it doesn't display correctly, there's a discrepancy between what you're telling your browser the file is encoded in and what it's really encoded in.
Check to see if Apache is setup to send the charset. Look for the directive "AddDefaultCharset" and set it to Off in .htaccess or your config file.
Most/all browsers will take what is sent in the HTTP headers over what is in the document.
If you're using Notepad++, I suggest You to use Edit Plus editor to copy the text (which has the special characters) and paste it in your file. This should work.
Yes I had this problem too in notepad++ copy and pasting wasn't working with some symbols
I think SLaks is right
HTML entities for copyright symbol ©
I am adding some Chinese text to a primarily English web page and am having trouble getting the characters to display properly. I've got the encoding set to UTF-8 in the meta content type tag, and I am copying/pasting the Chinese I was sent from a Word document. The text is still rendering as follows:
繁體中文版
rather than in Chinese characters:
繁體中文版
I'm sure it's an easy fix, but I'm lost as to how to make this happen.
Thanks very much for any help.
just because the meta tag says that the encoding is UTF8, doesn't mean that the content (file) itself is in UTF8. I mean, if you have a file index.html, the file itself should be encoded as utf8.
To change the encoding of a file in lunix, you can use this command
iconv --from-code=ISO-8859-1 --to-code=UTF-8 ./index.html > ./newIndex.html
but i guess that you are working with windows... and the only way i know change the encoding in windows is the Notepad++
Hope this helps