UTF-8 html without BOM displays strange characters - html

I have some HTML which contains some forign characters (€, ó, á). The HTML document is saved as UTF-8 without BOM. When I view the page in the browser the forign characters seem to get replaced with stranger character combinations (€, ó, Ã). It's only when I save my HTML document as UTF-8 with BOM that the characters then display properly.
I'd really rather not have to include a BOM in my files, but has anybody got any idea why it might do this? and a way to fix it? (other than including a BOM)

You are probably not specifying the correct character set in your HTML file. The BOM (thanks #Jukka) sends the browser into UTF-.8 mode; in its absence, you need to use other means to declare the document UTF.8.
If you have access to your server configuration, you may want to make sure the server isn't sending the wrong character set info. See e.g. How to change the default encoding to UTF-8 for Apache?
If you have access only to your HTML, adding this meta tag in your document's head should do the trick:
<meta http-equiv='Content-Type' content='Type=text/html; charset=utf-8'>
or as #Mathias points out, the new HTML 5
<meta charset="utf-8">
(valid only if you use a HTML 5 doctype, against which there is no good argument any more even if you don't use HTML 5 markup.)

Insert <meta charset="utf-8"> in <head>.
Or set the header Content-Type: text/html;charset=utf-8 on the server-side.
You can also do add in .htaccess: AddDefaultCharset UTF-8 more info here http://www.askapache.com/htaccess/setting-charset-in-htaccess.html

Related

How to make latin extended work?

I've been googling for some but can't realize how to make letters like č, ć, ž, š, đ work. I tried adding <body lang="sr"> because it actually is Serbian (sr=serbian) but doesn't work. I get this PoÄetna instead of Početna.
I tried adding <meta charset="ISO-8859-2"> into the head section but still nothing. What am I missing?
Pick a character encoding that supports the characters you want to use. ISO-8859-2 should do the job, but this isn't the 1990s any more. UTF-8 should be the default choice.
Ensure your editor is configured to save in that encoding.
Specify that you are using that encoding with document level meta data: <meta charset="utf-8">
Specify that you are using that encoding in your HTTP response (this takes priority over the document level): Content-Type: text/html;charset=UTF-8.

HTML Character Encoding tag do not work for me

Hello to everyone and sorry for my novice question.
I have an HTML document in which I would like to put some Greek Characters.
I have used the following:
<head>
<meta charset="UTF-8">
</head>
but the result is a sequence of such characters "Ξ Ξ±Ξ½Ξ±Ξ³ΞΉΟΟΞ·Ο ΞΞ­ΟΞ²Ξ±Ο"
Could you please advice a solution?
Thanks in advance and cheers to everyone!!
Mybe your file encoding is ANSI, you can change your file encoding with notepad++ in encoding tab
The cake is:
Make sure you are writing down your HTML code in UTF-8/UTF-16 character encoding, according to your configuration HTML editor, notepad, vim or whatever.
Set the value of the meta HTML tag exactly with the character encoding in which you have saved your HTML file.
This is optional, but is extremely recommendable to assure that the web server recognize your HTML encoding to avoid mistranslations.
To sum up: you have to be sure that your encoding file in which you leave your HTML code is the same character encoding you specify in the meta HTML tag, otherwise the client web browser will show the characters according to the meta info HTML tag and will probably make that unreadable.

How do I display Unicode as text in HTML?

I can't manage to find a way to do this.For example ∞ (infinity symbol) to display as text in a HTML document
You have first to check what is the Content-Type header your server returns? Is it Content-Type: text/html; charset=UTF-8? See Character_encodings_in_HTML If the server returns the charset, either fix it or use it, it overrides user provided encoding. (see HTML entities).
If your server does not provide charset, then add one in the document, as early as possible (should be in the first 1024 bytes entirely). Again, see Character_encodings_in_HTML. The following header should do:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
or for HTML 5:
<meta charset="utf-8">
or for XHTML (the first line):
<?xml version="1.0" encoding="ISO-8859-1"?>
And if you do not/can not use UTF-8 for your document, use HTML entities like
C Travel suggests.
You write the character, e.g. “∞”, in your authoring program, save the file as UTF-8 with BOM, and make sure that the fonts that you have declared for the page, or the relevant piece of text, contain the characters(s) you have included. For more information, see my Guide to using special characters in HTML. If problems remain, please post the code you have tried and specify how it fails (and on which browsers).
You can use the &#; HTML element.
For codes: http://unicode-table.com/en/
And you have to use UTF-8 encoding for the file save, and you have to put UTF-8 meta tag in the header too. (If you didn't already have this.)

Why does a diamond with a questionmark in it � appear in my HTML?

I have an unorder list, and � often (but not always!) appears where I have have two spaces between characters. What is causing this, and how do I prevent it?
This specific character � is usually the sign of an invalid (non-UTF-8) character showing up in an output (like a page) that has been declared to be UTF-8. It happens often when
a database connection is not UTF-8 encoded (even if the tables are)
a HTML or script source file is stored in the wrong encoding (e.g. Windows-1252 instead of UTF-8) - make sure it's saved as a UTF-8 file. The setting is often in the "Save as..." dialog.
an online source (like a widget or a RSS feed) is fetched that isn't serving UTF-8
I had the same issue ....
You can fix it by adding the following line in your template !
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
It's a character-set issue. Get a tool that inspects the response headers of the server (like the Firebug extension if you're using Mozilla Firefox) to see what character set the server response is sending with the content. If the server's character-set and the HTML character set of the actual content don't match up, you will see some strange looking characters like those little black diamond squares.
I had the same issue when getting an HTML output from an XSLT. Along with Pradip's solution I was also able to resolve the issue using UTF-32.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-32" />

Characters not displaying correctly in different browsers

I used certain characters in website such as • — “ ” ‘ ’ º ©.
I found that when testing to see what my website looked like under different browsers (BrowserLab)
the afore-mentioned characters are replaced with �.
I then changed the charset in the webpage header from:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
to
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Suddenly all the pages have the above mentioned characters replaced with a ?.
Even more puzzling is this is not always consistent across and even within the same page, as some sections display the character • and © correctly.
In particular, I need to replace the character • with one that will display across browsers, can anyone help me with the answer? Thanks.
You should save your HTML source as UTF8.
Alternatively, you can use HTML entities instead.
The source code needs to be saved in the same encoding as you're instructing the browser to parse it in. If you're saving your files in UTF-8, instruct the browser to parse it as UTF-8 by setting an appropriate HTTP header or HTML meta tag (headers preferable, your web server may be setting one without you knowing). Use a decent editor that clearly tells you what encoding you're saving the file as. If it doesn't display correctly, there's a discrepancy between what you're telling your browser the file is encoded in and what it's really encoded in.
Check to see if Apache is setup to send the charset. Look for the directive "AddDefaultCharset" and set it to Off in .htaccess or your config file.
Most/all browsers will take what is sent in the HTTP headers over what is in the document.
If you're using Notepad++, I suggest You to use Edit Plus editor to copy the text (which has the special characters) and paste it in your file. This should work.
Yes I had this problem too in notepad++ copy and pasting wasn't working with some symbols
I think SLaks is right
HTML entities for copyright symbol &#169