HTML UTF-8 encoding - html

When I save in an editor (Notepad++) an HTML file as "utf-8" encoded the meta tag charset (=ISO-8859-2) seems to be ignored by browser (charset is always set to "utf-8", not matter which encoding i have set in meta tag)
What's more interesting when i save this doc as "ANSI" encoded file changing tag charset works...
Can You please explain me such a behaviour?

You can use META. Like this:
<head>
<meta charset="utf-8">
</head>

"UTF-8" in Notepad++ really means "UTF-8 with BOM". The leading BOM very likely triggers UTF-8, regardless of what anything else is saying, since no other document should start with that particular byte sequence. Try saving as "UTF-8 without BOM" to see the difference.

Related

Utf-8 does not work in php file

I use notepad++ for coding.
I have a test.php file which is encoded with UTF-8 without boom. I have set the charset in the head as
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
However, if I open the file in a browser special characters like "äüö" are not represented correctly. If I go to pageinformation in firefox I get
Coding: windwos-1252
Content-Type text/html; charset=utf-8
Why is the coding wrong? How can I change it?
You should use notepad.exe which comes with windows and click SAVE AS option and change encoding there to utf-8 instead of ANSI. Save it and check.
I had similar issues.
Hope this helped
what about utf-8 without bom?
It happens with certain text editors. Try notepad++ or similar ones.

HTML5 Encoding & Cyrillic

Something that made me curious - supposedly the default character encoding in HTML5 is UTF-8. However if I have a plain simple HTML file with an HTML5 doctype like the code below, I get:
"hello" in Russian: "ЗдраÑтвуйте"
In Chrome 33+, Safari 6, IE11, etc.
<!DOCTYPE html>
<html>
<head></head>
<body>
<p>"hello" in Russian is "здраствуйте"</p>
</body>
</html>
What gives? Shouldn't the browser utilize the UTF-8 unicode standard and display the text correctly? I'm using Coda which is set to save html files with UTF-8 encoding by default so that's not the problem.
The text data in the example is UTF-8 encoded text misinterpreted as window-1252 encoded. The reason is that the encoding has not been specified and browsers are forced to make a guess. To fix this, specify the encoding; see the W3C page Character encodings. Two simple ways that work independently of server settings, as long as the server does not send wrong encoding information in HTTP headers:
1) Save the file as UTF-8 with BOM (there is probably an option for this in your authoring program.
2) Add the following tag into the head part:
<meta charset=utf-8>
There is no single default encoding specified for HTML5. On the contrary, browsers are expected to make guesses when no encoding has been declared. This is a fairly complex process, described in 8.2.2.2 Determining the character encoding.
If you want to be sure which charset will be used by browser you must have in your page head
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
otherwise you are at the mercy of local settings and browser automation.

HTML - charset windows 1255 works but utf-8

I wrote html page that displays mixed hebrew/english content.It works fine with charset "windows - 1255"
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/TR/html4/loose.dtd">
<html dir="rtl" lang="he">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1255">
,But I figured that people will have trouble if their machines doesn't support hebrew. I changed the charset to utf-8 and got
HTML:
meta http-equiv="Content-Type" content="text/html; charset=utf-8"
View:
"��� ��� ������, ��� ����� �����, �� ������ ���� ��� ���� �� ������"
Read zohar ��� ����
....
Isn't utf-8 suppose to support more chars then windows 1255?
I guess when you changed the tag, you didn't tell your editor to convert the file to UTF-8. So, the file is still in Windows-1255 format, but the browser tries to read it as if it was UTF-8, so you get bad/unreadable characters.
I have no idea which editor you're using, so i can't tell you how to put it in UTF-8 mode. Try to find a setting in your options regarding the character set to use. Or, open the file in Windows notepad, and when saving it, make sure you select "Codepage: UTF-8" from the drop down box next to the save button.
Relation to Unicode
The Unicode Hebrew block (U+0590–U+05FF) follows Windows-1255 by encoding both letters and vowel-points in the same relative positions as Windows-1255. Unicode goes further in encoding cantillation marks in lower positions. Unicode Hebrew is always in logical order.
For modern applications UTF-8 or UTF-16 is a preferred encoding.
Source: http://en.wikipedia.org/wiki/Windows_1255
It seems to me that your encoding should still work if your characters are within the Unicode Hebrew block.

Character encoding failing

I'm trying to set the encoding of some files in PHP to ISO-8859-1. I tried using this:
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">`
but it still isn't working. What can I do? Thank you.
You should be outputting this as a real HTTP header. <meta> elements are not a good substitute.
header('Content-Type: text/html;charset=iso8859-1');
You should ensure that the encoding of the files is actually ISO 8859-1 - all this does is tell the browser that the resource is in that encoding, it doesn't actually transcode or anything.

Charset UTF-8 doesn't work (stuck with these ?-marks)

I have a Unicode problem... I´ve done this before but for now, I cannot understand
why the Icelandic letters don´t show up - I have those question marks again
Here is the url (very plain and short html5)
http://nicejob.is/new/
Everything I Google says: use the <meta charset="utf-8"> as I do.
Any suggestions?
Your page is already viewed as UTF-8. But your source code is not saved as UTF-8.
Please change the encoding of your source code file to UTF-8.
Not all browsers support HTML5-way tags yet
here you can see table of compability
Try this instead:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
I can see a couple of issues.
The META should look like this:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
The <html> specified lang="en" which might be prone to confusing some browsers.
When I view the HTML from the browser, the question marks are encoded as 0xEF 0xBF 0xBD, which is the UTF-8 encoding for the byte order mark or BOM, aka U+FEFF. So, for whatever reason, the HTML is not transmitted as sensible UTF-8 (though it does seem to be valid UTF-8).
Probably you are using some text editor like notepad++,
and you didn't set up encoding to UTF-8 in that text editor.
What you have to do is to save the file with utf-8 encoding by using Notepad (the attached one with Windows).
Steps:
Save as ..
In the below options ... you will find encoding option choose UTF-8 ...
And save the file ...
Then add the line <meta charset="UTF-8" /> inside your file ...
And it will work.