IE 10 does not load page in UTF-8 - html

I've got simple HTML pages in Russian with a bit of js in it.
Every browser going well except IE10. Even IE9 is fine. Next code is included:
<html lang="ru">
<meta http-equiv="Cоntent-Type" content="text/html"; charset="utf-8">
Also I've added .htacess with
AddDefaultCharset UTF-8
Still IE10 loads page in Cyrillic encoding (cp-1251 I believe), the only way to display characters in a right way is to manually change it to UTF-8 inside of a browser (or chose auto-detect mode).
I don't understand why IE10 force load 1251 instead of UTF-8.
The website to check is http://btlabs.ru

What really causes the problem is that the HTTP headers sent by the server include
Content-Type: text/html; charset=windows-1251
This overrides any meta tags. You should of course fix the errors with the meta tag as pointed out in other answers, and run a markup validator to check your code, but to fix the actual problem, you need to fix the .htaccess file. Without seeing the file and other server-side issues, it is impossible to tell how to fix that (e.g., server settings might prevent the effect of a per-directory .htaccess file and apply one global file set by the server admin). Note that the file name must have two c's, not one (.htaccess, not `.htacess').
You can check what headers are e.g. using Rex Swain’s HTTP Viewer.
The reason why things work on other browsers is that they apply the modern HTML5 principle “BOM wins them all”. That is, an HTTP header wins a meta tag in specifying the character encoding, but if the actual data begins with three bytes that constitute the UTF-8 encoded form of the Byte Order Mark (BOM), then, no matter what, the data will be interpreted as UTF-8 encoded. For some unknown reason, IE 10 does not do that (and neither does IE 11).
But this won’t be a problem if you just make the server send an HTTP header that declares UTF-8.
If the server has been set to declare windows-1251 and you cannot possibly change that, then you just need to live with it. Transcode your HTML files to windows-1251 then, and declare windows-1251 in a meta tag. This means that if you need any characters outside the limited repertoire representable in windows-1251, you need to represent them using character references.

perhaps because your 'o' in 'content' is not an ascii 'o'. notice that it is not red in Stackoverflow? i then copied it to a good text editor and see that it is indeed not an o. because the 'o' is not really an ascii 'o', that whole line probably should get ignored in every web browser, which should then depend on what default charset it uses. Microsoft and IE is notorious for picking bad defaults, thus is my reason why it doesn't work in IE. ;)
but codingaround has good advice too. it's best to put quotes around your attribute values. but that should not break a web browser.
you should use a doctype at the start:
<!DOCTYPE html>
<html lang='ru'>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
but the real culprit is your content and charset problem. notice my line. mine is very different. ;) that's the problem. note that mine has two ascii 'o's, one in "Content-Type" and another in 'content='.

As Shawn pointed out, copy and paste this:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
This is a really good example of how non-Ascii letters that look like English Ascii letters can really mess things up!

Maybe you forgot changing cоntent=text/html; to cоntent="text/html";
As Shawn has already pointed out, it could also be content="text/html; charset=utf-8".
But as you have tried both things out, can you confirm if the IE10 output looks like this?
I can't really help further with this, as the only thing I have here is an IE 10 online emulator.
So far the possible problems are:
Different o character
I see, that the <meta> tag is still outside of <head>, put it in place
Problems with IE handling the content and charset attributes

Related

Should I still be using <meta charset="UTF-8"> in 2022?

<meta charset="UTF-8">
UTF-8 is default encoder in modern browsers. So all this code does is it adds support for browsers which don't automatically do this. I don't plan on supporting older browsers is there any other reason to add this line of code?
I have heard others saying that leaving it out could lead to some cross scripting attacks, bad things and such but never gave me any clear examples.
Also some old HTMl validator throws error when leaving <meta charset="UTF-8">. out
https://validator.w3.org/nu/#file
The character encoding was not declared
Then it does this
process with windows-1252.
This isn't great cause this could lead to error if the site has characters that windows-1252 doesn't support.
I'm guessing this only happens on browers that don't default to UTF-8 support this though. Should I be worried about this warning/error if I leave this out.
I have researched, about trying to understand why UTF-8 is used but I can't find a definite answer on to why to use or to not use it.
Thanks in advance.
For anyone looking back on this post, I did some more research, and found more reasons on why to include this line of code. I found this page from the Google Devs. That states this
https://web.dev/charset/#resources
Lighthouse flags pages that do not specify their character encoding:
So, not including <meta charset="UTF-8"> would make your Google Lighthouse score lower.
Here is why it is considered a best practice
Servers and browsers communicate with each other by sending bytes of data over the internet. If the server doesn't specify which character encoding format it's using when it sends an HTML file, the browser won't know what character each byte represents. The character encoding declaration specification solves this problem.
Here is the lighthouse doc
From my understanding this line of code isn't necessary anymore, but is considered a best practice. Hope this helped for anyone reading this.

How come the following characters are displayed in ISO-8859-1?

I have the following html:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
</head>
<body>
会意字 / 會意字 huìyìzì
</body>
When I run it in firefox, it displays the Chinese characters just fine. How come it works with the ISO-8859-1 characterset? I thought you needed UTF-8?
I can't reproduce your successful rendering:
… but HTML 5 defines a fairly complex character encoding detection method which doesn't pay any attention to <meta> until step 9.
In general, you should avoid encodings other than UTF-8 and definitely should not lie about the encoding of the document.
The most probable explanation is that the document is in fact UTF-8 encoded and the browser treats it that way, despite the meta tag. According to HTML5 encoding sniffing algorithm, which largely reflects browser behavior, the meta tag is ignored if any of the following is true:
The user has instructed (via e.g. a View → Encoding command) the browser to use a specific encoding.
The page starts with bytes that represent the Byte Order Mark in UTF-8 or UTF-16. In practice, it starts that way if the file was saved in an editor with a command like “Save as UTF-8 (with BOM)”.
HTTP headers specify an encoding in a Content-Type header.
You can find out which of these is the cause by using e.g. Rex Swain’s HTTP viewer. It lets you see both the HTTP response headers and the actual data as bytes. Developer Tools in browsers have similar features.

HTML5 Encoding & Cyrillic

Something that made me curious - supposedly the default character encoding in HTML5 is UTF-8. However if I have a plain simple HTML file with an HTML5 doctype like the code below, I get:
"hello" in Russian: "ЗдраÑтвуйте"
In Chrome 33+, Safari 6, IE11, etc.
<!DOCTYPE html>
<html>
<head></head>
<body>
<p>"hello" in Russian is "здраствуйте"</p>
</body>
</html>
What gives? Shouldn't the browser utilize the UTF-8 unicode standard and display the text correctly? I'm using Coda which is set to save html files with UTF-8 encoding by default so that's not the problem.
The text data in the example is UTF-8 encoded text misinterpreted as window-1252 encoded. The reason is that the encoding has not been specified and browsers are forced to make a guess. To fix this, specify the encoding; see the W3C page Character encodings. Two simple ways that work independently of server settings, as long as the server does not send wrong encoding information in HTTP headers:
1) Save the file as UTF-8 with BOM (there is probably an option for this in your authoring program.
2) Add the following tag into the head part:
<meta charset=utf-8>
There is no single default encoding specified for HTML5. On the contrary, browsers are expected to make guesses when no encoding has been declared. This is a fairly complex process, described in 8.2.2.2 Determining the character encoding.
If you want to be sure which charset will be used by browser you must have in your page head
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
otherwise you are at the mercy of local settings and browser automation.

Html charset and support for special (national) characters

I have a website in HTML5. Most of the content there is in Czech, which has some special symbols like "ř, č, š" etc...
I searched the internet for recommended charsets and I got these answers: UTF-8, ISO 8859-2 and Windows-1250.
<meta http-equiv="Content-Type" content="text/html; charset=ISO 8859-2" />
I tried UTF-8 which didnt work at all and then settled up with ISO 8859-2. I tested my website on my computer in the latest versions of Chrome, Firefox, IE and Opera. Everything worked fine but when I tested my website at http://browsershots.org/ , these characters were not displayed correctly (in the same browsers that I used for testing!).
How is that possible? How can I ensure, that all characters are displayed correctly in all web browsers. Is it possible that usage of HTML5 causes these problems (since its not fully supported by all browsers, but I am not using any advanced functions)?
Thanks for any hints and troubleshooting tips!
If you using HTML5, try this short declaration of charset:
<meta charset="UTF-8">
Additionally check you html file encoding. You can do it in Notepad++, menu Encoding -> Encode in UTF-8.
The important thing is that the actual encoding of the data coincides with the declared encoding. From the description, it seems that the actual encoding is ISO-8859-2, so you should declare it. Note that the name of the encoding has no space but hyphens. (I wonder whether you used it with a space – I would expect browsers to ignore the tag then.) The following is the simplest declaration:
<meta charset=ISO-8859-2>
I would not trust on browsershots.org getting things like this right. Testing on actual browsers is more useful.
UTF-8 is the best-supported character set for international usage. If it does not display correctly, you should ensure that your file is saved in UTF-8 format. Even Notepad has a "UTF-8" option in its save dialog.

€ symbol rendering as €2

Unusual problem here: I have an app that uses a text file which contains a few '€' symbols as well as other text in a text file to populate a mysql database. When rendered locally, the € symbol looks fine, but on the linux server and out on the web in html, it looks like this in some browsers:
€2
can anyone suggest a solution
Set the charset in the headers or a <META> element to UTF-8 so that it isn't processed as CP1250.
Use an UTF-8 encoding type on your file and make sure you add a content-type meta tag to your page:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
Hope this helps !
If you are viewing your text (.txt) file as a text and not HTML in browsers window, setting
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
will not do the job as you are dealing with text file,
so tags will not be "hidden", plus it may potentially (even most likely) send garbage to mysql database you are trying to populate (e.g. by auto-harvesting posted online file).
So, if in browser window instead of:
€ 123.39
you are seeing
€2 123.39
problem is not with quality of your text file, but with the way browser handles encoding.
If you need to copy and paste displayed file and "€2" is in the way,
try simply setting your browser default encoding to unicode (UTF-8).
In FF you want to do it here:
Tools-> Options-> Content (tab)-> Fonts&Colors-> Advanced-> Default Char. Encoding
Once there select UTF-8 encoding.
Remember thou that sometimes page reload may not be enough to see changes, due to browser cache. In such case, restart your browser.