IE loses automatic UTF-8 encoding in iframe form target - html

I have an odd problem in IE. It has to do with how IE detects the encoding of an iframe based on its parent content. My application wraps the content of a page in an iframe, and sets the encoding of the parent window to UTF-8 through the Content-Type header. The content of the iframe does not set the encoding through the Content-Type, and picks up the parent window's encoding on its initial load. This is the desired behavior - the content window requires the UTF-8 encoding for some language content, but for complicated reasons beyond my control, it cannot forcibly set its own encoding, so it relies on the parent window's encoding.
The problem arises when the content page is the target of a form action. When the form submits and the page loads in the content window, it auto-selects Western European (Windows) encoding. Does anyone know why? I've tried searching for any sort of documentation on related behavior, but the googles, they do nothing. Any sort of a lead (beyond sending a Content-Type header or a byte-order mark in the content) would be most helpful.
I unfortunately don't have a public place to host this, but copy-pasting these code samples to local files and saving each with UTF-8 encoding without a byte-order mark should consistently reproduce the behavior in all versions of IE.
frame1.html
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<div>エンコード</div>
<iframe src="frame2.html"></iframe>
frame2.html
<form>
<input value="エンコード">
<input type="submit">
</form>
To recap with the example, if you load the page and check the encoding of both the parent and the iframe, you should see "Auto-Select" checked and "UTF-8" selected in both. If you hit Submit in the iframe, the frame will reload and the input text will be garbled. Checking the encoding of the iframe should still show "Auto-Select" checked, but now "Western European (Windows)" will be selected instead of "UTF-8". I need to know if there is anything else I can do to make it automatically preserve the UTF-8 encoding when the form action completes.
Thanks in advance!

When you say you cannot add a Content-Type header/BOM, are you able to add the Content-Type as a meta tag? Something like:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Recently I had very similar issues - IE auto-detecting Western European at all times, except when a certain popup window navigated to the page, which then caused IE to pick UTF-8. I was never able to track down exactly what caused it (the resulting page was identical, only the page that linked to it was different!), so we ended up fixing it by forcing UTF-8 across the entire application (with headers).
If you're really unable to modify the inner page in any way, is it possible you could "replace" this page with your own, and then send the content over to the "other" server via an API or HTTP POST where you wouldn't need to worry about IE's "auto-detecting"?

Related

How to make HTML character set take preference over browser text encoding?

My webpage has some chinese characters. When the browser text encoding is "Unicode" everything is fine. But when I change it to "Western" the chinese characters are getting messy.
I want the page to display in UTF-8 irrespective of the browser encoding. How to do it?
The response header received for the JSP has Content-Type: "text/html;charset=UTF-8". When I check the response in the network tab, it is proper(in UTF-8). Also JSP has
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Even with all these charset mentions, the browser text encoding is taking preference. Can this be overridden? Can the page always be in "UTF-8" regardless of the browser encoding?
Note: The browser I checked is Firefox.
Text boxes are pre-populated with chinese characters from the server.
This is when the browser text encoding is "Unicode".
document.charset is "UTF-8"
This is when the browser text encoding is "Western".
document.charset is "windows-1252"
Please help.
I want the page to display in UTF-8 irrespective of the browser encoding. How to do it?
You can't.*
Manually selecting an encoding in the browser's encoding menu is supposed to override anything that the web site is saying about what the encoding should be.
You can't prevent this, and neither should you.
Anyone forcing the browser to use an encoding that the web site doesn't support is acting on their own responsibility.
* well, apart from displaying all text in images. Or in a Flash movie. :)

Why don't emojis render in my HTML and/or PHP?

In an effort to learn more about font rendering/encoding I'm more curious as to why when I copy and paste the emojis 😇🐵🙈 into a blank <html> page and simply save the .html file locally on my machine, or even start a local php server and serve files with the above emojis in there, they either show up as some weird characters (😇ðŸµðŸ™ˆ) or blank, respectively. Yet I know that when I type them straight into this very stack overflow ask textarea, they will render correctly in my browser, and be displayed as intended when viewing this page.
My understanding is that since mac osx now ships with the correct emoji fonts, they should be rendered as just that. So where is the disconnect between the HTML page you're looking at right now, and the local one I saved on my computer?
And recommended reading would be appreciated! :) errr.... 😀
When a web server sends a file to a browser, it will send a set of HTTP headers as well, relaying information about the content type, caching, etc. The content-type header also informs the browser which encoding was used:
Content-Type: text/html; charset=utf-8
If your open that file locally then your browser only gets the file and it has to guess the encoding. You can declare the encoding in the HTML head:
<meta charset="utf-8">

IE 10 does not load page in UTF-8

I've got simple HTML pages in Russian with a bit of js in it.
Every browser going well except IE10. Even IE9 is fine. Next code is included:
<html lang="ru">
<meta http-equiv="Cоntent-Type" content="text/html"; charset="utf-8">
Also I've added .htacess with
AddDefaultCharset UTF-8
Still IE10 loads page in Cyrillic encoding (cp-1251 I believe), the only way to display characters in a right way is to manually change it to UTF-8 inside of a browser (or chose auto-detect mode).
I don't understand why IE10 force load 1251 instead of UTF-8.
The website to check is http://btlabs.ru
What really causes the problem is that the HTTP headers sent by the server include
Content-Type: text/html; charset=windows-1251
This overrides any meta tags. You should of course fix the errors with the meta tag as pointed out in other answers, and run a markup validator to check your code, but to fix the actual problem, you need to fix the .htaccess file. Without seeing the file and other server-side issues, it is impossible to tell how to fix that (e.g., server settings might prevent the effect of a per-directory .htaccess file and apply one global file set by the server admin). Note that the file name must have two c's, not one (.htaccess, not `.htacess').
You can check what headers are e.g. using Rex Swain’s HTTP Viewer.
The reason why things work on other browsers is that they apply the modern HTML5 principle “BOM wins them all”. That is, an HTTP header wins a meta tag in specifying the character encoding, but if the actual data begins with three bytes that constitute the UTF-8 encoded form of the Byte Order Mark (BOM), then, no matter what, the data will be interpreted as UTF-8 encoded. For some unknown reason, IE 10 does not do that (and neither does IE 11).
But this won’t be a problem if you just make the server send an HTTP header that declares UTF-8.
If the server has been set to declare windows-1251 and you cannot possibly change that, then you just need to live with it. Transcode your HTML files to windows-1251 then, and declare windows-1251 in a meta tag. This means that if you need any characters outside the limited repertoire representable in windows-1251, you need to represent them using character references.
perhaps because your 'o' in 'content' is not an ascii 'o'. notice that it is not red in Stackoverflow? i then copied it to a good text editor and see that it is indeed not an o. because the 'o' is not really an ascii 'o', that whole line probably should get ignored in every web browser, which should then depend on what default charset it uses. Microsoft and IE is notorious for picking bad defaults, thus is my reason why it doesn't work in IE. ;)
but codingaround has good advice too. it's best to put quotes around your attribute values. but that should not break a web browser.
you should use a doctype at the start:
<!DOCTYPE html>
<html lang='ru'>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
but the real culprit is your content and charset problem. notice my line. mine is very different. ;) that's the problem. note that mine has two ascii 'o's, one in "Content-Type" and another in 'content='.
As Shawn pointed out, copy and paste this:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
This is a really good example of how non-Ascii letters that look like English Ascii letters can really mess things up!
Maybe you forgot changing cоntent=text/html; to cоntent="text/html";
As Shawn has already pointed out, it could also be content="text/html; charset=utf-8".
But as you have tried both things out, can you confirm if the IE10 output looks like this?
I can't really help further with this, as the only thing I have here is an IE 10 online emulator.
So far the possible problems are:
Different o character
I see, that the <meta> tag is still outside of <head>, put it in place
Problems with IE handling the content and charset attributes

How to set the "Content-Type ... charset" in the request header using a HTML link

I have a simple HTML-page with a UTF-8 encoded link.
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<a charset='UTF-8' href='http://server/search?q=%C3%BC'>search for "ü"</a>
</body>
</html>
However, I don't get the browser to include Content-Type:application/x-www-form-urlencoded; charset=utf-8 into the request header. Therefore I have to configure the web server to assume all requests are UTF-8 encoded (URIEncoding="UTF-8" in the Tomcat server.xml file). But of course the admin won't let me do that in the production environment (WebSphere).
I know it's quite easy to achieve using Ajax, but how can I control the request header when using standard HTML links? The charset attribute doesn't seem to work for me (tested in Internet Explorer 8 and Firefox 3.5)
The second part of the required solution would be to set the URL encoding when changing an IFrame's document.location using JavaScript.
This is not possible from HTML on.
The closest what you can get is the accept-charset attribute of the <form>. Only Internet Explorer adheres that, but even then it is doing it wrong (e.g., CP-1252 is actually been used when it says that it has sent ISO-8859-1). Other browsers are fully ignoring it and they are using the charset as specified in the Content-Type header of the response.
Setting the character encoding right is basically fully the responsibility of the server side. The client side should just send it back in the same charset as the server has sent the response in.
To the point, you should really configure the character encoding stuff entirely from the server side on. To overcome the inability to edit the URIEncoding attribute, someone here on Stack Overflow wrote a (complex) filter: Detect the URI encoding automatically in Tomcat. You may find it useful as well (note: I haven't tested it).
Noted should be that the meta tag as given in your question is ignored when the content is been transferred over HTTP. Instead, the HTTP response Content-Type header will be used to determine the content type and character encoding. You can determine the HTTP header with for example Firebug, in the Net panel.

€ symbol rendering as €2

Unusual problem here: I have an app that uses a text file which contains a few '€' symbols as well as other text in a text file to populate a mysql database. When rendered locally, the € symbol looks fine, but on the linux server and out on the web in html, it looks like this in some browsers:
€2
can anyone suggest a solution
Set the charset in the headers or a <META> element to UTF-8 so that it isn't processed as CP1250.
Use an UTF-8 encoding type on your file and make sure you add a content-type meta tag to your page:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
Hope this helps !
If you are viewing your text (.txt) file as a text and not HTML in browsers window, setting
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
will not do the job as you are dealing with text file,
so tags will not be "hidden", plus it may potentially (even most likely) send garbage to mysql database you are trying to populate (e.g. by auto-harvesting posted online file).
So, if in browser window instead of:
€ 123.39
you are seeing
€2 123.39
problem is not with quality of your text file, but with the way browser handles encoding.
If you need to copy and paste displayed file and "€2" is in the way,
try simply setting your browser default encoding to unicode (UTF-8).
In FF you want to do it here:
Tools-> Options-> Content (tab)-> Fonts&Colors-> Advanced-> Default Char. Encoding
Once there select UTF-8 encoding.
Remember thou that sometimes page reload may not be enough to see changes, due to browser cache. In such case, restart your browser.