How to make HTML character set take preference over browser text encoding? - html

My webpage has some chinese characters. When the browser text encoding is "Unicode" everything is fine. But when I change it to "Western" the chinese characters are getting messy.
I want the page to display in UTF-8 irrespective of the browser encoding. How to do it?
The response header received for the JSP has Content-Type: "text/html;charset=UTF-8". When I check the response in the network tab, it is proper(in UTF-8). Also JSP has
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Even with all these charset mentions, the browser text encoding is taking preference. Can this be overridden? Can the page always be in "UTF-8" regardless of the browser encoding?
Note: The browser I checked is Firefox.
Text boxes are pre-populated with chinese characters from the server.
This is when the browser text encoding is "Unicode".
document.charset is "UTF-8"
This is when the browser text encoding is "Western".
document.charset is "windows-1252"
Please help.

I want the page to display in UTF-8 irrespective of the browser encoding. How to do it?
You can't.*
Manually selecting an encoding in the browser's encoding menu is supposed to override anything that the web site is saying about what the encoding should be.
You can't prevent this, and neither should you.
Anyone forcing the browser to use an encoding that the web site doesn't support is acting on their own responsibility.
* well, apart from displaying all text in images. Or in a Flash movie. :)

Related

Why are Arabic and Chinese characters rendered correctly even without using meta charset="utf-8"?

I am very new to HTML. Recently I encountered the <meta charset="utf-8>" tag which ensures letters and characters are rendered properly in a browser.
But I was wondering why even if I do not specify UTF-8 all letters and characters are displayed perfectly anyway?
The page you're sending to the browser uses a specific character encoding (e.g. UTF-8). The browser must interpret the page in the correct encoding to read it correctly (i.e. as intended) and display the correct characters. There are several ways in which the browser determines what encoding to use, which it falls back to successively:
HTTP Content-Type header
HTML meta tags
any built-in heuristics
the browser/system default encoding
If the page displays correctly without an HTML meta tag, that means one of the other mechanisms caused the browser to choose to interpret the page as UTF-8. Probably your web server is outputting an HTTP Content-Type header, or your browser/system's default is UTF-8.
This is because the default character encoding for HTML5 is UTF-8.
See also this documentation:
The default character-set for HTML5 is UTF-8.
Example <meta charset="UTF-8">
The Unicode Consortium developed the UTF-8 and UTF-16 standards,
because the ISO-8859 character-sets are limited, and not compatible a multilingual environment.
The Unicode Standard covers (almost) all the characters, punctuations, and symbols in the world.
All HTML5 and XML processors support UTF-8, UTF-16, Windows-1252, and
ISO-8859.

HTML5 Encoding & Cyrillic

Something that made me curious - supposedly the default character encoding in HTML5 is UTF-8. However if I have a plain simple HTML file with an HTML5 doctype like the code below, I get:
"hello" in Russian: "ЗдраÑтвуйте"
In Chrome 33+, Safari 6, IE11, etc.
<!DOCTYPE html>
<html>
<head></head>
<body>
<p>"hello" in Russian is "здраствуйте"</p>
</body>
</html>
What gives? Shouldn't the browser utilize the UTF-8 unicode standard and display the text correctly? I'm using Coda which is set to save html files with UTF-8 encoding by default so that's not the problem.
The text data in the example is UTF-8 encoded text misinterpreted as window-1252 encoded. The reason is that the encoding has not been specified and browsers are forced to make a guess. To fix this, specify the encoding; see the W3C page Character encodings. Two simple ways that work independently of server settings, as long as the server does not send wrong encoding information in HTTP headers:
1) Save the file as UTF-8 with BOM (there is probably an option for this in your authoring program.
2) Add the following tag into the head part:
<meta charset=utf-8>
There is no single default encoding specified for HTML5. On the contrary, browsers are expected to make guesses when no encoding has been declared. This is a fairly complex process, described in 8.2.2.2 Determining the character encoding.
If you want to be sure which charset will be used by browser you must have in your page head
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
otherwise you are at the mercy of local settings and browser automation.

html looks weird in outlook, but ok in browser

I'm doing mailings, which contains html code.
If I check my HTML in IE or FF, everything looks great. But when I sent the mail, the characters become very weird:
In browser: Information générale
,In E-mail : Information g�n�rale
My HTML meta: <meta http-equiv="content-type" content="text/html; charset=UTF-8">
Clearly this has something to do with the encoding, but I don't get why it looks OK in a browser and not in the email...
I have other HTML emails (newsletters received from other persons) which use the same HTML meta, and those emails look just fine..
� is an indication that the browser/E-Mail client uses UTF-8 to render the document, but encountered an invalid character from a different encoding.
It isn't enough to set the content-type meta tag; your data actually needs to match the encoding you're declaring.
If it comes from a file, make sure the file is encoded as UTF-8 (usually, the editor will offer you a setting in the "Save as...." dialog.)
If it comes from a database, see UTF-8 all the way through
In the email you have one more place to add an encoding:
- The mail header
- the mine header
- if content is text/HTML in the HTML header
All of these need to be set right.
Sorry for short answer. Writing this on my cellphone.

€ symbol rendering as €2

Unusual problem here: I have an app that uses a text file which contains a few '€' symbols as well as other text in a text file to populate a mysql database. When rendered locally, the € symbol looks fine, but on the linux server and out on the web in html, it looks like this in some browsers:
€2
can anyone suggest a solution
Set the charset in the headers or a <META> element to UTF-8 so that it isn't processed as CP1250.
Use an UTF-8 encoding type on your file and make sure you add a content-type meta tag to your page:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
Hope this helps !
If you are viewing your text (.txt) file as a text and not HTML in browsers window, setting
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
will not do the job as you are dealing with text file,
so tags will not be "hidden", plus it may potentially (even most likely) send garbage to mysql database you are trying to populate (e.g. by auto-harvesting posted online file).
So, if in browser window instead of:
€ 123.39
you are seeing
€2 123.39
problem is not with quality of your text file, but with the way browser handles encoding.
If you need to copy and paste displayed file and "€2" is in the way,
try simply setting your browser default encoding to unicode (UTF-8).
In FF you want to do it here:
Tools-> Options-> Content (tab)-> Fonts&Colors-> Advanced-> Default Char. Encoding
Once there select UTF-8 encoding.
Remember thou that sometimes page reload may not be enough to see changes, due to browser cache. In such case, restart your browser.

IE loses automatic UTF-8 encoding in iframe form target

I have an odd problem in IE. It has to do with how IE detects the encoding of an iframe based on its parent content. My application wraps the content of a page in an iframe, and sets the encoding of the parent window to UTF-8 through the Content-Type header. The content of the iframe does not set the encoding through the Content-Type, and picks up the parent window's encoding on its initial load. This is the desired behavior - the content window requires the UTF-8 encoding for some language content, but for complicated reasons beyond my control, it cannot forcibly set its own encoding, so it relies on the parent window's encoding.
The problem arises when the content page is the target of a form action. When the form submits and the page loads in the content window, it auto-selects Western European (Windows) encoding. Does anyone know why? I've tried searching for any sort of documentation on related behavior, but the googles, they do nothing. Any sort of a lead (beyond sending a Content-Type header or a byte-order mark in the content) would be most helpful.
I unfortunately don't have a public place to host this, but copy-pasting these code samples to local files and saving each with UTF-8 encoding without a byte-order mark should consistently reproduce the behavior in all versions of IE.
frame1.html
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<div>エンコード</div>
<iframe src="frame2.html"></iframe>
frame2.html
<form>
<input value="エンコード">
<input type="submit">
</form>
To recap with the example, if you load the page and check the encoding of both the parent and the iframe, you should see "Auto-Select" checked and "UTF-8" selected in both. If you hit Submit in the iframe, the frame will reload and the input text will be garbled. Checking the encoding of the iframe should still show "Auto-Select" checked, but now "Western European (Windows)" will be selected instead of "UTF-8". I need to know if there is anything else I can do to make it automatically preserve the UTF-8 encoding when the form action completes.
Thanks in advance!
When you say you cannot add a Content-Type header/BOM, are you able to add the Content-Type as a meta tag? Something like:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Recently I had very similar issues - IE auto-detecting Western European at all times, except when a certain popup window navigated to the page, which then caused IE to pick UTF-8. I was never able to track down exactly what caused it (the resulting page was identical, only the page that linked to it was different!), so we ended up fixing it by forcing UTF-8 across the entire application (with headers).
If you're really unable to modify the inner page in any way, is it possible you could "replace" this page with your own, and then send the content over to the "other" server via an API or HTTP POST where you wouldn't need to worry about IE's "auto-detecting"?