I have some content on a webpage which contains æ ø å, but my webview cant show them properly.
Does anyone know what the problem might be ?
In order to use UTF-8 characters inside an (X)HTML page you declare the encoding with this meta tag (in the head section of the page):
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
If that alone does not work you may be able to find more useful information here.
You need to ensure that the HTML file is saved as UTF-8 and that the Content-Type header in the HTTP response contains the proper charset. You can verify the headers by among others Firebug.
A <meta> tag for Content-Type would only work when the Content-Type header in the response is absent and this is usually not the case when the HTML file is served over HTTP. However, its presence is good for offline viewing and self-documentary purposes.
Related
My html document starts as follows:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
אבגד
If I encode my document as UTF-8, it appears correctly in the browser. If I encode as UTF-8 without BOM (which I understand is more standard) I get unusual characters.
What am I doing wrong?
Your web server is declaring that the encoding is ISO-8859-1, and the browser is respecting that. Ironically enough, using a byte order mark sends a stronger signal to the browser that the encoding must actually be UTF-8. (The exact reason for this is complicated and boring.)
Fixing your web server depends on what the server is. If this is a static resource on disk served by Apache httpd, then something like AddCharset UTF-8 .html will add the header.
If this resource is served dynamically, then you should make sure you add the proper HTTP headers when producing the response, something like self.send_header('Content-Type', 'text/html; charset=utf-8') for Python's basic http server.
I have the following html:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
</head>
<body>
会意字 / 會意字 huìyìzì
</body>
When I run it in firefox, it displays the Chinese characters just fine. How come it works with the ISO-8859-1 characterset? I thought you needed UTF-8?
I can't reproduce your successful rendering:
… but HTML 5 defines a fairly complex character encoding detection method which doesn't pay any attention to <meta> until step 9.
In general, you should avoid encodings other than UTF-8 and definitely should not lie about the encoding of the document.
The most probable explanation is that the document is in fact UTF-8 encoded and the browser treats it that way, despite the meta tag. According to HTML5 encoding sniffing algorithm, which largely reflects browser behavior, the meta tag is ignored if any of the following is true:
The user has instructed (via e.g. a View → Encoding command) the browser to use a specific encoding.
The page starts with bytes that represent the Byte Order Mark in UTF-8 or UTF-16. In practice, it starts that way if the file was saved in an editor with a command like “Save as UTF-8 (with BOM)”.
HTTP headers specify an encoding in a Content-Type header.
You can find out which of these is the cause by using e.g. Rex Swain’s HTTP viewer. It lets you see both the HTTP response headers and the actual data as bytes. Developer Tools in browsers have similar features.
I'm doing mailings, which contains html code.
If I check my HTML in IE or FF, everything looks great. But when I sent the mail, the characters become very weird:
In browser: Information générale
,In E-mail : Information g�n�rale
My HTML meta: <meta http-equiv="content-type" content="text/html; charset=UTF-8">
Clearly this has something to do with the encoding, but I don't get why it looks OK in a browser and not in the email...
I have other HTML emails (newsletters received from other persons) which use the same HTML meta, and those emails look just fine..
� is an indication that the browser/E-Mail client uses UTF-8 to render the document, but encountered an invalid character from a different encoding.
It isn't enough to set the content-type meta tag; your data actually needs to match the encoding you're declaring.
If it comes from a file, make sure the file is encoded as UTF-8 (usually, the editor will offer you a setting in the "Save as...." dialog.)
If it comes from a database, see UTF-8 all the way through
In the email you have one more place to add an encoding:
- The mail header
- the mine header
- if content is text/HTML in the HTML header
All of these need to be set right.
Sorry for short answer. Writing this on my cellphone.
I have a simple HTML-page with a UTF-8 encoded link.
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<a charset='UTF-8' href='http://server/search?q=%C3%BC'>search for "ü"</a>
</body>
</html>
However, I don't get the browser to include Content-Type:application/x-www-form-urlencoded; charset=utf-8 into the request header. Therefore I have to configure the web server to assume all requests are UTF-8 encoded (URIEncoding="UTF-8" in the Tomcat server.xml file). But of course the admin won't let me do that in the production environment (WebSphere).
I know it's quite easy to achieve using Ajax, but how can I control the request header when using standard HTML links? The charset attribute doesn't seem to work for me (tested in Internet Explorer 8 and Firefox 3.5)
The second part of the required solution would be to set the URL encoding when changing an IFrame's document.location using JavaScript.
This is not possible from HTML on.
The closest what you can get is the accept-charset attribute of the <form>. Only Internet Explorer adheres that, but even then it is doing it wrong (e.g., CP-1252 is actually been used when it says that it has sent ISO-8859-1). Other browsers are fully ignoring it and they are using the charset as specified in the Content-Type header of the response.
Setting the character encoding right is basically fully the responsibility of the server side. The client side should just send it back in the same charset as the server has sent the response in.
To the point, you should really configure the character encoding stuff entirely from the server side on. To overcome the inability to edit the URIEncoding attribute, someone here on Stack Overflow wrote a (complex) filter: Detect the URI encoding automatically in Tomcat. You may find it useful as well (note: I haven't tested it).
Noted should be that the meta tag as given in your question is ignored when the content is been transferred over HTTP. Instead, the HTTP response Content-Type header will be used to determine the content type and character encoding. You can determine the HTTP header with for example Firebug, in the Net panel.
i wrote some html with utf-8 charset.
in the head of the html there is also a
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
everything works fine in local, but when i upload files to the server, i see all my letters
àèìòù etc
distorted.
anybody know how could it be the problem? is possible that the server force a charset that isn't utf-8?
thanks a lot
Try saving the actual file with utf-8 encoding. That did the trick for me.
I use PHPStorm as editor: File->File Encoding->utf-8
Actually the META tag is not all you need for correct UTF-8 encoding. Your server might still send the page as Content-Type: text/html; charset=ISO-8859-1 in the header of the page.
You can check the headers e.g. with the Live HTTP Headers Firefox add-on.
There is a lot of secret sauce with UTF-8 encoding and making it work, you might want to go through this page (UTF-8: The Secret of Character Encoding) which explains everything you need to know and gives you advice on how to solve encoding problems.
To answer your question: Yes it is possible to force the server to use UTF-8, e.g. by using the PHP headers() function like so:
header('Content-Type:text/html; charset=UTF-8');