In an effort to learn more about font rendering/encoding I'm more curious as to why when I copy and paste the emojis 😇🐵🙈 into a blank <html> page and simply save the .html file locally on my machine, or even start a local php server and serve files with the above emojis in there, they either show up as some weird characters (😇ðŸµðŸ™ˆ) or blank, respectively. Yet I know that when I type them straight into this very stack overflow ask textarea, they will render correctly in my browser, and be displayed as intended when viewing this page.
My understanding is that since mac osx now ships with the correct emoji fonts, they should be rendered as just that. So where is the disconnect between the HTML page you're looking at right now, and the local one I saved on my computer?
And recommended reading would be appreciated! :) errr.... 😀
When a web server sends a file to a browser, it will send a set of HTTP headers as well, relaying information about the content type, caching, etc. The content-type header also informs the browser which encoding was used:
Content-Type: text/html; charset=utf-8
If your open that file locally then your browser only gets the file and it has to guess the encoding. You can declare the encoding in the HTML head:
<meta charset="utf-8">
Related
I have an old web site in French tha I want to preserve and whose html files were encoded in iso-8859-1. All html files included
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
in the <head> element, however the host of my website changed something in the configuration an now pages are sent from their server with an HTTP header including
content-type: text/html; charset=UTF-8
and unfortunately someone decided this would override the <meta> information.
Do I have to trans-code all my html files to UTF-8 or is there a faster solution?
Update
In fact the charset was added to the http header's content-type field only for html content issued by php, not for pure html files. I'll put the solution I adopted as an answer.
Your options:
Transcode the files
Persuade whomever changed the server configuration to change it again
Change servers
Run all every request through a server side script which outputs a different Content-Type header and then outputs the HTML (which accounting for cache-control headers)
Took me a while to realize the problem occurs only for .php files. The fix I chose is the following: I added the line
ini_set('default_charset', NULL);
at the beginning of every php files. A bit tedious but seems reasonable to me.
My html document starts as follows:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
אבגד
If I encode my document as UTF-8, it appears correctly in the browser. If I encode as UTF-8 without BOM (which I understand is more standard) I get unusual characters.
What am I doing wrong?
Your web server is declaring that the encoding is ISO-8859-1, and the browser is respecting that. Ironically enough, using a byte order mark sends a stronger signal to the browser that the encoding must actually be UTF-8. (The exact reason for this is complicated and boring.)
Fixing your web server depends on what the server is. If this is a static resource on disk served by Apache httpd, then something like AddCharset UTF-8 .html will add the header.
If this resource is served dynamically, then you should make sure you add the proper HTTP headers when producing the response, something like self.send_header('Content-Type', 'text/html; charset=utf-8') for Python's basic http server.
I have a simple HTML-page with a UTF-8 encoded link.
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<a charset='UTF-8' href='http://server/search?q=%C3%BC'>search for "ü"</a>
</body>
</html>
However, I don't get the browser to include Content-Type:application/x-www-form-urlencoded; charset=utf-8 into the request header. Therefore I have to configure the web server to assume all requests are UTF-8 encoded (URIEncoding="UTF-8" in the Tomcat server.xml file). But of course the admin won't let me do that in the production environment (WebSphere).
I know it's quite easy to achieve using Ajax, but how can I control the request header when using standard HTML links? The charset attribute doesn't seem to work for me (tested in Internet Explorer 8 and Firefox 3.5)
The second part of the required solution would be to set the URL encoding when changing an IFrame's document.location using JavaScript.
This is not possible from HTML on.
The closest what you can get is the accept-charset attribute of the <form>. Only Internet Explorer adheres that, but even then it is doing it wrong (e.g., CP-1252 is actually been used when it says that it has sent ISO-8859-1). Other browsers are fully ignoring it and they are using the charset as specified in the Content-Type header of the response.
Setting the character encoding right is basically fully the responsibility of the server side. The client side should just send it back in the same charset as the server has sent the response in.
To the point, you should really configure the character encoding stuff entirely from the server side on. To overcome the inability to edit the URIEncoding attribute, someone here on Stack Overflow wrote a (complex) filter: Detect the URI encoding automatically in Tomcat. You may find it useful as well (note: I haven't tested it).
Noted should be that the meta tag as given in your question is ignored when the content is been transferred over HTTP. Instead, the HTTP response Content-Type header will be used to determine the content type and character encoding. You can determine the HTTP header with for example Firebug, in the Net panel.
i wrote some html with utf-8 charset.
in the head of the html there is also a
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
everything works fine in local, but when i upload files to the server, i see all my letters
àèìòù etc
distorted.
anybody know how could it be the problem? is possible that the server force a charset that isn't utf-8?
thanks a lot
Try saving the actual file with utf-8 encoding. That did the trick for me.
I use PHPStorm as editor: File->File Encoding->utf-8
Actually the META tag is not all you need for correct UTF-8 encoding. Your server might still send the page as Content-Type: text/html; charset=ISO-8859-1 in the header of the page.
You can check the headers e.g. with the Live HTTP Headers Firefox add-on.
There is a lot of secret sauce with UTF-8 encoding and making it work, you might want to go through this page (UTF-8: The Secret of Character Encoding) which explains everything you need to know and gives you advice on how to solve encoding problems.
To answer your question: Yes it is possible to force the server to use UTF-8, e.g. by using the PHP headers() function like so:
header('Content-Type:text/html; charset=UTF-8');
I have an odd problem in IE. It has to do with how IE detects the encoding of an iframe based on its parent content. My application wraps the content of a page in an iframe, and sets the encoding of the parent window to UTF-8 through the Content-Type header. The content of the iframe does not set the encoding through the Content-Type, and picks up the parent window's encoding on its initial load. This is the desired behavior - the content window requires the UTF-8 encoding for some language content, but for complicated reasons beyond my control, it cannot forcibly set its own encoding, so it relies on the parent window's encoding.
The problem arises when the content page is the target of a form action. When the form submits and the page loads in the content window, it auto-selects Western European (Windows) encoding. Does anyone know why? I've tried searching for any sort of documentation on related behavior, but the googles, they do nothing. Any sort of a lead (beyond sending a Content-Type header or a byte-order mark in the content) would be most helpful.
I unfortunately don't have a public place to host this, but copy-pasting these code samples to local files and saving each with UTF-8 encoding without a byte-order mark should consistently reproduce the behavior in all versions of IE.
frame1.html
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<div>エンコード</div>
<iframe src="frame2.html"></iframe>
frame2.html
<form>
<input value="エンコード">
<input type="submit">
</form>
To recap with the example, if you load the page and check the encoding of both the parent and the iframe, you should see "Auto-Select" checked and "UTF-8" selected in both. If you hit Submit in the iframe, the frame will reload and the input text will be garbled. Checking the encoding of the iframe should still show "Auto-Select" checked, but now "Western European (Windows)" will be selected instead of "UTF-8". I need to know if there is anything else I can do to make it automatically preserve the UTF-8 encoding when the form action completes.
Thanks in advance!
When you say you cannot add a Content-Type header/BOM, are you able to add the Content-Type as a meta tag? Something like:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Recently I had very similar issues - IE auto-detecting Western European at all times, except when a certain popup window navigated to the page, which then caused IE to pick UTF-8. I was never able to track down exactly what caused it (the resulting page was identical, only the page that linked to it was different!), so we ended up fixing it by forcing UTF-8 across the entire application (with headers).
If you're really unable to modify the inner page in any way, is it possible you could "replace" this page with your own, and then send the content over to the "other" server via an API or HTTP POST where you wouldn't need to worry about IE's "auto-detecting"?