What is the difference between these two and should I use both ? I want my website to be fully UTF-8.
In PHP:
<?php
header("Content-Type: text/html; charset=utf-8");
?>
And the meta tag in HTML:
<meta charset="utf-8">
The HTTP Content-Type header should always be set, it's the primary source for the browser to figure out what kind of document it's dealing with. Only if this header is missing will the browser try to find an HTML meta tag which gives it the same information as a fallback.
It makes sense to set both flags though, since you may save the HTML document to disk, in which case the HTTP header will be gone for anyone needing it later.
You can find the exact rules for how a browser determines the document's charset here: http://www.w3.org/TR/html5/syntax.html#determining-the-character-encoding
header("Content-Type: text/html; charset=utf-8");
is server side and depends upon the PHP script calling it before it will send the new page to the client.
<meta charset="utf-8">
The meta element has two uses: either to emulate the use of an HTTP response header, or to embed additional metadata within the HTML document. So the Meta Tag is the best way to have utf-8 on your site.
Related
I have an old web site in French tha I want to preserve and whose html files were encoded in iso-8859-1. All html files included
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
in the <head> element, however the host of my website changed something in the configuration an now pages are sent from their server with an HTTP header including
content-type: text/html; charset=UTF-8
and unfortunately someone decided this would override the <meta> information.
Do I have to trans-code all my html files to UTF-8 or is there a faster solution?
Update
In fact the charset was added to the http header's content-type field only for html content issued by php, not for pure html files. I'll put the solution I adopted as an answer.
Your options:
Transcode the files
Persuade whomever changed the server configuration to change it again
Change servers
Run all every request through a server side script which outputs a different Content-Type header and then outputs the HTML (which accounting for cache-control headers)
Took me a while to realize the problem occurs only for .php files. The fix I chose is the following: I added the line
ini_set('default_charset', NULL);
at the beginning of every php files. A bit tedious but seems reasonable to me.
I need to add response headers like X-Frame, Cache-control, Pragma etc directly into the html code, may be, using attributes in html elements?
It is for help pages which are directly coming from a directory via href link.
Is there any way to add headers to these htmls?
You can use meta to replicate some of these. Normally not the ideal solution, but look into the http-equiv attribute of meta tags. I believe a lot of these have been deprecated in newer browsers.
Examples:
<meta http-equiv="Cache-control" content="no-cache"/>
<meta http-equiv="X-Frame-Options" content="sameorigin"/>
<meta http-equiv="pragma" content="no-cache"/>
In short: no, you cannot. HTML files are the body of an HTTP response; the headers must come from the server. Anything you could embed in the HTML file would just become part of the body.
You can add something like this, if php execution is enabled on your web server:
<?php
http_response_code(your_response_code)
?>
rest-of-your-html-code
This will execute a php script that will set the response code.
I tested with my website in GTmetrix and it says
The following resources have a character set specified in a meta tag. Specifying a character set in a meta tag disables the lookahead downloader in IE8. To improve resource download parallelization, move the character set to the HTTP Content-Type response header.
Now it's like <meta http-equiv="Content-Type" content="text/html; charset=utf-8">.
How can I solve this problem? I am using HTML 5 and CSS3.
If your server is capable of running PHP, put the following code right at the top of your html file and rename it to whatever.php:
<?php
header('Content-Type: text/html; charset=utf-8');
?>
Here it the start of my HTML5 web application:
<!DOCTYPE html>
<html>
<head>
<meta content='text/html; charset=utf-8' http-equiv='Content-Type'>
Is the meta content tag needed. Is HTML / UTF-8 a default?
I just removed the namespace in the html tag as this is not needed.
Was wondering if I can remove the meta tag here.
UTF-8
Yes; typically this is simply <meta charset='utf-8'> in HTML5, since the actual content-type is always determined by the corresponding HTTP header instead:
<!DOCTYPE html>
<html>
<head>
<meta charset='utf-8'>
You can continue using what you already have, but the content-type must be text/html followed by the character encoding for it to validate as HTML5. For simplicity, just go with the new recommended syntax. See the W3C HTML5 spec for details.
There are two distinct issues here: the content type (media type, MIME type), and the character encoding (“charset”). For the latter, see <meta charset="utf-8"> vs <meta http-equiv="Content-Type">. Note that there is no universal default for character encoding in HTML, and a meta tag is just one way of specifying the encoding and may be trumped by HTTP headers or BOM.
But the title of the question asks “Does HTML5 require content-type to be set?”, and the answer is that it does not require it to be set in the HTML document and it cannot be set in the HTML document. If some software parses a meta tag and inteprets it as having a specific meaning, it has already decided to process the document as an HTML document.
General Internet protocols specify how clients are informed of content types (in HTTP headers, e-mail message headers, etc.), and for an HTML document transmitted over HTTP, the server should announce the content type as text/html (or as a content type defined for genuine XHTML, if you want Draconian XML error processing and other serious consequences). Without such information, browsers will have to guess the content type, and they may guess wrong
I have a simple HTML-page with a UTF-8 encoded link.
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<a charset='UTF-8' href='http://server/search?q=%C3%BC'>search for "ü"</a>
</body>
</html>
However, I don't get the browser to include Content-Type:application/x-www-form-urlencoded; charset=utf-8 into the request header. Therefore I have to configure the web server to assume all requests are UTF-8 encoded (URIEncoding="UTF-8" in the Tomcat server.xml file). But of course the admin won't let me do that in the production environment (WebSphere).
I know it's quite easy to achieve using Ajax, but how can I control the request header when using standard HTML links? The charset attribute doesn't seem to work for me (tested in Internet Explorer 8 and Firefox 3.5)
The second part of the required solution would be to set the URL encoding when changing an IFrame's document.location using JavaScript.
This is not possible from HTML on.
The closest what you can get is the accept-charset attribute of the <form>. Only Internet Explorer adheres that, but even then it is doing it wrong (e.g., CP-1252 is actually been used when it says that it has sent ISO-8859-1). Other browsers are fully ignoring it and they are using the charset as specified in the Content-Type header of the response.
Setting the character encoding right is basically fully the responsibility of the server side. The client side should just send it back in the same charset as the server has sent the response in.
To the point, you should really configure the character encoding stuff entirely from the server side on. To overcome the inability to edit the URIEncoding attribute, someone here on Stack Overflow wrote a (complex) filter: Detect the URI encoding automatically in Tomcat. You may find it useful as well (note: I haven't tested it).
Noted should be that the meta tag as given in your question is ignored when the content is been transferred over HTTP. Instead, the HTTP response Content-Type header will be used to determine the content type and character encoding. You can determine the HTTP header with for example Firebug, in the Net panel.