Setting two charsets? utf-8 and text/html - html

I'm new to learning HTML and learning about metadata in a webpage. It seems like people prefer you have to set the character set to support utf-8 and people are also saying have charset="text/html" so browsers know what kind of information they are receiving. How can I set both since it seems like both use the same attribute?

text/html is a media type, not a character set. The server can send a Content-Type: text/html; charset=utf-8 header to identify the content as HTML encoded as UTF-8. When you’re using a <meta> tag, though, the content is already known to be HTML and the only thing that matters is the charset, so HTML5 introduced the option to write <meta charset="utf-8"> as shorthand for <meta http-equiv="Content-Type" content="text/html; charset=utf-8">. (Older browsers also support this because people had a habit of forgetting quotes in the original.)
In short, if you’re using a <meta> tag, just write this:
<meta charset="utf-8">
and you’re done. The page is already HTML.

Related

The best way to define character set of an HTML5 Web Page?

So what is the best way to define? Thanks!
<meta charset='utf-8'>
or
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
or
<meta http-equiv='charset' content='utf-8'>
The first option, <meta charset='utf-8'>, is preferred, primarily because it's shortest.
Note that the charset declaration should be the first child of the <head>, before any user-controlled content. (in particular, before the <title>)
They are equivalent in HTML5, but I'd recommend using the first form, because it's shorter and easier to remember and was designed for backwards compatibility with older browsers, even in Internet Explorer 6 (see).
The best, and recommended, way is to specify the character encoding (“charset”) both in a Content-Type HTTP header and in a meta tag or, in the case of UTF-8, using a Byte Order Mark at the start. Note that any conflict between the HTTP header and a meta tag is resolved in favor of the HTTP header.
It is much less relevant which of the two forms of meta you use, but the shorter is safer (less opportunities for mistyping or copy error).
References: Specifying the document's character encoding in HTML5 CR and W3C page Character encodings.
The third tag mentioned in the question, <meta http-equiv='charset' content='utf-8'>, is invalid and has no effect. (The W3C validator says: “Bad value charset for attribute http-equiv on element meta.”)
the best way to define the Meta Charset follows:
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
The TAG above is the more complete.
http://www.w3.org/International/O-HTTP-charset.pt-br.php

What meta tags does HTML5 require?

Here it the start of my HTML5 web application:
<!DOCTYPE html>
<html>
<head>
<meta content='text/html; charset=utf-8' http-equiv='Content-Type'>
Is the meta content tag needed. Is HTML / UTF-8 a default?
I just removed the namespace in the html tag as this is not needed.
Was wondering if I can remove the meta tag here.
UTF-8
Yes; typically this is simply <meta charset='utf-8'> in HTML5, since the actual content-type is always determined by the corresponding HTTP header instead:
<!DOCTYPE html>
<html>
<head>
<meta charset='utf-8'>
You can continue using what you already have, but the content-type must be text/html followed by the character encoding for it to validate as HTML5. For simplicity, just go with the new recommended syntax. See the W3C HTML5 spec for details.
There are two distinct issues here: the content type (media type, MIME type), and the character encoding (“charset”). For the latter, see <meta charset="utf-8"> vs <meta http-equiv="Content-Type">. Note that there is no universal default for character encoding in HTML, and a meta tag is just one way of specifying the encoding and may be trumped by HTTP headers or BOM.
But the title of the question asks “Does HTML5 require content-type to be set?”, and the answer is that it does not require it to be set in the HTML document and it cannot be set in the HTML document. If some software parses a meta tag and inteprets it as having a specific meaning, it has already decided to process the document as an HTML document.
General Internet protocols specify how clients are informed of content types (in HTTP headers, e-mail message headers, etc.), and for an HTML document transmitted over HTTP, the server should announce the content type as text/html (or as a content type defined for genuine XHTML, if you want Draconian XML error processing and other serious consequences). Without such information, browsers will have to guess the content type, and they may guess wrong

Meta tags and language

I have a website writteng in greek.
<meta name="keywords" content="" /> can I use greek language in content or just in english?
If I can write them using greek chars, do I have to add anything to meta tag?
Thank you
You can use any character you want from the character set you are using. Just make sure that your web server send the correct character set headers in the HTTP request. You may also add a <meta> element specifying the charset, but it's not strictly necessary.
I use UTF-8 in these examples as I think it's a code charset and prefer to use it myself. It's on its way to take over as the standard charset on the web.
HTTP header which should be sent by the web server:
Content-Type: text/html; charset=utf-8
Optional <meta> element for your document:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

'charset=iso-8859-1' with <!DOCTYPE HTML> is throwing a warning

I just validated an HTML document using the W3C validator, and found that if I use:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
with:
<!DOCTYPE HTML>
It throws a warning Line 4, Column 72: Using windows-1252 instead of the declared encoding iso-8859-1.
However, it is fixed if I use:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
I don't really understand what is happening. Also, I don't even know how to use the DOCTYPE tag, I just copied and pasted one from around the web.
Why does this happen?
How should I use the DOCTYPE tag?
Changing the DOCTYPE is simply turning off the warning - it isn't actually fixing anything.
iso-8859-1 and windows-1252 are very similar encodings. They differ only in the characters associated with the 32 byte values from 0x80 to 0x9F, which in iso-8859-1 are mapped to control characters and in windows-1252 are mapped to some useful characters such as the Euro symbol.
The control characters are useless in HTML, and web authors often mistakenly declare iso-8859-1 and yet use one or more of those 32 values as if they were using windows-1252, so browsers when they see the iso-8859-1 charset being declared will automatically change this to be windows-1252.
The validator is simply warning you that this will happen. If you're not using any of the 32 byte values, then you can simply ignore the warning - it's not an error. If you are, and you genuinely want the iso-8859-1 interpretation of the byte values and not the windows-1252 interpretation, you are doing something wrong.
Again, this switching happens in browsers for any DOCTYPE, it's just that the HTML5 validator is being more helpful about what it is telling you than the HTML4 validator is.
A couple of points:
Any HTML5 validation should be taken with a grain of salt. The specification is still under active development, and not everything is set in stone.
You're using the HTML4 syntax for that meta tag. Try <meta charset="iso-8859-1">
That said, HTML validators don't serve that much purpose in this day and age.
But apparently the default for HTML4 was iso=8869-1. That said, the default charset for HTML5 is UTF-8.
More information about the HTML5 doctype can be found in this post by John Resig.
It throws a warning Line 4, Column 72: Using windows-1252 instead of the declared encoding iso-8859-1.
It means the file was saved with the encoding Windows-1252 on creation (AKA Western Windows 1252 or CP1252) and your charset declaration says "hey read this file with ISO 8859-1" when that's not the encoding the file has.
The meta charset exist for that reason. It exist to declare the encoding of the file you are sending/reading/using so when, for example a browser, reads the document it knows what encoding the file is using.
In detail, you have this charset declared:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
But the file you are validating is actually encoded in Windows-1252. How? Why? Check the text editor you are using and what encoding it is using to save files. If the editor can be configured to change the encoding, choose the one you want to use.
About HTML5
Using
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
or
<meta charset="iso-8859-1">
are both valid for HTML5. See <meta charset="utf-8"> vs <meta http-equiv="Content-Type">
The W3C validator offers options for which encoding the validator uses. You have specified encoding in your document, so you should see "Encoding: iso-8859-1" in the top block of information once the validator has been run.
To the right of that, there is a pull-down menu. Change the choice from "(detect automatically)" to "iso-8859-1 (Western European)". The validator will then use ISO 8859-1 instead of its own choice, and you will not receive the error.
Do the following:
ISO 8859-15. Yeah, -15, and it will work.
Don't place too much stock in the validators. There are typically too many Internet Explorer workarounds, particularly in the CSS content, that will trip up the validator. If your pages work in all browsers and your client is happy, it doesn't matter what some validator says.
If you are specifying the HTML5 doctype, then you should be consistent with the meta charset attribute. Try this though for your pages:
<!DOCTYPE HTML>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
</body>
</html>

Different Language In Website

I'm trying to write a website in Slovak language (central Europe). What I have done is put these two meta tags into the header:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2" />
<meta http-equiv="Content-Language" content="sk" />
The problem is all characters with diacritique are substituted with garbage characters (so the encoding I not working obviously). What to do?
Here is the whole beginning of the page:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sk" lang="sk">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta http-equiv="Content-Language" content="sk" />
When you save the file you have to make sure that it's created with the same encoding that you have specified in the meta tag.
I recommend that you use utf-8 instead of iso-8859-2. The unicode character set supports all characters in practically every language that exists (and even some that don't...).
There are two issues at work here.
Language
Character encoding
Language
The Content-Language HTTP header describes the natural language of the intended audience. This may not be the same as the language the document is actually written in. Use the lang attribute to describe that.
Character encoding
This allows you to represent the letters that you wish to use. You need to make sure that your text really does use the selected encoding and that the browser is informed that that is the encoding you are using.
Select a character encoding (UTF-8 is generally the best choice, it covers just about every character you could possibly want to use and saves you having to switch encodings for different languages), see http://www.w3.org/International/tutorials/tutorial-char-enc/
Ensure your editor saves using that encoding
Ensure your server specifies that it is using that encoding
Ensure that nothing mangles the encoding between the editor and server (such as by being inserted into a database that is configured to use a different encoding)
HTTP headers
NB: Your question mentions <meta http-equiv>. Real HTTP headers are the better place to specify this information, and they will override whatever your document claims. Make sure your server is configured correctly.
XHTML
XHTML complicates matters…
Use xml:lang in addition to lang
Don't use anything except UTF-8. If you do, then you must specify this in the XML declaration (and adding an XML declaration will trigger Quirks mode in some browsers).