HTML5 page language, direction and encoding - html

What is the correct way of declaring a HTML5 page to be in Hebrew, RTL and utf-8 encoded? I haven't done it in a while, but I remember that in HTML4 it involved 3 or 4 tags and attributes that seemed redundant. Is it still the same?

<html dir="rtl" lang="he">
<head>
<meta charset="utf-8">
...
</head>
...
</html>

You need the following:
A <!doctype html> to indicate your page is HTML5.
An <HTML> tag with the following attributes:
dir="rtl"
lang="he"
Note: you may omit the ", or use ' instead.
A <meta> tag to declare the character encoding. You can choose one of the following:
<meta charset="UTF-8">
Note: you may omit the ", or use ' instead.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
This is the "legacy" way of declaring character encoding. It's still allowed within HTML5, but all modern browsers support the first variant so there is no need for this.
Note: you may omit the " for the http-equiv attribute, or use ' instead for all attributes.
If the browser encounters an UTF-8 byte order mark, it will treat an HTML5 file as UTF-8. This happens regardless of any character encoding declared using meta tags.
None of the tags, attributes and attribute values used here, or the DOCTYPE, are case sensitive.
Note: if the browser encounters a character encoding declaration, it will re-parse the document from the start using the specified encoding. You can put your encoding inside a Content-Type HTTP header so this won't be a problem.
Note also that the browser will only look for a character encoding declaration in the first 1024 bytes of a document.

You need these to create a HTML5 page with language as hebrew, direction as RTL, and utf-8 encoded
<!DOCTYPE html> For declaring it as a HTML5 page
<html dir="rtl" lang="he"> For direction and language
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> For utf-8

<html dir="rtl" lang="he">
not: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Does not work on "Chrome" and "Firefox" browsers.

Related

<meta charset=“utf-8”> or <meta http-equiv=“Content-Type”>, which should I use?

In order to define charset for HTML 5 doctype, which notation should I use?
Short:
meta charset="utf-8"
Long:
meta http-equiv="Content-Type" content="text/html; charset=utf-8"
See the section Specifying the document’s character encoding in the HTML 5.1 spec:
[…] using a meta element with a charset attribute or a meta element with an http-equiv attribute in the encoding declaration state.
So both ways are fine. (But don’t use both in the same document.)

Turkish characters does not display correctly

i have the following code. it contain Turkish content. but i get the results including special charecter. so please give solution for that.
html code
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
<META HTTP-EQUIV="content-language" CONTENT="TR" />
<title>test</title>
</head>
<body>
Tarihçe
</body>
</html>
i will get Tarih�e instaed of Tarihçe.
If you can use Turkish encoding below will be the meta tag
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-9" />
else code Tarihçe as
Tarihçe
Change the actual character encoding of the file to UTF-8, using whatever settings need to be used in the program you use to create and edit pages. The file is now in some 8-bit encoding, so the letter ç appears in the data as a byte that is not allowed in UTF-8; hence the � symbol (it indicates character-level data error).

Which meta tags before title tag in html with doctype 4.01 Transitional?

So my question is about which meta tags are placed before the title tag when set in the head tag in html with doctype 4.01 Transitional.
Here I give an example:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Index</title>
<meta name="keywords" content="whatsoever">
<meta name="description" content="ask stackoverflow">
<meta name="author" content="gugol">
<link rel="stylesheet" type="text/css" href="../css/nicestyles.css">
</head>
<body></body>
</html>
I think I should have the charset attribute first so that everything in the html document is read under the character encoding for the HTML document. But have some doubts about the others.
What would be just the right order??
Modern browsers let you specify the character encoding in a meta tag so that it is applied even to elements preceding it. However, such a tag should appear as early as possible according to HTML 4.01, clause 5.2.2 Specifying the character encoding. HTML5 CR clarifies further, in clause 4.2.5.5 Specifying the document's character encoding, that “the element containing the character encoding declaration must be serialized completely within the first 1024 bytes of the document”.
The point here is that unless the encoding has been specified in HTTP headers or by data interpretable as Byte Order Mark at the start of a document, the browser will scan some initial part, such as one kilobyte, of the document, then infer or guess the encoding from it, tentatively parsing it as Ascii data and recognizing a meta tag if there is one.
Other than this, no order restrictions are placed on the content of a head element, and there is no reason to expect that the order of meta elements would matter.
There is no right order. Your title and meta tags can be in any order and the results will be the same.

Why am I getting †character in plain HTML?

I have bound data to a repeater control in ab asp.net application. When the data is displayed in plain HTML text in the browser it comes with these characters: â€. I am unable to figure out the reason behind this.
What kind of html document type are you writing right now. If it's HTML 5, then check if you have <meta charset=utf8 /> under the <head> tag. If you are using <meta http-equiv="Content-Type" content="text/html; charset=utf8" /> under your <head> tag.
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
have you given this in your code?
http://ask-leo.com/why_do_i_get_odd_characters_instead_of_quotes_in_my_documents.html
Also look at the tables in the database to see what collation they are eg latin1_swedish_ci, utf8_general_ci
Use symbol " . Do not use ”
Correct example:
mozilla.com
Incorrect:
<a href=”http://www.mozilla.com”>mozilla.com</a>

Why smart quotes look garbled in html

This is my htmlcode
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="en-us" http-equiv="Content-Language" />
<meta content="text/html; charset=windows-1252" http-equiv="Content-Type" />
</head>
<body>
“”
</body>
</html>
This is what I see in firefox “â€
I wonder why.
To write characters outside the standard ASCII set (which is common among almost all encodings) without using HTML entities, you have to make sure that the file is actually saved using the encoding that you specify in the meta tag.
Your page is saved as UTF-8 (the default), which means that those characters will be wrong when the browser tries to decode them as Windows-1252.
You need to use the HTML entities for smart quotes: “ for the left, ” on the right.
More info here.