Character entities in XHTML 1.0 metadata? - html

For my title, keywords and description fields on a page with an XHTML 1.0 Doctype what is the correct usage for non-ascii characters like – é etcetera? Do crawlers and social media sites read them better as:
<title>My webpage — a study in typesetting</title>
<meta name="description" content="A page about character entities in XHTML. Check my resumé if you don't believe me." />
OR:
<title>My webpage — a study in typesetting</title>
<meta name="description" content="A page about character entities in XHTML. Check my resumé if you don&apos;t believe me." />

You're best to stick to the native characters, but speficy the encoding type. for international charactersets, it's safe to stick with UTF-8:

If you declared your character set correctly in the HTTP header (and not only the HTML head), then it should make absolutely no difference.

Related

First site in Portuguese site Meta Tags?

I am wondering what the best way to develop a site that is going to be targeted at Brazil. Being from the US i have always used the meta information like below
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<meta http-equiv="content-language" content="en-US" />
Does any of this need to change?
I have seen in a few posts that you should do something like
<html lang="es">
and
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Does anyone have any experience with sites from outside the US and what type of meta data is need for all the special Spanish characters.
UPDATE
Taking some of the advice below i have realized that this is actually Portuguese not Spanish and have done some research and it looks most sites still use
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
And a combination of these
<html lang="pt-BR" xmlns="http://www.w3.org/1999/xhtml" xml:lang="pt-br">
<meta name="language" content="pt-br" />
<meta http-equiv="Content-Language" content="pt-br" />
You don’t need any special meta tags.
You can specify the language of the content and the character encoding of the document. But you can (and should) do this for every site, even those with English content.
Assuming HTML5:
You specify the language with the lang attribute. If the whole page is in a certain language, just add it to the html element:
<html lang="pt-BR"> <!-- for Brazilian Portuguese -->
You specify the character encoding with the meta element and its charset attribute:
<meta charset="utf-8"> <!-- for UTF-8 -->
(Of course you need to specify the actual encoding used; so only use the value utf-8 if your documents use UTF-8.)
Moving this comment block to answer for credit, as it appears to be accepted.
Your first problem is that they don't speak Spanish in Brazil. They speak Portuguese.
Just look at the code for some pages that are hosted in Brazil or Portugal? I can pretty much guarantee that they have all the stuff you are looking for completely in place.
The current answer (2022) for this question seems to be lang="pt".
The latest IANA Language Subtag Registry contains only one subtag per language, so there is no longer any ambiguity from pt versus pt-br.
Reference: https://www.w3.org/International/questions/qa-lang-2or3#answer
Registry: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
.

Korean is not recognized in HTML page

When I type Korean in my html code and open it through my browser, Korean is not recognized by the browser and prints some weird words. What shall I do?
There must be few mistakes you are making.
First, You should have a doctype specified on your HTML page. Use HTML5 doctype
<!DOCTYPE html>
Second, you should specify the character encoding of the document as well. So, add:
<meta charset="utf-8" />
In your head section. Or for a longer version with better cross browser compatibility use:
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
Also as Juhana said, your file must be saved with same encoding (i.e. Unicode UTF-8) to be able to store Unicode characters and display them.

Meta tags and language

I have a website writteng in greek.
<meta name="keywords" content="" /> can I use greek language in content or just in english?
If I can write them using greek chars, do I have to add anything to meta tag?
Thank you
You can use any character you want from the character set you are using. Just make sure that your web server send the correct character set headers in the HTTP request. You may also add a <meta> element specifying the charset, but it's not strictly necessary.
I use UTF-8 in these examples as I think it's a code charset and prefer to use it myself. It's on its way to take over as the standard charset on the web.
HTTP header which should be sent by the web server:
Content-Type: text/html; charset=utf-8
Optional <meta> element for your document:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

'charset=iso-8859-1' with <!DOCTYPE HTML> is throwing a warning

I just validated an HTML document using the W3C validator, and found that if I use:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
with:
<!DOCTYPE HTML>
It throws a warning Line 4, Column 72: Using windows-1252 instead of the declared encoding iso-8859-1.
However, it is fixed if I use:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
I don't really understand what is happening. Also, I don't even know how to use the DOCTYPE tag, I just copied and pasted one from around the web.
Why does this happen?
How should I use the DOCTYPE tag?
Changing the DOCTYPE is simply turning off the warning - it isn't actually fixing anything.
iso-8859-1 and windows-1252 are very similar encodings. They differ only in the characters associated with the 32 byte values from 0x80 to 0x9F, which in iso-8859-1 are mapped to control characters and in windows-1252 are mapped to some useful characters such as the Euro symbol.
The control characters are useless in HTML, and web authors often mistakenly declare iso-8859-1 and yet use one or more of those 32 values as if they were using windows-1252, so browsers when they see the iso-8859-1 charset being declared will automatically change this to be windows-1252.
The validator is simply warning you that this will happen. If you're not using any of the 32 byte values, then you can simply ignore the warning - it's not an error. If you are, and you genuinely want the iso-8859-1 interpretation of the byte values and not the windows-1252 interpretation, you are doing something wrong.
Again, this switching happens in browsers for any DOCTYPE, it's just that the HTML5 validator is being more helpful about what it is telling you than the HTML4 validator is.
A couple of points:
Any HTML5 validation should be taken with a grain of salt. The specification is still under active development, and not everything is set in stone.
You're using the HTML4 syntax for that meta tag. Try <meta charset="iso-8859-1">
That said, HTML validators don't serve that much purpose in this day and age.
But apparently the default for HTML4 was iso=8869-1. That said, the default charset for HTML5 is UTF-8.
More information about the HTML5 doctype can be found in this post by John Resig.
It throws a warning Line 4, Column 72: Using windows-1252 instead of the declared encoding iso-8859-1.
It means the file was saved with the encoding Windows-1252 on creation (AKA Western Windows 1252 or CP1252) and your charset declaration says "hey read this file with ISO 8859-1" when that's not the encoding the file has.
The meta charset exist for that reason. It exist to declare the encoding of the file you are sending/reading/using so when, for example a browser, reads the document it knows what encoding the file is using.
In detail, you have this charset declared:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
But the file you are validating is actually encoded in Windows-1252. How? Why? Check the text editor you are using and what encoding it is using to save files. If the editor can be configured to change the encoding, choose the one you want to use.
About HTML5
Using
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
or
<meta charset="iso-8859-1">
are both valid for HTML5. See <meta charset="utf-8"> vs <meta http-equiv="Content-Type">
The W3C validator offers options for which encoding the validator uses. You have specified encoding in your document, so you should see "Encoding: iso-8859-1" in the top block of information once the validator has been run.
To the right of that, there is a pull-down menu. Change the choice from "(detect automatically)" to "iso-8859-1 (Western European)". The validator will then use ISO 8859-1 instead of its own choice, and you will not receive the error.
Do the following:
ISO 8859-15. Yeah, -15, and it will work.
Don't place too much stock in the validators. There are typically too many Internet Explorer workarounds, particularly in the CSS content, that will trip up the validator. If your pages work in all browsers and your client is happy, it doesn't matter what some validator says.
If you are specifying the HTML5 doctype, then you should be consistent with the meta charset attribute. Try this though for your pages:
<!DOCTYPE HTML>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
</body>
</html>

Different Language In Website

I'm trying to write a website in Slovak language (central Europe). What I have done is put these two meta tags into the header:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2" />
<meta http-equiv="Content-Language" content="sk" />
The problem is all characters with diacritique are substituted with garbage characters (so the encoding I not working obviously). What to do?
Here is the whole beginning of the page:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sk" lang="sk">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta http-equiv="Content-Language" content="sk" />
When you save the file you have to make sure that it's created with the same encoding that you have specified in the meta tag.
I recommend that you use utf-8 instead of iso-8859-2. The unicode character set supports all characters in practically every language that exists (and even some that don't...).
There are two issues at work here.
Language
Character encoding
Language
The Content-Language HTTP header describes the natural language of the intended audience. This may not be the same as the language the document is actually written in. Use the lang attribute to describe that.
Character encoding
This allows you to represent the letters that you wish to use. You need to make sure that your text really does use the selected encoding and that the browser is informed that that is the encoding you are using.
Select a character encoding (UTF-8 is generally the best choice, it covers just about every character you could possibly want to use and saves you having to switch encodings for different languages), see http://www.w3.org/International/tutorials/tutorial-char-enc/
Ensure your editor saves using that encoding
Ensure your server specifies that it is using that encoding
Ensure that nothing mangles the encoding between the editor and server (such as by being inserted into a database that is configured to use a different encoding)
HTTP headers
NB: Your question mentions <meta http-equiv>. Real HTTP headers are the better place to specify this information, and they will override whatever your document claims. Make sure your server is configured correctly.
XHTML
XHTML complicates matters…
Use xml:lang in addition to lang
Don't use anything except UTF-8. If you do, then you must specify this in the XML declaration (and adding an XML declaration will trigger Quirks mode in some browsers).