Is There any New Rule Regarding Lang attribute in HTML Document? - html

I see many big sites are not using <html lang="en-US"> or anything similar. Lang is also not available in <Meta> tag. I checked few websites. hub.tutsplus.com and ReadWrite.com are not using any. Few are using as well.
So, I am concerned about:
Is there any effect of Lang attribute on SEO?
Is there any benefits of using lang attribute?
I read that Screen Reader and Search Engine can use this. Anything else? What will be the impact of missing lang attribute.
What will be better: <html lang="en-US"> or <html lang="en"> ?
Update:
I checked that ReadWrite is using Server header "Content-Language". So, Which is better?
Using server side or using lang attribute. From here, I know that lang takes precedence than Meta.
I am an Indian and I just write English (I meant, I cannot say I write US or UK English). In this case, will it be useful to use only lang="en" instead of lang="en-US"?
Does any have impact on performance?

Is there any effect of Lang attribute on SEO?
At least the W3C says that it is "assisting search engines" in a documentation for #lang in HTML 4.01. That information is most likely still valid.
What will be better: <html lang="en-US"> or <html lang="en"> ?
Both are valid according to BCP 47. Actually it is best practice to avoid language subtags unless they really provide useful information for applications that read or display the content. For example it might be, that the regional variants of certain languages recommend the use of different glyps when they are displayed. In that case it would be helpful to tell the application, which regional variant it is facing.
I am an Indian and I just write English (I meant, I cannot say I write US or UK English). In this case, will it be useful to use only lang="en" instead of lang="en-US"?
In your case it is certainly better to use lang="en".
I checked that ReadWrite is using Server header "Content-Language". So, Which is better?
The usage of lang="en" as an attribute to <html> is recommended, if the content really is in English. If you are not so sure about the language of the site's content and merely want to specify the language of your intended audiance, you might rather use a declaration in the HTTP header.
By the way, the W3C has a interesting article on Internationalization Best Practices, on which my answer is based.

Related

HTML5 html-tag and DOCTYPE

From what I've read, the correct way to start an HTML5 page is:
<!DOCTYPE html>
<html>
With nothing more in those lines. Is this true? (I'm asking because Visual Studio has more than that.)
(Also, I'm wondering if HTML5 is really the current standard or should I be using XHTML5 or some other version.)
According to the HTML living standard and the W3C spec, the doctype is the required preamble but required for legacy reasons. I quote:
A string that is an ASCII case-insensitive match for the string
"<!DOCTYPE".
One or more space characters.
A string that is an ASCII case-insensitive match for the string
"html".
Optionally, a DOCTYPE legacy string or an obsolete permitted DOCTYPE
string (defined below).
Zero or more space characters.
A U+003E GREATER-THAN SIGN character (>).
In other words, <!DOCTYPE html>, case-insensitively.
And <html></html> for a valid document
(Also, I'm wondering if HTML5 is really the current standard or should
I be using XHTML5 or some other version.)
It is not the current standard IMHO because it is not finished yet. But this article explains very well 10 reasons for using it now.
Mostly yes. But the HTML5 spec for the <html> element says
Authors are encouraged to specify a lang attribute on the root html
element, giving the document's language. This aids speech synthesis
tools to determine what pronunciations to use, translation tools to
determine what rules to use, and so forth.
so better, for a page whose content is in American English, would be
<!DOCTYPE html>
<html lang="en-us">
Also if you are using XHTML5 served as application/xhtml+xml you will need to add the namespace, and also the XML equivalent of the lang attribute making it:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us" xml:lang="en-us">
Yes it's true. No more complicated doctypes in HTML5. The new standard is simplified and there's only the one you said.
According to the HTML5 drafts, “A DOCTYPE is a required preamble”. The preamble <!DOCTYPE html> is recommended, but legacy doctypes are allowed as an alternative, though they “should not be used unless the document is generated from a system that cannot output the shorter string”. The only part that is required in addition to it is the title element, and even it may be omitted under certain conditions. The <html> tag is not required.
HTML5 is not a standard. It is not even a W3C recommendation (yet). What you should use depends on what you are doing. It does not really matter which version of HTML you think you are using. What matters is the markup you have and how browsers (and search engines etc.) process it.
Yes, that's correct as far as I know.

What is the difference between <html lang="en"> and <html lang="en-US">?

What is the difference between <html lang="en"> and <html lang="en-US">? What other values can follow the dash?
According to w3.org "Any two-letter subcode is understood to be a [ISO3166] country code." so does that mean any value listed under the alpha-2 code is an accepted value?
<html lang="en">
<html lang="en-US">
The first lang tag only specifies a language code. The second specifies a language code, followed by a country code.
What other values can follow the dash? According to w3.org "Any
two-letter subcode is understood to be a [ISO3166] country code." so
does that mean any value listed under the alpha-2 code is an accepted
value?
Yes, however the value may or may not have any real meaning.
<html lang="en-US"> essentially means "this page is in the US style of English." In a similar way, <html lang="en-GB"> would mean "this page is in the United Kingdom style of English."
If you really wanted to specify an invalid combination, you could. It wouldn't mean much, but <html lang="en-ES"> is valid according to the specification, as I understand it. However, that language/country combination won't do much since English isn't commonly spoken in Spain.
I mean does this somehow further help the browser to display the page?
It doesn't help the browser to display the page, but it is useful for search engines, screen readers, and other things that might read and try to interpret the page, besides human beings.
This should help :
http://www.w3.org/International/articles/language-tags/
The golden rule when creating language tags is to keep the tag as short as possible. Avoid region, script or other subtags except where they add useful distinguishing information. For instance, use ja for Japanese and not ja-JP, unless there is a particular reason that you need to say that this is Japanese as spoken in Japan, rather than elsewhere.
The list below shows the various types of subtag that are available. We will work our way through these and how they are used in the sections that follow.
language-extlang-script-region-variant-extension-privateuse
You can use any country code, yes, but that doesn't mean a browser or other software will recognize it or do anything differently because of it. For example, a screen reader might deal with "en-US" and "en-GB" the same if they only support an American accent in English. Another piece of software that has two distinct voices, though, could adjust according to the country code.
RFC 3066 gives the details of the allowed values (emphasis and links added):
All 2-letter subtags are interpreted as ISO 3166 alpha-2 country codes
from [ISO 3166], or subsequently assigned by the ISO 3166 maintenance
agency or governing standardization bodies, denoting the area to which
this language variant relates.
I interpret that as meaning any valid (according to ISO 3166) 2-letter code is valid as a subtag. The RFC goes on to state:
Tags with second subtags of 3 to 8 letters may be registered with
IANA, according to the rules in chapter 5 of this document.
By the way, that looks like a typo, since chapter 3 seems to relate to the the registration process, not chapter 5.
A quick search for the IANA registry reveals a very long list, of all the available language subtags. Here's one example from the list (which would be used as en-scouse):
Type: variant
Subtag: scouse
Description: Scouse
Added: 2006-09-18
Prefix: en
Comments: English Liverpudlian dialect known as 'Scouse'
There are all sorts of subtags available; a quick scroll has already revealed fr-1694acad (17th century French).
The usefulness of some of these (I would say the vast majority of these) tags, when it comes to documents designed for display in the browser, is limited. The W3C Internationalization specification simply states:
Browsers and other applications can use information about the language
of content to deliver to users the most appropriate information, or to
present information to users in the most appropriate way. The more
content is tagged and tagged correctly, the more useful and pervasive
such applications will become.
I'm struggling to find detailed information on how browsers behave when encountering different language tags, but they are most likely going to offer some benefit to those users who use a screen reader, which can use the tag to determine the language/dialect/accent in which to present the content.
XML Schema requires that the xml namespace be declared and imported before using xml:lang (and other xml namespace values)
RELAX NG predeclares the xml namespace, as in XML, so no additional declaration is needed.
Well, the first question is easy. There are many ens (Englishes) but (mostly) only one US English. One would guess there are en-CN, en-GB, en-AU. Guess there might even be Austrian English but that's more yes you can than yes there is.

Appropriate language tag for HTML page for language learners?

Suppose I'm writing a document about a certain language (Spanish, say) but it's written in English. The sort of thing you'd seen in a "Teach Yourself Spanish" book.
I could see tagging the page as a whole with either:
<html lang=en>...</html>
or:
<html lang=es>...</html>
Is there a best practice for such material?
The primary content in your example is in English and is targeted at English speaking visitors, so I would do
<html lang=en>...</html>
The lang attribute is typically there to inform clients (browsers, search engines) of the language of the primary content. Search engines will use this to help target their results better.
The recommended way (see WCAG 2.0 Guideline 3.1 and associated technology documents) is to specify the default language of a document, here using <html lang=en> or <html lang=en-US>, and to specify any differing language of any part using lang attributes in other elements, e.g. <span lang=es>¡Comprendo!</span>.
This is usually not very realistic in a case like this. You don’t want to add such markup by hand, and few authoring programs have handy tools for it. But if you e.g. write the text in MS Word, specify the language of parts there, using the tools of Word, to make spelling checks work well, then you could export the content into HTML format and edit it a lot, removing most of the markup generated by Word but preserving the lang attributes.
If you wish to style Spanish text differently from English text, then lang markup is useful and you might use it even if you need to type it by hand. But using class attributes would be slightly more cross-browser (some old browsers do not support language selectors in CSS).
Other than that, there is not much you can gain by using lang attributes. It is questionable whether you should use it at all if you would only use it on the <html> element, knowing that a very large part of the content is not in the specified language.
The W3 recomends something along the lines of:
<p title="Swedish"><a xml:lang="sv" lang="sv" href="index.sv.html">svenska</a></p>
as per http://www.w3.org/TR/i18n-html-tech-lang/#ri20050128.175100333

Consequences of not using a lang attribute in an html5 html tag

If at the start of my html5 doc I simply use:
<!DOCTYPE html><html>
how will my site be at a disadvantage for not using something like:
<!DOCTYPE html><html lang="en">
Good question, but I think most of what you want to know is simple reference:
http://www.w3.org/TR/html4/struct/dirlang.html
Some highlights:
Language information specified via the
lang attribute may be used by a user
agent to control rendering in a
variety of ways. Some situations where
author-supplied language information
may be helpful include:
Assisting search engines
Assisting speech synthesizers
Helping a user agent select glyph variants for high quality typography
Helping a user agent choose a set of quotation marks
Helping a user agent make decisions about hyphenation,
ligatures, and spacing
Assisting spell checkers and grammar checkers
I don't believe there are any important differences between HTML4 and HTML5 regarding this.

SEO Language information

I was wondering if defining your language in HTML is better for search enigines. For example, I've got a French site, then i've got three options:
1.) have faith that google can say my site is french
2.) define language in the HTML tag
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="fr">
3.) define language in a meta tag
<meta http-equiv="content-language" content="FR-fr" />
Which option you believe is best? Or which combination of options?
language (language (i.e.: french) != location market (i.e. france)) detection is done on a per page basis (not per site).
it detects the language via the words used on the page (and in the URL), it does not care about the HTML tag (1) and the meta tag (2).
(you can test via the language detect api what language google thinks your page is using http://code.google.com/apis/language/translate/v1/using_rest_langdetect.html )
i always go for the 1 page === 1 language approche. i always make sure that i only have one language per page (translating all of the navigation, making sure that other language content does not and can not show up on the page)
It's probably better to define anything you can and have the search engine disregard it, than to not and leave it completely up to chance. The more information you give them the better.
Usually, I stick with the 3rd one, the content-language meta tag, as described here.
probabay should be on webmasters.stackexchange.com but anyways
You should combine 2 and 3. Faith does not work with bots/spiders.
2.) define language in the HTML tag
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="fr">
3.) define language in a meta tag
<meta http-equiv="content-language" content="FR-fr" />
good luck
On top of defining your language in the html you can signup for the Google Webmaster tools. On that site you can set your geographic target to France (Site configuration -> Settings).
http://www.google.com/webmasters/tools
andrewk is right - use 2 or 3 , but take into account that in most cases the language will be detected by the robot it self automatically.