How should web sites deal with localization settings? (from “What are common UI misconceptions and annoyances?”) - language-agnostic

I’ve chosen to take this as a question in its own right since it was generating so much debate in the comments of the original post.
It’s interesting to see that a lot of people on SO (who are developer's) just don't get localization. Here’s my take on how it should work:
In all browsers that I've looked at (and for the .NET developers out there too) when you look at a user's culture preferences it is in the following format:
language-Culture.
So we have:
en-GB - English language - UK culture
en-US - English language - US culture
en - English language - Invariant culture.
fr-FR – French language – French culture
fr-CH – French language – Swiss culture
de-CH – German language – Swiss culture
de-DE – German language – German culture
See MSDN for a complete list that the .NET framework supports.
When I go to a website it knows that I want the English language from the en part and it knows I’m interested in it being slanted to the UK (number formatting, date formatting). So when I go to google.com and it takes me to google.de (because of my IP address) that’s completely fine if google.de displays everything to me in English but completely wrong since google.de is in German. I have little control over my IP address but complete control over my language and culture settings. If you’re interested Microsoft’s new search engine (bing.com) handles things properly. Let's hope Microsoft can learn how to do search as well as Google or Google can learn to localize as well as Microsoft ;)
MSDN has another good article here for more information
So what are your recommendations for how sites should deal with localizations?

The solution here is so simple, it's annoying that dev's do anything else.
Respect the browser setting. If it says English then by god it's English.
If you absolutely must, then simply add a button at the top to pick something else. Then, and ONLY then, do you override the browser.
If you think your way is better. Stop, have someone slap you. It's not. Repeat as necessary.
Get rid of those web splash pages that ask for someone's country. Just show your normal page, based off of the browser defaults, and see item 2 above. I have yet to run into a site where it actually matters. update: a few years later and there is now a reason to do this. In 2013 the UK instituted policies surrounding cookies that website operators need to respect for sites based in that country that are serving pages to visitors from that country. So pay attention to the laws in the countries you are hosted in.
IF you happen to have a site that really is served by multiple servers across multiple countries, then you can probably detect which one of your servers is really closer to serve from. If you can't, just stop the redirecting madness and then don't try and make a determination for them.

If localization settings are available - including, but not limited to, the HTTP Accept-Language header - then websites absolutely should respect them.
The common argument against this is that "average users" aren't smart enough to find the language settings and configure them to match their own preferences, so these settings are, more often than not, incorrect (unless the user happens to be within the US).
That is the wrong solution.
If a substantial segment of the user population can't find (or can't be bothered to find) their browser's language settings, then the correct response is to make them easier to find, not for sites to ignore what they've been set to. Perhaps make language settings directly accessible from the program's top level menu instead of burying it inside an over-complicated "Preferences" dialog. Perhaps ask for language preferences the first time the program is run. Perhaps use the operating system's localization settings. Or maybe something completely different, if that's what it takes to make it near-certain that the browser will be sending correct information about the user's preferences. But don't just throw up your hands, say "it's useless and can't be fixed!", and ignore it.
Other answers have talked about letting the user choose a language or locale in their profile on the site, which is also important and absolutely should be standard, but that's just to provide a site-specific override to the user's normal settings. If the user has not overriden this on the site, though, the correct action is to default to the most-preferred available language/locale as specified in their browser settings, not to base it on geolocation of their IP address.

At one point in my career, I maintained parts of TCP/IP stack. That puts me in the somewhat rare position of knowing very well that IP addresses should not be used as anything other than Network-layer addresses. Any association between an IP address and a location is all but coincidental - it's an artifact of the way addresses are distributed, not any fundamental part of what an IP address means.
(They're also not useful as the unique identifier of a computer, but that's a different story)
I suggest leaving geolocation out of it. The HTTP standard includes a way for a browser or other user agent to include the users culture preferences with each request (and remember, it's a list of weighted preferences, not necessarily just one culture). Since the browser is closer to the user than you are, you should honor this request, at least as the default.
It's ok to then permit the user to change their preference for your site, either temporarily or permanently. It's even ok to allow the user to choose to view different content with different culture settings. A wild example would be a site that includes both political news and technical information. It's quite reasonable that someone would want the news in their "natural" language, but the technical information in English.
Finally, it's ok to have a fallback pattern. If, for instance, you have a site that services users based on their region (resellers, for instance), then it's possible that Japanese content only exists on your Asian regional sub-site. A Japanese-speaking user visiting your EMEA site might just be stuck seeing English content, which might very well be his last choice.

On the sites I create I usually follow this pattern:
Each page has a unique URL with the language in it somewhere, usually like /en/page or a different (sub)domain
If the user opens a URL with an unspecified language like /page I start to guess:
Is a cookie from a previous session is available?
If not, is Accept-Language available and can I map it to a language available on the site?
If not, if it's a possibility, can I guess by IP?
If not, default to the site's default language.
I set a cookie with the guessed language and redirect the user to a site with the appropriate URL
I put a language switch on every page, so /en/page can easily be switched to /xx/page
Cookie gets updated if the user switches to a different page
Ideally I only have to guess once and from then on use the user's cookie, or the user visits the desired page directly.

I agree, give the user the chance to override them with user preferences in your app. This is especially handy for things like timezone localization issues which you can't derive from browser settings.

I risk being considered impolite, but I think my post on this topic will have more informative answers, mostly because my post is really a question. I am sorry though that I did not find that post before.

There's a difference between smart defaults and the ability of users to override them. In big apps I've worked on, I've assumed the user's locale from browser settings, geolocation, etc. -- but always given users a way to easily switch.
I don't know how else one would do that. Not giving users a chance to correct your assumptions is deeply problematic, because you're going to get it wrong some of the time.
ADDITION:
I think your problem here is that while you can edit your locale settings, if they look basically identical to the default, there's no way for an application developer to tell if you left it as-is intentionally, or because you don't know how or why to change it.
I suggest honoring users' localization settings, except if the setting is the overwhelming default, which users may not change. For example, I believe the great majority (90+%) of users with an en-us setting geolocated in Vietnam would almost always be better served by seeing Vietnamese content, rather than US English content, as long as there's a trivial way to switch locales. On the flip side, if a user geolocated in the US has a Vietnamese setting, by all means give him or her Vietnamese content.
Is this irritating for US-English users in Vietnam? Sure. But it's also the greatest good for the greatest number, and helps ensure that average non-technical users get the best real-world experience. Until we can hold a gun to users' heads and force them to honestly declare their language/culture preferences before turning on a computer, we're going to need heuristics like this.

I have seen enough forceful bug reports from customers that when investigated turn out to be that one of there users had the browser's culture setting wrong, that we now let the customer override the browsers with a config setting. The browser's culture setting is wrong often enough that is it not very useful, it is also too hard for most end users to find or change it.

Related

Are google's search results influenced by our data?

I have always wondered that.
For example, If I search for the term "composer" or "what is composer", it shows the php package manager. Why does it show programmer-related results? Obviously, it makes sense that it does that, since the results I get are much more relevant to me.
What if an aspiring composer googles that? What results will they get?
Another example is, if I enter the word "spring" to the search engine, it shows the spring framework, instead of, let's say, the season.
So, my question(s):
Does google actually use the data it collects to show relevant search results? (I am not talking about ads, but search results)
If yes, why doesn't incognito mode work?
How can I avoid google using other parameters, besides the very term I typed in, to affect the search results?
Yes. This is the very core of Google's business model. The same data that influences search results is also applied to ad placement (see their real-time bidding system); when you do searches, it's likely you will see ads about the same subjects fairly soon afterwards.
Incognito mode is a very limited form of anonymisation; it's really not very anonymous at all. If you visit a page in a browser that has some google-controlled element (e.g. Google Analytics, a CDN JS library, or a font), then shortly afterwards perform a google search, there will be very many points in common that allow google to match you as very likely the same person (e.g. your IP, time of day, recent similar requests, user agent string, window size, fonts available) even if it blocks cookies that would identify you explicitly. This form of fingerprinting is quite hard to avoid, though Safari is a lot better at it than Chrome. Tor provides much more robust anonymisation by normalising many fingerprintable elements, as well as hiding your IP.
That's difficult because making use of all this information will indeed lead to generally more relevant search results, so it's in Google's interests to use whatever it can (within technical and mostly legal limits). Tor will disconnect the search results from you, but it may instead provide you with results linked to whoever else might have been using the same Tor exit node as you recently, which might not be pleasant! The same would apply to using VPN services.

Best practice for email links that will set a DB flag?

Our business wants to email our customers a survey after they work with support. For internal reasons, we want to ask them the first question in the body of the email. We'd like to have a link for each answer. The link will go to a web service, which will store the answer, then present the rest of the survey.
So far so good.
The challenge I'm running into: making a server-side changed based on an HTTP GET is bad practice, but you can't do a POST from a link. Options seem to be:
Use an HTTP GET instead, even though that's not correct and could cause problems (https://twitter.com/rombulow/status/990684453734203392)
Embed an HTML form in the email and style some buttons to look like links (likely not compatible with a number of email platforms)
Don't include the first question in the email (not possible for business reasons)
Use HTTP GET, but have some sort of mechanism which prevents a link from altering the server state more than once
Does anybody have any better recommendations? Googling hasn't turned up much about this specific situation.
One thing to keep in mind is that HTTP is specifying semantics, not implementation. If you want to change the state of your server on receipt of a GET request, you can. See RFC 7231
This definition of safe methods does not prevent an implementation from including behavior that is potentially harmful, that is not entirely read-only, or that causes side effects while invoking a safe method. What is important, however, is that the client did not request that additional behavior and cannot be held accountable for it. For example, most servers append request information to access log files at the completion of every response, regardless of the method, and that is considered safe even though the log storage might become full and crash the server. Likewise, a safe request initiated by selecting an advertisement on the Web will often have the side effect of charging an advertising account.
Domain agnostic clients are going to assume that GET is safe, which means your survey results could get distorted by web spiders crawling the links, browsers pre-loading resource to reduce the perceived latency, and so on.
Another possibility that works in some cases is to treat the path through the graph as the resource. Each answer link acts like a breadcrumb trail, encoding into itself the history of the clients answers. So a client that answered A and B to the first two questions is looking at /survey/questions/questionThree?AB where the user that answered C to both is looking at /survey/questions/questionThree?CC. In other words, you aren't changing the state of the server, you are just guiding the client through a pre-generated survey graph.

browser request header "Accept-Language" does not send country

I am implementing i18n in my webapp and am in the testing phase at the moment. I am using java.util.Locale on the server side to pass the locale to the relevant APIs (date time etc) that consume the information. Here is my setup:
browser language has been set to "Hindi"
operating System country has been set to "India"
I send a request to the server expecting the "Accept-Language" header to be hi-IN but the value remains hi regardless of country setting on my OS ... actual value Accept-Language:hi;en-US,en;q=0.8,q=0.6
my webapp uses the incoming value in the request header and does i18n or l10n accordingly by loading the appropriate language translation from resource files
I have a test case where I manually send in new Locale("hi", "IN") to indicate language and country. This test case prints values in the correct language as I expect but since the value coming in from the request is only hi, I am unable to see the desired result.
Not sure why the browsers (Chrome and Firefox) do not support the language_country format for all entries in their selection. Is there anything I am missing?
Edit: I made a few corrections based on the answer by #pawel-dyda. To quote a part of his reponse
Your language tag should be hi-IN, which I believe should explain the odd behaviour.
The crux of the issue (the reason I am raising this question here) is that I am unable to get my browser to send the value hi-IN to the server in the Accept-Language header.
I think you're missing few things.
Regarding to second point, setting operating system country usually doesn't affect what web browser sends on its Accept-Language list. Usually, because I can give you the counter example: Safari on Mac OS X.
There is a slight chance that it has some effect on mobile web browsers, but I haven't performed any tests myself.
In regards to points 3 and 5... Well, you gave an example of Accept-Language list. Please take a closer look on it: it contains en-US, that is English (US). Your language tag should be hi-IN, which I believe should explain the odd behaviour.
I am not sure what you meant in point 4. Not knowing the implementation details, I can only guess that you're trying to load resource files (and judging by the locale format it would be Java properties...) as well as have some defaults for things like formatting.
For properties files usually (not always!) language alone is enough. But there is a problem with formatting.
Well, most of the times you will receive merely the language and you have no choice, but to accept this fact. There are two ways to mitigate this issue:
You can implement a user profile and let user choose his/her preferred UI language and formatting settings (it is best practice to keep those separated).
You can "guess" the most likely country. In case of Hindi, it's quite obvious what will be the result of guessing. It is a bit more complicated in case of for example German, which is used in Germany ("default"), Austria and Switzerland. There are obviously many more cases, if you want to find the aid in "guessing", CLDR is the best source of information.
The best approach is to actually implement locale settings in the user profile, but use smart guessing based on data taken from CLDR; basically you combine points 1 & 2.
And don't forget about fallback! That is locale fallback (going through the list in Accept-Language header until you find something that your application supports) and resource fallback (should you have messages_fr.properties, but no messages_fr_ca.properties, but the request came as fr-CA, it makes sense to return French translations from the prior file).
By the way: you can open Firefox about:config site. It has a setting named intl.accept_languages. I bet, that if you change its contents, you'll be able to send what you want. However, as I said it is useless, because users won't change their settings...

Is it possible to let the client choose the right translation of a page without scripting?

I have written a website for a local Go meeting in Berlin. It is translated into German, English and Chinese. Currently, I use the naming scheme index.<lang>.html for the three translations and a navigation bar on top to let the user choose.
Is it possible to use meta tags on the index.html (which currently is just a symlink) to let the user agent automagically redirect to the site with the right language if possible? I am interested in solutions that neither involve reconfiguring the server nor need java script to be enabled although the first one might be possible.
You can use HTTP content negotiation to select a version that best matches the language preference information that the browser sends. So it is possible without scripting, but you need to set things up in the server for the negotiation.
However, this is not very practical, because the language preference information cannot be relied on. It is mostly based on browser defaults, since few users even know about the relevant settings in the browser, still less set the appropriately.
Is it possible to use meta tags on the index.html (which currently is just a symlink) to let the user agent automagically redirect to the site with the right language if possible?
No.
If you want automatic selection, then you need to pay attention to the Accept header in the request. That needs server configuration or scripting.
Without it, the best you can have is links to the translations of the document which the user can select manually.

Should I default my website to www.foo or not?

Notice how the default domain for stackoverflow is http://stackoverflow.com and if you try to goto http://www.stackoverflow.com it bounces you to http://stackoverflow.com ?
What is the reason for this? Not the tech reason (as in the http code, etc) but why would the site owners want to do this?
I know it's purely aesthetic and I always have host-headers for both www and not, but is there a reason to bounce a user to a single domain, subheaded or not?
Update 1
Not having a subdomain is called a bare domain. Thanks peeps! never knew it had a term :)
Update 2
Thanks for the answers so far - please note I understand that www.domain.com can point to domain.com. This is not a question about if i should offer both or either/or, it's asking why some sites default to a baredomain instead of www subdomains, or vice-versa. Cheers.
Jeff Atwood actually HAS explained why he's gone for bare domains here and here. (Nod to Jonas Pegerfalk for the post :) )
Jeff's post (and others in this thread) also talk about the problems of a bare domain with cookies and static images. Basically, if you have cookies on in a bare domain, then all subdomains are forced also. The solution is to purchase another domain, as posted by the Yahoo Perf Team here.
Jeff Atwood has written a great article about the The Great Dub-Dub-Dub Debate. There is also a blog entry in the Stackoverflow blog on why and how Stackoverflow has dropped the www prefix.
as far as I can tell, it doesn't really matter, but you should pick one or the other as the default, and forward to that.
the reason is that, depending on the browser implementation, www.example.com cookies are not always accessible to example.com (or is it the other way around?)
for more discussion on this, see:
in favor of www
http://faq.nearlyfreespeech.net/section/domainnameservice/baredomain#baredomain - This webhost lists several good reasons for anyone considering doing more than simple webhosting to consider (such as load balancing, subdomains with different content, etc.)
http://yes-www.org - This blog post from 2005 mainly proposed that most internet users needed the www prefix in order to recognize a URL. This is less important now that browsers have built-in searching. Most computer illiterates I know bypass the URL bar entirely.
in opposition to www
http://no-www.org/
and a miscellaneous related rant about why www should not be used as a CNAME, but only as an A RECORD.
http://member.dnsstuff.com/rc/index.php?option=com_myblog&task=view&id=62&Itemid=37
It is worth noting that you can't have CNAME and an NS record on the same (bare domain) name in DNS. So, if you use a CDN and need to set up a CNAME record for your web server, you can't do it if you are using a bare domain. You must use "www" or some other prefix.
Having said that, I prefer the look of URLs without the "www." prefix so I use a bare domain for all my sites. (I don't need a CDN.)
When I am mentioning URLs for the general public (eg. on a business card), I feel that one has to use either the www. prefix or the http:// prefix. Otherwise, just a bare domain name doesn't tell people they can necessarily type it into their browser. So, since I consider http:// an ugly wart on a business card, I do use the www. prefix there.
What a mess.
In some cases, www might indeed point to a completely separate subdomain in a large corporate environment. Especially on an internal network, having the explicit www can make split DNS easier if the Web site is hosted externally (say, at Rackspace in Texas, but everything else is in your office in Virginia.) In most cases, it doesn't matter.
The important thing is to pick one and add an IHttpModule, rewrite rule, or equivalent for your platform to permanently redirect requests from one to the other.
Having both can lead to scary certificate warrnings when switching from http to https if you don't have a wildcard certificate and forget to explicitly redirect based on your site's name (which you probably don't because you want your code to work in both dev and production, so you're using some variable populated by the server).
Much more importantly, having both accepted results in search engines seeing duplicated content--you get dinged for having duplicated content, and you get dinged because your hits are getting split across two different URIs, hurting your rankings.
actually you can use both of them. So it's better to find user your address or some. I mean actually it doesn't really matter tho :)
But putting www as a prefix is more common in public so I guess I'd prefer to use www behind it.
It's easier to type google.com than www.google.com, so give the option of both. remember, the www is just a subdomain.
Also no www is a commonplace these days, so maybe make the www.foo.com redirect to foo.com.
I think one reason is to help with search rankings so that for each page only one page is getting rankings instead of being split between two domains.
I'm not sure why the StackOVerflow team decided to use only one, but if it were me, I'd do it for simplicity. You'd have to allow for both since a lot of people type www by default or out of habit (I'm sure less "techy" people have no idea that there's a difference).
Aside from that, there used to be a difference as far as search engines were concerned and so there was concern about having either a duplicate content penalty or having link reputation split. But this has long since been handled and so isn't much of a consideration at this point.
So I'd say it's pretty much personal preference to keep things simple.