Should I default my website to www.foo or not? - subdomain

Notice how the default domain for stackoverflow is http://stackoverflow.com and if you try to goto http://www.stackoverflow.com it bounces you to http://stackoverflow.com ?
What is the reason for this? Not the tech reason (as in the http code, etc) but why would the site owners want to do this?
I know it's purely aesthetic and I always have host-headers for both www and not, but is there a reason to bounce a user to a single domain, subheaded or not?
Update 1
Not having a subdomain is called a bare domain. Thanks peeps! never knew it had a term :)
Update 2
Thanks for the answers so far - please note I understand that www.domain.com can point to domain.com. This is not a question about if i should offer both or either/or, it's asking why some sites default to a baredomain instead of www subdomains, or vice-versa. Cheers.
Jeff Atwood actually HAS explained why he's gone for bare domains here and here. (Nod to Jonas Pegerfalk for the post :) )
Jeff's post (and others in this thread) also talk about the problems of a bare domain with cookies and static images. Basically, if you have cookies on in a bare domain, then all subdomains are forced also. The solution is to purchase another domain, as posted by the Yahoo Perf Team here.

Jeff Atwood has written a great article about the The Great Dub-Dub-Dub Debate. There is also a blog entry in the Stackoverflow blog on why and how Stackoverflow has dropped the www prefix.

as far as I can tell, it doesn't really matter, but you should pick one or the other as the default, and forward to that.
the reason is that, depending on the browser implementation, www.example.com cookies are not always accessible to example.com (or is it the other way around?)
for more discussion on this, see:
in favor of www
http://faq.nearlyfreespeech.net/section/domainnameservice/baredomain#baredomain - This webhost lists several good reasons for anyone considering doing more than simple webhosting to consider (such as load balancing, subdomains with different content, etc.)
http://yes-www.org - This blog post from 2005 mainly proposed that most internet users needed the www prefix in order to recognize a URL. This is less important now that browsers have built-in searching. Most computer illiterates I know bypass the URL bar entirely.
in opposition to www
http://no-www.org/
and a miscellaneous related rant about why www should not be used as a CNAME, but only as an A RECORD.
http://member.dnsstuff.com/rc/index.php?option=com_myblog&task=view&id=62&Itemid=37

It is worth noting that you can't have CNAME and an NS record on the same (bare domain) name in DNS. So, if you use a CDN and need to set up a CNAME record for your web server, you can't do it if you are using a bare domain. You must use "www" or some other prefix.
Having said that, I prefer the look of URLs without the "www." prefix so I use a bare domain for all my sites. (I don't need a CDN.)
When I am mentioning URLs for the general public (eg. on a business card), I feel that one has to use either the www. prefix or the http:// prefix. Otherwise, just a bare domain name doesn't tell people they can necessarily type it into their browser. So, since I consider http:// an ugly wart on a business card, I do use the www. prefix there.
What a mess.

In some cases, www might indeed point to a completely separate subdomain in a large corporate environment. Especially on an internal network, having the explicit www can make split DNS easier if the Web site is hosted externally (say, at Rackspace in Texas, but everything else is in your office in Virginia.) In most cases, it doesn't matter.
The important thing is to pick one and add an IHttpModule, rewrite rule, or equivalent for your platform to permanently redirect requests from one to the other.
Having both can lead to scary certificate warrnings when switching from http to https if you don't have a wildcard certificate and forget to explicitly redirect based on your site's name (which you probably don't because you want your code to work in both dev and production, so you're using some variable populated by the server).
Much more importantly, having both accepted results in search engines seeing duplicated content--you get dinged for having duplicated content, and you get dinged because your hits are getting split across two different URIs, hurting your rankings.

actually you can use both of them. So it's better to find user your address or some. I mean actually it doesn't really matter tho :)
But putting www as a prefix is more common in public so I guess I'd prefer to use www behind it.

It's easier to type google.com than www.google.com, so give the option of both. remember, the www is just a subdomain.
Also no www is a commonplace these days, so maybe make the www.foo.com redirect to foo.com.

I think one reason is to help with search rankings so that for each page only one page is getting rankings instead of being split between two domains.

I'm not sure why the StackOVerflow team decided to use only one, but if it were me, I'd do it for simplicity. You'd have to allow for both since a lot of people type www by default or out of habit (I'm sure less "techy" people have no idea that there's a difference).
Aside from that, there used to be a difference as far as search engines were concerned and so there was concern about having either a duplicate content penalty or having link reputation split. But this has long since been handled and so isn't much of a consideration at this point.
So I'd say it's pretty much personal preference to keep things simple.

Related

What is the best way to point a toplevel domain to an url?

i have the following situation, due to the IT departement at our university, i had three choices to point a top level domain to the content hosted at the university server:
Redirect
use frames
use a reverse proxy
i know frames are deprecated and suck, however getting a server where i can set up a reverse proxy sounded like a bit of an overkill and redirect was not an option, as the dirty url of the webapp server would appear in the addressbar.
So, when i looked up the site in Chrome, i got the message that the site contains unsafe content, opening the console told me that the "unsafe content" are the Google Webfonts i included in the page. All other browsers worked just fine...
Does anyone have an elegant solution for this? I'm not really happy with using frames in the first place.
Thank you guys in advance, cheers!
I will of course provide all the config files/code snippets needed!!
The best and cleanest option in this case would be reverse proxy with URL rewriting (if you don't like the webapp's ones). If you post your endpoints we'd be able to prepare ones for you. Or check whatever tutorial (for e.g. this, this or this)
One important thing which nobody mentions is to use the ProxyPreserveHost directive if on other end you deal with headers processing.
Also you may consider Forward Proxy instead of Reverse one, its easier to configure.
Complete reference here.
Mixing http and https, may be the cause of the unsafe content error. Be sure you are loading the pages and the fonts with the same protocol.
As for pointing your domain, I like the virtual host solution above. If your IT department says that everything else is "impossible", you might be stuck with frames. :)

What's the point of oEmbed API endpoints and URL schemes vs. link tags?

The oEmbed specification mentions 2 different ways of finding the oEmbed content of an URL:
Knowing the API endpoint of the website and passing it, through a GET parameter, the URL you want info about, if it matches the URL pattern it declared.
Discovering the URL of the oEmbed version thanks to a <link rel="alternate" type="application/json+oembed" ... /> (or text/xml+oembed) HTML header.
The 2nd ways seems more generic, as you don't have to store and maintain a whole list of providers. Moreover, lists of providers are the sign of a centralized internet, where only a few actors exist. This approach is hardly scalable.
I can see a use for the 1st approach, though, for websites that can parse resources made available by someone else. For example, I can provide an oEmbed version of video pages from website Foo. However, for several reasons, mainly security-related, I wouldn't trust someone who says "I can parse resource X for you" unless X's author is OK with that, which brings us back to approach 2.
So my question is: what did I miss here? What's the use of the 1st method of dealing with oEmbed? For instance, why store (and maintain up-to-date) a whole list of endpoints and patterns like oohEmbed does if you have a generic way of discovering it on-the-fly and for virtually any resource on the internet?
As a very closely related question, which I think may be asked at the same time (please correct me if I'm wrong): what happens if one doesn't provide a central endpoint for oEmbed contents, but rather, say, expect a '?version=oembed' parameter on each URL, that returns the oEmbed version instead of the standard one?
If I recall correctly, supporting both mechanisms was a compromise that we figured would help drive adoption. It's much easier to persuade large web properties to add a single endpoint vs. adding markup (that's irrelevant to most clients) to every response body. It was a pragmatic choice.
Longer term we planned to leverage some of the work Eran Hammer-Lahav was doing around discovery rather than re-inventing it (poorly, again). Unfortunately, his ideas still haven't gotten much traction and the web still lacks a good, standardized way to do this sort of thing.
I was hoping to find an answer here but it looks like everyone else is as confused as we are. The advantage of using option 1 in my opinion is that it only uses 1 json request instead of a potentially expensive html request followed by the json request. You can always use option 2 as a fallback in case you can't match a pattern in your pre-baked list of oEmbed providers.
OEmbed discovery is a major security concern. WordPress for example has a whitelist of supported OEmbed providers.
Suppose that every random URL at the internet can trigger an OEmbed code. That means everyone can hack your site.
Steps:
Create a new site, add an OEmbed discovery.
Post the URL to a form at your site. Now your site perform the OEmbed on my behalf.
Exploit:
by denial-of-service (DOS): e.g. redirect the URL to a tarpit or feed it a 1GB json response.
by cross site scripting (XSS): inject random HTML to pages that other people can see.
by stealing the admin's session-cookie via XSS: now the attacker can login to your CMS to upload files, and exploit even more.
It's XSS to the max, with little to stop it. The only sane thing to do, is whitelisting proper endpoints. That's the oEmbed endpoints are explicitly listed.
If you want something scalable, you might like www.noembed.com and www.embedly.com They provide OEmbed support for various sites which don't do OEmbed themselves.

How should web sites deal with localization settings? (from “What are common UI misconceptions and annoyances?”)

I’ve chosen to take this as a question in its own right since it was generating so much debate in the comments of the original post.
It’s interesting to see that a lot of people on SO (who are developer's) just don't get localization. Here’s my take on how it should work:
In all browsers that I've looked at (and for the .NET developers out there too) when you look at a user's culture preferences it is in the following format:
language-Culture.
So we have:
en-GB - English language - UK culture
en-US - English language - US culture
en - English language - Invariant culture.
fr-FR – French language – French culture
fr-CH – French language – Swiss culture
de-CH – German language – Swiss culture
de-DE – German language – German culture
See MSDN for a complete list that the .NET framework supports.
When I go to a website it knows that I want the English language from the en part and it knows I’m interested in it being slanted to the UK (number formatting, date formatting). So when I go to google.com and it takes me to google.de (because of my IP address) that’s completely fine if google.de displays everything to me in English but completely wrong since google.de is in German. I have little control over my IP address but complete control over my language and culture settings. If you’re interested Microsoft’s new search engine (bing.com) handles things properly. Let's hope Microsoft can learn how to do search as well as Google or Google can learn to localize as well as Microsoft ;)
MSDN has another good article here for more information
So what are your recommendations for how sites should deal with localizations?
The solution here is so simple, it's annoying that dev's do anything else.
Respect the browser setting. If it says English then by god it's English.
If you absolutely must, then simply add a button at the top to pick something else. Then, and ONLY then, do you override the browser.
If you think your way is better. Stop, have someone slap you. It's not. Repeat as necessary.
Get rid of those web splash pages that ask for someone's country. Just show your normal page, based off of the browser defaults, and see item 2 above. I have yet to run into a site where it actually matters. update: a few years later and there is now a reason to do this. In 2013 the UK instituted policies surrounding cookies that website operators need to respect for sites based in that country that are serving pages to visitors from that country. So pay attention to the laws in the countries you are hosted in.
IF you happen to have a site that really is served by multiple servers across multiple countries, then you can probably detect which one of your servers is really closer to serve from. If you can't, just stop the redirecting madness and then don't try and make a determination for them.
If localization settings are available - including, but not limited to, the HTTP Accept-Language header - then websites absolutely should respect them.
The common argument against this is that "average users" aren't smart enough to find the language settings and configure them to match their own preferences, so these settings are, more often than not, incorrect (unless the user happens to be within the US).
That is the wrong solution.
If a substantial segment of the user population can't find (or can't be bothered to find) their browser's language settings, then the correct response is to make them easier to find, not for sites to ignore what they've been set to. Perhaps make language settings directly accessible from the program's top level menu instead of burying it inside an over-complicated "Preferences" dialog. Perhaps ask for language preferences the first time the program is run. Perhaps use the operating system's localization settings. Or maybe something completely different, if that's what it takes to make it near-certain that the browser will be sending correct information about the user's preferences. But don't just throw up your hands, say "it's useless and can't be fixed!", and ignore it.
Other answers have talked about letting the user choose a language or locale in their profile on the site, which is also important and absolutely should be standard, but that's just to provide a site-specific override to the user's normal settings. If the user has not overriden this on the site, though, the correct action is to default to the most-preferred available language/locale as specified in their browser settings, not to base it on geolocation of their IP address.
At one point in my career, I maintained parts of TCP/IP stack. That puts me in the somewhat rare position of knowing very well that IP addresses should not be used as anything other than Network-layer addresses. Any association between an IP address and a location is all but coincidental - it's an artifact of the way addresses are distributed, not any fundamental part of what an IP address means.
(They're also not useful as the unique identifier of a computer, but that's a different story)
I suggest leaving geolocation out of it. The HTTP standard includes a way for a browser or other user agent to include the users culture preferences with each request (and remember, it's a list of weighted preferences, not necessarily just one culture). Since the browser is closer to the user than you are, you should honor this request, at least as the default.
It's ok to then permit the user to change their preference for your site, either temporarily or permanently. It's even ok to allow the user to choose to view different content with different culture settings. A wild example would be a site that includes both political news and technical information. It's quite reasonable that someone would want the news in their "natural" language, but the technical information in English.
Finally, it's ok to have a fallback pattern. If, for instance, you have a site that services users based on their region (resellers, for instance), then it's possible that Japanese content only exists on your Asian regional sub-site. A Japanese-speaking user visiting your EMEA site might just be stuck seeing English content, which might very well be his last choice.
On the sites I create I usually follow this pattern:
Each page has a unique URL with the language in it somewhere, usually like /en/page or a different (sub)domain
If the user opens a URL with an unspecified language like /page I start to guess:
Is a cookie from a previous session is available?
If not, is Accept-Language available and can I map it to a language available on the site?
If not, if it's a possibility, can I guess by IP?
If not, default to the site's default language.
I set a cookie with the guessed language and redirect the user to a site with the appropriate URL
I put a language switch on every page, so /en/page can easily be switched to /xx/page
Cookie gets updated if the user switches to a different page
Ideally I only have to guess once and from then on use the user's cookie, or the user visits the desired page directly.
I agree, give the user the chance to override them with user preferences in your app. This is especially handy for things like timezone localization issues which you can't derive from browser settings.
I risk being considered impolite, but I think my post on this topic will have more informative answers, mostly because my post is really a question. I am sorry though that I did not find that post before.
There's a difference between smart defaults and the ability of users to override them. In big apps I've worked on, I've assumed the user's locale from browser settings, geolocation, etc. -- but always given users a way to easily switch.
I don't know how else one would do that. Not giving users a chance to correct your assumptions is deeply problematic, because you're going to get it wrong some of the time.
ADDITION:
I think your problem here is that while you can edit your locale settings, if they look basically identical to the default, there's no way for an application developer to tell if you left it as-is intentionally, or because you don't know how or why to change it.
I suggest honoring users' localization settings, except if the setting is the overwhelming default, which users may not change. For example, I believe the great majority (90+%) of users with an en-us setting geolocated in Vietnam would almost always be better served by seeing Vietnamese content, rather than US English content, as long as there's a trivial way to switch locales. On the flip side, if a user geolocated in the US has a Vietnamese setting, by all means give him or her Vietnamese content.
Is this irritating for US-English users in Vietnam? Sure. But it's also the greatest good for the greatest number, and helps ensure that average non-technical users get the best real-world experience. Until we can hold a gun to users' heads and force them to honestly declare their language/culture preferences before turning on a computer, we're going to need heuristics like this.
I have seen enough forceful bug reports from customers that when investigated turn out to be that one of there users had the browser's culture setting wrong, that we now let the customer override the browsers with a config setting. The browser's culture setting is wrong often enough that is it not very useful, it is also too hard for most end users to find or change it.

Safe to have page resources without file extensions?

I need to decide on naming conventions for a new website.
I can use mod_rewrite at will.
My favourite solution would be to work with no file extension at all.
www.exampledomain.com/language/pagename
this would lead to "pagename" being treated as a directory. I would have to take that into account when using relative links.
Are there any other pitfalls I need to be aware of when doing this?
Is this legal, or are resources supposed to have a "name.prefix" structure?
Do you know of any clients that can't deal with this and start looking for /index.htm or .html?
Can you think of any SEO problems to be expected?
Unless you have a very good reason to add an extension, drop it.
are resources supposed to have a "name.prefix" structure?
Not that I know of. Normally not. Resources are just a concept. A custom resource format may have that extension requirement, the other would not. It will depend.
As for SEO, the short a link is, the better. It will increase relative weight of keywords. An extension would make links longer by 4 characters or more.
Do you know of any clients that can't deal with this and start looking for /index.htm or .html?
A problem may arise if you decide to support multiple entry points.
www.exampledomain.com
www.exampledomain.com/index.html
www.exampledomain.com//index.htm
www.exampledomain.com/index
These are all different urls to search engines. Some people will be linking to you with the shortest name, the others will use the other version. Then ultimately there will be different inbound links pointing to your site start page which will essentially be the same. Search engines will detect it and see it as content duplication. Consequently, your page rank will be divided between several url versions. Finally, all except one will likely be dropped out of their index entirely. To deal with this situation, decide for one "true" url and let others perform 301 redirect (moved permanently) to the "correct" url.
Dropping extensions actually has the significant benefit of not tying you to a specific language. If your URLs are http://example.com/page.php and you switch to another language, you'll either lose the existing URLs (bad!) or have to fake the PHP extension (clunky).

Unlinked web pages on a server - security hole?

On my website, I have several html files I do not link off the main portal page. Without other people linking to them, is it possible for Jimmy Evil Hacker to find them?
If anyone accesses the pages with advanced options turned on on their Google toolbar, then the address will be sent to Google. This is the only reason I have can figure out why some pages I have are on Google.
So, the answer is yes. Ensure you have a robots.txt or even .htaccess or something.
Hidden pages are REALLY hard to find.
First, be absolutely sure that your web server does not return any default index pages ever. Use the following everywhere in your configuration and .htaccess files. There's probably something similar for IIS.
Options -Indexes
Second, make sure the file name isn't a dictionary word -- the odds of guessing a non-dictionary word fall to astronomically small. Non-zero, there's a theoretical possibility that someone, somewhere might patiently guess every possible file name until they find yours. [I hate these theoretical attacks. Yes, they exist. No, they'll never happen in your lifetime, unless you've given someone a reason to search for your hidden content.]
Your talking about security through obscurity (google it) and it's never a good idea to rely on it.
Yes, it is.
It's unlikely they will be found, but still a possibility.
The term "security through obscurity" comes to mind