When should one use a 'www' subdomain? - subdomain

When browsing through the internet for the last few years, I'm seeing more and more pages getting rid of the 'www' subdomain.
Are there any good reasons to use or not to use the 'www' subdomain?

There are a ton of good reasons to include it, the best of which is here:
Yahoo Performance Best Practices
Due to the dot rule with cookies, if you don't have the 'www.' then you can't set two-dot cookies or cross-subdomain cookies a la *.example.com. There are two pertinent impacts.
First it means that any user you're giving cookies to will send those cookies back with requests that match the domain. So even if you have a subdomain, images.example.com, the example.com cookie will always be sent with requests to that domain. This creates overhead that wouldn't exist if you had made www.example.com the authoritative name. Of course you can use a CDN, but that depends on your resources.
Also, you then don't have the ability to set a cross-subdomain cookie. This seems evident, but this means allowing authenticated users to move between your subdomains is more of a technical challenge.
So ask yourself some questions. Do I set cookies? Do I care about potentially needless bandwidth expenditure? Will authenticated users be crossing subdomains? If you're really concerned with inconveniencing the user, you can always configure your server to take care of the www/no www thing automatically.
See dropwww and yes-www (saved).

Just after asking this question I came over the no-www page which says:
...Succinctly, use of the www subdomain
is redundant and time consuming to
communicate. The internet, media, and
society are all better off without it.

Take it from a domainer, Use both the www.domainname.com and the normal domainname.com
otherwise you are just throwing your traffic away to the browers search engine (DNS Error)
Actually it is amazing how many domains out there, especially amongst the top 100, correctly resolve for www.domainname.com but not domainname.com

There are MANY reasons to use the www sub-domain!
When writing a URL, it's easier to handwrite and type "www.stackoverflow.com", rather than "http://stackoverflow.com". Most text editors, email clients, word processors and WYSIWYG controls will automatically recognise both of the above and create hyperlinks. Typing just "stackoverflow.com" will not result in a hyperlink, after all it's just a domain name.. Who says there's a web service there? Who says the reference to that domain is a reference to its web service?
What would you rather write/type/say.. "www." (4 chars) or "http://" (7 chars) ??
"www." is an established shorthand way of unambiguously communicating the fact that the subject is a web address, not a URL for another network service.
When verbally communicating a web address, it should be clear from the context that it's a web address so saying "www" is redundant. Servers should be configured to return HTTP 301 (Moved Permanently) responses forwarding all requests for #.stackoverflow.com (the root of the domain) to the www subdomain.
In my experience, people who think WWW should be omitted tend to be people who don't understand the difference between the web and the internet and use the terms interchangeably, like they're synonymous. The web is just one of many network services.
If you want to get rid of www, why not change the your HTTP server to use a different port as well, TCP port 80 is sooo yesterday.. Let's change that to port 1234, YAY now people have to say and type "http://stackoverflow.com:1234" (eightch tee tee pee colon slash slash stack overflow dot com colon one two three four) but at least we don't have to say "www" eh?

There are several reasons, here are some:
1) The person wanted it this way on purpose
People use DNS for many things, not only the web. They may need the main dns name for some other service that is more important to them.
2) Misconfigured dns servers
If someone does a lookup of www to your dns server, your DNS server would need to resolve it.
3) Misconfigured web servers
A web server can host many different web sites. It distinguishes which site you want via the Host header. You need to specify which host names you want to be used for your website.
4) Website optimization
It is better to not handle both, but to forward one with a moved permanently http status code. That way the 2 addresses won't compete for inbound link ranks.
5) Cookies
To avoid problems with cookies not being sent back by the browser. This can also be solved with the moved permanently http status code.
6) Client side browser caching
Web browsers may not cache an image if you make a request to www and another without. This can also be solved with the moved permanently http status code.

There is no huge advantage to including-it or not-including-it and no one objectively-best strategy. “no-www.org” is a silly load of old dogma trying to present itself as definitive fact.
If the “big organisation that has many different services and doesn't want to have to dedicate the bare domain name to being a web server” scenario doesn't apply to you (and in reality it rarely does), which address you choose is a largely cultural matter. Are people where you are used to seeing a bare “example.org” domain written on advertising materials, would they immediately recognise it as a web address without the extra ‘www’ or ‘http://’? In Japan, for example, you would get funny looks for choosing the non-www version.
Whichever you choose, though, be consistent. Make both www and non-www versions accessible, but make one of them definitive, always link to that version, and make the other redirect to it (permanently, status code 301). Having both hostnames respond directly is bad for SEO, and serving any old hostname that resolves to your server leaves you open to DNS rebinding attacks.

Apart from the load optimization regarding cookies, there is also a DNS related reason for using the www subdomain. You can't use CNAME to the naked domain. On yes-www.org (saved) it says:
When using a provider such as Heroku or Akamai to host your web site, the provider wants to be able to update DNS records in case it needs to redirect traffic from a failing server to a healthy server. This is set up using DNS CNAME records, and the naked domain cannot have a CNAME record. This is only an issue if your site gets large enough to require highly redundant hosting with such a service.

As jdangel points out the www is good practice in some cookie situations but I believe there is another reason to use www.
Isn't it our responsibility to care for and protect our users. As most people expect www, you will give them a less than perfect experience by not programming for it.
To me it seems a little arrogant, to not set up a DNS entry just because in theory it's not required. There is no overhead in carrying the DNS entry and through redirects etc they can be redirected to a non www dns address.
Seriously don't loose valuable traffic by leaving your potential visitor with an unnecessary "site not found" error.
Additionally in a windows only network you might be able to set up a windows DNS server to avoid the following problem, but I don't think you can in a mixed environment of mac and windows. If a mac does a DNS query against a windows DNS mydomain.com will return all the available name servers not the webserver. So if in your browser you type mydomain.com you will have your browser query a name server not a webserver, in this case you need a subdomain (eg www.mydomain.com ) to point to the specific webserver.

Some sites require it because the service is configured on that particular set up to deliver web content via the www sub-domain only.
This is correct as www is the conventional sub-domain for "World Wide Web" traffic.
Just as port 80 is the standard port. Obviously there are other standard services and ports as well (http tcp/ip on port 80 is nothing special!)
Imagine mycompany...
mx1.mycompany.com 25 smtp, etc
ftp.mycompany.com 21 ftp
www.mycompany.com 80 http
Sites that don't require it basically have forwarding in dns or redirection of some-kind.
e.g.
*.mycompany.com 80 http
The onlty reason to do it as far as I can see is if you prefer it and you want to.

Related

best method of linking to outside of secured script (using SSL)

I have shopping cart script in our site that is setup to be secure (https with SSL certificate). I have links in the script leading to other parts of my site that are not secure (WordPress blog, etc).
In the secure site, if I have links that are not secure ( http ), it triggers a message to user in browser, alerting of unsecured links. If I put the outgoing links in the script as relative links, when the user clicks on them and goes outside of script, it keeps them in secure mode (which we don't want for other parts of our site).
Years ago, I remember having this issue. I think I got around it by using a HTTP Redirect for every outgoing link in the secure site. Using a HTTP Redirect, I would have https://www.example.com/outgoinglink1a redirect to http://www.example.com/outgoinglink1b in the HTTP Redirect. This way, I could put https://www.example.com/outgoinglink1a in the secure site, and when it was clicked, it would lead to http://www.example.com/outgoinglink1b
In modern times, how do I have links in the secure site that lead to other parts of the site that aren't secure, without triggering SSL Error Message to user when they are in Secure part of site? Is using some type of 301 redirect in .htaccess better? Is there another preferred or easier method (than using HTTP Redirects) for accomplishing this?
Thank you for any guidance.
You can use https-2-http redirects to the unsecured site to avoid browser warnings.
But for multiple reasons, safety being one of them, I would really advice against using http and https for the same domain, even if lot of big sites still do it. You would ether have to use different cookies for the secure and the normal site, or the one cookie u use for your shopping cart can't have the secure flag, in which case you really don't need https in my opinion. Also, you will never be able to implement HSTS.
You've already gone to the lengths bought a certificate and set up an https-server, now why not secure the whole site?
Update to answer your question in the comment:
That is of course a deal-breaker, if you rely on those and the hosts haven't implemented https yes (which they probably will sooner or later, or they are going to be out of business)
Depending on what they actually do, you maybe could proxy the request to those scripts and serve them from you https-enabled server. But I would really consider this last a resort.
The slowing down part is mostly just the handshake. If you enable session resumption there shouldn't be too much overhead to actually slow down your site. Make sure your TLS session cache is big enough and that the ticket lifetime is ample.
Of course, your mileage may vary. So make sure you test your https site before going online.
I heard of such horror stories as well, but I think most of the time it's probably due to faulty or at least sub-standard implementation. Make sure you redirect EVERY single http-request to https with the 301 status and you should be fine. For some months now enabling https should actually help with your Google pagerank.
To link to an external site (differnt FQDN) you don't have to implement any trickery to avoid browser warnings - that's just linking to a different site and has nothing to do with mixed content policies.

Make full site HTTPS / SSL? What performance / SEO issues & best practices still apply in 2012? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Note: There are existing question that look like duplicates (linked below) but most of them are from a few years ago. I'd like to get a clear and definitive answer that proves things either way.
Is making an entire website run in HTTPS not an issue today from a best practice and performance / SEO perspective?
UPDATE: Am looking for more information with sources, esp. around impact to SEO. Bounty added
Context:
The conversation came up when we wanted to introduce some buttons that spawn lightboxes with forms in them that collect personal information (some of them even allow users to login). This is on pages that make up a big portion of the site. Since the forms would need to collect and submit information securely and the forms are not on pages of their own, the easiest way we could see to make this possible was to make the pages themselves be HTTPS.
What I would like is for an answer that covers issues with switching a long running popular site to HTTPS such as the ones listed below:
Would a handshake be negotiated on every request?
Will all assets need to be encrypted?
Would browsers not cache HTTPS content, including assets?
Is downstream transparent proxies not caching HTTPS content, including assets (css, js etc.) still an issue?
Would all external assets (tracking pixels, videos, etc) need to have HTTPS version?
HTTPS and gzip might not be happy together?
Backlinks and organic links will always be HTTP so you will be 301'ing all the time, does this impact SEO / performance? Any other SEO impact of changing this sitewide?
There's a move with some of the big players to always run HTTPS, see Always on SSL, is this setting a precedent / best practice?
Duplicate / related questions:
Good practice or bad practice to force entire site to HTTPS?
Using SSL Across Entire Site
SSL on entire site or just part of it?
Not sure I can answer all points in one go with references, but here goes. Please edit as appropriate:
Would a handshake must be negotiated on every request?
No, SSL connections are typically reused for a number of consecutive requests. The overhead once associated with SSL is mostly gone these days. Computers have also gotten a lot faster.
Will all assets need to be encrypted?
Yes, otherwise the browser will not consider the entire site secure.
Would browsers not cache HTTPS content, including assets?
I do not think so, caching should work just fine.
Is downstream transparent proxies not caching HTTPS content, including assets (css, js etc.) still an issue?
For the proxy to cache SSL encrypted connections/assets, the proxy would need to decrypt the connection. That largely negates the advantage of SSL. So yes, proxies would not cache content.
It is possible for a proxy to be an SSL endpoint to both client and server, so it has separate SSL sessions with each and can see the plaintext being transmitted. One SSL connection would be between the proxy and the server, the proxy and the client would have a separate SSL connection signed with the certificate of the proxy. That requires that the client trusts the certificate of the proxy and that the proxy trusts the server certificate. This may be set up this way in corporate environments.
Would all external assets (tracking pixels, videos, etc) need to have HTTPS version?
Yes.
HTTPS and gzip might not be happy together?
Being on different levels of protocols, it should be fine. gzip is negotiated after the SSL layer is put over the TCP stream. For reasonably well behaved servers and clients there should be no problems.
Backlinks and organic links will always be HTTP so you will be 301'ing all the time, does this impact SEO?
Why will backlinks always be HTTP? That's not necessarily a given. How it impacts SEO very much depends on the SE in question. An intelligent SE can recognize that you're simply switching protocols and not punish you for it.
1- Would a handshake be negotiated on every request?
There are two issues here:
Most browsers don't need to re-establish a new connection between requests to the same site, even with plain HTTP. HTTP connections can be kept alive, so, no, you don't need to close the connection after each HTTP request/response: you can re-use a single connection for multiple requests.
You can also avoid to perform multiple handshake when parallel or subsequent SSL/TLS connections are required. There are multiple techniques explained in ImperialViolet - Overclocking SSL (definitely relevant for this question), written by Google engineers, in particular session resumption and false start. As far as I know, most modern browsers support at least session resumption.
These techniques don't get rid of new handshakes completely, but reduce their cost. Apart from session-reuse, OCSP-stapling (to check the certificate revocation status) and elliptic curves cipher suites can be used to reduce the key exchange overhead during the handshake, when perfect forward-secrecy is required. These techniques also depend on browser support.
There will still be an overhead, and if you need massive web-farms, this could still be a problem, but such a deployment is possible nowadays (and some large companies do it), whereas it would have been considered inconceivable a few years ago.
2- Will all assets need to be encrypted?
Yes, as always. If you serve a page over HTTPS, all the resources it uses (iframe, scripts, stylesheets, images, any AJAX request) need to be using HTTPS. This is mainly because there is no way to show the user which part of the page can be trusted and which can't.
3- Would browsers not cache HTTPS content, including assets?
Yes, they will, you can either use Cache-Control: public explicitly to serve your assets, or assume that the browser will do so. (In fact, you should prevent caching for sensitive resources.)
4- Is downstream transparent proxies not caching HTTPS content, including assets (css, js etc.) still an issue?
HTTP proxy servers merely relay the SSL/TLS connection without looking into them. However, some CDNs also provide HTTPS access (all the links on Google Libraries API are available via https://), which, combined with in-browser caching, allows for better performance.
5- Would all external assets (tracking pixels, videos, etc) need to have HTTPS version?
Yes, this goes with point #3. The fact that YouTube supports HTTPS access helps.
6- HTTPS and gzip might not be happy together?
They're independent. HTTPS is HTTP over TLS, the gzip compression happens at the HTTP level. Note that you can compress the SSL/TLS connection directly, but this is rarely used: you might as well use gzip compression at the HTTP level if you need (there's little point compressing twice).
7- Backlinks and organic links will always be HTTP so you will be 301'ing all the time, does this impact SEO?
I'm not sure why these links should use http://. URL shortening services are a problem generally speaking for SEO if that's what you're referring to.
I think we'll see more and more usage of HTTP Strict Transport Security, so more https:// URLs by default.

Is there any globaly unique identifier for a client machine accessible through the web browser?

Is there any way to identify a users machine through a browser without previously putting cookies in? Probably no access to Mac Address through the web right? Just thought I'd ask...
There is no such identity element, and even if there were, the nature of the HTTP protocol would not prevent it from being spoofed.
In short: No.
This was partly why Intel tried to have unique processor IDs a few years back, but that didn't ever take off. (Which is good as now we have multi-core machines.)
Just install a cookie on the box. IP address is no good because of Natting. Someday we'll have IPv6 to do this correctly.
You could retrieve an IP address, but it frequently wouldn't mean much (if anything). If you retrieve the IP address the client is using, you'll get a whole lot of them that are 192.168.*. If you retrieve the address your server sees, it won't match that, and you might easily see several (possibly hundreds or even thousands) of machines with the 'same' IP address.
If you put those two together, you'll get something that's unique for the moment, but is subject to change at any time. The client's local IP address may change when their DHCP lease expires and their global IP address may change anytime they reboot their router (unless they have a static IP address, which you mostly don't have any way of knowing).

Pros and Cons of a separate image server (e.g. images.mydomain.com)?

We have several images and PDF documents that are available via our website. These images and documents are stored in source control and are copied content on deployment. We are considering creating a separate image server to put our stock images and PDF docs on - thus significantly decreasing the bulk of our deployment package.
Does anyone have experience with this approach?
I am wondering about any "gotchas" - like XSS issues and/or browser issues delivering content from the alternate sub-domain?
Pro:
Many browsers will only allocate two sockets to downloading assets from a single host. So if index.html is downloaded from www.domain.com and it references 6 image files, 3 javascript files, and 3 CSS files (all on www.domain.com), the browser will download them 2 at a time, with the other blocking until a socket is free.
If you pull the 6 image files off onto a separate host, say images.domain.com, you get an extra two sockets dedicated to download your images. This parallelizes the asset download process so, in theory, your page could render twice as fast.
Con:
If you're using SSL, you would need to either get an additional single-host SSL certificate for images.domain.com or a wildcard SSL certificate for *.domain.com (matches any subdomain). Failure to do so will generate a warning in the browser saying the page contains mixed secure and insecure content.
You will also, with a different domain, not send the cookies data with every request. This can increase performance.
Another thing not yet mentioned is that you can use different web servers to serve different sorts of content. For example, your static content could be served via lighttpd or nginx while still serving your dynamic content off Apache.
Pros:
-load balancing
-isolating a different functionality
Cons:
-more work (when you create a page on the main site you would have to maintain the resources on the separate server)
Things like XSS is a problem of code not sanitizing input (or output for that matter). The only issue that could arise is if you have sub-domain specific cookies that are used for authentication.. but that's really a trivial fix.
If you're serving HTTPS and you serve an image from an HTTP domain then you'll get browser security alert warnings pop up when you use it.
So if you do HTTPS, you'll need to buy HTTPS for your image domain awell if you don't want to annoy the hell out of your users :)
There are other ways around this, but it's not particularly in the scope of this answer - it was just a warning!

Blocking access for a given geographic location

What is the most reliable way to prevent users from a geographic location to access a web available application?
I understand that IPs are related to geo positioning and I also understand that the most naive way is to get the HTTP request header IP address and take it from there.
It's obvious that naive methods, like the one described are extremely easy to bypass, specially using Proxies or VPNs.
So the question is: is there a 100% reliable way of determining a web user geo location? If not, what are the available options and what are the pros and cons on each of them?
The short answer is no. There is no way to 100% lock down the people from a specific geographic location because you can't guarantee the location of a user that reliably using an IP address. Even if you could, it can be faked through redirects.
There are ways to make it more difficult for people in a region to access the site, but the more restrictive you get with those approaches the more legitimate users you are likely to lock out. For example, turning off the server would give you 100% assurance that no one from China could hit it, but it would also give you 100% assurance that no one in the US could either.
Nothing in TCP/IP includes location data (other than what you can infer from routing tables or look up in a database), and nothing indicates whether a machine is acting "on behalf of" someone in another location.
So as you say, proxies and VPN, SSH port-forwarding, TOR, etc, can completely prevent your web app from knowing the physical location of the human being who's using your site. All you can look up, is the IP address of that last hop which is the TCP/IP connection and HTTP request you actually see.
The above techniques won't work if anyone is trying to hide their location from you by redirecting through relays in other countries.
I found this script to be an easy way to implement this:
https://www.blocked.com/
Country blocking is included in the free version, as is blocking of open proxy servers, anonymity networks, etc.
There is a database somewhere on the tubes named IP 2 Country which can tell where an IP is from.
It is of course not perfect but it can give you the country where the ip comes from.
There is also a method called SSN which is related to ip addresses. I don't know how it works however, and seems to be rather complicated. It is comonly used in ads to send you localised spam. For example if you live in Montreal, Canada, then the ad will display "Find singles from Montreal!". The ISP behind the person does have to support this service.
first - figure out what ip groups are assigned to the region then you could check with every request for the user's ip address. If it matches part of the region you want to block then send them to disney.com.
See if this helps you: IP Address Info
No, there's no fool-proof way of doing this.
There's plenty of related work going on at the IETF in the GeoPriv working group, where protocols are being designed (e.g. HELD) to allow entities to ask the network their own location, and also allow other authorised entities to request that information.
However the VPN issue still causes problems, to the extent that clients with VPN capability need to request their location information before the VPN is established.