Make full site HTTPS / SSL? What performance / SEO issues & best practices still apply in 2012? [closed]

Make full site HTTPS / SSL? What performance / SEO issues & best practices still apply in 2012? [closed] - html

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Note: There are existing question that look like duplicates (linked below) but most of them are from a few years ago. I'd like to get a clear and definitive answer that proves things either way.
Is making an entire website run in HTTPS not an issue today from a best practice and performance / SEO perspective?
UPDATE: Am looking for more information with sources, esp. around impact to SEO. Bounty added
Context:
The conversation came up when we wanted to introduce some buttons that spawn lightboxes with forms in them that collect personal information (some of them even allow users to login). This is on pages that make up a big portion of the site. Since the forms would need to collect and submit information securely and the forms are not on pages of their own, the easiest way we could see to make this possible was to make the pages themselves be HTTPS.
What I would like is for an answer that covers issues with switching a long running popular site to HTTPS such as the ones listed below:
Would a handshake be negotiated on every request?
Will all assets need to be encrypted?
Would browsers not cache HTTPS content, including assets?
Is downstream transparent proxies not caching HTTPS content, including assets (css, js etc.) still an issue?
Would all external assets (tracking pixels, videos, etc) need to have HTTPS version?
HTTPS and gzip might not be happy together?
Backlinks and organic links will always be HTTP so you will be 301'ing all the time, does this impact SEO / performance? Any other SEO impact of changing this sitewide?
There's a move with some of the big players to always run HTTPS, see Always on SSL, is this setting a precedent / best practice?
Duplicate / related questions:
Good practice or bad practice to force entire site to HTTPS?
Using SSL Across Entire Site
SSL on entire site or just part of it?

Not sure I can answer all points in one go with references, but here goes. Please edit as appropriate:
Would a handshake must be negotiated on every request?
No, SSL connections are typically reused for a number of consecutive requests. The overhead once associated with SSL is mostly gone these days. Computers have also gotten a lot faster.
Will all assets need to be encrypted?
Yes, otherwise the browser will not consider the entire site secure.
Would browsers not cache HTTPS content, including assets?
I do not think so, caching should work just fine.
Is downstream transparent proxies not caching HTTPS content, including assets (css, js etc.) still an issue?
For the proxy to cache SSL encrypted connections/assets, the proxy would need to decrypt the connection. That largely negates the advantage of SSL. So yes, proxies would not cache content.
It is possible for a proxy to be an SSL endpoint to both client and server, so it has separate SSL sessions with each and can see the plaintext being transmitted. One SSL connection would be between the proxy and the server, the proxy and the client would have a separate SSL connection signed with the certificate of the proxy. That requires that the client trusts the certificate of the proxy and that the proxy trusts the server certificate. This may be set up this way in corporate environments.
Would all external assets (tracking pixels, videos, etc) need to have HTTPS version?
Yes.
HTTPS and gzip might not be happy together?
Being on different levels of protocols, it should be fine. gzip is negotiated after the SSL layer is put over the TCP stream. For reasonably well behaved servers and clients there should be no problems.
Backlinks and organic links will always be HTTP so you will be 301'ing all the time, does this impact SEO?
Why will backlinks always be HTTP? That's not necessarily a given. How it impacts SEO very much depends on the SE in question. An intelligent SE can recognize that you're simply switching protocols and not punish you for it.

1- Would a handshake be negotiated on every request?
There are two issues here:
Most browsers don't need to re-establish a new connection between requests to the same site, even with plain HTTP. HTTP connections can be kept alive, so, no, you don't need to close the connection after each HTTP request/response: you can re-use a single connection for multiple requests.
You can also avoid to perform multiple handshake when parallel or subsequent SSL/TLS connections are required. There are multiple techniques explained in ImperialViolet - Overclocking SSL (definitely relevant for this question), written by Google engineers, in particular session resumption and false start. As far as I know, most modern browsers support at least session resumption.
These techniques don't get rid of new handshakes completely, but reduce their cost. Apart from session-reuse, OCSP-stapling (to check the certificate revocation status) and elliptic curves cipher suites can be used to reduce the key exchange overhead during the handshake, when perfect forward-secrecy is required. These techniques also depend on browser support.
There will still be an overhead, and if you need massive web-farms, this could still be a problem, but such a deployment is possible nowadays (and some large companies do it), whereas it would have been considered inconceivable a few years ago.
2- Will all assets need to be encrypted?
Yes, as always. If you serve a page over HTTPS, all the resources it uses (iframe, scripts, stylesheets, images, any AJAX request) need to be using HTTPS. This is mainly because there is no way to show the user which part of the page can be trusted and which can't.
3- Would browsers not cache HTTPS content, including assets?
Yes, they will, you can either use Cache-Control: public explicitly to serve your assets, or assume that the browser will do so. (In fact, you should prevent caching for sensitive resources.)
4- Is downstream transparent proxies not caching HTTPS content, including assets (css, js etc.) still an issue?
HTTP proxy servers merely relay the SSL/TLS connection without looking into them. However, some CDNs also provide HTTPS access (all the links on Google Libraries API are available via https://), which, combined with in-browser caching, allows for better performance.
5- Would all external assets (tracking pixels, videos, etc) need to have HTTPS version?
Yes, this goes with point #3. The fact that YouTube supports HTTPS access helps.
6- HTTPS and gzip might not be happy together?
They're independent. HTTPS is HTTP over TLS, the gzip compression happens at the HTTP level. Note that you can compress the SSL/TLS connection directly, but this is rarely used: you might as well use gzip compression at the HTTP level if you need (there's little point compressing twice).
7- Backlinks and organic links will always be HTTP so you will be 301'ing all the time, does this impact SEO?
I'm not sure why these links should use http://. URL shortening services are a problem generally speaking for SEO if that's what you're referring to.
I think we'll see more and more usage of HTTP Strict Transport Security, so more https:// URLs by default.

Related

User submitted images with http:// urls are causing browsers to warn that the page is not secure

I'm working for a forum owner who allows users to submit hotlinked images from other domains in their posts. If they choose to use an http version of the URL, the otherwise clean page becomes insecure in the eyes of a browser, which some percentage of the time triggers a worried email from certain users.
I can't rewrite the urls, since I can't code against the assumption that future off site images will have https available. For the same reason, I can't use protocol relative src attributes. I'm unwilling to fetch and cache the images on our server just so that they can be served over https, because of the computational expense involved.
What can I do? Is there some piece of HTML syntax or some similar which I can use to tell the browser "This image doesn't matter, and doesn't constitute a security threat"?

This isn't possible. The image may not constitute a security threat but MITM attacks could still lead to images other than the intended one being loaded over the network, and who knows what an attacker may want to supplant that image with. My suggestion would be to pass the annoyance on to your users and tell them they can only use https URLs.

With CORS, why do servers declare which clients may trust it, instead of clients declaring what servers they trust? [duplicate]

This question already has answers here:
XMLHttpRequest cannot load XXX No 'Access-Control-Allow-Origin' header
(11 answers)
Closed 6 years ago.
There is something about Cross Origin Resource Sharing (CORS) that I have never truly understood, namely that with a cross-origin HTTP request, it is not the client that gets to decide which server(s) it wants to trust; instead, the server declares (in the Access-Control-Allow-Origin response header) that one or more particular clients (origins) trust it. A CORS-enabled browser will only deliver the server's response to the application if the server says that the client trusts the server. This seems like a reverse way of establishing a trust relationship between two HTTP parties.
What would make more sense to me is a mechanism similar to the following: The client declares a list of origins that it trusts; for example, via some fictional <meta allow-cross-origin="https://another-site:1234"/> element in the <head>. (Of course a browser would have to ensure that these elements are read-only and cannot be removed, modified, or augmented via scripts.)
What am I misunderstanding about CORS? Why would a client-side declaration of trusted origins not work? Why is it that the servers get to confirm which clients (origins) may trust its responses? Who is actually protected from whom by CORS? Does it protect the server, or the client?
(These are a lot of questions. I hope it's clear that I am not expecting an answer to each of these, but rather just an answer that points out my fundamental misunderstanding.)

Client has nothing to do with it. With a CORS header you're telling the client which other servers do I trust. Those then can share your resources and client wont mind.
For example if you have two domains you tell the client so let your resources be used by your second website, you dont say i trust you as a client.
So you're protecting the server, not client. You dont want AJAX API Endpoints to be accessible by scripts hosted anywhere in the world.
A client has nothing to gain/lose from this. Its only a protection for servers because using AJAX all the URLs are clearly visible to anyone and had it been not for this protection, anybody could go ahead run their front end using your API, only servers have to lose from this so they get to decide who can use their resources.

Why use protocol-relative URLs at all?

It's been an oft-discussed question on StackOverflow what this means:
<script src="//cdn.example.com/somewhere/something.js"></script>
This gives the advantage that if you're accessing it over HTTPS, you get HTTPS automatically, instead of that scary "Insecure elements on this page" warning.
But why use protocol-relative URLs at all? Why not simply use HTTPS always in CDN URLs? After all, an HTTP page has no reason to complain if you decide to load some parts of it over HTTPS.
(This is more specifically for CDNs; almost all CDNs have HTTPS capability. Whereas, your own server may not necessarily have HTTPS.)

As of December 2014, Paul Irish's blog on protocol-relative URLs says:
2014.12.17: Now that SSL is encouraged for everyone and doesn’t have performance concerns, this technique is now an anti-pattern. If the asset you need is available on SSL, then always use the https:// asset.
Unless you have specific performance concerns (such as the slow mobile network mentioned in Zakjan's answer) you should use https:// to protect your users.

Because of performance. Establishing of HTTPS connection takes much longer time than HTTP, TLS handshake adds latency delay up to 2 RTTs. You can notice it on mobile networks. So it is better not to use HTTPS asset URLs, if you don't need it.

There are a number of potential reasons, though they're all not particularly crucial:
How about the next time every business with an agenda pushes a new protocol? Are we going to have to swap out thousands of strings again then? No thanks.
HTTPS is slower than HTTP of same version
If any of the notes listed at caniuse.com for HTTP/2 are a problem
Conceptually, if the server enforces the protocol, there is no reason to be specific about it in the first place. Agnosticism is what it is. It's covering all your bases.

One thing to note, if you are using CSP's upgrade-insecure-requests, you can safely use protocol-agnostic URLs (//example.com).

Protocol-relative URLs sometimes break JS codes that try to detect location.protocol. They are also not understood by extremely old browsers. If you are developing web services that requires maximum backward-compatibility (i.e. serving crucial emergency information that can be received/sent on slow connections and/or old devices) do not use PRURLs.

best method of linking to outside of secured script (using SSL)

I have shopping cart script in our site that is setup to be secure (https with SSL certificate). I have links in the script leading to other parts of my site that are not secure (WordPress blog, etc).
In the secure site, if I have links that are not secure ( http ), it triggers a message to user in browser, alerting of unsecured links. If I put the outgoing links in the script as relative links, when the user clicks on them and goes outside of script, it keeps them in secure mode (which we don't want for other parts of our site).
Years ago, I remember having this issue. I think I got around it by using a HTTP Redirect for every outgoing link in the secure site. Using a HTTP Redirect, I would have https://www.example.com/outgoinglink1a redirect to http://www.example.com/outgoinglink1b in the HTTP Redirect. This way, I could put https://www.example.com/outgoinglink1a in the secure site, and when it was clicked, it would lead to http://www.example.com/outgoinglink1b
In modern times, how do I have links in the secure site that lead to other parts of the site that aren't secure, without triggering SSL Error Message to user when they are in Secure part of site? Is using some type of 301 redirect in .htaccess better? Is there another preferred or easier method (than using HTTP Redirects) for accomplishing this?
Thank you for any guidance.

You can use https-2-http redirects to the unsecured site to avoid browser warnings.
But for multiple reasons, safety being one of them, I would really advice against using http and https for the same domain, even if lot of big sites still do it. You would ether have to use different cookies for the secure and the normal site, or the one cookie u use for your shopping cart can't have the secure flag, in which case you really don't need https in my opinion. Also, you will never be able to implement HSTS.
You've already gone to the lengths bought a certificate and set up an https-server, now why not secure the whole site?
Update to answer your question in the comment:
That is of course a deal-breaker, if you rely on those and the hosts haven't implemented https yes (which they probably will sooner or later, or they are going to be out of business)
Depending on what they actually do, you maybe could proxy the request to those scripts and serve them from you https-enabled server. But I would really consider this last a resort.
The slowing down part is mostly just the handshake. If you enable session resumption there shouldn't be too much overhead to actually slow down your site. Make sure your TLS session cache is big enough and that the ticket lifetime is ample.
Of course, your mileage may vary. So make sure you test your https site before going online.
I heard of such horror stories as well, but I think most of the time it's probably due to faulty or at least sub-standard implementation. Make sure you redirect EVERY single http-request to https with the 301 status and you should be fine. For some months now enabling https should actually help with your Google pagerank.
To link to an external site (differnt FQDN) you don't have to implement any trickery to avoid browser warnings - that's just linking to a different site and has nothing to do with mixed content policies.

When should one use a 'www' subdomain?

When browsing through the internet for the last few years, I'm seeing more and more pages getting rid of the 'www' subdomain.
Are there any good reasons to use or not to use the 'www' subdomain?

There are a ton of good reasons to include it, the best of which is here:
Yahoo Performance Best Practices
Due to the dot rule with cookies, if you don't have the 'www.' then you can't set two-dot cookies or cross-subdomain cookies a la *.example.com. There are two pertinent impacts.
First it means that any user you're giving cookies to will send those cookies back with requests that match the domain. So even if you have a subdomain, images.example.com, the example.com cookie will always be sent with requests to that domain. This creates overhead that wouldn't exist if you had made www.example.com the authoritative name. Of course you can use a CDN, but that depends on your resources.
Also, you then don't have the ability to set a cross-subdomain cookie. This seems evident, but this means allowing authenticated users to move between your subdomains is more of a technical challenge.
So ask yourself some questions. Do I set cookies? Do I care about potentially needless bandwidth expenditure? Will authenticated users be crossing subdomains? If you're really concerned with inconveniencing the user, you can always configure your server to take care of the www/no www thing automatically.
See dropwww and yes-www (saved).

Just after asking this question I came over the no-www page which says:
...Succinctly, use of the www subdomain
is redundant and time consuming to
communicate. The internet, media, and
society are all better off without it.

Take it from a domainer, Use both the www.domainname.com and the normal domainname.com
otherwise you are just throwing your traffic away to the browers search engine (DNS Error)
Actually it is amazing how many domains out there, especially amongst the top 100, correctly resolve for www.domainname.com but not domainname.com

There are MANY reasons to use the www sub-domain!
When writing a URL, it's easier to handwrite and type "www.stackoverflow.com", rather than "http://stackoverflow.com". Most text editors, email clients, word processors and WYSIWYG controls will automatically recognise both of the above and create hyperlinks. Typing just "stackoverflow.com" will not result in a hyperlink, after all it's just a domain name.. Who says there's a web service there? Who says the reference to that domain is a reference to its web service?
What would you rather write/type/say.. "www." (4 chars) or "http://" (7 chars) ??
"www." is an established shorthand way of unambiguously communicating the fact that the subject is a web address, not a URL for another network service.
When verbally communicating a web address, it should be clear from the context that it's a web address so saying "www" is redundant. Servers should be configured to return HTTP 301 (Moved Permanently) responses forwarding all requests for #.stackoverflow.com (the root of the domain) to the www subdomain.
In my experience, people who think WWW should be omitted tend to be people who don't understand the difference between the web and the internet and use the terms interchangeably, like they're synonymous. The web is just one of many network services.
If you want to get rid of www, why not change the your HTTP server to use a different port as well, TCP port 80 is sooo yesterday.. Let's change that to port 1234, YAY now people have to say and type "http://stackoverflow.com:1234" (eightch tee tee pee colon slash slash stack overflow dot com colon one two three four) but at least we don't have to say "www" eh?

There are several reasons, here are some:
1) The person wanted it this way on purpose
People use DNS for many things, not only the web. They may need the main dns name for some other service that is more important to them.
2) Misconfigured dns servers
If someone does a lookup of www to your dns server, your DNS server would need to resolve it.
3) Misconfigured web servers
A web server can host many different web sites. It distinguishes which site you want via the Host header. You need to specify which host names you want to be used for your website.
4) Website optimization
It is better to not handle both, but to forward one with a moved permanently http status code. That way the 2 addresses won't compete for inbound link ranks.
5) Cookies
To avoid problems with cookies not being sent back by the browser. This can also be solved with the moved permanently http status code.
6) Client side browser caching
Web browsers may not cache an image if you make a request to www and another without. This can also be solved with the moved permanently http status code.

There is no huge advantage to including-it or not-including-it and no one objectively-best strategy. “no-www.org” is a silly load of old dogma trying to present itself as definitive fact.
If the “big organisation that has many different services and doesn't want to have to dedicate the bare domain name to being a web server” scenario doesn't apply to you (and in reality it rarely does), which address you choose is a largely cultural matter. Are people where you are used to seeing a bare “example.org” domain written on advertising materials, would they immediately recognise it as a web address without the extra ‘www’ or ‘http://’? In Japan, for example, you would get funny looks for choosing the non-www version.
Whichever you choose, though, be consistent. Make both www and non-www versions accessible, but make one of them definitive, always link to that version, and make the other redirect to it (permanently, status code 301). Having both hostnames respond directly is bad for SEO, and serving any old hostname that resolves to your server leaves you open to DNS rebinding attacks.

Apart from the load optimization regarding cookies, there is also a DNS related reason for using the www subdomain. You can't use CNAME to the naked domain. On yes-www.org (saved) it says:
When using a provider such as Heroku or Akamai to host your web site, the provider wants to be able to update DNS records in case it needs to redirect traffic from a failing server to a healthy server. This is set up using DNS CNAME records, and the naked domain cannot have a CNAME record. This is only an issue if your site gets large enough to require highly redundant hosting with such a service.

As jdangel points out the www is good practice in some cookie situations but I believe there is another reason to use www.
Isn't it our responsibility to care for and protect our users. As most people expect www, you will give them a less than perfect experience by not programming for it.
To me it seems a little arrogant, to not set up a DNS entry just because in theory it's not required. There is no overhead in carrying the DNS entry and through redirects etc they can be redirected to a non www dns address.
Seriously don't loose valuable traffic by leaving your potential visitor with an unnecessary "site not found" error.
Additionally in a windows only network you might be able to set up a windows DNS server to avoid the following problem, but I don't think you can in a mixed environment of mac and windows. If a mac does a DNS query against a windows DNS mydomain.com will return all the available name servers not the webserver. So if in your browser you type mydomain.com you will have your browser query a name server not a webserver, in this case you need a subdomain (eg www.mydomain.com ) to point to the specific webserver.

Some sites require it because the service is configured on that particular set up to deliver web content via the www sub-domain only.
This is correct as www is the conventional sub-domain for "World Wide Web" traffic.
Just as port 80 is the standard port. Obviously there are other standard services and ports as well (http tcp/ip on port 80 is nothing special!)
Imagine mycompany...
mx1.mycompany.com 25 smtp, etc
ftp.mycompany.com 21 ftp
www.mycompany.com 80 http
Sites that don't require it basically have forwarding in dns or redirection of some-kind.
e.g.
*.mycompany.com 80 http
The onlty reason to do it as far as I can see is if you prefer it and you want to.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008