should rel-canonical also include protocol (http/https)? - html

I'm migrating my website from http to https (although it will still support access via http)
Currently all of my pages have accurate rel-canonical meta tags set in the HTML, but obviously they all point to the canonical http:// url.
Should I now be updating those to https:// too, or is it ok to leave them as http?
I'm wondering whether Google will penalise me, or start detecting duplicate content, if I start mixing them

Yes Google sees http and https as different sites so you should update them.
A redirect on the server might be sufficient in the short term but personally I would be looking to update the pages as soon as you can.

Related

Looking for a script or program to convert all HTTP links to HTTPS

I am trying to find a script or program to convert my html website links from http to https.
I have looked all over hundreds of search results and web articles and I used the Word Press SSL plugin but it missed numerous pages with http links.
Below is one of thousands of my links I need to convert:
http://www.robert-b-ritter-jr.com/2015/11/30/blog-121-we-dont-need-the-required-minimum-distributions-rmds
I am looking for a way to do this quickly instead of one at a time.
The HTTPS Everywhere extension will automatically rewrite unsecure HTTP requests to HTTPS. Keep in mind not all websites offer a secure and encrypted connection.

Does the browser (Chrome/Firefox) automatically use https even when you try to use http?

I store urls in a database for the users of my webapp and I am not sure whether I need to store whether it was a "http" or a "https" request.
If I don't store the type of the connection and just echo to the users a link with "http", will it in 100% of the cases use a "https" connection automatically (when it is possible)? I don't want to be responsible for a user not using a https connection even though it is possible.
Does the browser (Chrome/Firefox) automatically use https even when you try to use http?
No. If you tell the browser to use HTTP, then it will use HTTP.
Schemes will only be added to a URL automatically under two circumstances:
When it is a relative URL, in which case the scheme will be the same as the one used to load the current document.
When the user types the URL into the browser's address bar and omits the scheme, in which case it will default to HTTP (not HTTPS).
A web server might provide HTTP and HTTPS versions of the same URL with the HTTP version containing a redirect to the HTTPS version and the HTTPS version hosting the content.
A web server might, for that matter, not provide an HTTP version at all… but that is very uncommon.
I am not sure whether I need to store whether it was a "http" or a "https" request.
You should store the full URL. You shouldn't omit bits and hope that you can fill them in by guesswork.
It won't automatically do that, but there are ways to help out:
some users may have the "HTTPSeverywhere" extension, which will attempt to redirect to HTTPS
you can serve HSTS headers, which will make the browser automatically stick to HTTPS if the user has at least once been on HTTPS with your site
Now there are a few problems with these points:
not everyone use the extension
HSTS only works once the user was visiting the URL with HTTPS and it will only work on site with HSTS headers set up, so if links are external, this might not be the case.
That being said: Are the links you store links to your own domain or external links to any web site?

Are push notifications possible in html5 without fully https site?

Looks like Push notifications are finally usable for web-apps! Unfortunately, this requires https for ServiceWorker, which not all sites may have.
One thing I noticed in the spec it mentions:
if r's url's scheme is not one of "http" and "https", then:
Throw a TypeError."
So I'm confused - can the site be http, as long as it includes a serviceworker that is from https? For example, mydomain.com could include an https serviceworker from https://anotherdomain.com?
Another standard, web-api simple-push, doesn't mention requiring https (likely an omission in the documentation?), and "The user experience on Firefox Desktop has not been drawn out yet". Is the documentation on this outdated, or is push really only supported in FirefoxOS??
Simple-push, that is the current push solution in Firefox OS doesn't have anything to do with ServiceWorkers.
The next generation of push, implemented by both Google and Mozilla will be done through ServiceWorkers:
Push API spec
In that case yes, your content will need to be served over HTTPS.
Probably you will be interested in the LetsEncrypt initiative:
letsencrypt.org
A new certification authority that will help developers to transition their content over HTTPS.
Also just for development purposes, both Google and Mozilla implementations of ServiceWorkers allow you to bypass the check of the secure content, if you develop against localhost.
In the case of Mozilla you will need to enable the flag:
devtools.serviceWorkers.testing.enabled: true
But again this will be just for development, and AFAIK, Mozilla push landed or is about to land, and will be available in the nightly builds, you can follow the work here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1038811
No, the new generation of push notifications (i.e. Push API) requires HTTPS.
If you need to add push notifications to a website without HTTPS you can use a third-party service like Pushpad (I am the founder) that delivers notifications on your behalf.
The text you cited from the spec is from the Cache.addAll() section (5.4).
Here's the summary of addAll() on MDN:
The addAll() method of the Cache interface takes an array of URLS, retrieves them, and adds the resulting response objects to the given cache. The request objects created during retrieval become keys to the stored response operations.
Service workers can request & cache URLs that are either HTTP or HTTPS, but a Service Worker itself can only work in its registered Scope (which must be HTTPS).
simple-push is not related to Service Workers; it seems comparable to the approaches other platforms have taken:
Apple Push Notifications
Google Cloud Messaging
I found a nice bypass workaround to allow notifications from websites and domains without SSL, hence http:// and not https:// for Firefox.
Firefox holds a file inside the Mozilla directory called permissions.sqlite which is a sqlite database file that holds the permissions for domains. You can add your domain there http://yourdomainname with permissions for notifications and it will work.
I have created a demonstration for Windows here https://gist.github.com/caviv/8df5fa11a98e0e33557f75215f691d54 in golang

Link to http or https

While adding a hyperlink to another site (which has SSL), the site documentation sometimes say to link to the http:// link instead of the https:// (e.g. Play store, which is a site that uses SSL but it does not tell you to link to https; instead, it says to link to http). They do not matter (as they function normally), but would there be a reason to link to the http:// instead the https://?
Maybe they don't want extra encryption and lowering down the site speed as SSL may decrease performance somewhat.
If users are downloading large, public files, there may be a system burden to encrypt these each time.
Some browsers may not support SSL.
You will probably want the home page accessible via HTTP, so that users don't have to remember to type https to get to it.
Your specific portion of page needs secure http(https) not whole site.
Your site is indexed mainly on http on Search engines.

How do I stop search engines indexing a maintenance page

I need to setup a maintenance page for a website I'm running, e.g. for display when I'm performing site maintenance (scheduled downtime) or if something really breaks and I need to put up a holding page.
Is there anything special I need to do to ensure that search engine crawlers don't index it and think that it's my site. Or should I do a 404, add a temporary robots.txt file or something? I basically don't want them to index it as my site, but I also don't want them to think my site is dead and not come back.
Edit: Here's what I did in Apache: ErrorDocument 503 /.server-maintenance.html RewriteEngine On RewriteRule !^.server-maintenance.html /server-maintenance Redirect 503 /server-maintenancestrong text
You should send a 503 Service Unavailable HTTP status code, and not a 404. Use this in conjunction with a Retry-After header to tell the robots when to come back.
You may use a robots.txt
http://www.robotstxt.org/
Also, google has a validator in their webmasters tools.
https://www.google.com/webmasters/tools/
Returning 503 Service Unavailable tells Google bots to come back later. There's a Google support page describing the HTTP error codes and how they are interpreted by them.
You can also use Retry-After response header to suggest the minimum time after which your site is re-checked for availability.
Another approach would be to not link the maintenance page from any other page on your website (or any other website).