Can I use a "protocol free" URL with rel=canonical? - html

SEO wise, is it okay to use an protocol free URL like this?
<link rel="canonical" href="//example.com" />
I redirect all users to HTTPS anyway.
With protocol free I mean not using either http:// or https:// but // instead.

If the spec co-written by Google employees (https://www.rfc-editor.org/rfc/rfc6596) is correct, then yes, any relative reference is ok.

Yes you can - both the spec and Google will allow a uri as a canonical href.
This does present a risk that your page is reached via the wrong protocol - eg: http page is visited when you want https canonical url's. In this case the relative canonical value is interpreted as an http url.
However, if you have your 301 redirects correctly set up to go from http to https, you will not have an issue, and it may actually be preferable in some cases to use a relative canonical url.
Case in point, switching from an http site to https will lose you all your Facebook likes accumulated on your http url's. In this case you may want Facebook to still crawl your http site, whilst redirecting all other user agents to https.
Facebook will then reinstate your old http page likes on both your http and https pages, but not if your page's canonical url points to an absolute https url. In this instance a protocol relative canonical url - //www.mysite.com - is very useful.

Yes
href attribute on a canonical link is like all href attribute on <link>: it supports URIs. And URIs can be full URIs or relative URIs.
Moreover The Canonical Link Relation spec confirms that.
Then: of course you can use a relative URL like a protocol free one.
But don't
I will recommend anyway to always use full URLs : scheme, host, path...
Why ? Because canonical URL is made to prevent from wrong URL to be used by robots.
Then using a relative URL might let some wrong URLs used by bots contrary to a full URL which you can be certain it is the right one.

I believe you can't, according to this website.
Make them 100% specific. For various reasons, a ton of sites use protocol relative links, meaning they leave the http / https bit from their URLs. Don’t do this for your canonicals. You have a preference. Show it.
Also, I'd recommend the https, because Google is using it as a ranking signal.

Related

Form Post HTTPS

I'm trying to submit a form back to my server using POST and the target is at the same domain (which is HTTPS) however when I submit I get a Mixed Content error. Does the form post not follow the same protocol as the hosting page? If so what is the best way to fix it without always specifying the full url (I use sub domains for different companies)
Does the form post not follow the same protocol as the hosting page?
It does if a relative URL is specified.
e.g. path relative
action="/form/foo"
or protocol relative
action="//example.com/form/foo"
It appears you have something on your action page, or the page that it redirects to loading over plain HTTP. Use developer tools to hunt out this reference.

What does // mean in an <a> tag

I'm writing a web crawler and I'm testing it out by starting at Wikipedia. However, I noticed that many of wikipedia's links are prefaced with //, so the link from wikipedia.org to en.wikipedia.org is a link to //en.wikipedia.org. What exactly does this // mean in practice? Does it say "use whatever scheme you were using before and then redirect to this url?" or does it mean something entirely different?
The link will use protocol (http or https) same as page which contain that link. For example if https://stackoverflow.com/ contain it will directed to https://en.wikipedia.org
It maintains the protocol that is being used for the webpage. HTTP/HTTPS.
It's particulaly useful for external scripts and css tags, in which you don't know in which protocol your site will be working on.
That's why on Google libraries (https://developers.google.com/speed/libraries/devguide#jquery) you have like this:
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
Just while writing this I found a duplicate: Two forward slashes in a url/src/href attribute
Take a look at it.
Yes, it will redirect to that url using the scheme of the current location.
In order for this to work, the resource this url points to must be available in every scheme it's expected to be redirected from (usually, both http and https).
It is protocoll relative url. It keeps http or https.

http content within https page

In my website with SSL certified URL, I need to include http content for static contents - JS, CSS, Images etc.
E.g. the page at https://www.example.com will refer to http://subdomain.example.com/images/a.jpg.
Is there a way to include HTTP element within HTTPS page, avoiding any security alerts in the browser?
No. The security warnings are there for a reason and cannot be bypassed.
What if someone altered the JS en-route with a man-in-the-middle attack? They could add their own code and have full access to the DOM, making the SSL worthless.
To be secure, you need to load the entire page over HTTPS, not just the HTML document. If you are secure, then the warnings about being insecure will go away.
You don't need to define http:// when calling your static assets. You can use the protocol-relative url, basically you can do this :
<img src="//subdomain.abc.com/images/a.jpg" alt="">
It will get the image through the same protocol as the page you're on.
Paul Irish will explain that better than me on his blog
I don’t think you can bypass security alerts (on browsers that give them) if your https page includes http content. But check whether the references work with https as well (depends on the server of course). If they do, you can use URLs like //subdomain.example.com/images/a.jpg which means that the protocol (http or https) of the page itself will be used.

What URLs on an https page need to be https?

I did a search and could not find an answer on here to my question. What I am confused about is what URLs on an https page need to be https, as opposed to http.
I am making a series of pages my website that must be accessed over https. At the top of the pages is a menu. I accidentally included a style sheet into the page using http, instead of https, and all the browsers I tried gave me a warning about insecure content. But, I can leave the menu links at the top of the page http, and there's no problem.
So, am I correct in saying that things that are being loaded onto the page, such as style sheets and images, need to have https in the link, but that plain old href links can just have http in them?
Thanks for your advice.
Generally your secure pages such as purchase page, credit card etc processing pages are set to https or sometimes all pages such as websites for banks or other financial institutes or even login pages.
You can leave it to browser to deciper http or https part by using what is called protocol-relative URLS in which you simply do not specify either of http or https and still browser will be able to figure it out. An example:
//example.com
//google.com
Let's say your domain is foo.com, you would specify all URLs like:
//foo.com/page1.html
//foo.com/otherpage
So you simply leave the http or https part in your URLs.
To know more about protocol-relative URLS, see:
http://paulirish.com/2010/the-protocol-relative-url/
Yes, all links that are used to create the page itself (the HTML, the CSS, JavaScript, the images) need to be served over https. That means all URLs of that domain need to be served over https.
Links to other websites can be http just fine. You may want to check if those links can be visited over https as well because then the user will use a secure connection to visit those website as well.

html - links without http protocol

Is there a reason we include the http / https protocol on the href attribute of links?
Would it be fine to just leave it off:
my site
The inclusion of the “http:” or “https:” part is partly just a matter of tradition, partly a matter of actually specifying the protocol. If it is defaulted, the protocol of the current page is used; e.g., //www.example.com becomes http://www.example.com or https://www.example.com depending on the URL of the referring page. If a web page is saved on a local disk and then opened from there, it has no protocol (just the file: pseudo-protocol), so URLs like //www.example.com won’t work; so here’s one reason for including the “http:” or “https:” part.
Omitting also the “//” part is a completely different issue altogether, turning the URL to a relative URL that will be interpreted as relative to the current base URL.
The reason why www.example.com works when typed or pasted on a browser’s address line is that relative URLs would not make sense there (there is no base URL to relate to), so browser vendors decided to imply the “http://” prefix there.
URLs in href are not restricted to only HTTP documents. They support all the protocols supported by browsers- ftp, mailto, file etc.
Also, you can preceed URL name with '#', to link to a html id internally in the page. You can give just the name or directory path, without a protocol, which will be taken as a relative URL.
My solution was to trick the browser with a redirect service, such as bit.ly and goo.gl (which will be discontinued soon), in addition to others.
When the browser realizes that the url of the shortcuts is https, it automatically releases the link image, the link is released and instead displays the http image, without showing the original link.
The annoying part is that, according to the access, it will display in the panel control of your redirector, thousands of "clicks", which is actually "display".
With this experience I'm going to look for a Wordpress plugin for redirection and create my own "redirects links". So I will have https // mysite.com /id → redirect to http link.