Do canonical links require a full domain? - html

I want to add canonical links to my pages, but do I need to specify the domain, or will a relative URL do?
In other words, is:
<link rel="canonical" href="/item/1">
good enough, or do I need to use:
<link rel="canonical" href="http://mydomain.com/item/1">

Directly from Google:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Can the link be relative or absolute?
rel="canonical" can be used with relative or absolute links, but we recommend using absolute links to minimize potential confusion or difficulties. If your document specifies a base link, any relative links will be relative to that base link.

Again, Google says this:
https://support.google.com/webmasters/answer/139066?hl=en
Avoid errors: use absolute paths rather than relative paths with the rel="canonical" link element.
Use this structure: https://www.example.com/dresses/green/greendresss.html
Not this structure: /dresses/green/greendress.html).
For example’s sake, these are their URLs:
http://example.com/wordpress/seo-plugin/
http://example.com/wordpress/seo/seo-plugin/
This is what rel=canonical was invented for. Especially in a lot of e-commerce systems, this (unfortunately) happens fairly often, where a product has several different URLs depending on how you got there. You would apply rel=canonical in the following method:
You pick one of your two pages as the canonical version. It should be the version you think is the most important one. If you don’t care, pick the one with the most links or visitors. If all of that’s equal: flip a coin. You need to choose.
Add a rel=canonical link from the non-canonical page to the canonical one. So if we picked the shortest URL as our canonical URL, the other URL would link to the shortest URL like so in the <head> section of the page:
<link rel="canonical" href="http://example.com/wordpress/seo-plugin/">
That’s it. Nothing more, nothing less.

All href attributes are hypertext references - that's what it stands for. As such, they are always URI-References, not URIs, and can be relative.
In this case though, there's a benefit in putting in the full URI if you can, because it will survive anything that migrates it onto another domain in the future (assuming you will still want the domain listed to be the canonical one), and can even survive some of the cruder automated plagiarisms :)
That benefit is pretty slight if you aren't actively using non-canonical versions on other domains though, so I wouldn't expend much effort on it.

There is nothing special about canonical. It’s a standard link type, for use with standard ways to provide links (e.g., the link element), so you can specify any kind of URL reference (absolute, relative, protocol-relative, in combination with the base element, empty, …).
RFC 6596 (The Canonical Link Relation) explicitly says:
The target (canonical) IRI MAY:
Specify a relative IRI (see [RFC3986], Section 4.2).
One of the examples:
[…] or as a relative IRI:
<link rel="canonical" href="page.php?item=purse">

Update on canonical best practices: rel="canonical" has cross-domain support. Google's source: https://webmasters.googleblog.com/2009/12/handling-legitimate-cross-domain.html
Moreover, the introduction of structured data makes the use of canonicals even more strict, as Google will not pick-up the JSON markup from not canonical sources (a mistake I happen to have made!).

Relative canonical paths are accepted. This one works best:
<link rel="canonical" href="#"/>
It points to the current document's URL – including queries – sans the hash part.

If you only have one domain for that website, is ok to use the absolute path:
<link rel="canonical" href="/item/1">

Related

What does "vr:canonical" mean?

What namespace/scheme/whatever does this "vr:canonical" come from?
<meta property="vr:canonical" content="URL_FROM_OTHER_PAGE" />
(edit: I know canonical links, just curious about that "vr:" thing)
After analyzing the page (and the code of its used trackers and counters and advertising scripts), this seems to be a property used by an advertising tool named "VisualRevenue", a product by the Company Outbrain.
Thanks for your effort.
For example:
<link rel="canonical" href="https://moz.com/blog" />
This would tell Google that the page in question should be treated as though it were a copy of the URL above and that all of the link & content metrics the engines apply should technically flow back to that URL.
The Canonical URL tag attribute is similar in many ways to a 301 redirect from an SEO perspective. In essence, you're telling the engines that multiple pages should be considered as one (which a 301 does), without actually redirecting visitors to the new URL

Having links relative to path (i.e. http://domain/path/)

Are there commonly accepted ways to have all links and references to images, scripts, stylesheets be relative to some path regardless of current document's URL?
Let's start from the very beginning. I am developing a custom content managing system in PHP. I am using mod_rewrite to redirect all requests like http://domain.com/path/artist/edit/25 to http://domain.com/path/index.php?url=/artist/edit/25. So the part of the URL following http://domain.com/path/ is actually virtual.
I would like all links to be in the format like ... and references to images, scripts, etc. in the format like <link href="ui/css/style.css"...>.
Well, it seems to be possible with:
...
<base href="http://domain.com/path/" />
...
This way I can link to scripts and stylesheets in a way like below:
...
<!-- Custom page style CSS -->
<link href="ui/css/style.css" rel="stylesheet" type='text/css'>
<!-- Support for CSS3 media query in IE8 -->
<script type="text/javascript" src="ui/js/respond.js"></script>
<!-- MooTools 1.6.0 -->
<script type="text/javascript" src="ui/js/MooTools-Core-1.6.0.js"></script>
...
However, AFAIK the <base href=...> should match the current page request (which is http://domain.com/path/artist/edit/25). And it ruins the whole concept.
That's why I need you to clarify:
Is it a commonly accepted practice to have <base href=...> pointing to a directory and not to the current document URL?
Does this practice comply with the requirements for the usage of HTML <base> element?
Will it in any way affect crawlers like Googlebot? Do they require the <base href=...> to match the every particular document URL?
I also would like to know how do you solve the problem of relative links and references to resources when some part of URL is virtual. I have discovered that projects like WordPress tend to completely avoid relative links and go the "absolute links way".
The whole point of the base element is to specify an arbitrary base URL to be used to resolve relative links instead of the current-document URL. Otherwise the element would not make sense since current-document URL is used as the base url by default anyway.
Major crawlers support both absolute and relative URLs as well as the base element. Some shake-and-bake crawlers don’t understand relative URLs and/or don’t support the base element (thus resulting in multiple 404 lines in your server logs, though this is a minor thing).
I would recommend not to use the base element. Relative links tend to be error-prone resulting in wrong resolved URLs while not providing any serious benefits. It’s generally more reasonable and easy to always use absolute URLs.
Is it a commonly accepted practice to have pointing to
a directory and not to the current document URL?
No, it's not common. In fact I'd say it's very uncommon because there are better ways create a logical information architecture of your site without it.
Will it in any way affect crawlers like Googlebot? Do they require the to match the every particular document URL?
It's hard to get the base tag correct and there are ways to do what you want using better methods that are transparent to googlebot etc.
Note, absolute links are what you're seeing in the source but it that does not means that the links physically map to directories and files etc. Using tools like mod_rewrite on apache you can structure your site as many ways as you please with practically any physical filesystem, doing this is also what I'd recommend because as things changes you're not tied to a particular solution. This is also why most php apps send everything through an index.php script, the application then controls the information architecture, not the filesystem.
"base href" can be used without problems, but it is not always the best solution. It is fine if your server will answer requests with diferent server name and paths (e.g. "http://www.example.com/companysection/especificservice" and "http://service.internalnetwork.dev/")
IMHO it's not the best solution for your case.
In the url "http://example.com/path/index.php?url=/artist/edit/25" you want to transform part of the query in a path ( base example.com/path/index.php ?url= )... and this can be a big problem. How are you going to handle querys that also have a query? (receiving a search term or a form GET, for example)
Apache mod_rewrite would be a better option, as Harry answer suggest (or nginx rewrite rules). With it you can easily "transform" a request like http://example.com/path/artist/edit/25?search=something&order=ASC in http://example.com/path/index.php?url=artist/edit/25&search=something&order=ASC
This will give you less problems in the long term.
Check the last example in https://wiki.apache.org/httpd/RewriteQueryString , it's really close to fulfill all your rewriting needs
(you will just need to ensure you handle the rest of query properly)
Take a URL of the form http://example.com/path/var/val and transform
it into a var=val query http://example.com/path?var=val. Essentially
the reverse of the above recipe. This example will work for any valid
three level URL. http://example.com/path/var/val will be transformed
into http://example.com/path?var=val.
RewriteRule ^/path/([^/]+)/([^/]+) /path?$1=$2

Can the canonical tag be used on all pages?

Im working on a site that has had an SEO expert review it. They have advised me that we should apply canonical tags on every other page
<!-- http://www.example.com/detail/table&r=dining-room -->
<link rel="canonical" href="http://www.example.com/detail/table"/>
is it really required that the canonical tag only appears on every other page, or will it play nicely if it appears on the same page?
The reason I ask this is: isn't the link also telling Google that it is infact on the right page?
RFC 6596: The Canonical Link Relation specifies:
The target (canonical) IRI MAY:
o […]
o Be self-referential (context IRI identical to target IRI).
So, yes, you can use rel-canonical even on the canonical page.
It seems to be an argument between Google and Bing. Google doesn't mind if you have the canonical tag pointing to itself. Bing does mind and you lose their trust for the use of canonical tags. Look at this article: http://www.northsideseo.com/google-vs-bing-canonical-tag/

What is <link rel="image_src">

Today I came across a <link rel="image_src"> tag. I don't know about it, so I use google. Google tell me that this tag are similar to og:image. So I came to open graph main site to read about it http://ogp.me/, but i found nothing about link rel="image_src". So this tag is replacement to meta property="og:image" or is in special tag in another specification ? How use this tag or for what is used?
The rel attribute specifies the type of the link, i.e. the kind of the relationship between the document and the linked resource. Usually just a few keywords, like stylesheet and icon, are used. Although many other keywords have been proposed and registered, most of them are write-only: they are meant to express something, but nobody cares (no software uses the information).
The extension mechanisms of HTML5 include, in the description of link types, a somewhat obscure mechanism that allows, in theory, anyone register his favorite keyword in the existing rel values wiki to make documents using it as rel value “conforming”.
And image_src has indeed been registered there, with the information that it is used to “specify a Webpage Icon for use by Facebook, Yahoo, Digg, etc.”, no specification has been identified but an article about it is linked to, and it is “probably redundant with rel=icon”.
You can use this tag to use an image as the thumb for link share.
When someone posts a link to your site on social media, such as Facebook, the image that is displayed with your link is usually the first one in your code. This may not be the image that best fits defines your site, and it may not fit well in the small box that Facebook posts. The link rel="image_src" tag lets you control what image (or images, you can have more than one by stacking separate references) is displayed alongside your link.

Is it a bad practice to use relative urls with an explicit base?

Is it a bad practice to use relative urls with explicit base for dynamic site?
For example, like this one:
<base href="http://my-site.com/mount-point-of-site">
...
<img src='/my-page/my-image.jpg'></img>
I need it because mount point of site can be changed over time, and I need to preserve referential integrity of wiki-like content produced by users (links to relative pages, relative image paths, ...).
But I never saw such technique in use for dynamic web applications, usually it's handled on the server-side.
Is there any specific disadvantages of such technique, that may bite me later? SEO, cross-browser / mobile compatibility, some other aspects?
I get what you're saying about applications not using absolute urls. You'll typically set the base url in a config file, not as a meta tag in that instance.
Best practice? Always use absolute urls incase anyone links to your stuff, or scrapes your links, things will still point to your site instead of their site.
SEO folks will agree with the absolute url rule.