Protocol-less absolute URIs, with host, in HTML? - html

I have seen some pages that refer to what appear to be absolute URIs, with a host, but without a protocol. For example:
<script src="//mc.yandex.ru/metrika/watch.js" type="text/javascript"></script>
My assumption is that this means 'use the same protocol as what we are on now', so the parent page will request https://mc.yandex.ru/metrika/watch.js if its own protocol is https.
Is this syntax even correct? Part of a standard? What does it mean?

It's called a "network path reference". The documentation for this can be found in RFC 3986. Specifically, see section 4.2:
A relative reference that begins with two slash characters is termed
a network-path reference; such references are rarely used.
And section 5.4:
Within a representation with a well defined base URI of
http://a/b/c/d;p?q
a relative reference is transformed to its target URI as follows...
"g:h" = "g:h"
...
"//g" = "http://g"
...
So a URI starting with a double slash is transformed to match the base URI. One use of this that I know of (in fact, the only use I've ever seen) is when using a CDN (for example, when including jQuery via the Google CDN). Google hosts a version on the http protocol, and another on the https protocol, and using this URI format will cause the correct version to be loaded, no matter which protocol you are using.
Update (having just found and read this article)
It appears that using this URI format throughout a page can prevent the "This Page Contains Both Secure and Non-Secure Items" error in IE. However, it's worth noting that this format causes files included via a link element, or an #import directive cause the included file to be requested twice. All other resources (such as images and anchors) should work as expected.

Related

What's the difference between the two css link below, and what are the purposes of the two links? [duplicate]

I just learned from a colleague that omitting the "http | https" part of a URL in a link will make that URL use whatever scheme the page it's on uses.
So for example, if my page is accessed at http://www.example.com and I have a link (notice the '//' at the front):
Google
That link will go to http://www.google.com.
But if I access the page at https://www.example.com with the same link, it will go to https://www.google.com
I wanted to look online for more information about this, but I'm having trouble thinking of a good search phrase. If I search for "URLs without HTTP" the pages returned are about urls with this form: "www.example.com", which is not what I'm looking for.
Would you call that a schemeless URL? A protocol-less URL?
Does this work in all browsers? I tested it in FF and IE 8 and it worked in both. Is this part of a standard, or should I test more browsers?
Protocol relative URL
You may receive unusual security warnings in some browsers.
See also, Wikipedia Protocol-relative URLs for a brief definition.
At one time, it was recommended; but going forward, it should be avoided.
See also the Stack Overflow question Why use protocol-relative URLs at all?.
It is called network-path reference (the part that is missing is called scheme or protocol) defined in RFC3986 Section 4.2
4.2 Relative Reference
A relative reference takes advantage of the hierarchical syntax
(Section 1.2.3) to express a URI reference relative to the name space
of another hierarchical URI.
relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty
The URI referred to by a relative reference, also known as the target URI, is obtained by applying the reference resolution
algorithm of Section 5.
A relative reference that begins with two slash characters is
termed a network-path reference (emphasis mine); such references are rarely used.
A relative reference that begins with a single slash character is termed an absolute-path reference. A relative reference that does not begin with a slash character is termed a relative-path reference.
A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative- path reference.

Images with protocol-relative source URLs don't load in Firefox

I use the following syntax in my application so that images are loaded over https or http depending upon how the page was loaded.
<img src="//path_to_image.jpg">
This works fine in Chrome but firefox does not display any images.
What can I do to fix this ?
The // use to support multiple protocol (i.e. http or https), these type of URL is known as "protocol relative URLs" and use with complete domain name. Mostly CDN url are used with //.
If you are planning to use // make sure you use full domain url (i.e. //xyz.com/images/path_to_image.jpg). If you just want to use relative path from the root then use single slash (i.e. /)
Following like help you to understand // usage
Two forward slashes in a url/src/href attribute
Using // will only work with a full URL ('//yourhost.tld/directory/path_to_image.jpg'). In your case one slash /, should be enough!

How do user agents distinguish domains from file extensions in relative urls?

Let's say a browser encounters a link like this:
<a href='stackoverflowhome.html'>home</a>
This is clearly a relative url to an html file in the current directory, but how does the browser know that the .html is a file extension, and not a TLD (top level domain)? Does it have a list of common file extensions, or a list of TLDs? And if so, is it manually updated whenever a new file format becomes commonly used, or when the list of accepted TLDs change, for example with brand tlds?
It's because that is how RFC 3986 specified that URIs should be parsed. If the URI does not have a scheme (a set of characters followed by a colon - e. g. http: or gopher:) then it must be treated as a relative URI. Quoting from the RFC:
A URI-reference is either a URI or a relative reference. If the
URI-reference's prefix does not match the syntax of a scheme followed
by its colon separator, then the URI-reference is a relative
reference.
User-agents are allowed to make their best guess about what the user meant (see section 4.5) especially in cases where the context is ambiguous (such as URL bars in browsers) but the RFC recommends against it where the URI will be around for a long time as the best guess of user-agents will change over time, thus leading to URIs that don't resolve to the same resource depending on the time they are accessed or the user-agent they are accessed with.
Relative URLs are never domain names.
A URL is only parsed as containing a domain name if it has a protocol. (or is protocol-relative).
The URL does not start with a protocol specifier - no http:// or https://, so is interpreted as a relative URL.

Absolute Paths beginning with two slashes

I noticed that Wikipedia links pointing to a path on a different Wikipedia subdomain use a link with the following syntax: //<SERVER_NAME>/<REQUEST_URI>. For example, a link from a file page to the file appears (for example) as //upload.wikimedia.org/wikipedia/en/9/95/Stack_Overflow_website_logo.png. I am familiar with absolute paths (thinking twice about that now) and relative paths and how to use them. However, I have never seen this use. I assume this points to a new server name using the current protocol. Is this correct? And is there an official name (or widely accepted name) for this?
You are absolutely right. A link to //some/path is a protocol relative path.
Namely, if you are currently on http://something.example.com, a link to //google.com would point to http://google.com.
If you are currently on https://something.example.com, a link to //google.com would point to https://google.com.
The most common use of this can be seen in the html5 boilerplate.
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.5.1/jquery.js"></script>
Kindly google provides its javascript cdn over both http and https.
Thereby to avoid security warnings, we load it over https if we are on https, or http if we are on http.
note:
Unfortunately, you can't do the same thing for google analytics.
they use the domains ssl.google-analytics.com and www.google-analytics.com for https and http.
It looks like these //example.com URIs are called "Scheme Relative" or "Protocol Relative", and there is more information about it at this question:
Network-Path Reference URI / Scheme relative URLs
EDIT:
Apparently this might actually be called a "network-path reference" as seen here:
https://www.rfc-editor.org/rfc/rfc3986#section-4.2
Quote:
A relative reference that begins with two slash characters is termed
a network-path reference; such references are rarely used. A
relative reference that begins with a single slash character is
termed an absolute-path reference. A relative reference that does
not begin with a slash character is termed a relative-path reference.

what is the # symbol in the url

I went to some photo sharing site, so when I click the photo, it direct me to a url like
www.example.com/photoshare.php?photoid=1234445
. and when I click the other photo in this page the url become
www.example.com/photoshare.php?photoid=1234445#3338901
and if I click other photos in the same page, the only the number behind # changes. Same as the pretty photo like
www.example.com/photoshare.php?album=holiday#!prettyPhoto[gallery2]/2/
.I assume they used ajax because the whole page seems not loaded, but the url is changed.
The portion of a URL (including and) following the # is the fragment identifier. It is special from the rest of the URL. The key to remember is "client-side only" (of course, a client could choose to send it to the server ... just not as a fragment identifier):
The fragment identifier functions differently than the rest of the URI: namely, its processing is exclusively client-side with no participation from the server — of course the server typically helps to determine the MIME type, and the MIME type determines the processing of fragments. When an agent (such as a Web browser) requests a resource from a Web server, the agent sends the URI to the server, but does not send the fragment. Instead, the agent waits for the server to send the resource, and then the agent processes the resource according to the document type and fragment value.
This can be used to navigate to "anchor" links, like: http://en.wikipedia.org/wiki/Fragment_identifier#Basics (note how it goes the "Basics" section).
While this used to just go to "anchors" in the past, it is now used to store navigatable state in many JavaScript-powered sites -- gmail makes heavy use of it, for instance. And, as is the case here, there is some "photoshare" JavaScript that also makes use of the fragment identifier for state/navigation.
Thus, as suspected, the JavaScript "captures" the fragment (sometimes called "hash") changing and performs AJAX (or other background task) to update the page. The page itself is not reloaded when the fragment changes because the URL still refers to the same server resource (the part of the URL before the fragment identifier).
Newer browsers support the onhashchange event but monitoring has been supported for a long time by various polling techniques.
Happy coding.
It's called the fragment identifier. It identifies a "part" of the page. If there is an element with a name or id attribute equal to the fragment text, it will cause the page to scroll to that element. They're also used by rich JavaScript apps to refer to different parts of the app even though all the functionality is located on a single HTML page.
Lately, you'll often see fragments that start with "#!". Although these are still technically just fragments that start with the ! character, that format was specified by Google to help make these AJAXy pseudo-pages crawlable.
The '#' symbol in the context of a url (and other things) is called a hash, what comes after the hash is called a fragment. Using JavaScript you can access the fragment and use its contents.
For example most browsers implement a onhashchange event, which fires when the hash changes. Using JavaScript you can also access the hash from location.hash. For example, with a url like http://something.com#somethingelse
var frag = location.hash.substr(1);
console.log(frag);
This would print 'somethingelse' to the console. If we didn't use substr to remove the first character, it frag would be: '#somethingelse'.
Also, when you navigate to a URL with a hashtag, the browser will try and scroll down to an element which has an id corresponding to the fragment.
http://en.wikipedia.org/wiki/Fragment_identifier
It is the name attribute of an anchor URL: http://www.w3schools.com/HTML/html_links.asp
It is used to make a bookmark within an HTML page (and not to be confused with bookmarks in toolbars, etc.).
In your example, if you bookmarked the page with the # symbol in the URL, when you visit that bookmark again it will display the last image that you viewed, most likely an image that has the id of 3338901.
hey i used sumthing like this .... simple but useful
location.href = data.url.replace(/%2523/, '%23');
where data.url is my original url . It substitutes the # in my url