How do browsers interpret hrefs that start with "http:/"? - html

Some of my users are creating links that look like
<a href='http:/some_local_path'>whatever</a>
I've noticed that Firefox interprets this as
<a href='/some_local_path'>whatever</a>
Can I count on this occurring in all browsers? Or should I remove the http:/s myself?

This is an unusual URL, but is not invalid. The URL spec says that omitted components are defaulted from the base url, which can be provided explicitly in a <base> tag, or absent that, the current URL of the page.
When a browser sees /some_local_path, it is missing a scheme and a host, so it takes them from the base url. When your users put http:/some_local_path, it has an explicit scheme, but is missing a host, so the host defaults to the base url. If your page is an http: page, then the two URLs will be interpreted identically.
All that said, those URLs are almost certainly not what your users intended. You'll be helping them if you point out their error.

It's always best to validate data entered by users. Inevitably, you'll get something unexpected.

Related

What is the difference between these URL syntax?

I was sent a hyperlink to a Tableau Public link by a client. When I tried opening it, I got a 404 exception. I wrote back to the client but was told by the same that the link was working fine. I visited his profile page and was able to open the presentation there, but the URL that ended up working was slightly different than the one behind the original, non-functioning link.
Here's the anonymized URL behind the original link
https://public.tableau.com/profile/[client_name]%23!/vizhome/Project-AirportDelay/FlightPerformancesinUSA?publish=yes
And here's the URL via the profile page:
https://public.tableau.com/profile/[client_name]#!/vizhome/Project-AirportDelay/FlightPerformancesinUSA
The only differences I see are ?publish=yes and %23!. I tried appending the former, ?publish=yes, to the working URL, and it was still functional. So I suspect that it has to do with the other difference %23! vs. #!. Could the first work because he is opening it from his computer where he is likely logged onto Tableau Public? What's the difference between these syntax? Any ideas about why the original hyperlink might not be functional?
For obvious privacy reasons, I can't provide the whole URL.
It looks like the basic URL pattern for passing filters ?publish=yes
and
%23 is the URL encoded representation of #
The first # after the authority component starts the fragment component. If the # should be part of the path component or the query component, it has to be percent-encoded as %23.
As # is a reserved character, these URIs aren’t equivalent:
http://example.com/foo#bar
http://example.com/foo%23bar
There are countless ways how a URI reference could become erroneous. The culprit is often a software, like a word processor, where someone pastes the correct URI, and the software incorrectly percent-encodes it (maybe assuming that the user didn’t paste the real/correct URI).
Copy-pasting the URI from the browser address bar into a plain text document should always work correctly.

Forward slashes in http protocol declaration in URL

I have just noticed that on HTML form validation for an input type of url that the double forward slash '//' after the protocol: is not required. I tried entering URLs in to many browsers without the forward slashes and they all work e.g. http:www.web-dewd.com works in Chrome, Firefox, Edge, Opera, and dare I say it, even IE11.
I cannot find any specific definition which states whether they are required or not. I spent a good few minutes on https://www.w3.org/standards/ without any luck. The best I could find was an interview with Tim Berners-Lee stating they are not required: http://www.dailymail.co.uk/sciencetech/article-1220286/Sir-Tim-Berners-Lee-admits-forward-slashes-web-address-mistake.html :
But with the colon in there as well, it turns out people never use the slash slash...
This article from ZDNet states:
there is practically no reference to the double forward-slash on the web
I would argue that slashes are recommended, but does anyone know and is able to provide evidence of what the correct standard is?
Somewhat ironically, Stackoverflow does require // when entering a link, as do other editors when determining to convert text to a link e.g. Microsoft Outlook.
Source
PrePrefix: To be a Uniform Resource Locator as currently defined by the URI
working group, the whole string must start with a constant prefix
"URL:"
this part says that valid URL starts with protocol: and does not states anything about //
Internet protocol parts Those schemes which refer to internet protocols mostly have a
common syntax for the rest of the object name. This starts with a
double slash "//" to indicate its presence, and continues until the
following slash "/".
To indicate URL string must start with protocol: and // is just common syntax to indicate domain name start.
When replacing URL usually you look for http[s]:// instead of http[s]:. It's just common practice, and does not mean that all web developers will use that.

Chrome file download name incorrect

I have the following line
<a id="export" href="data:text/plain;base64,MDow" download="fname">Download</a>
I'd expect this to save a file as fname.txt, however, Chrome always saves it as download.txt. Tested this on firefox, it gives the expected behavior fname.txt.
This suggests that this behavior is intentional and won't be fixed, so my question is, is there a way to download it with the correct filename?
It's not just a security concern
the pseudo specification says:
The attribute can furthermore be given a value, to specify the file
name that user agents are to use when storing the resource in a file
system. This value can be overridden by the Content-Disposition HTTP
header's filename parameters. [RFC6266] In cross-origin situations,
the download attribute has to be combined with the Content-Disposition
HTTP header, specifically with the attachment disposition type, to
avoid the user being warned of possibly nefarious activity. (This is
to protect users from being made to download sensitive personal or
confidential information without their full understanding.)
Sure if the browser detects a header which go against yours, will not use your definition.
Which is not your case...
You have a bad formated base64 text:
this should work.
<a id="export" href="data:text/plain;base64,MDow" download="fname.txt"> Download</a>
And try a diferent encoding if you need a expanded charset than utf-8, use iso-8859-1 insted
example 2
<a download="fname" href="" >image code</a>
Note: My ".txt" extension at download attribute is not necessary but it is good practice
Edit2: TEsted in 37.0.2058.2
fidle:
http://jsfiddle.net/yL8UZ/

Why is IE10 removing URL hash marks on external redirect links

I have a basic link:
Free Pie Here
but when I click on it, I'm redirected to https://pieworld.com/apple
Everything after the hash mark, as well as the hash, are not included. This is only happening in IE10. I've tested without the target="_blank" as well, but the link still breaks at the hash.
Can't seem to find any documentation on this. The closest I've come to is this SO question, but it doesn't help.
Some background info that might help:
This is a .Net site
I'm redirecting from a http: to a https: site.
According to the RFC3986 https://www.rfc-editor.org/rfc/rfc3986 it is not OK to use this format. You should remove the trailing slash. If you have a trailing slash, it points to a directory within the server. Without it, you point to a document and with the hashmark you are allowed to point to a segment of the document. See example here.
A hash character is used for bookmarks in an URL. To use a hash character as part of the URL itself, you need to URL encode it using %23:
Free Pie Here
Why do you have a trailing slash after the hash?
Try https://pieworld.com/apple/#1
That would be more standard. I've never heard of anyone putting trailing slashes after hash links.
I Think, as the other folks suggested, that the website that you are trying to navigate to may interpret the /#1 as a folder/page inside the parent-page/document. Try removing the forward-slash before the #1 or look inside the html for the header's id/name tag so you can link it directly.
May also be a bug in IE10.
-Phantom
Any URL that contains a # character is a fragment URL. The portion of the URL to the left of the # identifies a resource that can be downloaded by a browser and the portion on the right, known as the fragment identifier, specifies a location within the resource.
http://www.httpwatch.com/features.htm#print
In HTML documents, the browser looks for an element with id attribute matching the fragment. For example, in the URL shown above the browser finds a matching tag in the Printing Support heading:
<h3 id="print">Printing Support</h3>
and scrolls the page to display that section.
I am not sure if the slash after the hash is supported. If you didn't mean to use it as a fragmented url, you should remove the hash or replace it.
The syntax of the Location header field has been changed to allow all URI references, including relative references and fragments,
along with some clarifications as to when use of fragments would not be appropriate. (Section 7.1.2)
for more information check this thorough post.
Hash removed from URL when back button clicked IE9 , IE10 IE11
In IE10 browser, first time on clicking the HREF link, then it comes to the correct below url: http://www.example.com/yy/zz/ff/paul.html#20007_14
If back button is clicked from the IE10 browser and again clicked the HREF link , then it comes to the below url: http://www.example.com/yy/zz/ff/paul.html
Solution :
Please change your url with https
It works for mine

Redirect from HTML deprecated?

I've heard that the usual way I redirect from an HTML page, like
<meta http-equiv="REFRESH" content="0;url=page.html">
is deprecated by the latest HTML. Is it true or not, and if so, what other ways are there to redirect?
The proper way to redirect is to send redirect headers.
You need to change status from 200 OK to appropriate 3xx status. Then you also need to include Location: http://yourRedirectURL header. The implementation depends on what programming language you are using in the back-end.
Using the Location header is both seamless and a more efficient way to redirect someone to another page, assuming you're just using a zero timeout anyways.
Unless you're placing them on a landing page first then redirecting them, use the Location header.
I should also note that the location header specifies it should be provided with a fully qualified address to land on and not use an absolute or relative site-based path. E.g.
Location: http://www.google.com/
Instead of:
Location: /login
Location: ../../home
If you are using php, you can use the following code (prior to any other output to the browser):
<?php header('Location: http://example.com'); ?>
It is technically not deprecated, but that’s just because the pseudo-term “deprecated” is sloppily used in the “spec”. The meta redirect mechanism is described as “should not” in HTML 4.01:
“Note. Some user agents support the use of META to refresh the current page after a specified number of seconds, with the option of replacing it by a different URI. Authors should not use this technique to forward users to different pages, as this makes the page inaccessible to some users. Instead, automatic page forwarding should be done using server-side redirects.”
The HTML5 drafts, though, describe the meta refresh mechanism without saying such things, though the examples are about different use. This does not make it any better idea. It should not be used for redirecting an address to a new one, except in case you have no way of affecting server behavior so that appropriate HTTP redirect takes place. In that case, it is advisable to add a normal link to the new address into the document body, for situations where the meta redirect does not work.