What is the difference between these URL syntax? - html

I was sent a hyperlink to a Tableau Public link by a client. When I tried opening it, I got a 404 exception. I wrote back to the client but was told by the same that the link was working fine. I visited his profile page and was able to open the presentation there, but the URL that ended up working was slightly different than the one behind the original, non-functioning link.
Here's the anonymized URL behind the original link
https://public.tableau.com/profile/[client_name]%23!/vizhome/Project-AirportDelay/FlightPerformancesinUSA?publish=yes
And here's the URL via the profile page:
https://public.tableau.com/profile/[client_name]#!/vizhome/Project-AirportDelay/FlightPerformancesinUSA
The only differences I see are ?publish=yes and %23!. I tried appending the former, ?publish=yes, to the working URL, and it was still functional. So I suspect that it has to do with the other difference %23! vs. #!. Could the first work because he is opening it from his computer where he is likely logged onto Tableau Public? What's the difference between these syntax? Any ideas about why the original hyperlink might not be functional?
For obvious privacy reasons, I can't provide the whole URL.

It looks like the basic URL pattern for passing filters ?publish=yes
and
%23 is the URL encoded representation of #

The first # after the authority component starts the fragment component. If the # should be part of the path component or the query component, it has to be percent-encoded as %23.
As # is a reserved character, these URIs aren’t equivalent:
http://example.com/foo#bar
http://example.com/foo%23bar
There are countless ways how a URI reference could become erroneous. The culprit is often a software, like a word processor, where someone pastes the correct URI, and the software incorrectly percent-encodes it (maybe assuming that the user didn’t paste the real/correct URI).
Copy-pasting the URI from the browser address bar into a plain text document should always work correctly.

Related

International characters in website file names

I need to create a website (in PHP) that has filenames that include international characters.
For example: transportører.php (notice the 'o' with the diagonal line through it).
So I happily create the file, save it, and upload it to the web server. Whenever I LINK to this file, however, it all goes wrong. I'll have the usual link syntax:
My Link Text
Upon clicking such a link, the web browser attempts to navigate to a non-existent page:
The requested URL /transportører.php was not found on this server.
Notice how the filename has been mutated? The "ø" character in "transportører.php" has been changed into the bizarre "ø" symbol (that's not a comma after the "A", by the way, but an actual component of the symbol itself).
There's obviously some sort of translation going on here, but what, why, and how do I prevent it?
I think, it's two possible reasons:
html encoding
Possibly the encoding of the html file is wrong, so the link is actually pointing to a wrong path. Add
<meta charset="UTF-8">
in the head section of your file.
server settings
If the server is resolving the link wrongly (you can check this by typing the address of your norwegian-named.php in the browser and see if it is replaced), you need to know which server you are using and investigate in this direction. For apache, How to change the default encoding to UTF-8 for Apache? looks promising.
As the URL isn’t percent-encoded in the hyperlink, browsers assume¹ UTF-8 for percent-encoding it, where ø becomes %C3%B8.
However, your server seems to expect/use ISO 8859-1 (instead of UTF-8), where ø becomes %F8.
A quick fix would be to link to the ISO 8859-1 percent-encoded URL:
transportører
(A better fix would be to let your server use UTF-8 for everything, and then to use the UTF-8 percent-encoded URL in the hyperlink.)
¹ Either by default, or because the linking page seems to use UTF-8 (at least according to the HTTP header Content-Type: text/html; charset=UTF-8).
Well, this is embarrassing. Everything was - in actual fact - working correctly. The 404 error made the filename LOOK "wrong" - e.g. transportører.php. However, this is actually correct. That is how HTML seems to reference the file "behind the scenes". So to the browser, "transportører.php" is synonymous with "transportører.php"
What was happening was that FileZilla (my FTP client) objects to international characters. It was changing the filename during upload.... replacing the international characters with "something else". The filenames LOOKED correct on the screen (when I viewed the website folder with Linux Mint's native FTP client), but the underlying character coding was NOT correct. The web-browsers could tell the difference, and hence didn't associated my links with the (mutated) file names, hence triggering an error 404.
The solution in a nutshell: I used Linux Mint native FTP to upload my files, overwriting the ones uploaded by FileZilla, and everything just sprang into life.
Thanks to everyone who offered advice... it was all good stuff, just not the solution in this particular case.

Why is IE10 removing URL hash marks on external redirect links

I have a basic link:
Free Pie Here
but when I click on it, I'm redirected to https://pieworld.com/apple
Everything after the hash mark, as well as the hash, are not included. This is only happening in IE10. I've tested without the target="_blank" as well, but the link still breaks at the hash.
Can't seem to find any documentation on this. The closest I've come to is this SO question, but it doesn't help.
Some background info that might help:
This is a .Net site
I'm redirecting from a http: to a https: site.
According to the RFC3986 https://www.rfc-editor.org/rfc/rfc3986 it is not OK to use this format. You should remove the trailing slash. If you have a trailing slash, it points to a directory within the server. Without it, you point to a document and with the hashmark you are allowed to point to a segment of the document. See example here.
A hash character is used for bookmarks in an URL. To use a hash character as part of the URL itself, you need to URL encode it using %23:
Free Pie Here
Why do you have a trailing slash after the hash?
Try https://pieworld.com/apple/#1
That would be more standard. I've never heard of anyone putting trailing slashes after hash links.
I Think, as the other folks suggested, that the website that you are trying to navigate to may interpret the /#1 as a folder/page inside the parent-page/document. Try removing the forward-slash before the #1 or look inside the html for the header's id/name tag so you can link it directly.
May also be a bug in IE10.
-Phantom
Any URL that contains a # character is a fragment URL. The portion of the URL to the left of the # identifies a resource that can be downloaded by a browser and the portion on the right, known as the fragment identifier, specifies a location within the resource.
http://www.httpwatch.com/features.htm#print
In HTML documents, the browser looks for an element with id attribute matching the fragment. For example, in the URL shown above the browser finds a matching tag in the Printing Support heading:
<h3 id="print">Printing Support</h3>
and scrolls the page to display that section.
I am not sure if the slash after the hash is supported. If you didn't mean to use it as a fragmented url, you should remove the hash or replace it.
The syntax of the Location header field has been changed to allow all URI references, including relative references and fragments,
along with some clarifications as to when use of fragments would not be appropriate. (Section 7.1.2)
for more information check this thorough post.
Hash removed from URL when back button clicked IE9 , IE10 IE11
In IE10 browser, first time on clicking the HREF link, then it comes to the correct below url: http://www.example.com/yy/zz/ff/paul.html#20007_14
If back button is clicked from the IE10 browser and again clicked the HREF link , then it comes to the below url: http://www.example.com/yy/zz/ff/paul.html
Solution :
Please change your url with https
It works for mine

what is the # symbol in the url

I went to some photo sharing site, so when I click the photo, it direct me to a url like
www.example.com/photoshare.php?photoid=1234445
. and when I click the other photo in this page the url become
www.example.com/photoshare.php?photoid=1234445#3338901
and if I click other photos in the same page, the only the number behind # changes. Same as the pretty photo like
www.example.com/photoshare.php?album=holiday#!prettyPhoto[gallery2]/2/
.I assume they used ajax because the whole page seems not loaded, but the url is changed.
The portion of a URL (including and) following the # is the fragment identifier. It is special from the rest of the URL. The key to remember is "client-side only" (of course, a client could choose to send it to the server ... just not as a fragment identifier):
The fragment identifier functions differently than the rest of the URI: namely, its processing is exclusively client-side with no participation from the server — of course the server typically helps to determine the MIME type, and the MIME type determines the processing of fragments. When an agent (such as a Web browser) requests a resource from a Web server, the agent sends the URI to the server, but does not send the fragment. Instead, the agent waits for the server to send the resource, and then the agent processes the resource according to the document type and fragment value.
This can be used to navigate to "anchor" links, like: http://en.wikipedia.org/wiki/Fragment_identifier#Basics (note how it goes the "Basics" section).
While this used to just go to "anchors" in the past, it is now used to store navigatable state in many JavaScript-powered sites -- gmail makes heavy use of it, for instance. And, as is the case here, there is some "photoshare" JavaScript that also makes use of the fragment identifier for state/navigation.
Thus, as suspected, the JavaScript "captures" the fragment (sometimes called "hash") changing and performs AJAX (or other background task) to update the page. The page itself is not reloaded when the fragment changes because the URL still refers to the same server resource (the part of the URL before the fragment identifier).
Newer browsers support the onhashchange event but monitoring has been supported for a long time by various polling techniques.
Happy coding.
It's called the fragment identifier. It identifies a "part" of the page. If there is an element with a name or id attribute equal to the fragment text, it will cause the page to scroll to that element. They're also used by rich JavaScript apps to refer to different parts of the app even though all the functionality is located on a single HTML page.
Lately, you'll often see fragments that start with "#!". Although these are still technically just fragments that start with the ! character, that format was specified by Google to help make these AJAXy pseudo-pages crawlable.
The '#' symbol in the context of a url (and other things) is called a hash, what comes after the hash is called a fragment. Using JavaScript you can access the fragment and use its contents.
For example most browsers implement a onhashchange event, which fires when the hash changes. Using JavaScript you can also access the hash from location.hash. For example, with a url like http://something.com#somethingelse
var frag = location.hash.substr(1);
console.log(frag);
This would print 'somethingelse' to the console. If we didn't use substr to remove the first character, it frag would be: '#somethingelse'.
Also, when you navigate to a URL with a hashtag, the browser will try and scroll down to an element which has an id corresponding to the fragment.
http://en.wikipedia.org/wiki/Fragment_identifier
It is the name attribute of an anchor URL: http://www.w3schools.com/HTML/html_links.asp
It is used to make a bookmark within an HTML page (and not to be confused with bookmarks in toolbars, etc.).
In your example, if you bookmarked the page with the # symbol in the URL, when you visit that bookmark again it will display the last image that you viewed, most likely an image that has the id of 3338901.
hey i used sumthing like this .... simple but useful
location.href = data.url.replace(/%2523/, '%23');
where data.url is my original url . It substitutes the # in my url

How do browsers interpret hrefs that start with "http:/"?

Some of my users are creating links that look like
<a href='http:/some_local_path'>whatever</a>
I've noticed that Firefox interprets this as
<a href='/some_local_path'>whatever</a>
Can I count on this occurring in all browsers? Or should I remove the http:/s myself?
This is an unusual URL, but is not invalid. The URL spec says that omitted components are defaulted from the base url, which can be provided explicitly in a <base> tag, or absent that, the current URL of the page.
When a browser sees /some_local_path, it is missing a scheme and a host, so it takes them from the base url. When your users put http:/some_local_path, it has an explicit scheme, but is missing a host, so the host defaults to the base url. If your page is an http: page, then the two URLs will be interpreted identically.
All that said, those URLs are almost certainly not what your users intended. You'll be helping them if you point out their error.
It's always best to validate data entered by users. Inevitably, you'll get something unexpected.

using encodeURI to display an entire page

Hi I am making a chrome extension. Where I save a page to the database as a string and then open it later as a dataURI scheme like:
d = 'data:text/html;charset=utf-8'+encodeURI('HTML TEXT')
location.reload(d);
The problem with this is that the page, say its name is http://X/, in which I executed the above command loses the javascript files in its head.
I considered using the document.write(d), if d has a string appeneded to it with the <head>...</head> of http://X/.
But this opens a big vulnerability problem for XSS. At this point I am trying to think of white listing tags when I save the original page... is there another way?
I'm not sure what you mean by http://X/, but if want copied website to retain its origin (i.e. have code you give run exactly as if it were downloaded from http://X/), then I'm afraid it's not possible with standard DOM methods (it would be a security vulnerability that bypasses same-origin security policy).
If you want to run 3rd party sourcecode safely, then use this:
<iframe sandbox src="data:…"></iframe>
You could modify the source and insert <base href="http://X/"> in there to make relative URLs work properly.