Absolute href address resolves with prepended relative link? - html

I'm a bit baffled about this, so I'm guessing I must be overlooking something very simple or obvious.
I'm trying to provide some external links on a page advertising a bicycle, but the absolute links are resolving with the relative path prepended to it. Here is [the page] https://dl.dropboxusercontent.com/u/9972753/rambler.htm.
Click on any link and you'll see they resolve as:
location of HTML file(Dropbox)/"absolute link"
Any help would be much appreciated.
Thanks!
Rory

You're not using the correct double quotes characters. You must use " instead of ” otherwise your browser will add the correct double quotes itself and use the document base of the page (i.e. the dropbox link) as a prefix of whatever comes after the href attribute. In your case that includes the whole URL with the wrong doubles quotes, which the browser thinks are part of a relative URL (since the URL doesn't start with either an URL scheme (e.g. http://) or a forward slash.
PS: Is it possible that you've copy-pasted these links from Word or something?

Related

HTML Hrefs in Tomcat

I'm building a HTML string in tomcat and I notice that in my JSON object, my clickable href link is something like:
http://localhost/%22/https://myLinkHere.com/%22
This is a 2 part question. First, should it contain the http://localhost in front? And secondly, why is the %22 there?
Here is what my JSON href looks like in text:
linkDisplayName
This looks right to me, but I can't tell why the last %22 is there.
I think you won't need the localhost as long as you are supplying the relative path
The ascii code for %22 is " which is correctly referenced in your link.
HTML parsers are very lenient, which often leads to confusing behavior. Without the exact JSON it's hard to say for sure but there are a couple of obvious issues. Ultimately the issue is your HTML is malformed and/or mis-escaped.
%22 is " URL-encoded, which means that the quotes you've \-escaped are being included in the URL rather than surrounding them. That likely means that in the JSON they're double-escaped. That might mean it's \\" or something similar; try just a single backslash (\") or no backslash at all (").
Notice that the protocol (https:/) in your URL is also wrong; a URL starts with a protocol (like https) followed by a :, and generally followed by two slashes (//). Your URL follows the protocol with just a single slash, which makes it look like a relative URL rather than an absolute one. Browsers will prefix relative URLs with whatever they infer the current host to be, which in your context appears to be localhost.
The HTML should look like this:
linkDisplayName
So in summary no, the URL should probably not contain http://localhost, and it should not contain those %22s either. They're showing up because your JSON is malformed.

Urlize escaping %23 in href of text

I am trying to render text from a user with a url like this in it: https://example.com/%20%23654
I pass the url to urlize and I get this:
In[1]: outp = urlize('https://example.com/%20%23654'); print outp
Out[1]: u'https://example.com/%20%23654'
I understand that %20 escapes to a space and %23 to a hash, but why is it only escaping the hash in the href? Is this a bug? If it were intended, why is it not escaping the %20 to blank space?
I don't think this is a bug.
I see two parts to this question:
Why is it only unescaping the hash and not the space?
Why is it only doing the unescaping in the href and not in the visible linked text?
Here are my thoughts on the first:
A hash is a perfectly legal URL path character. It is most often used to go to anchors in HTML (example and link to docs in one!):
http://www.w3.org/TR/html4/struct/links.html#h-12.2
urlize realizes this. It unescapes the hash in the href. It works with any letter which is a legal URL character. Here is an example with the letter f:
>>> urlize('https://example.com/%66')
u'https://example.com/%66'
A space on the other hand is not a legal URL character (although it is often tolerated). Therefore, it remains encoded to %20 both in the link and in the visible link depiction.
The second part of the question is why is it only unescaping in the link but not in the visible depiction. That also makes sense. In the href, it does not matter whether you pass in https://example.com/%66 or https://example.com/f. The effect is the same, and the depiction is "under the hood." So urlize uses the simplest form, without the unnecessary encoding. On the other hand, the visible part is presented to the user. Therefore, urlize tries to preserve the exact depiction which it was passed in originally, as that is the least surprising thing to do.

Why is IE10 removing URL hash marks on external redirect links

I have a basic link:
Free Pie Here
but when I click on it, I'm redirected to https://pieworld.com/apple
Everything after the hash mark, as well as the hash, are not included. This is only happening in IE10. I've tested without the target="_blank" as well, but the link still breaks at the hash.
Can't seem to find any documentation on this. The closest I've come to is this SO question, but it doesn't help.
Some background info that might help:
This is a .Net site
I'm redirecting from a http: to a https: site.
According to the RFC3986 https://www.rfc-editor.org/rfc/rfc3986 it is not OK to use this format. You should remove the trailing slash. If you have a trailing slash, it points to a directory within the server. Without it, you point to a document and with the hashmark you are allowed to point to a segment of the document. See example here.
A hash character is used for bookmarks in an URL. To use a hash character as part of the URL itself, you need to URL encode it using %23:
Free Pie Here
Why do you have a trailing slash after the hash?
Try https://pieworld.com/apple/#1
That would be more standard. I've never heard of anyone putting trailing slashes after hash links.
I Think, as the other folks suggested, that the website that you are trying to navigate to may interpret the /#1 as a folder/page inside the parent-page/document. Try removing the forward-slash before the #1 or look inside the html for the header's id/name tag so you can link it directly.
May also be a bug in IE10.
-Phantom
Any URL that contains a # character is a fragment URL. The portion of the URL to the left of the # identifies a resource that can be downloaded by a browser and the portion on the right, known as the fragment identifier, specifies a location within the resource.
http://www.httpwatch.com/features.htm#print
In HTML documents, the browser looks for an element with id attribute matching the fragment. For example, in the URL shown above the browser finds a matching tag in the Printing Support heading:
<h3 id="print">Printing Support</h3>
and scrolls the page to display that section.
I am not sure if the slash after the hash is supported. If you didn't mean to use it as a fragmented url, you should remove the hash or replace it.
The syntax of the Location header field has been changed to allow all URI references, including relative references and fragments,
along with some clarifications as to when use of fragments would not be appropriate. (Section 7.1.2)
for more information check this thorough post.
Hash removed from URL when back button clicked IE9 , IE10 IE11
In IE10 browser, first time on clicking the HREF link, then it comes to the correct below url: http://www.example.com/yy/zz/ff/paul.html#20007_14
If back button is clicked from the IE10 browser and again clicked the HREF link , then it comes to the below url: http://www.example.com/yy/zz/ff/paul.html
Solution :
Please change your url with https
It works for mine

Unwanted characters being added to url in HTML

I'm trying to include a simple hyperlink in a website:
...Engineers (IEEE) projects:
So that it ends up looking like "...Engineers (IEEE) projects:" with "IEEE" being the hyperlink.
When I click on copy link address and paste the address, instead of getting
http://www.ieee.ucla.edu/
I get
http://www.ieee.ucla.edu/%C3%A2%E2%82%AC%C5%BD
and when I click on the link, it takes me to a 404 page.
Check the link. These special character are added automatically by browser (URL Encoding).
Url Encoding
Use this code and it will work::
IEEE
The proper format to add hyperlink to a html is as follow
(texts to be hyperlink)
and for better understanding go through this link http://www.w3schools.com/html/html_links.asp
%C3%A2%E2%82%AC%C5%BD represents „ which is when you get when a unicode „ is being parsed as Windows-1252 data.
Use straight quotes to delimit attribute values in your real code. You are doing this in the code you have included in the question, but that won't have the effect you are seeing. Presumably your codes are being transformed at some point in your real code.
Add appropriate HTTP headers and <meta> data to tell the browser what encoding your file is really using

Absolutizing an image url with a ".."

I have an HTML document I'm transforming with an image whose source url looks like this:
"../foo/bar/baz.png"
I'm using a tritium function to absolutize image source urls, but the ".." seems to be stumping it. It's prepending the hostname, etc, but when it does, it adds one too many layers.
So for example, the correct URL of the image is:
"www.host.com/foo/bar.png"
But the page on which it appears is at "www.host.com/site/baz/page.html"
The source of the image in the original html is therefore "../foo/bar.png"
But the absolutized result I'm getting is: "www.host.com/site/foo/bar.png"
In other words it's going up the file tree to "/site/", but it needs to be going up one more. I don't really see how it even works on the original page without another ".." How should I be handling the ".." in the url?
.. means to traverse one level up; you are using a relative path, not an absolute one like you should be. Drop the dots:
<img src="/foo/bar.png"> will load the image from the root of the domain.
There is a huge difference between src="/foo/bar.png" and src="foo/bar.png" (Notice the slash after the first double quote)
First one points to http://example.com/foo/bar.png NO MATTER what.
Second one, however, (without the beginning slash) is relative URL so the output path depends on the file on which the image appears.
That is why you were getting "www.host.com/site/foo/bar.png" (one level up relative to the file path).
Two solutions:
1) src="/foo/bar.png" OR
2) src="../../foo/bar.png"
I always recommend the first approach because even after you move the files around, you won't have to change the absolute URL. (I learned it the hard way)
P.S. this rule applies to CSS files as well. (for example when specifying the background image URL) If you use absolute paths, you won't have to bang your head on the wall when you change the directory of the CSS file.
As you're in a Moovweb project, I would suggest manipulating the problematic src before you use the absolutize() function.
Is there an easy way you can select the image using Tritium? I'd suggest doing that, then manipulating the src attribute:
$("./img[#id='']") {
attribute("src", "/foo/bar.png")
}
After this, you should be able to use the absolutize() function and the src will be rendered correctly.