I noticed in webpages such as Google maps there is an # in the URL. What does it do? For example https://www.google.com/maps/place/Vancouver+City+Hall/#49.260404,-123.113799,3a,75y,349.48h,90t/data=!3m4!1e1!3m2!1sUeoHwwwaQPVvyH1amrQAAQ!2e0!4m9!1m6!2m5!1sgoogle+maps+vancouver+city+hall!3m3!1scity+hall!2sVancouver,+BC,+Canada!3s0x548673f143a94fb3:0xbb9196ea9b81f38b!3m1!1s0x548673e7b8d4609d:0x9823432c0c571e10!6m1!1e1
An URL in general is not consistenly consisting of directories followed by a file. Before it eventually beeing a physical directory+path, it's just a string, not more, not less. There is only 1 real special character in URL that is not part of this URI string: # (Hash fragment identifier).
Basically you can map any string after //yourdomain.com/ (and before #) to anything you want.
Therefore, the # character in the URL only has cosmetical/optical/ however you call it meaning.
The ? and & have a special meaning in terms of, the server can use these symbols to identify parameters. But it does not have to do so. It is in fact possible to map an URI like //yourdomain.com/&&& to a complete different resource than //yourdomain.com/&&.
Related
Our new URL structure is like below
http://domain.com/#/test?utm_source=test&utm_medium=test
We need to keep # (hash sign) in URL as the application depends upon it but at same time we also need querystring to work but the problem is browsers are skipping querystring from request if URL contain # and application / server not even receiving them.
You can't do that:
https://en.wikipedia.org/wiki/Fragment_identifier
The fragment identifier introduced by a hash mark # is the optional last part of a URL for a document. It is typically used to identify a portion of that document. The generic syntax is specified in RFC 3986. The hash mark separator in URIs does not belong to the fragment identifier.
Solutions:
Omit this tag and use hashtags always in this routing place
Use as $_GET param with urldecode
Read this Usage of Hash(#) in URL
First thing, it will not work; but one thing you can do is, put a javascript code on the page, where you compare the routes & AJAX request to the API ( that returns only data needed ). pseudo code can be,
window.onload = function(){
if(window.location.hash == "you needed"){
xhr(url_needed_with_json_or_xml);
}
}
NOTE: downside is you may need to keep routes in client side js, otherwise go change hash based url routing.
I'm inserting untrusted data into a href attribute of an tag.
Based on the OWASP XSS Prevention Cheat Sheet, I should URI encode the untrusted data before inserting it into the href attribute.
But would HTML encoding also prevent XSS in this case? I know that it's an URI context and therefore I should use URI encoding, but are there any security advantages of URI encoding over using HTML encoding in this case?
The browser will render the link properly in both cases as far as I know.
I'm assuming this is Rule #5:
URL Escape Before Inserting Untrusted Data into HTML URL Parameter
Values
(Not rule #35.)
This is referring to individual parameter values:
<a href="http://www.example.com?test=...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...">link</a >
URL and HTML encoding protect against different things.
URL encoding prevents a parameter breaking out of a URL parameter context:
e.g. ?firstname=john&lastname=smith&salary=20000
Say this is a back-end request made by an admin user. If john and smith aren't correctly URL encoded then a malicious front-end user might enter their name as john&salary=40000 which would render the URL as
?firstname=john&salary=40000&lastname=smith&salary=20000
and say the back-end application takes the first parameter value in the case of duplicates. The user has successfully doubled their salary. This attack is known as HTTP Parameter Pollution.
So if you're inserting a parameter into a URL which is then inserted into an HTML document, you technically need to URL encode the parameter, then HTML encode the whole URL. However, if you follow the OWASP recommendation to the letter:
Except for alphanumeric characters, escape all characters with ASCII
values less than 256 with the %HH escaping format.
then this will ensure no characters with special meaning to HTML will be output, therefore you can skip the HTML encoding part, making it simpler.
Example - If user input is allowed to build a relative link (to http://server.com/), and javascript:alert(1) is provided by the user.
URL-encoding: <a href="javascript%3Aalert%281%29"> - Link will lead to http://server.com/javascript%3Aalert%281%29
Entity-encoding only: <a href="javascript:alert;(1)"> - Click leads to javascript execution
I can't find an answer out there,
Can you tell me what means a #? in an url
# is for intern shortcut / anchor
? is for GET parameters
Example : http://www.roxy.fr/vestes-snowboard-femme/#?camp=da:rx_fr_Cooler_dryflight_bn&ectrans=1
Is it for "no anchor, with these parameters" ?
It seems nonsense to put a # with no anchor name.
Everything following the # is the fragment, or "anchor" as you call it. Your URL has a fragment value of ?camp=da:rx_fr_Cooler_dryflight_bn&ectrans=1. That's right, all this is the fragment. It's styled like a query parameter, and if it would come before the # it would be a query parameter, but as it is it's simply the value of the fragment.
This is likely read by Javascript on the page and evaluated there and the Javascript will fetch some data via AJAX or do something else based on the information in this string. This is typically done when developing a single-page-application or otherwise moving a lot of code to the client-side. The server doesn't receive the fragment and doesn't have to worry about it, it's all done client-side.
In URL syntax, anything after # is a fragment identifier. How it will be used is a different matter and depends on the software that processes the URL. The use of a fragment part in links is just one of the possible uses.
I am trying to give users of my website the ability to download files from Amazon S3. The URLs are digitally signed by my AWS private key on my webserver than sent to the client via AJAX and embedded in the action attribute of an html form.
The problem arises when the form is submitted. The action attribute of the form contains a url that has a digital signature. This signature often times contains + symbols which get percent-encoded. It completely invalidates the signature. How can I keep forms from percent-encoding my urls?
I (respectfully) suggest that you need to more carefully identify the precise nature of the problem, where in the process flow it breaks down, and identify precisely what it is that you actually need to fix. URLEncoding of "+" is the correct thing for the browser to do, because the literal "+" in a query string is correctly interpreted by the server as " " (space).
Your question prompted me to review code I've written that generates signed urls for S3 and my recollection was correct -- I'm changing '+' to %2B, '=' to %3D, and '/' to %2F in the signature... so that is not invalid. This is assuming we are talking about the same thing, such that the "digital signature" you mention in the question is the signature discussed here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html#RESTAuthenticationQueryStringAuth
Note the signature in the example has a urlencoded '+' in it: Signature=vjbyPxybdZaNmGa%2ByT272YEAiv4%3D
I will speculate that the problem you are having might not be '+' → '%2B' (which should be not only valid, but required)... but perhaps it's a double-encoding, such that you are, at some point, double-encoding it so that '+' → '%2B' → '%252B' ... with the percent sign being encoded as a literal, which would break the signature.
As with any user supplied data, the URLs will need to be escaped and filtered appropriately to avoid all sorts of exploits. I want to be able to
Put user supplied URLs in href attributes. (Bonus points if I don't get screwed if I forget to write the quotes)
...
Forbid malicious URLs such as javascript: stuff or links to evil domain names.
Allow some leeway for the users. I don't want to raise an error just because they forgot to add an http:// or something like that.
Unfortunately, I can't find any "canonical" solution to this sort of problem. The only thing I could find as inspiration is the encodeURI function from Javascript but that doesn't help with my second point since it just does a simple URL parameter encoding but leaving alone special characters such as : and /.
OWASP provides a list of regular expressions for validating user input, one of which is used for validating URLs. This is as close as you're going to get to a language-neutral, canonical solution.
More likely you'll rely on the URL parsing library of the programming language in use. Or, use a URL parsing regex.
The workflow would be something like:
Verify the supplied string is a well-formed URL.
Provide a default protocol such as http: when no protocol is specified.
Maintain a whitelist of acceptable protocols (http:, https:, ftp:, mailto:, etc.)
The whitelist will be application-specific. For an address-book app the mailto: protocol would be indispensable. It's hard to imagine a use case for the javascript: and data: protocols.
Enforce a maximum URL length - ensures cross-browser URLs and prevents attackers from polluting the page with megabyte-length strings. With any luck your URL-parsing library will do this for you.
Encode a URL string for the usage context. (Escaped for HTML output, escaped for use in an SQL query, etc.).
Forbid malicious URLs such as javascript: stuff or links or evil domain names.
You can utilize the Google Safe Browsing API to check a domain for spyware, spam or other "evilness".
For the first point, regular attribute encoding works just fine. (escape characters into HTML entities. escaping quotes, the ampersand and brackets is OK if attributes are guaranteed to be quotes. Escaping other alphanumeric characters will make the attribute safe if its accidentally unquoted.
The second point is vague and depends on what you want to do. Just remember to use a whitelist approach instead of a blacklist one its possible to use html entity encoding and other tricks to get around most simple blacklists.