how can I do this without getting "forbidden". Other sites do it, for example http://twitter.com?status=http://somesite.com works just fine. I've been looking everywhere for an answer. Please can somebody help! Please note my example is automatically encoded (imagine it without the %3A)
You will need to encode the url. A query string with an unencoded url is going to be a problem.
If you don't encode urls inside urls, then whoever is interpreting it will not see it as a valid URL. This is because in your example
http://twitter.com?status=http%3A//somesite.com
The %3A is a colon. But according to the URI specification, the colon is a schema delimiter (http, ftp, irc, whatever), and a uri can only contain one. And if I've read enough of these specs, I'm guessing it says the equivalent to "servers receiving an badly formed url should return an error message" or "..try to interpret it without guaranteeing a positive response".
Technically the // should also be escaped, since they are path delimiters, but only a server serving static content would react to that.
For the URI specification, see http://labs.apache.org/webarch/uri/rfc/rfc3986.html
If you are asking how to do this in Javascript you should use the escape/unescape and handle the special case of the / character.
Take a look at this reference.
Related
I am trying to use the following command to get the page's source code
requests.get("website").text
But I get an error:
UnicodeEncodeError: 'gbk' codec can't encode character '\xe6' in position 356: illegal multibyte sequence
Then I tried to change the page code to utf-8
requests.get("website").text.encode('utf-8')
But in addition to English will become the following form
\xe6°\xb8\xe4\xb9\x85\xe6\x8f\x90\xe4\xbe\x9b\xe5\x85\x8dè\xb4\xb9VPN\xe5\xb8\x90\xe5\x8f·\xe5\x92\x8c\xe5\x85\x8dè\xb4\xb
How can I do?
Thank you for your help
You can query the encoding of the requests.Response object by accessing its Response.content attribute.
Whenever you call requests.Response.text, the response object uses requests.Response.encoding to decode the bytes.
This may, however, not always be the correct encoding, hence you sometimes have to set it manually by looking it up in the content attribute, since websites usually specify the encoding there, if it's not utf-8 or similar (this is from experience, I'm not sure if this is actual standard behavior).
See more on requests.Response contents here
I have a json object which contains some values that include some special chracter. For example {UserName":"UserTest","Password":"OImqNlK/tLwUzKnt1rA1OA=="}
I use it as an object parameter to call to web Api but I can't get it on Api server.
it works fine if I remove the special characters. For example:
{UserName":"UserTest","Password":"OImqNlKtLwUzKnt1rA1OA"}
Please help me fix it. Many thanks!
First page when I put "Json Special Characters" into google : here
To be concrete to your question, this is the way
{UserName":"UserTest","Password":"OImqNlK\/tLwUzKnt1rA1OA=="}
I'm inserting untrusted data into a href attribute of an tag.
Based on the OWASP XSS Prevention Cheat Sheet, I should URI encode the untrusted data before inserting it into the href attribute.
But would HTML encoding also prevent XSS in this case? I know that it's an URI context and therefore I should use URI encoding, but are there any security advantages of URI encoding over using HTML encoding in this case?
The browser will render the link properly in both cases as far as I know.
I'm assuming this is Rule #5:
URL Escape Before Inserting Untrusted Data into HTML URL Parameter
Values
(Not rule #35.)
This is referring to individual parameter values:
<a href="http://www.example.com?test=...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...">link</a >
URL and HTML encoding protect against different things.
URL encoding prevents a parameter breaking out of a URL parameter context:
e.g. ?firstname=john&lastname=smith&salary=20000
Say this is a back-end request made by an admin user. If john and smith aren't correctly URL encoded then a malicious front-end user might enter their name as john&salary=40000 which would render the URL as
?firstname=john&salary=40000&lastname=smith&salary=20000
and say the back-end application takes the first parameter value in the case of duplicates. The user has successfully doubled their salary. This attack is known as HTTP Parameter Pollution.
So if you're inserting a parameter into a URL which is then inserted into an HTML document, you technically need to URL encode the parameter, then HTML encode the whole URL. However, if you follow the OWASP recommendation to the letter:
Except for alphanumeric characters, escape all characters with ASCII
values less than 256 with the %HH escaping format.
then this will ensure no characters with special meaning to HTML will be output, therefore you can skip the HTML encoding part, making it simpler.
Example - If user input is allowed to build a relative link (to http://server.com/), and javascript:alert(1) is provided by the user.
URL-encoding: <a href="javascript%3Aalert%281%29"> - Link will lead to http://server.com/javascript%3Aalert%281%29
Entity-encoding only: <a href="javascript:alert;(1)"> - Click leads to javascript execution
Good day! How are HTML parameters expressed in the routes file? I am trying to pass an HTML but I don't know how. All I know are passing integers ((id: Integer)) and some data types. I tried (content: Html)and (content: Html). I also tried javax.swing.text.html.HTML but it says something about QueryStringBindable. Please help me. Thanks a lot.
Remember that all you pass by route's params will be included in the URL so what is the advantage of using HTML in this place ? GET params should use only simple data types like numerical types, booleans and strings - so you can pass some HTML part as a String (preferably url-encoded or even beter with Base64 encoding).
Much better option is sending it via POST, your URLs won't be terrible long - you won't hit any limitation of URL length, also after common serialization it won't break at special HTML chars.
As with any user supplied data, the URLs will need to be escaped and filtered appropriately to avoid all sorts of exploits. I want to be able to
Put user supplied URLs in href attributes. (Bonus points if I don't get screwed if I forget to write the quotes)
...
Forbid malicious URLs such as javascript: stuff or links to evil domain names.
Allow some leeway for the users. I don't want to raise an error just because they forgot to add an http:// or something like that.
Unfortunately, I can't find any "canonical" solution to this sort of problem. The only thing I could find as inspiration is the encodeURI function from Javascript but that doesn't help with my second point since it just does a simple URL parameter encoding but leaving alone special characters such as : and /.
OWASP provides a list of regular expressions for validating user input, one of which is used for validating URLs. This is as close as you're going to get to a language-neutral, canonical solution.
More likely you'll rely on the URL parsing library of the programming language in use. Or, use a URL parsing regex.
The workflow would be something like:
Verify the supplied string is a well-formed URL.
Provide a default protocol such as http: when no protocol is specified.
Maintain a whitelist of acceptable protocols (http:, https:, ftp:, mailto:, etc.)
The whitelist will be application-specific. For an address-book app the mailto: protocol would be indispensable. It's hard to imagine a use case for the javascript: and data: protocols.
Enforce a maximum URL length - ensures cross-browser URLs and prevents attackers from polluting the page with megabyte-length strings. With any luck your URL-parsing library will do this for you.
Encode a URL string for the usage context. (Escaped for HTML output, escaped for use in an SQL query, etc.).
Forbid malicious URLs such as javascript: stuff or links or evil domain names.
You can utilize the Google Safe Browsing API to check a domain for spyware, spam or other "evilness".
For the first point, regular attribute encoding works just fine. (escape characters into HTML entities. escaping quotes, the ampersand and brackets is OK if attributes are guaranteed to be quotes. Escaping other alphanumeric characters will make the attribute safe if its accidentally unquoted.
The second point is vague and depends on what you want to do. Just remember to use a whitelist approach instead of a blacklist one its possible to use html entity encoding and other tricks to get around most simple blacklists.