If I have the xml/html data to post we need to encode the data to avoid the XSS validation. So should we use HTMLencode or URI encoding for this.
If URI encoding is used will it cause issues as form POST automatically URI encode all the data before sending.
XSS is a problem caused by giving tainted data to the client. It can't be solved at the point where data is posted.
To protect against it, HTML encode the data (immediately) before placing it in an HTML document.
Remember: filter input, escape output.
Always filter input before placing it in a database (to avoid SQL injection etc)
Escape output before sending it to the client by filtering / encoding any HTML in the dynamic content.
Related
I have a REST API that accepts and returns JSON data.
A sample request response is a follows
Request
{
"repos": [
"some-repo",
"test-repo<script>alert(1)</script>"
]
}
Response
{
"error": "Error Message",
"repos": [
"test-repo<script>alert(1)</script>"
]
}
Is my API vulnerable for XSS?. From what I understand, since the Content-Type is set to application/json, the API as such is safe from XSS. The client needs to ensure that the output is encoded to prevent any XSS attacks.
To add an additional layer of security, I can add some input encoding/validation in the API layer.
Please let me know if my assessment is right and any other gotchas that I need to be aware of
I think it's right that any XSS issue here is a vulnerability of the client. If the client inserts HTML into a document, then it is its responsibility to apply any neccessary encoding.
The client knows what encoding is required not the server. Different encoding, or no encoding may be needed in different places for the same data. For example:
If a client did something like:
$(div).html("<b>" + repos + "</b>");
then it would be vulnerable to XSS, so repos would need to be HTML encoded here.
But if it did something like:
$(div).append($("<b>").text(repos));
then HTML encoding would have resulted in HTML entity codes being wrongly displayed to the user.
Or if the client wanted to do some processing of the data, it may want the plaintext data first to do the processing, and then encode it later to output it.
Input validation can help too, but the rules for what is valid input may not align with what is safe to use without encoding. Things like ampersands, quotes and brackets can appear in valid text data too. But if your data can't contain these characters, you can reject the input as invalid.
Your API will not be vulnerable to XSS unless it also provides a UI that consumes it. It will be the clients of your API which could be vulnerable - they will need to make sure they correctly encode any data from your API before they use it in any UI.
I think your api is vulnerable, though the script may not be execute. Talking to XSS prevention, we always suggest decode/validation dangerous characters when the api deals the input. There is a common requirement "clean the data before it store in the DB".
As for your situation, the api just response a json, but we don't know where the data in json will be used to. usually the frontend accept the data without any decode/validation, if that, there will be a xss.
you talk about that the client use decode the data they get from your api. Yes I agree, but frontend always "trust" the backend so that they won't decode the data. but the api server should not trust any input from frontend due to this can be controlled/changed by (malicious)users.
I'm inserting untrusted data into a href attribute of an tag.
Based on the OWASP XSS Prevention Cheat Sheet, I should URI encode the untrusted data before inserting it into the href attribute.
But would HTML encoding also prevent XSS in this case? I know that it's an URI context and therefore I should use URI encoding, but are there any security advantages of URI encoding over using HTML encoding in this case?
The browser will render the link properly in both cases as far as I know.
I'm assuming this is Rule #5:
URL Escape Before Inserting Untrusted Data into HTML URL Parameter
Values
(Not rule #35.)
This is referring to individual parameter values:
<a href="http://www.example.com?test=...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...">link</a >
URL and HTML encoding protect against different things.
URL encoding prevents a parameter breaking out of a URL parameter context:
e.g. ?firstname=john&lastname=smith&salary=20000
Say this is a back-end request made by an admin user. If john and smith aren't correctly URL encoded then a malicious front-end user might enter their name as john&salary=40000 which would render the URL as
?firstname=john&salary=40000&lastname=smith&salary=20000
and say the back-end application takes the first parameter value in the case of duplicates. The user has successfully doubled their salary. This attack is known as HTTP Parameter Pollution.
So if you're inserting a parameter into a URL which is then inserted into an HTML document, you technically need to URL encode the parameter, then HTML encode the whole URL. However, if you follow the OWASP recommendation to the letter:
Except for alphanumeric characters, escape all characters with ASCII
values less than 256 with the %HH escaping format.
then this will ensure no characters with special meaning to HTML will be output, therefore you can skip the HTML encoding part, making it simpler.
Example - If user input is allowed to build a relative link (to http://server.com/), and javascript:alert(1) is provided by the user.
URL-encoding: <a href="javascript%3Aalert%281%29"> - Link will lead to http://server.com/javascript%3Aalert%281%29
Entity-encoding only: <a href="javascript:alert;(1)"> - Click leads to javascript execution
Is it necessary to percent encode a URI before using it in the browser i.e when we write a URI in a browser should it already be percent encoded or it is the responsibility of the browser to encode the URI and send the request to the server?
You'll find that most modern browsers will accept a non-encoded URL and they will generally be able to encode reserved characters themselves.
However, it is bad practice to rely on this because you can end up with unpredictable results. For instance, if you were sending form data to a server using a GET request and someone had typed in a # symbol, the browser will interpret that differently if it was encoded or non-encoded.
In short, it's always best to encode data manually to get predictable results if you're expecting reserved characters in a request. Fortunately most programming languages used on the web have built in functions for this.
Just to add, you don't need to encode the whole URL - it's usually the data you're sending in a GET request which gets encoded. For example:
http://www.foo.com?data=This%20is%20my%20encoded%20string%20%23
So there's a small debate by my team, and I'm sure this is answered in many places but I couldn't find any definitive answers.
Right now we have a server that tosses up JSON data (think REST, sorta). The client is a complete JavaScript client that uses $.ajax to grab the data and render it appropriately.
The client is using UnderscoreJS templates to render data within the HTML:
<%- something %>
So if the server sends down a JSON block (non-html encoded):
{
"username": "Joe's Crab & Cookies"
}
Should the server be HTML or JavaScript encoding this value? Or should that still be left up to the client?
What if a bit of data from the server needs to be an attribute of an element:
<li data-item-id="<%= userId %>">something</li>
I realize that I shouldn't need to encode anything that's generated by the server, it's all data that is entered by the user. So imagine the "userId" above being set by a user, not generated.
So if we encode on the server and on the client we see on the rendered page:
Joe's Crab & Cookies
If you're sending json data somewhere, the only encoding that should be done to it is json encoding. You don't necessarily know if the values are going to end up in sql, javascript, xml, html attributes, a winforms app, etc.
Now, on the other hand, if some of your json values were to contain html, that html value should be encoded, ready-to-display html. It depends on context.
First of all, you should be escaping the value, if it can be set by a user.
You should use as much escaping and validation as possible -- both on the input fields, when capturing the data on the server, when inputting into DB and, finally, when rendering it back.
Among the mild consequences of not escaping data would be that it can crack your HTML when you're outputting to data-item-id.
I am designing a Web API which returns JSON as the content-type, the response body could contain characters like ', ", < and >, they are valid characters in JSON. So, my question is should I do HTML encode for my Web API response body or should I leave this task to HTML client who is consuming my Web API?
No; you must not.
You must only escape data if and when you concatenate it into a structured format.
If you return JSON like { "text": "Content by X & Y" }, anyone who reads that JSON will see the literal text &.
It will only work correctly for extremely broken clients who concatenate it directly into their HTML without escaping.
In short:
Never escape text except when you're about to display it
What platform are you using? For example, Node.js, you can use restify to handle that very well. You don't need to explicitly encode the data. Therefore, please find a restful framework or component to help you out.