So there's a small debate by my team, and I'm sure this is answered in many places but I couldn't find any definitive answers.
Right now we have a server that tosses up JSON data (think REST, sorta). The client is a complete JavaScript client that uses $.ajax to grab the data and render it appropriately.
The client is using UnderscoreJS templates to render data within the HTML:
<%- something %>
So if the server sends down a JSON block (non-html encoded):
{
"username": "Joe's Crab & Cookies"
}
Should the server be HTML or JavaScript encoding this value? Or should that still be left up to the client?
What if a bit of data from the server needs to be an attribute of an element:
<li data-item-id="<%= userId %>">something</li>
I realize that I shouldn't need to encode anything that's generated by the server, it's all data that is entered by the user. So imagine the "userId" above being set by a user, not generated.
So if we encode on the server and on the client we see on the rendered page:
Joe's Crab & Cookies
If you're sending json data somewhere, the only encoding that should be done to it is json encoding. You don't necessarily know if the values are going to end up in sql, javascript, xml, html attributes, a winforms app, etc.
Now, on the other hand, if some of your json values were to contain html, that html value should be encoded, ready-to-display html. It depends on context.
First of all, you should be escaping the value, if it can be set by a user.
You should use as much escaping and validation as possible -- both on the input fields, when capturing the data on the server, when inputting into DB and, finally, when rendering it back.
Among the mild consequences of not escaping data would be that it can crack your HTML when you're outputting to data-item-id.
Related
I have a REST API that accepts and returns JSON data.
A sample request response is a follows
Request
{
"repos": [
"some-repo",
"test-repo<script>alert(1)</script>"
]
}
Response
{
"error": "Error Message",
"repos": [
"test-repo<script>alert(1)</script>"
]
}
Is my API vulnerable for XSS?. From what I understand, since the Content-Type is set to application/json, the API as such is safe from XSS. The client needs to ensure that the output is encoded to prevent any XSS attacks.
To add an additional layer of security, I can add some input encoding/validation in the API layer.
Please let me know if my assessment is right and any other gotchas that I need to be aware of
I think it's right that any XSS issue here is a vulnerability of the client. If the client inserts HTML into a document, then it is its responsibility to apply any neccessary encoding.
The client knows what encoding is required not the server. Different encoding, or no encoding may be needed in different places for the same data. For example:
If a client did something like:
$(div).html("<b>" + repos + "</b>");
then it would be vulnerable to XSS, so repos would need to be HTML encoded here.
But if it did something like:
$(div).append($("<b>").text(repos));
then HTML encoding would have resulted in HTML entity codes being wrongly displayed to the user.
Or if the client wanted to do some processing of the data, it may want the plaintext data first to do the processing, and then encode it later to output it.
Input validation can help too, but the rules for what is valid input may not align with what is safe to use without encoding. Things like ampersands, quotes and brackets can appear in valid text data too. But if your data can't contain these characters, you can reject the input as invalid.
Your API will not be vulnerable to XSS unless it also provides a UI that consumes it. It will be the clients of your API which could be vulnerable - they will need to make sure they correctly encode any data from your API before they use it in any UI.
I think your api is vulnerable, though the script may not be execute. Talking to XSS prevention, we always suggest decode/validation dangerous characters when the api deals the input. There is a common requirement "clean the data before it store in the DB".
As for your situation, the api just response a json, but we don't know where the data in json will be used to. usually the frontend accept the data without any decode/validation, if that, there will be a xss.
you talk about that the client use decode the data they get from your api. Yes I agree, but frontend always "trust" the backend so that they won't decode the data. but the api server should not trust any input from frontend due to this can be controlled/changed by (malicious)users.
I have a situation where I have to write a api to create a resource and amongst datafields that I need to accept is a string that is basically contents of a html file. As I see it I have a choice between structuring the entire thing as a json object where this field is a string field with urlencoded html string , and having the Content Type as multipart/form-data where each of the fields and the html string (UTF-8 encoded) is a part in the message.
Not using json is something I am not comfortable with as I feel violating the REST standards in not structuring the content of the entity I am about to create thus there is a loss of information for the consumers as they can't tell immediately looking at my api definition about what data to feed to it. But practically multipart/form-data handles stuff like html file content better and more efficient as I will not have to urlencode it and can control the char-encoding also.
What will be a better approach in current context and upholding RESTful principles ? Also are there other trade-offs i should be aware of ? what about parsing a json with a huge string field (~ 200 Kb)embedded?
EDIT :- I was reading some similar questions on SO and one approach that stood out was the 2-step approach of making a first call with metadata to create the entity and then upload the file as an UPDATE process to the created entity wherein we use multipart/form-data. In that context, I guess , what I am asking is how sound is an approach where I send both metadata and the file in a single api call as multipart data , where each metadata field is actually a part in the multipart message as is the file.
The canonical way to upload files to REST API is using the multipart/form-data. As W3 recommendation guide says:
The content type "multipart/form-data" should be used for submitting
forms that contain files, non-ASCII data, and binary data.
Multipart/form-data has advantages over base64 to represent binary data. Is sticked to REST/Http philosophy, and simplify the develop of API clients.
Returning values from Forms: multipart/form-data
W3 Recommendation guide
The good practice is to use multipart/form-data whenever files are uploaded to the server along with database fields. Do not send a base64 JSON string as the request to your Rest API as it might corrupt the file or degrade the performance of your application.
As far as documenting multipart/form-data Rest API for your consumers is concerned you have to force your API consumers to use the same form fields which you have predefined in your web service.
Returning Values from Forms: multipart/form-data
I started using FormData objects everywhere on the client-side, in lieu of regular form input fields, for dynamic REST posts. FormData is presented in a positive light in various tutorials, so I went with it.
However, down the line, this caused me problems when decoding the form data into my Go structs. FormData objects are sent as "multipart/form-data" (regardless of files being sent) and I believe my decoder in Go didn't convert the raw data back to string form. Eventually my SQL queries were throwing panics, as hex data was being sent in where strings should have been.
So with some adjustment, I could use FormData however I've decided to revert to the simple universal recommendation: Use "multipart/form-data" only for special cases like when sending files. Otherwise, just use regular "application/x-www-form-urlencoded".
A REST API can have arguments in several places:
In the request body - As part of a json body, or other MIME type
In the query string - e.g. /api/resource?p1=v1&p2=v2
As part of the URL-path - e.g. /api/resource/v1/v2
What are the best practices and considerations of choosing between 1 and 2 above?
2 vs 3 is covered here.
What are the best practices and considerations of choosing between 1
and 2 above?
Usually the content body is used for the data that is to be uploaded/downloaded to/from the server and the query parameters are used to specify the exact data requested. For example when you upload a file you specify the name, mime type, etc. in the body but when you fetch list of files you can use the query parameters to filter the list by some property of the files. In general, the query parameters are property of the query not the data.
Of course this is not a strict rule - you can implement it in whatever way you find more appropriate/working for you.
You might also want to check the wikipedia article about query string, especially the first two paragraphs.
I'll assume you are talking about POST/PUT requests. Semantically the request body should contain the data you are posting or patching.
The query string, as part of the URL (a URI), it's there to identify which resource you are posting or patching.
You asked for a best practices, following semantics are mine. Of course using your rules of thumb should work, specially if the web framework you use abstract this into parameters.
You most know:
Some web servers have limits on the length of the URI.
You can send parameters inside the request body with CURL.
Where you send the data shouldn't have effect on debugging.
The following are my rules of thumb...
When to use the body:
When the arguments don't have a flat key:value structure
If the values are not human readable, such as serialized binary data
When you have a very large number of arguments
When to use the query string:
When the arguments are such that you want to see them while debugging
When you want to be able to call them manually while developing the code e.g. with curl
When arguments are common across many web services
When you're already sending a different content-type such as application/octet-stream
Notice you can mix and match - put the the common ones, the ones that should be debugable in the query string, and throw all the rest in the json.
The reasoning I've always used is that because POST, PUT, and PATCH presumably have payloads containing information that customers might consider proprietary, the best practice is to put all payloads for those methods in the request body, and not in the URL parms, because it's very likely that somewhere, somehow, URL text is being logged by your web server and you don't want customer data getting splattered as plain text into your log filesystem.
That potential exposure via the URL isn't an issue for GET or DELETE or any of the other REST operations.
Which is faster, to return ajax in JSON and then process JSON response to render the html, or just have the Ajax response the raw html in a bunch of <li></li>'s?
Depends. In both cases, the server is simply returning a response with text. If the JSON version of the response requires more characters than the HTML version, that response will take longer to be transmitted back to the client, and vice versa.
But of course there is also the server-side script which must do its work. Perhaps in your case generating JSON is faster than HTML from your server-side script. No way for me to know.
And then there is the client-side processing. You'd have to parse the response to turn it into a true object, and then you'd need to iterate over the resulting object in order to generate the HTML. This will definitely take longer than just taking an HTML response and injecting it into the DOM.
However, I doubt that the performance difference will be noticeable, meaning that your decision about providing a JSON response vs. HTML response should be based on other factors.
As already mentioned, that depends. From a server side point of view it makes a lot of sense to let the client generate the HTML because just serializing JSON is faster and takes a lot of strain off the server because it doesn't have to deal with all the HTML generation. An additional benefit is, that you offer an API when returning JSON that can be used for more than just outputting HTML.
If you want to take the work off the client it makes sense to generate the HTML on the server side.
In the end the speed of it depends a lot on the technologies used. Both ways can perform extremely well but when done wrong either one will be slow.
here as you can see i did the same response by HTML and JSON. the JSON response equals weight half what HTML response as kilobyte ,Thats means faster server-side respone. but in this case you have to rebulid the html from json so let calculate the json rebuild time and see
the first one is the html so it takes more time to do the server respone
now lets see appending it to html document
first one is html again the html proccess last longer than Json
RoR 3 automagically sanitizes ERB templates (when done correctly). However, I've got a little project where I'm using RoR for the application tier only and javascript for the presentation. So, typical request is ajax call to rails route and render returned json. Issue is it is currently possible for me to inject js, create a new product with title <script>alert('hello')</script> and this is returned as is on the next request and the browser happily interprets the script.
Is it best to
sanitize the inputs on post?
sanitize the json response on the server? (override to_json?)
sanitize the json response on the client?
I appreciate any input.
You should encode HTML entities on the content client-side as you're appending data to the page.
The question is, do your other product fields intentionally contain markup like links or paragraph tags that will also be encoded? If this is the case, and you intend to render some parts of the json response as HTML on the page, then you should be sanitizing your input at the point when new products are created, and limited the HTML tags you allow to a specific subset, and then scrubbing away their attributes. There are libraries to automate this process, like the sanitize gem.