I am designing a Web API which returns JSON as the content-type, the response body could contain characters like ', ", < and >, they are valid characters in JSON. So, my question is should I do HTML encode for my Web API response body or should I leave this task to HTML client who is consuming my Web API?
No; you must not.
You must only escape data if and when you concatenate it into a structured format.
If you return JSON like { "text": "Content by X & Y" }, anyone who reads that JSON will see the literal text &.
It will only work correctly for extremely broken clients who concatenate it directly into their HTML without escaping.
In short:
Never escape text except when you're about to display it
What platform are you using? For example, Node.js, you can use restify to handle that very well. You don't need to explicitly encode the data. Therefore, please find a restful framework or component to help you out.
Related
I have a REST API that accepts and returns JSON data.
A sample request response is a follows
Request
{
"repos": [
"some-repo",
"test-repo<script>alert(1)</script>"
]
}
Response
{
"error": "Error Message",
"repos": [
"test-repo<script>alert(1)</script>"
]
}
Is my API vulnerable for XSS?. From what I understand, since the Content-Type is set to application/json, the API as such is safe from XSS. The client needs to ensure that the output is encoded to prevent any XSS attacks.
To add an additional layer of security, I can add some input encoding/validation in the API layer.
Please let me know if my assessment is right and any other gotchas that I need to be aware of
I think it's right that any XSS issue here is a vulnerability of the client. If the client inserts HTML into a document, then it is its responsibility to apply any neccessary encoding.
The client knows what encoding is required not the server. Different encoding, or no encoding may be needed in different places for the same data. For example:
If a client did something like:
$(div).html("<b>" + repos + "</b>");
then it would be vulnerable to XSS, so repos would need to be HTML encoded here.
But if it did something like:
$(div).append($("<b>").text(repos));
then HTML encoding would have resulted in HTML entity codes being wrongly displayed to the user.
Or if the client wanted to do some processing of the data, it may want the plaintext data first to do the processing, and then encode it later to output it.
Input validation can help too, but the rules for what is valid input may not align with what is safe to use without encoding. Things like ampersands, quotes and brackets can appear in valid text data too. But if your data can't contain these characters, you can reject the input as invalid.
Your API will not be vulnerable to XSS unless it also provides a UI that consumes it. It will be the clients of your API which could be vulnerable - they will need to make sure they correctly encode any data from your API before they use it in any UI.
I think your api is vulnerable, though the script may not be execute. Talking to XSS prevention, we always suggest decode/validation dangerous characters when the api deals the input. There is a common requirement "clean the data before it store in the DB".
As for your situation, the api just response a json, but we don't know where the data in json will be used to. usually the frontend accept the data without any decode/validation, if that, there will be a xss.
you talk about that the client use decode the data they get from your api. Yes I agree, but frontend always "trust" the backend so that they won't decode the data. but the api server should not trust any input from frontend due to this can be controlled/changed by (malicious)users.
I have a situation where I have to write a api to create a resource and amongst datafields that I need to accept is a string that is basically contents of a html file. As I see it I have a choice between structuring the entire thing as a json object where this field is a string field with urlencoded html string , and having the Content Type as multipart/form-data where each of the fields and the html string (UTF-8 encoded) is a part in the message.
Not using json is something I am not comfortable with as I feel violating the REST standards in not structuring the content of the entity I am about to create thus there is a loss of information for the consumers as they can't tell immediately looking at my api definition about what data to feed to it. But practically multipart/form-data handles stuff like html file content better and more efficient as I will not have to urlencode it and can control the char-encoding also.
What will be a better approach in current context and upholding RESTful principles ? Also are there other trade-offs i should be aware of ? what about parsing a json with a huge string field (~ 200 Kb)embedded?
EDIT :- I was reading some similar questions on SO and one approach that stood out was the 2-step approach of making a first call with metadata to create the entity and then upload the file as an UPDATE process to the created entity wherein we use multipart/form-data. In that context, I guess , what I am asking is how sound is an approach where I send both metadata and the file in a single api call as multipart data , where each metadata field is actually a part in the multipart message as is the file.
The canonical way to upload files to REST API is using the multipart/form-data. As W3 recommendation guide says:
The content type "multipart/form-data" should be used for submitting
forms that contain files, non-ASCII data, and binary data.
Multipart/form-data has advantages over base64 to represent binary data. Is sticked to REST/Http philosophy, and simplify the develop of API clients.
Returning values from Forms: multipart/form-data
W3 Recommendation guide
The good practice is to use multipart/form-data whenever files are uploaded to the server along with database fields. Do not send a base64 JSON string as the request to your Rest API as it might corrupt the file or degrade the performance of your application.
As far as documenting multipart/form-data Rest API for your consumers is concerned you have to force your API consumers to use the same form fields which you have predefined in your web service.
Returning Values from Forms: multipart/form-data
I started using FormData objects everywhere on the client-side, in lieu of regular form input fields, for dynamic REST posts. FormData is presented in a positive light in various tutorials, so I went with it.
However, down the line, this caused me problems when decoding the form data into my Go structs. FormData objects are sent as "multipart/form-data" (regardless of files being sent) and I believe my decoder in Go didn't convert the raw data back to string form. Eventually my SQL queries were throwing panics, as hex data was being sent in where strings should have been.
So with some adjustment, I could use FormData however I've decided to revert to the simple universal recommendation: Use "multipart/form-data" only for special cases like when sending files. Otherwise, just use regular "application/x-www-form-urlencoded".
I've been working with ServiceStack for a while now and recently a new need came up that requires receiving of some html templates inside a JSON request body. I'm obviously thinking about escaping this HTML and it seems to work but I would prefer consumers not to bother escaping a potentially large template. I know ASP.NET is capable of handling this but I wonder if it can be done with ServiceStack.
The following JSON body is received incomplete at the REST endpoint because of the second double quote ...="test...
{
"template" : "<html><body><div name="test"></div></body></html>"
}
"I would prefer consumers not to bother escaping a potentially large template"
I'm not sure I follow. Why would the consumers know anything about escaping a template? Shouldn't that be transparent to the consumer? Any call to JSON.stringify(sourceString) or sourceString.toJson() will automatically escape embedded double-quotes.
"I know ASP.NET is capable of handling this"
Embedded double-quotes must be escaped in valid json syntax. I don't see how ASP.NET wouldn't have the same problem. Am I missing something?
Recently I run into some weird issue with http header usage ( Adding multiple custom http request headers mystery) To avoid the problem at that time, I have put the fields into json string and add that json string into header instead of adding those fields into separate http headers.
For example, instead of
request.addHeader("UserName", mUserName);
request.addHeader("AuthToken", mAuthorizationToken);
request.addHeader("clientId","android_client");
I have created a json string and add it to the single header
String jsonStr="{\"UserName\":\"myname\",\"AuthToken\":\"123456\",\"clientId\":\"android_client\"}";
request.addHeader("JSonStr",jsonStr);
Since I am new to writing Rest and dealing with the Http stuff, I don't know if my usage is proper or not. I would appreciate some insight into this.
Some links
http://lists.w3.org/Archives/Public/ietf-http-wg/2011OctDec/0133.html
Yes, you may use JSON in HTTP headers, given some limitations.
According to the HTTP spec, your header field-body may only contain visible ASCII characters, tab, and space.
Since many JSON encoders (e.g. json_encode in PHP) will encode invisible or non-ASCII characters (e.g. "é" becomes "\u00e9"), you often don't need to worry about this.
Check the docs for your particular encoder or test it, though, because JSON strings technically allow most any Unicode character. For example, in JavaScript JSON.stringify() does not escape multibyte Unicode, by default. However, you can easily modify it to do so, e.g.
var charsToEncode = /[\u007f-\uffff]/g;
function http_header_safe_json(v) {
return JSON.stringify(v).replace(charsToEncode,
function(c) {
return '\\u'+('000'+c.charCodeAt(0).toString(16)).slice(-4);
}
);
}
Source
Alternatively, you can do as #rocketspacer suggested and base64-encode the JSON before inserting it into the header field (e.g. how JWT does it). This makes the JSON unreadable (by humans) in the header, but ensures that it will conform to the spec.
Worth noting, the original ARPA spec (RFC 822) has a special description of this exact use case, and the spirit of this echoes in later specs such as RFC 7230:
Certain field-bodies of headers may be interpreted according to an
internal syntax that some systems may wish to parse.
Also, RFC 822 and RFC 7230 explicitly give no length constraints:
HTTP does not place a predefined limit on the length of each header field or on the length of the header section as a whole, as described in Section 2.5.
Base64encode it before sending. Just like how JSON Web Token do it.
Here's a NodeJs Example:
const myJsonStr = JSON.stringify(myData);
const headerFriendlyStr = Buffer.from(myJsonStr, 'utf8').toString('base64');
res.addHeader('foo', headerFriendlyStr);
Decode it when you need reading:
const myBase64Str = req.headers['foo'];
const myJsonStr = Buffer.from(myBase64Str, 'base64').toString('utf8');
const myData = JSON.parse(myJsonStr);
Generally speaking you do not send data in the header for a REST API. If you need to send a lot of data it best to use an HTTP POST and send the data in the body of the request. But it looks like you are trying to pass credentials in the header, which some REST API's do use. Here is an example for passing the credentials in a REST API for a service called SMSIfied, which allows you to send SMS text message via the Internet. This example is using basic authentication, which is a a common technique for REST API's. But you will need to use SSL with this technique to make it secure. Here is an example on how to implement basic authentication with WCF and REST.
From what I understand using a json string in the header option is not as much of an abuse of usage as using http DELETE for http GET, thus there has even been proposal to use json in http header. Of course more thorough insights are still welcome and the accepted answer is still to be given.
In the design-first age the OpenAPI specification gives another perspective:
an HTTP header parameter may be an object,
e.g. object X-MyHeader is
{"role": "admin", "firstName": "Alex"}
however, within the HTTP request/response the value is serialized, e.g.
X-MyHeader: role=admin,firstName=Alex
G'day gurus,
I'm calling the REST APIs of an enterprise application that shall remain nameless, and they return JSON such as the following:
throw 'allowIllegalResourceCall is false.';
{
"data": ... loads of valid JSON stuff here ...
}
Is this actually valid JSON? If (as I suspect) it isn't, is there any compelling reason for these kinds of shenanigans?
The response I received from the application vendor is that this is done for security purposes, but I'm struggling to understand how this improves security much, if at all.
Thanks in advance!
Peter
According to
http://jsonlint.com/
It is not.
Something like the below is.
{
"data": "test"
}
Are they expecting you to pull the JSon load out of the message above?
Its not a JSON format at all. From your question it seems you are working with enterprise systems like JIVE :). I am also facing same issue with JIVE api. This is the problem with their V3 API. Not standard , but following thing worked for me. (I am not sure if you are talking about JIVE or not)
//invalid jason response... https://developers.jivesoftware.com/community/thread/2153
jiveResponse = jiveResponse.Replace
("throw 'allowIllegalResourceCall is false.';",String.Empty);
There is a valid reason for this: it protects against CSRF attacks. If you include a JSON url as the target of a <script> tag, then the same-origin policy doesn't apply. This means that a malicious site can include the URL of a JSON API, and any authenticated users will successfully request that data.
By appropriately overriding Object.prototype and/or Array.prototype, the malicious site can get any data parsed as an object literal or array literal (and all valid JSON is also valid javascript). The throw statement protects against this by making it impossible to parse javascript included on a page via <script> tags.
Definitely NOT valid JSON. Maybe there's an error in the implementation that is mixing some kind of debug output with the correct output?
And, by no means this is for security reasons. Seems to me this is a plain bug.
throw 'allowIllegalResourceCall is false.'; is certainly not valid JSON.
What MIME type is reported?
It seems they have added that line to prevent JSON Hijacking. Something like that line is required to prevent JSON Hijacking only if you return a JSON array. But they may have added that line above all of their JSON responses for easier implementation.
Before using it, you have to strip out the first line, and then parse the remaining as JSON.