REST interface: design for many parameters and big data

REST interface: design for many parameters and big data - json

I have a RESTful interface and mainly one function. The function mainly takes one big binary file as input and returns the modified file. Let's assume it's an encryption function (it's not, but similar). Nothing gets stored on the server, so repeating the call is no problem. I also need to pass many parameters, like type of encryption, some information about the caller, if the file needs to be converted first, etc.
I was thinking to implement this with a URL with something like this: //server/v1/encrypt, a POST request, and all parameters including the file (base64 encoded) in a JSON in the body.
Now the JSON definition says POST should be used for creation requests and it cannot be cached. Caching is not really important for me (as the file is always different), but I would like to follow standards and recommendations and best practice.
I would assume for the type of request, the best would be to use GET, but GET cannot have a file in the body (as per question 978061). And also the parameters in the body as JSON is probably also not a good idea. But is it really better to have 50 parameters in the URL (either as part of the path or part of GET parameters)?
How would you implement this and why?

Related

Non-standard JSON and Azure Logic Apps

I have an API that produces JSON like this:
)]}',
{
//JSON DATA
}
The //JSON DATA is valid JSON, but the )]}', up top is not.
When I try to GET this data via a Logic App, I get:
BadRequest. Http request failed: the content was not a valid JSON.
So, a few related questions:
1) Can I tell the logic app to return the invalid JSON anyway?
2) How can debug the issue better? I happen to know that the response is invalid, but what if I didn't? Can I see the raw data somewhere?
3) This is all done via the Azure web portal. Are there better tools? Visual Studio?
I should also mention that if I call a route on the same API that returns XML instead of JSON, then the Logic App works fine. So it definitely doesn't like the JSON response in particular.
Thanks!

First of all, please do not post three questions as a single question.
Question 1). The best thing you can do is make the API return a valid JSON object. This is good for million reasons. Here're a few:
it's pretty much a standard (either valid JSON or XML -- yeah, old school way);
therefore, no users of this API (including you) will need to struggle and guess what's going on and why;
your Logic App's step will just work without adding extra complexity;
you will make this world and your karma better.
If API-side changes are not within your reach, I don't think you can do much. If you're lucky and the HTTP action is successful (Status Code 2xx), you can try to use a Query Action with a function that truncates the first characters. It will look something like this (I don't know the exact syntax): #Substring(body('myHttpGet'), 4, length(body('myHttpGet')) - 4) where myHttpGet is the id of the Http Get action.
However, once again, if possible, I strongly recommend fixing up the API which is the root cause of the problem, instead of dealing with garbage response after that.
UPDATE Another thing you can do is wrap the dirty API. For example, you could create a trivial Azure Function that invokes the API you don't directly control, and sanitizes the response for you consumption requirements. This Azure Function function should be easy to call from the Logic App. It costs almost nothing (unless we're talking millions of requests/month). The only drawback here is the increasing latency, which may be not an issue at all -- test it and see whether it adds less than 100ms or so... Oh, and don't forget to file a ticket with the API owner, they make our world a bad place!
Question 2) In Azure Logic App web UI you can Look into the execution details and the error will definitely be there.
Question 3) You're asking for a tool recommendation which is by definition a highly subjective thing and is off-topic on StackOverflow.

TL/DR: The other app is not producing valid JSON.
Meaning, this is not a problem for you to solve. The other app has to return valid JSON if the owner claims it should.
If they cannot or will not produce valid JSON, then the first thing you need to do is inform your management that you will have to spend a lot of extra time accommodating their non-standard format.

Is it ever acceptable for REST API to return something other than JSON/XML?

I am currently trying to build a REST endpoint whereby an authenticated user can download a PDF. In researching the proper way to do this, I have mostly seen that JSON or XML are the proper response bodies to give. However, this site explains that the response can be something other than JSON as long as it's some human-readable document.
So, is it ever okay for a REST API to return application/pdf as the response type instead of application/json or application/xml?

Yes, definitely, a RESTful API can return whatever it wants. There is no constraint to be human-readable (although I think the linked article tries to argue exactly the opposite). Just think of the Web, which is REST-based, returning images, movies, sometimes even runnable code.
There are however some constraints. Any representation returned should be 'self-contained', meaning it has to has every piece of information necessary for the client to make sense of it. In this case, it would basically mean to just set the type 'application/pdf' properly on the response.

REST API Best practices: args in query string vs in request body

A REST API can have arguments in several places:
In the request body - As part of a json body, or other MIME type
In the query string - e.g. /api/resource?p1=v1&p2=v2
As part of the URL-path - e.g. /api/resource/v1/v2
What are the best practices and considerations of choosing between 1 and 2 above?
2 vs 3 is covered here.

What are the best practices and considerations of choosing between 1
and 2 above?
Usually the content body is used for the data that is to be uploaded/downloaded to/from the server and the query parameters are used to specify the exact data requested. For example when you upload a file you specify the name, mime type, etc. in the body but when you fetch list of files you can use the query parameters to filter the list by some property of the files. In general, the query parameters are property of the query not the data.
Of course this is not a strict rule - you can implement it in whatever way you find more appropriate/working for you.
You might also want to check the wikipedia article about query string, especially the first two paragraphs.

I'll assume you are talking about POST/PUT requests. Semantically the request body should contain the data you are posting or patching.
The query string, as part of the URL (a URI), it's there to identify which resource you are posting or patching.
You asked for a best practices, following semantics are mine. Of course using your rules of thumb should work, specially if the web framework you use abstract this into parameters.
You most know:
Some web servers have limits on the length of the URI.
You can send parameters inside the request body with CURL.
Where you send the data shouldn't have effect on debugging.

The following are my rules of thumb...
When to use the body:
When the arguments don't have a flat key:value structure
If the values are not human readable, such as serialized binary data
When you have a very large number of arguments
When to use the query string:
When the arguments are such that you want to see them while debugging
When you want to be able to call them manually while developing the code e.g. with curl
When arguments are common across many web services
When you're already sending a different content-type such as application/octet-stream
Notice you can mix and match - put the the common ones, the ones that should be debugable in the query string, and throw all the rest in the json.

The reasoning I've always used is that because POST, PUT, and PATCH presumably have payloads containing information that customers might consider proprietary, the best practice is to put all payloads for those methods in the request body, and not in the URL parms, because it's very likely that somewhere, somehow, URL text is being logged by your web server and you don't want customer data getting splattered as plain text into your log filesystem.
That potential exposure via the URL isn't an issue for GET or DELETE or any of the other REST operations.

Why use a JSON object to pass data with POST versus a Query String in Perl?

I'm looking to run a script using Stripe.pm, basically looking to do credit card processing. The credit card number is not being passed at all. All the examples I see use a JSON object passed in a POST call but I have a lot of experience using Query Strings i.e.
http://www.example.com/cgi-bin/processingscript.pl?param1=XXXX&param2=YYYYY&param3=ZZZZZ
Is this a security risk? What is the advantage or disadvantage of posting using JSON versus a query string like I'm used to using?

From a purely technical point of view, there is no difference between POST and GET if you pass a reasonably short parameter. You can also just pass JSON as a GET parameter no problem:
GET foo.pl?json={'foo':'bar'}
It would make sense to url-encode the data in this case. You can also send the same request using POST.
If you do not want to use query params at all, you need POST and put your JSON into the request body. Depending on which option you choose, there are differences in how to deal with it in Perl. Let's say you are using the CGI module... Perl makes no difference between POST and GET params.
For the query string GET or POST, you need to do:
use CGI;
my $cgi = CGI->new;
my $json = $cgi->param('json');
If you put the payload directly into the request body, you will instead need to do:
use CGI;
my $cgi = CGI->new;
$cgi->param('POSTDATA');
This is documented in CGI under "handling non url-encoded ...".
For JSON, there is also of course the time it takes to parse it, but that should be negligible.
The advantage JSON has over query strings without JSON inside them is, that you can encode arbitrary complex data structures inside JSON, while plain-text query strings are just one level deep.
From a security point of view, pretty much everything has been said. I'll recap my own ideas:
use SSL
do not put sensitive stuff into log files
if you are dealing with CC data (even if it is not the number itself), take extra care; read up on PCI DSS and encrypt stuff during transmission
NEVER store a cvc!
if you want to learn more about that topic, there is a Stack Exchange site called Information Security.

Security
- making sure no one other than user/server know the data GET/POST has no influence. Use SSL to ensure this.
- Stopping the user passing "interesting" arguments to your script. POST has a certain amount of security-through-obscurity in that most users won't see the parameters, but it's no real security. You should check the parameters in your application to cover that.
Generally POST with a nice data package (e.g. JSON) makes for a much more flexible and maintainable application interface, and has the advantage that you don't need to worry about encoding and length of parameters the same way you do when using GET.

POST:
Generally safer for important data
Parameters not shown in url
You can pass longer Strings-Structures (e.g. JSON) than by using GET that has length limit on the parameters you send

How would you pass HTTP Headers using a standard anchor tag?

According to the HTML4 reference there's no attribute to pass on HTTP headers using the anchor tag.
I would like to offer a link requesting for a specific file type using the Accept header.
The only way I can see is simply let it be, and pass a GET parameter.
You may as why I would want to do this... I intend to expose a bunch of methods as a public API, serving the results as JSON. And when doing requests using JavaScript, or another programming language, using the Accept header to request a specific response format is "The Right Way" to do it. But that would mean that I need to accommodate both the Accept header and the GET parameter in my code, which smells like a duplication of logic.
This topic is largely debatable, as such links may not be possible to bookmark in the browser... still... I'd like to know if it was possible without too much magic...

I don't see another way than using the GET parameter or an extension like
http://myurl/page?format=json
or better
http://myurl/page.json
Which overrides the accept header (since the browser will only send it's default
accept header). Then you just need to initialize a format to accept header mapping like this (which I don't find duplicate logic at all):
{
"json" : "application/json",
"html" : "text/html"
}

You can't.
I intend to expose a bunch of methods as a public API, serving the results as JSON. And when doing requests using JavaScript, or another programming language, using the Accept header to request a specific response format is "The Right Way" to do it. But that would mean that I need to accommodate both the Accept header and the GET parameter in my code, which smells like a duplication of logic.
If I understand you correctly, you don't have to do this anyway. Browsers already supply an Accept header.

Hmm, seems like if your results are JSON, you will be sending / receiving from script anyway, which can provide any header you want. Just have your link call a script function and you're done.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008