REST API Accept header json/pdf - json

I'm currently working with an API, which I would think is doing a wrong implementation of a REST API.
We have the following endpoints
GET v1/{organizationId}/invoices
GET v1/{organizationId}/invoices/{guid}
When you GET the first, you get all invoices, but if you get the second endpoint, without the Accept: Application/json header, you get the response from the first endpoint instead.
The company, which provides the API, says the reason for this is because the second endpoint can give both JSON and a PDF output, which would seem as a PDF would be
GET v1/{organizationId}/invoices/{guid}/pdf
Besides that, if send a malformed guid, you get a full blown HTML 404 error page instead of, e.g., a blank page or an error message.
To sum it up
Is the 2. endpoint handled properly?
Is it valid to make PDF an output like JSON/XML?
Shouldn't it be a new endpoint instead?
Is HTML allowed as an error response, even for a malformed request?

Some people don't like to categorize REST-based questions "right" and "wrong", but I will go ahead anyway:
No, the second resource should always return a representation of the second resource, that is, an invoice, never a collection of invoices.
Yes, it is completely OK to potentially return multiple representations of the same "thing". An invoice can conceivably return both json and pdf, and can select the correct one based on the "Accept" header.
No, it shouldn't be a new "endpoint", because these "things" are not "endpoints", but resources. An invoice is a resource, the json document is just a "representation" of the resource.
Yes, an error response can contain a body, with whatever representation the server is comfortable sending. The server should of course orient itself using the "Accept" header from the client, but is not required to do so.
A few remarks: It should not be possible to GET an invoice with a wrong guid, since these resources should be linked from the collection resource. When the client follows links (as it should), there should be no way to follow a link with a wrong guid.
It is wrong to return a completely different resource (invoice collection instead of an invoice), but it is OK to have multiple representations of the same resource. If there is no "Accept" header, the server should pick one, or indicate 406 Not Acceptable.
One small point: The mime-type should not be application/json, because how do you know it's an invoice then? It should be something like: application/vnd.someprovider.invoice+json. No one bothers with mime-types these days, but I mention it just for completeness sake.

Related

Is it ever acceptable for REST API to return something other than JSON/XML?

I am currently trying to build a REST endpoint whereby an authenticated user can download a PDF. In researching the proper way to do this, I have mostly seen that JSON or XML are the proper response bodies to give. However, this site explains that the response can be something other than JSON as long as it's some human-readable document.
So, is it ever okay for a REST API to return application/pdf as the response type instead of application/json or application/xml?
Yes, definitely, a RESTful API can return whatever it wants. There is no constraint to be human-readable (although I think the linked article tries to argue exactly the opposite). Just think of the Web, which is REST-based, returning images, movies, sometimes even runnable code.
There are however some constraints. Any representation returned should be 'self-contained', meaning it has to has every piece of information necessary for the client to make sense of it. In this case, it would basically mean to just set the type 'application/pdf' properly on the response.

REST API Best practices: args in query string vs in request body

A REST API can have arguments in several places:
In the request body - As part of a json body, or other MIME type
In the query string - e.g. /api/resource?p1=v1&p2=v2
As part of the URL-path - e.g. /api/resource/v1/v2
What are the best practices and considerations of choosing between 1 and 2 above?
2 vs 3 is covered here.
What are the best practices and considerations of choosing between 1
and 2 above?
Usually the content body is used for the data that is to be uploaded/downloaded to/from the server and the query parameters are used to specify the exact data requested. For example when you upload a file you specify the name, mime type, etc. in the body but when you fetch list of files you can use the query parameters to filter the list by some property of the files. In general, the query parameters are property of the query not the data.
Of course this is not a strict rule - you can implement it in whatever way you find more appropriate/working for you.
You might also want to check the wikipedia article about query string, especially the first two paragraphs.
I'll assume you are talking about POST/PUT requests. Semantically the request body should contain the data you are posting or patching.
The query string, as part of the URL (a URI), it's there to identify which resource you are posting or patching.
You asked for a best practices, following semantics are mine. Of course using your rules of thumb should work, specially if the web framework you use abstract this into parameters.
You most know:
Some web servers have limits on the length of the URI.
You can send parameters inside the request body with CURL.
Where you send the data shouldn't have effect on debugging.
The following are my rules of thumb...
When to use the body:
When the arguments don't have a flat key:value structure
If the values are not human readable, such as serialized binary data
When you have a very large number of arguments
When to use the query string:
When the arguments are such that you want to see them while debugging
When you want to be able to call them manually while developing the code e.g. with curl
When arguments are common across many web services
When you're already sending a different content-type such as application/octet-stream
Notice you can mix and match - put the the common ones, the ones that should be debugable in the query string, and throw all the rest in the json.
The reasoning I've always used is that because POST, PUT, and PATCH presumably have payloads containing information that customers might consider proprietary, the best practice is to put all payloads for those methods in the request body, and not in the URL parms, because it's very likely that somewhere, somehow, URL text is being logged by your web server and you don't want customer data getting splattered as plain text into your log filesystem.
That potential exposure via the URL isn't an issue for GET or DELETE or any of the other REST operations.

Is there any action I could do with POST, but not with GET?

I know differences and advantages of each command, my question is could I replace POST requests with GET everywhere? And which these commands calls by default while sending request from html form?
could I replace POST requests with GET everywhere
No (and it would be a terrible idea to try).
Things that a form can do with POST that you can't do with GET include:
Sending lots of data
Sending files
There are other things that would simply be stupid to do with GET.
From http://www.w3.org/TR/html5/forms.html#attr-fs-method :
The method and formmethod content attributes are enumerated attributes
with the following keywords and states:
The keyword get, mapping to the state GET, indicating the HTTP GET
method. The keyword post, mapping to the state POST, indicating the
HTTP POST method. The invalid value default for these attributes is
the GET state. (There is no missing value default.)
When using GET to transfer data from the client to the server, the data is added to the URL, there is not BODY of the request. There is usually a limit on how long a URL can be, in the old days this was 1024 characters but that really depends on the server software and server middleware and even the browser.
This means if you want to transfer loads or data or upload a file to the server, you can not do it with GET.

Why should JSON have a status property

I stumbled over a practice that I found to be quite widespread. I even found a web page that gave this a name, but I forgot the name and am not able to find that page on google anymore.
The practice is that every JSON response from a REST service should have the following structure:
{
"status": "ok",
"data": { ... }
}
or in an error case:
{
"status": "error",
"message": "Something went wrong"
}
My question: What is the point why such a "status" property should be required in the JSON? In my opinion that is what HTTP status codes were made for.
REST uses the HTTP means of communication between client and server, for example the "DELETE" verb should be used for deleting. In the same way, 404 should be used if a resource is not found, etc. So inline with that thinking, any error cases should be encoded properly in the HTTP status.
Are there specific reasons to return a HTTP 200 status code in an error case and have the error in the JSON instead? It just seems to make the javascript conditional branches more complex when processing the response.
I found some cases where status could be "redirect" to tell the application to redirect to a certain URL. But if the proper HTTP status code was used, the browser would perform the redirection "for free", maintaining the browsing history properly.
I picture mainly two possible answers from you:
Either there are two quarreling communities with their favorite approach each (use HTTP status always vs. use HTTP status never)
or I am missing an important point and you'll tell me that although the HTTP status should be used for some cases, there are specific cases where a HTTP status does not fit and the "status" JSON property comes into play.
You are right. I think what you are seeing is a side-effect of people not doing REST correctly. Or just not doing REST at all. Using REST is not a pre-requisite for a well-designed application; there is no rule that webapps have to be REST-ful.
On the other hand, for the error condition, sometimes apps want to return a 200 code but an error to represent a business logic failure. The HTTP error codes don't always match the semantics of application business errors.
You are mixing two different Layers here:
HTTP is for establishing (high-level) connections and transferring data. The HTTP status codes thus informs you if and how the connection was established or why it was not. On a successful connection the body of the HTTP request could then contain anything (e.g. XML, JSON, etc.), thus these status code have to define a general meaning. It does not inform you about the correctness or type (e.g. error message or data) of the response.
When using JSON for interchanging data you could certainly omit the status property, however it is easier for you to parse the JSON, if you know if it includes the object you were requesting or an error message by just reading one property.
So, yes, it is perfectly normal to return a 200 status code and have a "status": "error" property in your JSON.
HTTP status codes can be caused by a lot of things, including load balancers, proxies, caches, firewalls, etc. None of these are going to modify your JSON output, unless they completely break it, which can also be treated as an error.
Bottom line: it's more reliable to do it via JSON.

Do HTTP POST methods send data as a QueryString?

I'd like to know if the POST method on HTTP sends data as a QueryString, or if it use a special structure to pass the data to the server.
In fact, when I analyze the communication with POST method from client to server (with Fiddler for example), I don't see any QueryString, but a Form Body context with the name/value pairs.
The best way to visualize this is to use a packet analyzer like Wireshark and follow the TCP stream. HTTP simply uses TCP to send a stream of data starting with a few lines of HTTP headers. Often this data is easy to read because it consists of HTML, CSS, or XML, but it can be any type of data that gets transfered over the internet (Executables, Images, Video, etc).
For a GET request, your computer requests a specific URL and the web server usually responds with a 200 status code and the the content of the webpage is sent directly after the HTTP response headers. This content is the same content you would see if you viewed the source of the webpage in your browser. The query string you mentioned is just part of the URL and gets included in the HTTP GET request header that your computer sends to the web server. Below is an example of an HTTP GET request to http://accel91.citrix.com:8000/OA_HTML/OALogout.jsp?menu=Y, followed by a 302 redirect response from the server. Some of the HTTP Headers are wrapped due to the size of the viewing window (these really only take one line each), and the 302 redirect includes a simple HTML webpage with a link to the redirected webpage (Most browsers will automatically redirect any 302 response to the URL listed in the Location header instead of displaying the HTML response):
For a POST request, you may still have a query string, but this is uncommon and does not have anything to do with the data that you are POSTing. Instead, the data is included directly after the HTTP headers that your browser sends to the server, similar to the 200 response that the web server uses to respond to a GET request. In the case of POSTing a simple web form this data is encoded using the same URL encoding that a query string uses, but if you are using a SOAP web service it could also be encoded using a multi-part MIME format and XML data.
For example here is what an HTTP POST to an XML based SOAP web service located at http://192.168.24.23:8090/msh looks like in Wireshark Follow TCP Stream:
Post uses the message body to send the information back to the server, as opposed to Get, which uses the query string (everything after the question mark). It is possible to send both a Get query string and a Post message body in the same request, but that can get a bit confusing so is best avoided.
Generally, best practice dictates that you use Get when you want to retrieve data, and Post when you want to alter it. (These rules aren't set in stone, the specs don't forbid altering data with Get, but it's generally avoided on the grounds that you don't want people making changes just by clicking a link or typing a URL)
Conversely, you can use Post to retrieve data without changing it, but using Get means you can bookmark the page, or share the URL with other people, things you couldn't do if you'd used Post.
http://en.wikipedia.org/wiki/GET_%28HTTP%29
http://en.wikipedia.org/wiki/POST_%28HTTP%29
As for the actual format of the data sent in the message body, that's entirely up to the sender and is specified with the Content-Type header. If not specified, the default content-type for HTML forms is application/x-www-form-urlencoded, which means the server will expect the post body to be a string encoded in a similar manner to a GET query string. However this can't be depended on in all cases. RFC2616 says the following on the Content-Type header:
Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body. If
and only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its
content and/or the name extension(s) of the URI used to identify the
resource. If the media type remains unknown, the recipient SHOULD
treat it as type "application/octet-stream".
A POST request can include a query string, however normally it doesn't - a standard HTML form with a POST action will not normally include a query string for example.
GET will send the data as a querystring, but POST will not. Rather it will send it in the body of the request.
If your post try to reach the following URL
mypage.php?id=1
you will have the POST data but also GET data.