Where can I find explicit documentation on what's NOT covered by Status Code 200? - json

We had a discussion today about an transfer operation resulting in status code 200, OK. There were two objects returned looking like this.
First one being fairly graspable (and following the expected contract).
{ name: "john", age: 34, city: "stockholm" }
Second one, following the contract but with unquestionably wrong data.
{ name: null, age: -3.141526, city: "http://some.com/address/poof" }
One party claimed that the status code 200 is incorrect because the values are wrong. The other side argued that the status code describes the operation as such and the format of the request/response, which went well because the transfer agrees with the contract.
It's fairly obvious that the REST endpoint gets an exception from the sources it fetches the data from. And so, the first party wanted the result to be either 404 not found or 500 internal error. The other side was open to it under the condition that the object structure is empty (nulls all the way) in the former case and that it doesn't attempt to follow the agreed format in the latter case.
Checking out the Kamasutra it's said that:
The request has succeeded. The information returned with the response is dependent on the method used in the request.
Now, technically speaking, we can't know for sure if the resource requested has a name, might be planned to be born in PI years and happens to reside in a city that changed its name to an URL. That is actually possible, although highly unlikely. However, I'd like to see an explicit statement of what isn't included in status code 200.
The question: is it valid to demand status code 400 or higher because the values are seemingly (or even obviously) wrong?

Don't use the RFC 2616
The RFC 2616 is completely irrelevant nowadays once it has been replaced by a set of new RFCs that together define the HTTP/1.1 protocol:
RFC 7230: HTTP/1.1: Message Syntax and Routing
RFC 7231: HTTP/1.1: Semantics and Content
RFC 7232: HTTP/1.1: Conditional Requests
RFC 7233: HTTP/1.1: Range Requests
RFC 7234: HTTP/1.1: Caching
RFC 7235: HTTP/1.1: Authentication
Status codes
For the HTTP status codes, refer to the RFC 7231. Such document defines what each status code indicates. Pick the one that best gives the result of the attempt to understand and satisfy the request.
This document also defines the classes of the status codes, that helps to determine the most suitable status for the response:
The first digit of the status-code defines the class of response. The last two digits do not have any categorization role. There are five values for the first digit:
1xx (Informational): The request was received, continuing process
2xx (Successful): The request was successfully received,
understood, and accepted
3xx (Redirection): Further action needs to be taken in order to
complete the request
4xx (Client Error): The request contains bad syntax or cannot be
fulfilled
5xx (Server Error): The server failed to fulfill an apparently
valid request
Just bear in mind that HTTP status codes are extensible. The RFC 7231 does not include extension status codes defined in other specifications. The complete list of status codes is maintained by IANA.
Unprocessable entity
The 2xx class of status code indicates that request was successfully received, understood, and accepted. Once you won't accept invalid data, that is, an entity that cannot be processed by the server, the 200 status code is not suitable for the this situation.
The 422 status code is what you are looking for: the syntax of the request payload is valid but it cannot be processed due to invalid data. Have a look:
11.2. 422 Unprocessable Entity
The 422 (Unprocessable Entity) status code means the server
understands the content type of the request entity (hence a
415 (Unsupported Media Type) status code is inappropriate), and the
syntax of the request entity is correct (thus a 400 (Bad Request)
status code is inappropriate) but was unable to process the contained
instructions. For example, this error condition may occur if an XML
request body contains well-formed (i.e., syntactically correct), but
semantically erroneous, XML instructions.
For your situation, just read JSON instead of XML.
The 422 is registered in IANA and defined in the RFC 4918, the document that defines WebDAV, an extension for the HTTP protocol.
Decision charts
Michael Kropat put together a set of decision charts that helps determine the best status code for each situation. The status codes are grouped into three rough categories:
Start here:
Choosing 2xx and 3xx status codes:
Choosing 4xx status codes:
Choosing 5xx status codes:

Related

Error code pattern for API

What are the good choice for API error code response pattern?
Instead of using different codes indicating different type of error
100001 // username not provided
100002 // password not provided
100003 // password too short
...
I see some other use patterns like the following (non-sequential) ...
20000
20001
20004
20015
Are there any other recommendations?
In my experience developing and using web services, I have found that a strategy of using a combination of top-level HTTP status codes and lower level API error codes work reasonably well. Note that the lower level API error codes don't need to be integers, but can be any enumeration. For a well-known public example, AWS Simple Email Service (SES) uses this strategy of using both HTTP status codes and API level error codes. You can see a sample error code response for SES here. Note that although SES uses XML response error payloads, this strategy works equally well for JSON response payloads.
In my experience, there are a few things that you need to keep in mind when using this strategy:
Strive to return the correct HTTP response code: HTTP is a ubiquitous protocol and is no doubt understood by your web container. Its response codes fit naturally into REST web services. As such, leverage it! If your web service encounters an error condition, you should do your best to return the correct HTTP status code in whose context, the API error code has meaning. One my biggest headaches in debugging issues with web services occur when developers just unconditionally throw arbitrary (usually runtime) exceptions back up the stack. The result is that everything gets returned back to the caller as an HTTP 500 (Internal Server Error) status code even when that's not the case (e.g. the client sends garbage data and the server just can't process it. Some common HTTP status codes you might want to design for include:
400 Bad Request: There is an issue with the client's request. Note this error isn't just used for things like broken JSON syntax in a POST request, but it is also a legitimate response code for semantic issues as well (i.e. the JSON request payload conformed to the prescribed schema, but there was an issue with the data in the payload, such as a number being negative when it is supposed to be only positive).
401 Unauthorized: The caller's credentials were invalid (i.e. authorization error).
403 Forbidden: The caller's credentials were valid, but their access level isn't sufficient to access the resource (i.e. authentication error).
404 Not Found: The resource of the URL doesn't exist.
500 Internal Server Error: Something bad happened inside the server itself, this error could be anything.
502 Bad Gateway: An error occurred when calling downstream service.
503 Service Unavailable: A useful response code for when you get hammered with a ton of "happy" customers who are inadvertently DDOS'ing your service.
504 Gateway Timeout: Like the 502 status code, but indicates a timeout instead of an actual error with the downstream service, per se.
HTTP response codes are the top-level codes, and API error codes only have meaning within that context: By this, I mean that your API error codes are only meaningful for certain HTTP response codes. For example, in the table of SES error codes, each error code is only tied to a single HTTP(S) response code. The error codes ConfigurationSetDoesNotExist and InvalidParameterValue only make sense when a 400 Bad Request is returned by SES - it wouldn't make sense to return these status codes when a 500 Internal Server Error is returned. Similarly, if you were writing a web service that called downstream services and databases, you might have a FooDownstreamServiceTimedOut error code that you would return with a 504 Gateway Timeout HTTP status code when a downstream web service call timed out to the "Foo" web service. You might also have a MyDatabaseError error code that you would return with a 500 Internal Server Error HTTP status code when your query to the internal DB fails.
Have a uniform error code schema irrespective of status codes: Your clients need to be able to process your error content programmatically. As such, it needs to conform to a certain schema. Ideally, your API error code schema should include the error code (i.e. name or ID, etc.). You also probably want to include a natural language description of the error code and the ID/GUID of the request that you are responding to. For an example of an error schema, see this sample AWS SES response and schema. Additionally, you might also want to consider returning a client ID in the response. This is as much for your own benefit as the client's since it can help you drill down into the data to see if one particular client is getting a glut of particular errors vs. your other clients.
Consider returning natural language descriptions of the error codes in the response: To make things easier on your clients, you might want to consider not just returning the error code in the error payload, but a natural language description as well. This kind of behavior can immediately help confused and busy engineers who really don't care that much about your service quickly diagnose what's happening so that they can resolve the issue ASAP. btw, enabling engineers to quickly diagnose issues with your service increases the all-important "uptime" metric that your customers and managers will no doubt care about.
Don't feel obliged to use integers, use enumerations instead: The notion of "error codes" conjures up images of outdated technologies and codebooks where you had to look up what an error meant. It arose from the programming dark ages when engineers needed to fit all possible errors into a byte of space, or a nibble or whatever. Those days are gone, and your error code can be a string, likely without any meaningful impact on performance. You might as well take advantage and make the error code meaningful, as a means of keeping things simple.
Return info to clients that they might need to debug, but be mindful of security: If possible, return whatever debug info your clients may need. However, if your service potentially deals with sensitive information such as credit card numbers and the like, you probably don't want to pass that info around for obvious reasons.
Hope that helps.
A recommendation by the IETF (internet standards body) is using the application/problem+json mediatype.
Notable is that they don't use random numbers, they use strings (specifically uris) to identify errors.
This is a subjective question, but even if you don't use their format, I'd argue that username-not-provided is better in almost every way to 100001.
I would say this heavily depends on what kind of API you're providing.
I were to always include a field called ack or something similar in every response that has three states: failure, warning, success. Success obviously being everything went well. On warning, the request went through and the JSON will contain the expected output, but it will also include a warning string, or even better in case multiple warnings could occur an array called errors which consists of multiple objects containg code, string and type. This array will also be returned in case of failure, and nothing else but this array.
The array contains one object per error or warning, having a code (I would suggest going with your initial idea of 10001, 10002, ...) and a string explaining the error in a very short phrase (e.g. Username contains invalid characters). The type is either error or warning, which is useful in case of a failure ack that contains not only errors but also warnings.
This makes it easy to look up errors by their code (I would provide a page, also with an API, that contains all the error codes in a table along with their short and long description plus common causes/fixes/etc. - All this information should also be available via an API where they can be accessed by providing the error code) while still having a quick short text response so the user can tell what's wrong in most cases without having to look up the error.
This also allows for easy output of warnings and errors to the end user, not just the developers. Using my idea with the API call to get informations about an error, developers using your API could easily provide full information about errors to end-users when needed (including causes/fixes/whatever you see fit).
Instead of writing your own API standard from scratch adopt one of the already available, for example the JSON API standard:
If you’ve ever argued with your team about the way your JSON responses should be formatted, JSON API can be your anti-bikeshedding tool.
By following shared conventions, you can increase productivity, take advantage of generalized tooling, and focus on what matters: your application.
Clients built around JSON API are able to take advantage of its features around efficiently caching responses, sometimes eliminating network requests entirely.
If you decide to go with JSON API it has a section dedicated to errors and a few error examples.
For many years, many developent companies have created things like bitmask for errors, so they can encode multiple variables inside the error:
000 - all ok
001 - something failed with X
010 - something failed with Y
011 - something failed with X and Y
100 - something failed with Z
101 - something failed with X and Z
The limitation is that that limits the error space into however many bytes you decide on the encoding, like 16 or 32 possible combinations, it may be enough for you, or not.
You see this being common in COM+
https://learn.microsoft.com/en-us/windows/desktop/com/com-error-codes-1
I hope this helps.

Should I respond with a 400 error if a form submit contains validation errors?

In a classic form-based webapp, if a user submits a HTML form that contains validation errors, assuming no JavaScript, what's the correct thing to do?
Respond with the HTTP 200 + the page content (including error info for the user)
Respond with the HTTP 400 + the page content (including error info for the user)
Does it matter?
Your app is talking to human beings, not other machines. Therefore you should do the right thing and handle exceptions in a user-friendly manner.
Your user doesn't care about HTTP return codes, and so it should not even be a consideration for you either. You are confusing business-logic problems with HTTP protocol problems.
Infact, by throwing a 400 error at a web-browser, you are only likely to encounter the web browser throwing up an ugly message to the user.
If you were coding a REST api, then the answer would be different. But you're not.
1) would be the correct approach because you want to display a page of content to the user that highlights the invalid input values.
The trouble with 2) is that some browsers may display their own 'friendly' error page that is designed to help users understand 4xx errors. Here's some information about when IE displays 'friendly' error pages:
http://support.microsoft.com/kb/294807
On the one hand, if it is a web app for human consumption, a 200 with a some useful error message will work. Making web sites for humans is easier in that sense because they can read and understand the content and do not have to depend on the status code for interact with the applications.
On the other hand, If you thinking of a REST API more appropriate would be to throw a 4xx error because it is a client side error. In that case, you have several options.
According RFC2616, a 400 means
The request could not be understood by the server due to malformed
syntax. The client SHOULD NOT repeat the request without
modifications.
This doesn't seem to be appropriate as it's not due to malformed syntax.
However, RFC2616 is now obsoleted by RFC7230-7235. The new RFC7231 defines the meaning of 400 in a more broader way.
Client Error 4xx The 4xx (Client Error) class of status code indicates
that the client seems to have erred. Except when responding to a HEAD
request, the server SHOULD send a representation containing an
explanation of the error situation, and whether it is a temporary or
permanent condition.
400 Bad Request
The 400 (Bad Request) status code indicates that the server cannot or
will not process the request due to something that is perceived to be
a client error (e.g., malformed request syntax, invalid request
message framing, or deceptive request routing)
So this seems acceptable even though still generic. Another option would be to use 422 status code defined by RFC4918 (WebDAV).
422 Unprocessable Entity The 422 (Unprocessable Entity) status code
means the server understands the content type of the request entity
(hence a 415(Unsupported Media Type) status code is inappropriate),
and the syntax of the request entity is correct (thus a 400 (Bad
Request) status code is inappropriate) but was unable to process the
contained instructions. For example, this error condition may occur
if an XML request body contains well-formed (i.e., syntactically
correct), but semantically erroneous, XML instructions.

Why should JSON have a status property

I stumbled over a practice that I found to be quite widespread. I even found a web page that gave this a name, but I forgot the name and am not able to find that page on google anymore.
The practice is that every JSON response from a REST service should have the following structure:
{
"status": "ok",
"data": { ... }
}
or in an error case:
{
"status": "error",
"message": "Something went wrong"
}
My question: What is the point why such a "status" property should be required in the JSON? In my opinion that is what HTTP status codes were made for.
REST uses the HTTP means of communication between client and server, for example the "DELETE" verb should be used for deleting. In the same way, 404 should be used if a resource is not found, etc. So inline with that thinking, any error cases should be encoded properly in the HTTP status.
Are there specific reasons to return a HTTP 200 status code in an error case and have the error in the JSON instead? It just seems to make the javascript conditional branches more complex when processing the response.
I found some cases where status could be "redirect" to tell the application to redirect to a certain URL. But if the proper HTTP status code was used, the browser would perform the redirection "for free", maintaining the browsing history properly.
I picture mainly two possible answers from you:
Either there are two quarreling communities with their favorite approach each (use HTTP status always vs. use HTTP status never)
or I am missing an important point and you'll tell me that although the HTTP status should be used for some cases, there are specific cases where a HTTP status does not fit and the "status" JSON property comes into play.
You are right. I think what you are seeing is a side-effect of people not doing REST correctly. Or just not doing REST at all. Using REST is not a pre-requisite for a well-designed application; there is no rule that webapps have to be REST-ful.
On the other hand, for the error condition, sometimes apps want to return a 200 code but an error to represent a business logic failure. The HTTP error codes don't always match the semantics of application business errors.
You are mixing two different Layers here:
HTTP is for establishing (high-level) connections and transferring data. The HTTP status codes thus informs you if and how the connection was established or why it was not. On a successful connection the body of the HTTP request could then contain anything (e.g. XML, JSON, etc.), thus these status code have to define a general meaning. It does not inform you about the correctness or type (e.g. error message or data) of the response.
When using JSON for interchanging data you could certainly omit the status property, however it is easier for you to parse the JSON, if you know if it includes the object you were requesting or an error message by just reading one property.
So, yes, it is perfectly normal to return a 200 status code and have a "status": "error" property in your JSON.
HTTP status codes can be caused by a lot of things, including load balancers, proxies, caches, firewalls, etc. None of these are going to modify your JSON output, unless they completely break it, which can also be treated as an error.
Bottom line: it's more reliable to do it via JSON.

Choosing the right HTTP response code for incorrect POST data

I suspect this is a very trivial question. I'm writing a PHP script to respond to an AJAX query. The query should include some XML data, which the PHP script processes and then returns a response to. There are two error cases I want to consider:
No POST data in the request; or
Bad data in the XML (either not valid or well-formed XML, or fails some schema checks)
In such cases I believe I should be returning a 4xx response code. Is there anything more appropriate than 400?
More Details
To illustrate the problem further: The client Javascript application is a diagram editor for educational purposes. The user is required to create a diagram that correctly models a given situation. The student can then submit the diagram, whereby an XML serialization of the diagram is POSTed via an AJAX call to the server. A PHP script analyses the diagram XML and constructs an XML report that is sent as the AJAX response to the client. The two situations I originally described (no XML POST data or invalid XML therein) should not happen when requested by the client, but I think it prudent to correctly capture and deal with these situations. Hence my belief that a 4xx response code is appropriate. The XML report structure doesn't cater for these situations, and an empty report would amount to a perfect diagram, which clearly is not appropriate,
Based upon the meanings of the codes in the TCP/IP Guide it seems like 400 is your best choice. Nothing there seems to meet your example.
I think the two error cases you mentioned actually would be served with different HTTP status codes. From the W3C's Status Code Definitions:
400 Bad Request - The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications.
409 Conflict - The request could not be completed due to a conflict with the current state of the resource.... For example, if versioning were being used and the entity being PUT included changes to a resource which conflict with those made by an earlier (third-party) request, the server might use the 409 response to indicate that it can't complete the request....
So the 400 is for cases when the request body can't even be parsed due to syntax problems. The 409, in contrast, seems to be for cases when the request body is parsed and the server understands the request all right, but is refusing to fulfill it because of business rules.
In the case of failing schema validation or bad XML syntax, I agree with the other posters, 400 is appropriate. But in the case of no POST data, which you say is a valid diagram but not acceptable for other reasons, 409 seems more appropriate to me.

Syntax of HTTP-status headers

There're many ways to write an HTTP-status header:
HTTP/1.1 404 Not Found
Status: 404
Status: 404 Not Found
but which is the semantically-correct and spec-compliant way?
Edit: By status headers I mean this, using a function such as PHP's header().
Adding some information some time later, since I came across this question whilst researching something related.
I believe the Status header field was originally invented as part of the CGI specification, RFC 3875:
https://www.rfc-editor.org/rfc/rfc3875#section-6.3.3
To quote:
The Status header field contains a 3-digit integer result code that
indicates the level of success of the script's attempt to handle the
request.
Status = "Status:" status-code SP reason-phrase NL
status-code = "200" | "302" | "400" | "501" | extension-code
extension-code = 3digit
reason-phrase = *TEXT
It allows a CGI script to return a status code to the web server that overrides the default seen in the HTTP status line. Usually the server buffers the result from the script and emits a new header for the client. This one is a valid HTTP header which starts with an amended HTTP status line and omits the scripts "Status:" header field (plus some other transformations mandated by the RFC).
So all of your examples are valid from a CGI script, but only the first is really valid in a HTTP header. The latter two are only valid coming from a CGI script (or perhaps a FastCGI application).
A CGI script can also operate in "non-parsed header" (NPH) mode, when it generates a complete and valid HTTP header which the web server passes to the client verbatim. As such this shouldn't include a Status: header field.
Note, what I am interested in is what which status should win if an NPH script gets it a bit wrong and emits the Status: header field, possibly in addition to the HTTP status line. I can't find any clear indication so and I suspect it is left to the implementation of whatever is parsing the output, either the client or the server.
Since https://www.rfc-editor.org/rfc/rfc2616#section-6 and more specifically https://www.rfc-editor.org/rfc/rfc2616#section-6.1 does not mention use of "Status:" when indicating a status code, and since the official list of headers at http://www.iana.org/assignments/message-headers/message-headers.xml does not mention "Status", I'd be inclined to believe it should not be served with it as a header.
The closest thing I've found to an answer is the Fast CGI spec, which states to set status codes through Status and Location headers.
A lot of them are pretty much arbitrary strings, but there here is the w3c's spec for the commonly used ones
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html