REST service semantics; include properties not being updated? - json

Suppose I have a resource called Person. I can update Person entities by doing a POST to /data/Person/{ID}. Suppose for simplicity that a person has three properties, first name, last name, and age.
GET /data/Person/1 yields something like:
{ id: 1, firstName: "John", lastName: "Smith", age: 30 }.
My question is about updates to this person and the semantics of the services that do this. Suppose I wanted to update John, he's now 31. In terms of design approach, I've seen APIs work two ways:
Option 1:
POST /data/Person/1 with { id: 1, age: 31 } does the right thing. Implicitly, any property that isn't mentioned isn't updated.
Option 2:
POST /data/Person/1 with the full object that would have been received by GET -- all properties must be specified, even if many don't change, because the API (in the presence of a missing property) would assume that its proper value is null.
Which option is correct from a recommended design perspective? Option 1 is attractive because it's short and simple, but has the downside of being ambiguous in some cases. Option 2 has you sending a lot of data back and forth even if it's not changing, and doesn't tell the server what's really important about this payload (only the age changed).

Option 1 - updating a subset of the resource - is now formalised in HTTP as the PATCH method. Option 2 - updating the whole resource - is the PUT method.
In real-world scenarios, it's common to want to upload only a subset of the resource. This is better for performance of the request and modularity/diversity of clients.
For that reason, PATCH is now more useful than PUT in a typical API (imo), though you can support both if you want to. There are a few corner cases where a platform may not support PATCH, but I believe they are rare now.
If you do support both, don't just make them interchangeable. The difference with PUT is, if it receives a subset, it should assume the whole thing was uploaded, so should then apply default properties to those that were omitted, or return an error if they are required. Whereas PATCH would just ignore those omitted properties.

Related

Returning different JSON results for the same request - is this a violation of REST?

Note the following from Roy Fielding concerning REST design, guidelines & principals.
5.2.1.1 Resources and Resource Identifiers
The key abstraction of information in REST is a resource. Any
information that can be named can be a resource: a document or image,
a temporal service (e.g. "today's weather in Los Angeles"), a
collection of other resources, a non-virtual object (e.g. a person),
and so on. In other words, any concept that might be the target of an
author's hypertext reference must fit within the definition of a
resource.
A resource is a conceptual mapping to a set of entities, not the
entity that corresponds to the mapping at any particular point in
time.
More precisely, a resource R is a temporally varying membership
function MR(t), which for time t maps to a set of entities, or values,
which are equivalent. The values in the set may be resource
representations and/or resource identifiers. A resource can map to the
empty set, which allows references to be made to a concept before any
realization of that concept exists -- a notion that was foreign to
most hypertext systems prior to the Web [61]. Some resources are
static in the sense that, when examined at any time after their
creation, they always correspond to the same value set. Others have a
high degree of variance in their value over time.
The only thing that is required to be static for a resource is the
semantics of the mapping, since the semantics is what distinguishes
one resource from another.
The key points have been bolded, the rest of the paragraph I have included is for context.
Here is the scenario.
I have a web api that has a endpoint: http://www.myfakeapi.com/people
When a client does a GET request to this endpoint, they receive back a list of people.
Person
{
"Name": "John Doe",
"Age": "23",
"Favorite Color": "Green"
}
Ok, well that's cool.
But is it against REST design practices and principles if I have a 'Person' who does not have a Favorite Color and I want to return them like this:
Person
{
"Name": "Bob Doe",
"Age": "23",
}
Or should I return them like this:
Person
{
"Name": "Bob Doe",
"Age": "23",
"Favorite Color": null
}
The issue is that the client requesting the resource has to do extra work to see if the property even exist in the first place. Some 'Person's' have favorite colors and some don't. Is it against REST principals to just omit the json property of 'Favorite Color' if they don't exist - or should that property be given a 'null' or blank value?
What does REST say about this? I am thinking that I should give back a null and not change the representation of the resource the client is requesting by omitting properties.
Off the top of my head I can't think of any REST constraints that this violates (here's a link to a brief overview if you're interested). It also doesn't violate idempotency for a GET request. However, it is still bad practice.
The consumer of your API should know what to expect and ideally this should be well documented (I like using Swagger a lot for this). Any changes in what to expect should be communicated to consumers, possibly in the form of release notes. Changes that could potentially be breaking for your consumer should be delivered in a new version of your API.
Since your Person1 and Person2 are technically different object structures, that could be breaking in itself (let's face it, we don't always find the edge cases as devs). You don't just want your API to work on a basic level and to hell with the end users - you want to design it with the end-consumer in mind so that their lives are made easier.
There are various ways we can deal with this, depends upon the use case, I'll list them only by one
1) Prefer enums (only if it makes sense to your use case)
{
"Name": "Bob Doe",
"Age": "23",
"Favorite Color": NO_COLOR
}
When you know the values for your property at the beginning, define a set of enum constants, and assign a default value if the property does not apply to the user. This helps in a few ways:
Your client knows what are the possible values so they can prepare their client system accordingly.
By giving default enum constant, we convey that value of the particular field is successfully retrieved from either persistent storage or maybe from another remote service, but it has default value because the property may not apply to the user OR user doesn't have any value for this property.
By avoiding NULL pattern, your client code will be resilient and the client can prepare their code for default enum constant.
When you start to serve more users, you may need to add a few more enum constants which may not apply to every client of yours. When you add new enums which they don't know, they can easily handle this in their parsing libraries and convert into something as per client application design. In Jackson, we can use DeserializationFeature.READ_UNKNOWN_ENUM_VALUES_AS_NULL for this.
2) Use Null - Do not create enum constants for everything
There are cases perfectly valid to have a NULL object. For instance, in the below example, it makes sense to use null if there is no favourite quote.
{
"Name": "Bob Doe",
"Age": "23",
"Favorite Quote": null
}
3) Document your required properties clearly
If you use swagger for your rest API documentation, you can mark mandatory properties as required. The ones not marked are optional. In that way, the client will be prepared to handle if they are NULL or empty string. (It should apply to other API documentation tools as well)
Bad practice:
I notice a few users code in such a way, they send errors in the same response model they send their success response 200. Refer this question & answer. This is definitely a bad practice. Don't mix two different responses and mark one property as optional - use status codes to convey any problems. I'm not talking about partial response here.
4) Add/Modify properties (as long as you're not breaking a contract with the client)
Say the Favorite Color property is added later and currently you're sending the following response to your client. You will publish your new contract to your clients when you add Favorite Color, but your clients should have fail-safe code and they should handle the unknown properties. In Jackson, we will use DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES for this. Non-breaking changes do not necessarily require v2.
Person
{
"Name": "Bob Doe",
"Age": "23",
}
So, answer to your question is, you should start looking at the first three options while you design your rest API, you don't require to omit any properties. But, you may be required to add a few properties later(covered at #4), which is perfectly fine.

Composition in REST and consistence of the inserted data

How to properly design REST if you have a composition? I have a TestResult entity, which has TestCaseResults entities. Both support full set of REST methods. The important fact about this (which I believe differs from many examples I found on a web) is that TestResult is not consistent if it doesn't have all of TestCaseResults How do I properly design this in REST?
Let's say I create it as separate but dependent resources: api\testresults\ and api\testresults\1\testcaseresults. When the client wants to create a test result, he needs to POST to api\testresults, then retrieve URL api\testresults\1\testcaseresutls by a link from the response, and POST all of test case results to it. This means that at some point in time the test result is not consistent until the user finishes its operation. Basically, there is no concept of the transaction here.
Let's say I create only api\testresults resource, and embed an array of test case results inside, like this:
{
"Name": "Test A"
"Results": [
{
"Measured": "BB",
...
},
...
]
...
}
Then it is easier to insert, but it still hard to work with. Simple GET to api\testresults\1\ will retrieve test result with a big amount of test case results. GET to api\testresults\ will retrieve much more! The structure of this becomes complex. Furthermore, in the real word I have a few entities like TestCaseResults belong to TestResults, so there will be a few arrays, and each could have 100-200 elements.
I could try to combine the approaches. Embed the array, but also provide links to api\testresults\1\testcaseresults and support operations there as well. Maybe on GET api\testresults\1\ I could provide TestResult without it's TestCaseResults but only with a link pointing to a resource, but on POST I could accept an array of TestCaseResults embedded (not sure though it is allowed to have different return types for POST and GET in REST) But now there are two approaches for inserting information, it is confusing and I'm still not sure it solves anything.
your approach with api\testresults\1 and api\testresults\1\testcaseresults seems promising.
As JSON does not have a fixed structure, you can add query parameters to your URL to control if results are inserted or not.
api\testresults\1?with_results=true would mean that your caller want to see the test cases in addition to the test results.
api\testresults\1\testcaseresults would still return the test case results for your test 1.
If you fear that the number of test case results is too large, you can add pagination parameters, that would be reuse in the testcaseresults call.
api\testresults\1?with_results=true&per_page=10 would include the only the 10 first results. To get more, use api\testresults\1\testcaseresults?per_page=10&page=2 and so on, as it is the dedicated endpoint.
Cheers
Note: if you want a flexible API still returning JSON data, you can give a look to GraphQL, the trendy approach.

Rest API design with multiple unique ids

Currently, we are developing an API for our system and there are some resources that may have different kinds of identifiers.
For example, there is a resource called orders, which may have an unique order number and also have an unique id. At the moment, we only have URLs for the id, which are these URLs:
GET /api/orders/{id}
PUT /api/orders/{id}
DELETE /api/orders/{id}
But now we need also the possibility to use order numbers, which normally would result into:
GET /api/orders/{orderNumber}
PUT /api/orders/{orderNumber}
DELETE /api/orders/{orderNumber}
Obviously that won't work, since id and orderNumber are both numbers.
I know that there are some similar questions, but they don't help me out, because the answers don't really fit or their approaches are not really restful or comprehensible (for us and for possible developers using the API). Additionally, the questions and answers are partially older than 7 years.
To name a few:
1. Using a query param
One suggests to use a query param, e.g.
GET /api/orders/?orderNumber={orderNumber}
I think, there are a lot of problems. First, this is a filter on the orders collections, so that the result should be a list as well. However, there is only one order for the unique order number which is a little bit confusing. Secondly, we use such a filter to search/filter for a subset of orders. Additionally, a query params is some kind of a second-class parameter, but should be first-class in this case. This is even a problem, if I the object does not exist. Normally a get would return a 404 (not found), but a GET /api/orders/?orderNumber=1234 would be an empty array, if the order 1234 does not exist.
2. Using a prefix
Some public APIs use some kind of a discriminator to distinguish between different types, e.g. like:
GET /api/orders/id_1234
GET /api/orders/ordernumber_367652
This works for their approach, because id_1234 and ordernumber_367652 are their real unique identifiers that are also returned by other resources. However, that would result in a response object like this:
{
"id": "id_1234",
"ordernumber": "ordernumber_367652"
//...
}
This is not very clean, because the type (id or order number) is modelled twice. And apart from the problem of changing all identifiers and response objects, this would be confusing, if you e.g. want to search for all order numbers greater than 67363 (thus, there is also a string/number clash). If the response does not add the type as a prefix, a user have to add this for some request, which would also be very confusing (sometime you have to add this and sometimes not...)
3. Using a verb
This is what e.g. Twitter does: their URL ends with show.json, so you can use it like:
GET /api/orders/show.json?id=1234
GET /api/orders/show.json?number=367652
I think, this is the most awful solution, since it is not restful. Furthermore, it has some of the problems that I mentioned in the query param approach.
4. Using a subresource
Some people suggest to model this like a subresource, e.g.:
GET /api/orders/1234
GET /api/orders/id/1234 //optional
GET /api/orders/ordernumber/367652
I like the readability of this approach, but I think the meaning of /api/orders/ordernumber/367652 would be "get (just) the order number 367652" and not the order. Finally, this breaks some best practices like using plurals and only real resources.
So finally, my questions are: Did we missed something? And are there are other approaches, because I think that this is not an unusual problem?
to me, the most RESTful way of solving your problem is using the approach number 2 with a slight modification.
From a theoretical point of view, you just have valid identification code to identify your order. At this point of the design process, it isn't important whether your identification code is an id or an order number. It's something that uniquely identify your order and that's enough.
The fact that you have an ambiguity between ids and numbers format is an issue belonging to the implementation phase, not the design phase.
So for now, what we have is:
GET /api/orders/{some_identification_code}
and this is very RESTful.
Of course you still have the problem of solving your ambiguity, so we can proceed with the implementation phase. Unfortunately your order identification_code set is made of two distinct entities that share the format. It's trivial it can't work. But now the problem is in the definition of these entity formats.
My suggestion is very simple: ids will be integers, while numbers will be codes such as N1234567. This approach will make your resource representation acceptable:
{
"id": "1234",
"ordernumber": "N367652"
//...
}
Additionally, it is common in many scenarios such as courier shipments.
Here is an alternate option that I came up with that I found slightly more palatable.
GET /api/orders/1234
GET /api/orders/1234?idType=id //optional
GET /api/orders/367652?idType=ordernumber
The reason being it keeps the pathing consistent with REST standards, and then in the service if they did pass idType=orderNumber (idType of id is the default) you can pick up on that.
I'm struggling with the same issue and haven't found a perfect solution. I ended up using this format:
GET /api/orders/{orderid}
GET /api/orders/bynumber/{orderNumber}
Not perfect, but it is readable.
I'm also struggling with this! In my case, i only really need to be able to GET using the secondary ID, which makes this a little easier.
I am leaning towards using an optional prefix to the ID:
GET /api/orders/{id}
GET /api/orders/id:{id}
GET /api/orders/number:{orderNumber}
or this could be a chance to use an obscure feature of the URI specification, path parameters, which let you attach parameters to particular path elements:
GET /api/orders/{id}
GET /api/orders/{id};id_type=id
GET /api/orders/{orderNumber};id_type=number
The URL using an unqualified ID is the canonical one. There are two options for the behaviour of non-canonical URLs: either return the entity, or redirect to the canonical URL. The latter is more theoretically pure, but it may be inconvenient for users. Or it may be more useful for users, who knows!
Another way to approach this is to model an order number as its own thing:
GET /api/ordernumbers/{orderNumber}
This could return a small object with just the ID, which users could then use to retrieve the entity. Or even just redirect to the order.
If you also want a general search resource, then that can also be used here:
GET /api/orders?number={orderNumber}
In my case, i don't want such a resource (yet), and i could be uncomfortable adding what appears to be a general search resource that only supports one field.
So basically, you want to treat all ids and order numbers as unique identifiers for the order records. The thing about unique identifiers is, of course, they have to be unique! But your ids and order numbers are all numeric; do their ranges overlap? If, say, "1234" could be either an id or an order number, then obviously /api/orders/1234 is not going to reference a unique order.
If the ranges are unique, then you just need discriminator logic in the handler code for /api/orders/{id}, that can tell an id from an order number. This could actually work, say if your order numbers have more digits than your ids ever will. But I expect you would have done this already if you could.
If the ranges might overlap, then you must at least force the references to them to have unique ranges. The simplest way would be to add a prefix when referring to an order number, e.g. the prefix "N". So that if the order with id 1234 has order number 367652, it could be retrieved with either of these calls:
/api/orders/1234
/api/orders/N367652
But then, either the database must change to include the "N" prefix in all order numbers (you say this is not possible) or else the handler code would have to strip off the "N" prefix before converting to int. In that case, the "N" prefix should only be used in the API calls - user facing data-entry forms should not expose it! You can't have a "lookup by any identifier" field where users can enter either id or order number (this would have a non-uniqueness problem anyway.) Instead, you must have separate "lookup by id" and "lookup by order number" options. Then, you should be able to have the order number input handler automatically add the "N" prefix before submitting to the API.
Fundamentally, this is a problem with the database design - if this (using values from both fields as "unique identifiers") was a requirement, then the database fields should have been designed with this in mind (i.e. with non-overlapping ranges) - if you can't change the order number format, then the id format should have been different.

What are the actual advantages of the visitor pattern? What are the alternatives?

I read quite a lot about the visitor pattern and its supposed advantages. To me however it seems they are not that much advantages when applied in practice:
"Convenient" and "elegant" seems to mean lots and lots of boilerplate code
Therefore, the code is hard to follow. Also 'accept'/'visit' is not very descriptive
Even uglier boilerplate code if your programming language has no method overloading (i.e. Vala)
You cannot in general add new operations to an existing type hierarchy without modification of all classes, since you need new 'accept'/'visit' methods everywhere as soon as you need an operation with different parameters and/or return value (changes to classes all over the place is one thing this design pattern was supposed to avoid!?)
Adding a new type to the type hierarchy requires changes to all visitors. Also, your visitors cannot simply ignore a type - you need to create an empty visit method (boilerplate again)
It all just seems to be an awful lot of work when all you want to do is actually:
// Pseudocode
int SomeOperation(ISomeAbstractThing obj) {
switch (type of obj) {
case Foo: // do Foo-specific stuff here
case Bar: // do Bar-specific stuff here
case Baz: // do Baz-specific stuff here
default: return 0; // do some sensible default if type unknown or if we don't care
}
}
The only real advantage I see (which btw i haven't seen mentioned anywhere): The visitor pattern is probably the fastest method to implement the above code snippet in terms of cpu time (if you don't have a language with double dispatch or efficient type comparison in the fashion of the pseudocode above).
Questions:
So, what advantages of the visitor pattern have I missed?
What alternative concepts/data structures could be used to make the above fictional code sample run equally fast?
For as far as I have seen so far there are two uses / benefits for the visitor design pattern:
Double dispatch
Separate data structures from the operations on them
Double dispatch
Let's say you have a Vehicle class and a VehicleWasher class. The VehicleWasher has a Wash(Vehicle) method:
VehicleWasher
Wash(Vehicle)
Vehicle
Additionally we also have specific vehicles like a car and in the future we'll also have other specific vehicles. For this we have a Car class but also a specific CarWasher class that has an operation specific to washing cars (pseudo code):
CarWasher : VehicleWasher
Wash(Car)
Car : Vehicle
Then consider the following client code to wash a specific vehicle (notice that x and washer are declared using their base type because the instances might be dynamically created based on user input or external configuration values; in this example they are simply created with a new operator though):
Vehicle x = new Car();
VehicleWasher washer = new CarWasher();
washer.Wash(x);
Many languages use single dispatch to call the appropriate function. Single dispatch means that during runtime only a single value is taken into account when determining which method to call. Therefore only the actual type of washer we'll be considered. The actual type of x isn't taken into account. The last line of code will therefore invoke CarWasher.Wash(Vehicle) and NOT CarWasher.Wash(Car).
If you use a language that does not support multiple dispatch and you do need it (I can honoustly say I have never encountered such a situation though) then you can use the visitor design pattern to enable this. For this two things need to be done. First of all add an Accept method to the Vehicle class (the visitee) that accepts a VehicleWasher as a visitor and then call its operation Wash:
Accept(VehicleWasher washer)
washer.Wash(this);
The second thing is to modify the calling code and replace the washer.Wash(x); line with the following:
x.Accept(washer);
Now for the call to the Accept method the actual type of x is considered (and only that of x since we are assuming to be using a single dispatch language). In the implementation of the Accept method the Wash method is called on the washer object (the visitor). For this the actual type of the washer is considered and this will invoke CarWasher.Wash(Car). By combining two single dispatches a double dispatch is implemented.
Now to eleborate on your remark of the terms like Accept and Visit and Visitor being very unspecific. That is absolutely true. But it is for a reason.
Consider the requirement in this example to implement a new class that is able to repair vehicles: a VehicleRepairer. This class can only be used as a visitor in this example if it would inherit from VehicleWasher and have its repair logic inside a Wash method. But that ofcourse doesn't make any sense and would be confusing. So I totally agree that design patterns tend to have very vague and unspecific naming but it does make them applicable to many situations. The more specific your naming is, the more restrictive it can be.
Your switch statement only considers one type which is actually a manual way of single dispatch. Applying the visitor design pattern in the above way will provide double dispatch.
This way you do not necessarily need additional Visit methods when adding additional types to your hierarchy. Ofcourse it does add some complexity as it makes the code less readable. But ofcourse all patterns come at a price.
Ofcourse this pattern cannot always be used. If you expect lots of complex operations with multiple parameters then this will not be a good option.
An alternative is to use a language that does support multiple dispatch. For instance .NET did not support it until version 4.0 which introduced the dynamic keyword. Then in C# you can do the following:
washer.Wash((dynamic)x);
Because x is then converted to a dynamic type its actual type will be considered for the dispatch and so both x and washer will be used to select the correct method so that CarWasher.Wash(Car) will be called (making the code work correctly and staying intuitive).
Separate data structures and operations
The other benefit and requirement is that it can separate the data structures from the operations. This can be an advantage because it allows new visitors to be added that have there own operations while it also allows data structures to be added that 'inherit' these operations. It can however be only applied if this seperation can be done / makes sense. The classes that perform the operations (the visitors) do not know the structure of the data structures nor do they have to know that which makes code more maintainable and reusable. When applied for this reason the visitors have operations for the different elements in the data structures.
Say you have different data structures and they all consist of elements of class Item. The structures can be lists, stacks, trees, queues etc.
You can then implement visitors that in this case will have the following method:
Visit(Item)
The data structures need to accept visitors and then call the Visit method for each Item.
This way you can implement all kinds of visitors and you can still add new data structures as long as they consist of elements of type Item.
For more specific data structures with additional elements (e.g. a Node) you might consider a specific visitor (NodeVisitor) that inherits from your conventional Visitor and have your new data structures accept that visitor (Accept(NodeVisitor)). The new visitors can be used for the new data structures but also for the old data structures due to inheritence and so you do not need to modify your existing 'interface' (the super class in this case).
In my personal opinion, the visitor pattern is only useful if the interface you want implemented is rather static and doesn't change a lot, while you want to give anyone a chance to implement their own functionality.
Note that you can avoid changing everything every time you add a new method by creating a new interface instead of modifying the old one - then you just have to have some logic handling the case when the visitor doesn't implement all the interfaces.
Basically, the benefit is that it allows you to choose the correct method to call at runtime, rather than at compile time - and the available methods are actually extensible.
For more info, have a look at this article - http://rgomes-info.blogspot.co.uk/2013/01/a-better-implementation-of-visitor.html
By experience, I would say that "Adding a new type to the type hierarchy requires changes to all visitors" is an advantage. Because it definitely forces you to consider the new type added in ALL places where you did some type-specific stuff. It prevents you from forgetting one....
This is an old question but i would like to answer.
The visitor pattern is useful mostly when you have a composite pattern in place in which you build a tree of objects and such tree arrangement is unpredictable.
Type checking may be one thing that a visitor can do, but say you want to build an expression based on a tree that can vary its form according to a user input or something like that, a visitor would be an effective way for you to validate the tree, or build a complex object according to the items found on the tree.
The visitor may also carry an object that does something on each node it may find on that tree. this visitor may be a composite itself chaining lots of operations on each node, or it can carry a mediator object to mediate operations or dispatch events on each node.
You imagination is the limit of all this. you can filter a collection, build an abstract syntax tree out of an complete tree, parse a string, validate a collection of things, etc.

OOP Design Question - Linking to precursors

I'm writing a program to do a search and export the output.
I have three primary objects:
Request
SearchResults
ExportOutput
Each of these objects links to its precursor.
Ie: ExportOutput -> SearchResults -> Request
Is this ok? Should they somehow be more loosely coupled?
Clarification:
Processes later on do use properties and methods on the precursor objects.
Ie:
SendEmail(output.SearchResults.Request.UserEmail, BODY, SUBJECT);
This has a smell even to me. The only way I can think to fix it is have hiding properties in each one, that way I'm only accessing one level
MailAddress UserEmail
{
get { return SearchResults.UserEmail; }
}
which would yeild
SendEmail(output.UserEmail, BODY, SUBJECT);
But again, that's just hiding the problem.
I could copy everything out of the precursor objects into their successors, but that would make ExportOutput really ugly. Is their a better way to factor these objects.
Note: SearchResults implements IDisposable because it links to unmanaged resources (temp files), so I really don't want to just duplicate that in ExportOutput.
If A uses B directly, you cannot:
Reuse A without also reusing B
Test A in isolation from B
Change B without risking breaking A
If instead you designed/programmed to interfaces, you could:
Reuse A without also reusing B - you just need to provide something that implements the same interface as B
Test A in isolation from B - you just need to substitute a Mock Object.
Change B without risking breaking A - because A depends on an interface - not on B
So, at a minimum, I recommend extracting interfaces. Also, this might be a good read for you: the Dependency Inversion Principle (PDF file).
Without knowing your specifics, I would think that results in whatever form would simply be returned from a Request's method (might be more than one such method from a configured Request, like find_first_instance vs. find_all_instances). Then, an Exporter's output method(s) would take results as input. So, I am not envisioning the need to link the objects at all.