Fiware Orion batch duplicate - fiware

I'm using APPEND_STRICT, but having trouble understanding a certain concept.
Example, I have a single entity in Fiware Orion(already created) and want to create let's say 1000 entities in batch using APEND_STRINCT(v2/op/update).
In 1000 entities there is 1 duplicate(an entity I mention that is already in Orion).
So is this correct, Orion will throw error 422 without any information in term of the id of an entity that already exists, error talk about attributes of the entity (I understand why it is the concept of APPEND_STRICT) but showing it would really help.
And another part is if the entity which is duplicate was on position 400 then Orion send error but continue to write remaining entities, this is really hard to manage because I cannot know when a total write is done and have to show some response while Orion still works on them in the background.
Are my assumptions correct and can be something done to avoid this, something I failed to notice.
Thanks.
Edit
Error message:
{ error: 'Unprocessable',
description: 'one or more of the attributes in the request already exist:
[ family, serialNumber, refSortingType, description, refType, storedWasteOrigin, location, address, fillingLevel, cargoWeight, temperature, methaneConcentration, regulation, responsible, owner, dateServiceStarted, dateLastEmptying, nextActuationDeadline, actuationHours, openingHours, dateLastCleaning, nextCleaningDeadline, refDepositPointIsle, status, color, image, annotations, areaServed, dateModified, refDevice ]' } } }
Example request:
{ method: 'POST',
headers:
{ 'Content-Type': 'application/json',
'Fiware-Service': 'waste4think',
'Fiware-ServicePath': '/d',
'X-Auth-Token': 'DssfKZe82e1dyJof416EmrQPdFQ3QK1' },
uri: 'http://localhost:1026/v2/op/update',
body: { actionType: 'APPEND_STRICT', entities: [Array] }
{"actionType":"APPEND_STRICT","entities":[{"id":"xxx","type":"xxx","family":{"value":"Agent","type":"String","metadata":{}},"serialNumber":{"value":"","type":"String","metadata":{}},"refSortingType":{"value":"SortingType:2","type":"String","metadata":{}},"description":{"value":"","type":"String","metadata":{}},"refType":{"value":"DepositPointType:0","type":"String","metadata":{}},"storedWasteOrigin":{"value":"","type":"String","metadata":{}},"location":{"value":{"type":"Point","coordinates":[xxx]},"type":"geo:json"},"address":{"value":"xxxxx.","type":"String","metadata":{}},"fillingLevel":{"value":0,"type":"Float","metadata":{"unit":{"value":"C62","type":"String"}}},"cargoWeight":{"value":0,"type":"Float","metadata":{"unit":{"value":"KGM","type":"String"}}},"temperature":{"value":0,"type":"Float","metadata":{"unit":{"value":"CEL","type":"String"}}},"methaneConcentration":{"value":0,"type":"Float","metadata":{"unit":{"value":"59","type":"String"}}},"regulation":{"value":"Municipal association","type":"String","metadata":{}},"responsible":{"value":"","type":"String","metadata":{}},"owner":{"value":"xxx","type":"String","metadata":{}},"dateServiceStarted":{"value":"","type":"String","metadata":{}},"dateLastEmptying":{"value":"","type":"String","metadata":{}},"nextActuationDeadline":{"value":"","type":"String","metadata":{}},"actuationHours":{"value":[],"type":"List","metadata":{}},"openingHours":{"value":[],"type":"List","metadata":{}},"dateLastCleaning":{"value":"","type":"String","metadata":{}},"nextCleaningDeadline":{"value":"","type":"String","metadata":{}},"refDepositPointIsle":{"value":"","type":"String","metadata":{}},"status":{"value":"ok","type":"String","metadata":{}},"color":{"value":"","type":"String","metadata":{}},"image":{"value":"","type":"String","metadata":{}},"annotations":{"value":"","type":"String","metadata":{}},"areaServed":{"value":"","type":"String","metadata":{}},"dateModified":{"value":"","type":"String","metadata":{}},"refDevice":{"value":"","type":"String","metadata":{}}}]}
As for the request, I split the post part and body part. As you can see by error msg is not possible to know what entity caused this

I think the functionality is as you describe. Orion responses with a list of the attributes that already exist but not to which entity they belong. A response like this could be probably more useful:
'one or more of the attributes in the request already exist:
entity23: [ family, serialNumber], entity 42: [refSortingType, description]'
with some capping (e.g. as much as 20 entities) to preclude too bigs responses.
If you think implementing something like that could be insteresting, please create a new issue in the Orion repository about it, please.
Some additional comments:
APPEND_STRICT is deprecated. The right keyword is appendStrict.
Regarding "Orion send error but continue to write remaining entities, this is really hard to manage because I cannot know when a total write is done and have to show some response while Orion still works on them in the background". Orion doesn't response until it finishes to process the whole batch in the POST /v2/op/entity request. So your REST client can be sure that when the response is received everything has been processed (although that processing could involve errors due to duplicated attributes in some entities, as we have been discussing). Have you experience a different behaviour? In that case, how did you get it? (it Orion is behaving that way it could be a bug and I'd like to know about in order to debug it).

Related

REST API Design query

I have a problem in deciding what to do in this case in REST API design.
here is my problem,
I have a resource domain model, which has a nested object, which is also a domain model.
you can imagine something like this
{
"name":"abc"
"type":{
"name":"typeName",
"description":"description"
}
}
Now, i want to be able to fetch the outer model resources, based on the inner model and few more params.
for example, i want to fetch all outer model resources which have a given type and some params like page number, size etc.
So my questions,
1.the API should accept inner model in post, and return outer model, is it a good rest design?
How do i pass the extra params? It's a POST, can't put them in url, and can't put them inner model.
Should i create a new model, which contains these extra params and the type resource also?
like
{
"page":"10",
"type":{
"name":"typeName",
"description":"description"
}
}
If you are making a generic HTTP service, it's acceptable to use POST to send a complex query, and to get a response.
If you are trying to be RESTful, then this is a bad practice. You have two options. Option 1 is to find a way to encode your query in the URL, so you can use a GET request.
Option 2 is more involved. I wouldn't necessarily say that I would suggest this, but it's a method to get around the constraints of REST while having complex queries.
The idea is that you use POST to create a 'query' resource. Almost as if you doing a server-side prepared statement, and then later on use GET to get the result of the query.
Example of the client->server conversation:
POST /queries
Content-Type: application/json
...
A response to this might be:
HTTP/1.1 201 Created
Location: /queries/1234
Link: </queryresults/1234>; rel="some-relationship-identifier"
Then after that you could do a GET on /queries/1234 to see the query you 'prepared' and a GET on /queryresults/1234 to see the actual report.
Benefits of this is that it stays within the constraints of REST, and that you could potentially re-use queries and take a longer time to generate the results.
The obvious drawback is that it's harder to explain this to a consumer of your API, as not everyone might be familiar with this pattern and it's an extra HTTP request.
So you have to decide:
Is it worth doing this?
Can you encode the query in the URI instead to avoid this altogether
Maybe you don't care enough about being RESTful and you might just want to break the rules and use POST for some complex queries.

kafka-python 1.3.3: KafkaProducer.send with explicit key fails to send message to broker

(Possibly a duplicate of Can't send a keyedMessage to brokers with partitioner.class=kafka.producer.DefaultPartitioner, although the OP of that question didn't mention kafka-python. And anyway, it never got an answer.)
I have a Python program that has been successfully (for many months) sending messages to the Kafka broker, using essentially the following logic:
producer = kafka.KafkaProducer(bootstrap_servers=[some_addr],
retries=3)
...
msg = json.dumps(some_message)
res = producer.send(some_topic, value=msg)
Recently, I tried to upgrade it to send messages to different partitions based on a definite key value extracted from the message:
producer = kafka.KafkaProducer(bootstrap_servers=[some_addr],
key_serializer=str.encode,
retries=3)
...
try:
key = some_message[0]
except:
key = None
msg = json.dumps(some_message)
res = producer.send(some_topic, value=msg, key=key)
However, with this code, no messages ever make it out of the program to the broker. I've verified that the key value extracted from some_message is always a valid string. Presumably I don't need to define my own partitioner, since, according to the documentation:
The default partitioner implementation hashes each non-None key using the same murmur2 algorithm as the java client so that messages with the same key are assigned to the same partition.
Furthermore, with the new code, when I try to determine what happened to my send by calling res.get (to obtain a kafka.FutureRecordMetadata), that call throws a TypeError exception with the message descriptor 'encode' requires a 'str' object but received a 'unicode'.
(As a side question, I'm not exactly sure what I'd do with the FutureRecordMetadata if I were actually able to get it. Based on the kafka-python source code, I assume I'd want to call either its succeeded or its failed method, but the documentation is silent on the point. The documentation does say that the return value of send "resolves to" RecordMetadata, but I haven't been able to figure out, from either the documentation or the code, what "resolves to" means in this context.)
Anyway: I can't be the only person using kafka-python 1.3.3 who's ever tried to send messages with a partitioning key, and I have not seen anything on teh Intertubes describing a similar problem (except for the SO question I referenced at the top of this post).
I'm certainly willing to believe that I'm doing something wrong, but I have no idea what that might be. Is there some additional parameter I need to supply to the KafkaProducer constructor?
The fundamental problem turned out to be that my key value was a unicode, even though I was quite convinced that it was a str. Hence the selection of str.encode for my key_serializer was inappropriate, and was what led to the exception from res.get. Omitting the key_serializer and calling key.encode('utf-8') was enough to get my messages published, and partitioned as expected.
A large contributor to the obscurity of this problem (for me) was that the kafka-python 1.3.3 documentation does not go into any detail on what a FutureRecordMetadata really is, nor what one should expect in the way of exceptions its get method can raise. The sole usage example in the documentation:
# Asynchronous by default
future = producer.send('my-topic', b'raw_bytes')
# Block for 'synchronous' sends
try:
record_metadata = future.get(timeout=10)
except KafkaError:
# Decide what to do if produce request failed...
log.exception()
pass
suggests that the only kind of exception it will raise is KafkaError, which is not true. In fact, get can and will (re-)raise any exception that the asynchronous publishing mechanism encountered in trying to get the message out the door.
I also faced the same error. Once I added json.dumps while sending the key, it worked.
producer.send(topic="first_topic", key=json.dumps(key)
.encode('utf-8'), value=json.dumps(msg)
.encode('utf-8'))
.add_callback(on_send_success).add_errback(on_send_error)

How to do a RESTful GET on an indefinite number of parameters?

I have a collection of IDs of RESTful resources (all the same type of resource), the number of which can be indefinitely large. I want to make a REST call to get the names of these resources. Something like this:
Send:
['005fc983-fe41-43b5-8555-d9a2310719cd', '4c6e6898-e519-4bac-b03e-e8873d3fa3f0',...]
Receive:
['Resource A', 'Resource B',...]
What is the best way to retrieve the names of these resources RESTfully?
Here are the ideas I have had and the problems I see with each approach:
The naive approach is to iterate through all IDs in my collection and do a 'GET /resource/:id' for each ID. This would be prohibitively slow and resource intensive because of the large number of HTTP calls I would have to make.
The next approach I thought of is to pass the IDs as parameters to a single GET call. The problem here is that most servers have a limit on the URL length, which would be quickly exceeded.
Next, I thought that putting the IDs in the body of a GET would work, but according to Roy Fielding, data in the GET body should not affect the results of a REST call: HTTP GET with request body
I could use a POST request and put the data on the POST body, but POST is intended for creating and modifying resources, which is not what I'm doing. Maybe I should ignore the intent of the verb and use it anyway?
I could split the request into multiple GET requests to avoid exceeding the max URL length. The problem here is that I have to combine the results after all calls have returned, which is potentially slow.
I could create a collection resource within my main resource by posting my list of IDs to 'POST /resource/collection', then use a 'GET /resource/collection/:id' call to retrieve the results. This actually works, but then I have to do a 'DELETE /resource/collection/:id' to clean up. It takes multiple calls, requires cleanup, and seems a bit clunky overall, so it's okay, but not ideal.
Is there a better way to do this?
Your last approach is RESTful and the one I recommend. I'd do this:
Step 1:
Request:
POST /resource/collection
Content-Tpye: application/json
{
"ids": [
"005fc983-fe41-43b5-8555-d9a2310719cd",
"4c6e6898-e519-4bac-b03e-e8873d3fa3f0"
]
}
Response:
201 Created
Location: /resource/collection/89AB8902-FDF1-11E4-ADDF-CD4FB664A5DC
Step 2:
Request:
GET /resource/collection/89AB8902-FDF1-11E4-ADDF-CD4FB664A5DC
Response:
200 OK
Content-Type: application/json
{
"resources": [ ... ]
}
but then I have to do a 'DELETE /resource/collection/:id' to clean up.
Not, that is not necessary. The server could implement a job that removes all collections that are older than a specific timestamp. It is not the client who has to do this.
If later a client access the collection again, the server would respond with
410 Gone

Use of PUT vs PATCH methods in REST API real life scenarios

First of all, some definitions:
PUT is defined in Section 9.6 RFC 2616:
The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server. If the Request-URI does not point to an existing resource, and that URI is capable of being defined as a new resource by the requesting user agent, the origin server can create the resource with that URI.
PATCH is defined in RFC 5789:
The PATCH method requests that a set of changes described in the
request entity be applied to the resource identified by the Request-
URI.
Also according to RFC 2616 Section 9.1.2 PUT is Idempotent while PATCH is not.
Now let us take a look at a real example. When I do POST to /users with the data {username: 'skwee357', email: 'skwee357#domain.example'} and the server is capable of creating a resource, it will respond with 201 and resource location (lets assume /users/1) and any next call to GET /users/1 will return {id: 1, username: 'skwee357', email: 'skwee357#domain.example'}.
Now let us say I want to modify my email. Email modification is considered "a set of changes" and therefore I should PATCH /users/1 with "patch document". In my case it would be the JSON document: {email: 'skwee357#newdomain.example'}. The server then returns 200 (assuming permission are ok). This brings me to first question:
PATCH is NOT idempotent. It said so in RFC 2616 and RFC 5789. However if I issue the same PATCH request (with my new email), I will get the same resource state (with my email being modified to the requested value). Why is PATCH not then idempotent?
PATCH is a relatively new verb (RFC introduced in March 2010), and it comes to solve the problem of "patching" or modifying a set of fields. Before PATCH was introduced, everybody used PUT to update resources. But after PATCH was introduced, it leaves me confused about what PUT is used for. And this brings me to my second (and the main) question:
What is the real difference between PUT and PATCH? I have read somewhere that PUT might be used to replace entire entity under specific resource, so one should send the full entity (instead of set of attributes as with PATCH). What is the real practical usage for such case? When would you like to replace / overwrite an entity at a specific resource URI and why is such an operation not considered updating / patching the entity? The only practical use case I see for PUT is issuing a PUT on a collection, i.e. /users to replace the entire collection. Issuing PUT on a specific entity makes no sense after PATCH was introduced. Am I wrong?
NOTE: When I first spent time reading about REST, idempotence was a confusing concept to try to get right. I still didn't get it quite right in my original answer, as further comments (and Jason Hoetger's answer) have shown. For a while, I have resisted updating this answer extensively, to avoid effectively plagiarizing Jason, but I'm editing it now because, well, I was asked to (in the comments).
After reading my answer, I suggest you also read Jason Hoetger's excellent answer to this question, and I will try to make my answer better without simply stealing from Jason.
Why is PUT idempotent?
As you noted in your RFC 2616 citation, PUT is considered idempotent. When you PUT a resource, these two assumptions are in play:
You are referring to an entity, not to a collection.
The entity you are supplying is complete (the entire entity).
Let's look at one of your examples.
{ "username": "skwee357", "email": "skwee357#domain.example" }
If you POST this document to /users, as you suggest, then you might get back an entity such as
## /users/1
{
"username": "skwee357",
"email": "skwee357#domain.example"
}
If you want to modify this entity later, you choose between PUT and PATCH. A PUT might look like this:
PUT /users/1
{
"username": "skwee357",
"email": "skwee357#gmail.com" // new email address
}
You can accomplish the same using PATCH. That might look like this:
PATCH /users/1
{
"email": "skwee357#gmail.com" // new email address
}
You'll notice a difference right away between these two. The PUT included all of the parameters on this user, but PATCH only included the one that was being modified (email).
When using PUT, it is assumed that you are sending the complete entity, and that complete entity replaces any existing entity at that URI. In the above example, the PUT and PATCH accomplish the same goal: they both change this user's email address. But PUT handles it by replacing the entire entity, while PATCH only updates the fields that were supplied, leaving the others alone.
Since PUT requests include the entire entity, if you issue the same request repeatedly, it should always have the same outcome (the data you sent is now the entire data of the entity). Therefore PUT is idempotent.
Using PUT wrong
What happens if you use the above PATCH data in a PUT request?
GET /users/1
{
"username": "skwee357",
"email": "skwee357#domain.example"
}
PUT /users/1
{
"email": "skwee357#gmail.com" // new email address
}
GET /users/1
{
"email": "skwee357#gmail.com" // new email address... and nothing else!
}
(I'm assuming for the purposes of this question that the server doesn't have any specific required fields, and would allow this to happen... that may not be the case in reality.)
Since we used PUT, but only supplied email, now that's the only thing in this entity. This has resulted in data loss.
This example is here for illustrative purposes -- don't ever actually do this (unless your intent is to drop the omitted fields, of course... then you are using PUT as it should be used). This PUT request is technically idempotent, but that doesn't mean it isn't a terrible, broken idea.
How can PATCH be idempotent?
In the above example, PATCH was idempotent. You made a change, but if you made the same change again and again, it would always give back the same result: you changed the email address to the new value.
GET /users/1
{
"username": "skwee357",
"email": "skwee357#domain.example"
}
PATCH /users/1
{
"email": "skwee357#gmail.com" // new email address
}
GET /users/1
{
"username": "skwee357",
"email": "skwee357#gmail.com" // email address was changed
}
PATCH /users/1
{
"email": "skwee357#gmail.com" // new email address... again
}
GET /users/1
{
"username": "skwee357",
"email": "skwee357#gmail.com" // nothing changed since last GET
}
My original example, fixed for accuracy
I originally had examples that I thought were showing non-idempotency, but they were misleading / incorrect. I am going to keep the examples, but use them to illustrate a different thing: that multiple PATCH documents against the same entity, modifying different attributes, do not make the PATCHes non-idempotent.
Let's say that at some past time, a user was added. This is the state that you are starting from.
{
"id": 1,
"name": "Sam Kwee",
"email": "skwee357#olddomain.example",
"address": "123 Mockingbird Lane",
"city": "New York",
"state": "NY",
"zip": "10001"
}
After a PATCH, you have a modified entity:
PATCH /users/1
{"email": "skwee357#newdomain.example"}
{
"id": 1,
"name": "Sam Kwee",
"email": "skwee357#newdomain.example", // the email changed, yay!
"address": "123 Mockingbird Lane",
"city": "New York",
"state": "NY",
"zip": "10001"
}
If you then repeatedly apply your PATCH, you will continue to get the same result: the email was changed to the new value. A goes in, A comes out, therefore this is idempotent.
An hour later, after you have gone to make some coffee and take a break, someone else comes along with their own PATCH. It seems the Post Office has been making some changes.
PATCH /users/1
{"zip": "12345"}
{
"id": 1,
"name": "Sam Kwee",
"email": "skwee357#newdomain.example", // still the new email you set
"address": "123 Mockingbird Lane",
"city": "New York",
"state": "NY",
"zip": "12345" // and this change as well
}
Since this PATCH from the post office doesn't concern itself with email, only zip code, if it is repeatedly applied, it will also get the same result: the zip code is set to the new value. A goes in, A comes out, therefore this is also idempotent.
The next day, you decide to send your PATCH again.
PATCH /users/1
{"email": "skwee357#newdomain.example"}
{
"id": 1,
"name": "Sam Kwee",
"email": "skwee357#newdomain.example",
"address": "123 Mockingbird Lane",
"city": "New York",
"state": "NY",
"zip": "12345"
}
Your patch has the same effect it had yesterday: it set the email address. A went in, A came out, therefore this is idempotent as well.
What I got wrong in my original answer
I want to draw an important distinction (something I got wrong in my original answer). Many servers will respond to your REST requests by sending back the new entity state, with your modifications (if any). So, when you get this response back, it is different from the one you got back yesterday, because the zip code is not the one you received last time. However, your request was not concerned with the zip code, only with the email. So your PATCH document is still idempotent - the email you sent in PATCH is now the email address on the entity.
So when is PATCH not idempotent, then?
For a full treatment of this question, I again refer you to Jason Hoetger's answer which already fully answers that.
Though Dan Lowe's excellent answer very thoroughly answered the OP's question about the difference between PUT and PATCH, its answer to the question of why PATCH is not idempotent is not quite correct.
To show why PATCH isn't idempotent, it helps to start with the definition of idempotence (from Wikipedia):
The term idempotent is used more comprehensively to describe an operation that will produce the same results if executed once or multiple times [...] An idempotent function is one that has the property f(f(x)) = f(x) for any value x.
In more accessible language, an idempotent PATCH could be defined as: After PATCHing a resource with a patch document, all subsequent PATCH calls to the same resource with the same patch document will not change the resource.
Conversely, a non-idempotent operation is one where f(f(x)) != f(x), which for PATCH could be stated as: After PATCHing a resource with a patch document, subsequent PATCH calls to the same resource with the same patch document do change the resource.
To illustrate a non-idempotent PATCH, suppose there is a /users resource, and suppose that calling GET /users returns a list of users, currently:
[{ "id": 1, "username": "firstuser", "email": "firstuser#example.org" }]
Rather than PATCHing /users/{id}, as in the OP's example, suppose the server allows PATCHing /users. Let's issue this PATCH request:
PATCH /users
[{ "op": "add", "username": "newuser", "email": "newuser#example.org" }]
Our patch document instructs the server to add a new user called newuser to the list of users. After calling this the first time, GET /users would return:
[{ "id": 1, "username": "firstuser", "email": "firstuser#example.org" },
{ "id": 2, "username": "newuser", "email": "newuser#example.org" }]
Now, if we issue the exact same PATCH request as above, what happens? (For the sake of this example, let's assume that the /users resource allows duplicate usernames.) The "op" is "add", so a new user is added to the list, and a subsequent GET /users returns:
[{ "id": 1, "username": "firstuser", "email": "firstuser#example.org" },
{ "id": 2, "username": "newuser", "email": "newuser#example.org" },
{ "id": 3, "username": "newuser", "email": "newuser#example.org" }]
The /users resource has changed again, even though we issued the exact same PATCH against the exact same endpoint. If our PATCH is f(x), f(f(x)) is not the same as f(x), and therefore, this particular PATCH is not idempotent.
Although PATCH isn't guaranteed to be idempotent, there's nothing in the PATCH specification to prevent you from making all PATCH operations on your particular server idempotent. RFC 5789 even anticipates advantages from idempotent PATCH requests:
A PATCH request can be issued in such a way as to be idempotent,
which also helps prevent bad outcomes from collisions between two
PATCH requests on the same resource in a similar time frame.
In Dan's example, his PATCH operation is, in fact, idempotent. In that example, the /users/1 entity changed between our PATCH requests, but not because of our PATCH requests; it was actually the Post Office's different patch document that caused the zip code to change. The Post Office's different PATCH is a different operation; if our PATCH is f(x), the Post Office's PATCH is g(x). Idempotence states that f(f(f(x))) = f(x), but makes no guarantes about f(g(f(x))).
TLDR - Dumbed Down Version
PUT => Set all new attributes for an existing resource.
PATCH => Partially update an existing resource (not all attributes required).
I was curious about this as well and found a few interesting articles. I may not answer your question to its full extent, but this at least provides some more information.
http://restful-api-design.readthedocs.org/en/latest/methods.html
The HTTP RFC specifies that PUT must take a full new resource
representation as the request entity. This means that if for example
only certain attributes are provided, those should be remove (i.e. set
to null).
Given that, then a PUT should send the entire object. For instance,
/users/1
PUT {id: 1, username: 'skwee357', email: 'newemail#domain.example'}
This would effectively update the email. The reason PUT may not be too effective is that your only really modifying one field and including the username is kind of useless. The next example shows the difference.
/users/1
PUT {id: 1, email: 'newemail#domain.example'}
Now, if the PUT was designed according the spec, then the PUT would set the username to null and you would get the following back.
{id: 1, username: null, email: 'newemail#domain.example'}
When you use a PATCH, you only update the field you specify and leave the rest alone as in your example.
The following take on the PATCH is a little different than I have never seen before.
http://williamdurand.fr/2014/02/14/please-do-not-patch-like-an-idiot/
The difference between the PUT and PATCH requests is reflected in the
way the server processes the enclosed entity to modify the resource
identified by the Request-URI. In a PUT request, the enclosed entity
is considered to be a modified version of the resource stored on the
origin server, and the client is requesting that the stored version be
replaced. With PATCH, however, the enclosed entity contains a set of
instructions describing how a resource currently residing on the
origin server should be modified to produce a new version. The PATCH
method affects the resource identified by the Request-URI, and it also
MAY have side effects on other resources; i.e., new resources may be
created, or existing ones modified, by the application of a PATCH.
PATCH /users/123
[
{ "op": "replace", "path": "/email", "value": "new.email#example.org" }
]
You are more or less treating the PATCH as a way to update a field. So instead of sending over the partial object, you're sending over the operation. i.e. Replace email with value.
The article ends with this.
It is worth mentioning that PATCH is not really designed for truly REST
APIs, as Fielding's dissertation does not define any way to partially
modify resources. But, Roy Fielding himself said that PATCH was
something [he] created for the initial HTTP/1.1 proposal because
partial PUT is never RESTful. Sure you are not transferring a complete
representation, but REST does not require representations to be
complete anyway.
Now, I don't know if I particularly agree with the article as many commentators point out. Sending over a partial representation can easily be a description of the changes.
For me, I am mixed on using PATCH. For the most part, I will treat PUT as a PATCH since the only real difference I have noticed so far is that PUT "should" set missing values to null. It may not be the 'most correct' way to do it, but good luck coding perfect.
tl;dr version
POST: is used to create an entity
PUT: is used to update/replace an existing entity where you must send the entire representation of the entity as you wish for it to be stored
PATCH: is used to update an entity where you send only the fields that need to be updated
The difference between PUT and PATCH is that:
PUT is required to be idempotent. In order to achieve that, you have to put the entire complete resource in the request body.
PATCH can be non-idempotent. Which implies it can also be idempotent in some cases, such as the cases you described.
PATCH requires some "patch language" to tell the server how to modify the resource. The caller and the server need to define some "operations" such as "add", "replace", "delete". For example:
GET /contacts/1
{
"id": 1,
"name": "Sam Kwee",
"email": "skwee357#olddomain.example",
"state": "NY",
"zip": "10001"
}
PATCH /contacts/1
{
[{"operation": "add", "field": "address", "value": "123 main street"},
{"operation": "replace", "field": "email", "value": "abc#myemail.example"},
{"operation": "delete", "field": "zip"}]
}
GET /contacts/1
{
"id": 1,
"name": "Sam Kwee",
"email": "abc#myemail.example",
"state": "NY",
"address": "123 main street",
}
Instead of using explicit "operation" fields, the patch language can make it implicit by defining conventions like:
in the PATCH request body:
The existence of a field means "replace" or "add" that field.
If the value of a field is null, it means delete that field.
With the above convention, the PATCH in the example can take the following form:
PATCH /contacts/1
{
"address": "123 main street",
"email": "abc#myemail.example",
"zip":
}
Which looks more concise and user-friendly. But the users need to be aware of the underlying convention.
With the operations I mentioned above, the PATCH is still idempotent. But if you define operations like: "increment" or "append", you can easily see it won't be idempotent anymore.
In my humble opinion, idempotence means:
PUT:
I send a compete resource definition, so - the resulting resource state is exactly as defined by PUT params. Each and every time I update the resource with the same PUT params - the resulting state is exactly the same.
PATCH:
I sent only part of the resource definition, so it might happen other users are updating this resource's OTHER parameters in a meantime. Consequently - consecutive patches with the same parameters and their values might result with different resource state. For instance:
Presume an object defined as follows:
CAR:
- color: black,
- type: sedan,
- seats: 5
I patch it with:
{color: 'red'}
The resulting object is:
CAR:
- color: red,
- type: sedan,
- seats: 5
Then, some other users patches this car with:
{type: 'hatchback'}
so, the resulting object is:
CAR:
- color: red,
- type: hatchback,
- seats: 5
Now, if I patch this object again with:
{color: 'red'}
the resulting object is:
CAR:
- color: red,
- type: hatchback,
- seats: 5
What is DIFFERENT to what I've got previously!
This is why PATCH is not idempotent while PUT is idempotent.
I might be a bit off topic considering your questions about idempotency, but I'd like you to consider evolutivity.
Consider you have the following element :
{
"username": "skwee357",
"email": "skwee357#domain.example"
}
If you modify with PUT, you have to give the whole representation of the object :
PUT /users/1
{
"username": "skwee357",
"email": "skwee357#newdomain.example"
}
Now you update the schema, and add a field phone :
PUT /users/1
{
"username": "skwee357",
"email": "skwee357#newdomain.example",
"phone": "123-456-7890"
}
Now update it again with PUT the same way, it will set phone to null. To avoid that bad side-effect, you have to update all the components that modify elements everytime you update your schema. Lame.
By using PATCH, you do not have this problem, because PATCH only updates the given fields. So, in my opinion, you should use PATCH to modify an element (whether it is really idempotent or not). That's a real-life return of experience.
Everyone else has answered the PUT vs PATCH. I was just going to answer what part of the title of the original question asks: "... in REST API real life scenarios". In the real world, this happened to me with internet application that had a RESTful server and a relational database with a Customer table that was "wide" (about 40 columns). I mistakenly used PUT but had assumed it was like a SQL Update command and had not filled out all the columns. Problems: 1) Some columns were optional (so blank was valid answer), 2) many columns rarely changed, 3) some columns the user was not allowed to change such as time stamp of Last Purchase Date, 4) one column was a free-form text "Comments" column that users diligently filled with half-page customer services comments like spouses name to ask about OR usual order, 5) I was working on an internet app at time and there was worry about packet size.
The disadvantage of PUT is that it forces you to send a large packet of info (all columns including the entire Comments column, even though only a few things changed) AND multi-user issue of 2+ users editing the same customer simultaneously (so last one to press Update wins). The disadvantage of PATCH is that you have to keep track on the view/screen side of what changed and have some intelligence to send only the parts that changed. Patch's multi-user issue is limited to editing the same column(s) of same customer.
Let me quote and comment more closely the RFC 7231 section 4.2.2, already cited in earlier comments:
A request method is considered "idempotent" if the intended effect on
the server of multiple identical requests with that method is the same
as the effect for a single such request. Of the request methods
defined by this specification, PUT, DELETE, and safe request methods
are idempotent.
(...)
Idempotent methods are distinguished because the request can be
repeated automatically if a communication failure occurs before the
client is able to read the server's response. For example, if a
client sends a PUT request and the underlying connection is closed
before any response is received, then the client can establish a new
connection and retry the idempotent request. It knows that repeating
the request will have the same intended effect, even if the original
request succeeded, though the response might differ.
So, what should be "the same" after a repeated request of an idempotent method? Not the server state, nor the server response, but the intended effect. In particular, the method should be idempotent "from the point of view of the client". Now, I think that this point of view shows that the last example in Dan Lowe's answer, which I don't want to plagiarize here, indeed shows that a PATCH request can be non-idempotent (in a more natural way than the example in Jason Hoetger's answer).
Indeed, let's make the example slightly more precise by making explicit one possible intend for the first client. Let's say that this client goes through the list of users with the project to check their emails and zip codes. He starts with user 1, notices that the zip is right but the email is wrong. He decides to correct this with a PATCH request, which is fully legitimate, and sends only
PATCH /users/1
{"email": "skwee357#newdomain.example"}
since this is the only correction. Now, the request fails because of some network issue and is re-submitted automatically a couple of hours later. In the meanwhile, another client has (erroneously) modified the zip of user 1. Then, sending the same PATCH request a second time does not achieve the intended effect of the client, since we end up with an incorrect zip. Hence the method is not idempotent in the sense of the RFC.
If instead the client uses a PUT request to correct the email, sending to the server all properties of user 1 along with the email, his intended effect will be achieved even if the request has to be re-sent later and user 1 has been modified in the meanwhile --- since the second PUT request will overwrite all changes since the first request.
To conclude the discussion on the idempotency, I should note that one can define idempotency in the REST context in two ways. Let's first formalize a few things:
A resource is a function with its co-domain being the class of strings. In other words, a resource is a subset of String × Any, where all the keys are unique. Let's call the class of the resources Res.
A REST operation on resources, is a function f(x: Res, y: Res): Res. Two examples of REST operations are:
PUT(x: Res, y: Res): Res = x, and
PATCH(x: Res, y: Res): Res, which works like PATCH({a: 2}, {a: 1, b: 3}) == {a: 2, b: 3}.
(This definition is specifically designed to argue about PUT and POST, and e.g. doesn't make much sense on GET and POST, as it doesn't care about persistence).
Now, by fixing x: Res (informatically speaking, using currying), PUT(x: Res) and PATCH(x: Res) are univariate functions of type Res → Res.
A function g: Res → Res is called globally idempotent, when g ○ g == g, i.e. for any y: Res, g(g(y)) = g(y).
Let x: Res a resource, and k = x.keys. A function g = f(x) is called left idempotent, when for each y: Res, we have g(g(y))|ₖ == g(y)|ₖ. It basically means that the result should be same, if we look at the applied keys.
So, PATCH(x) is not globally idempotent, but is left idempotent. And left idempotency is the thing that matters here: if we patch a few keys of the resource, we want those keys to be same if we patch it again, and we don't care about the rest of the resource.
And when RFC is talking about PATCH not being idempotent, it is talking about global idempotency. Well, it's good that it's not globally idempotent, otherwise it would have been a broken operation.
Now, Jason Hoetger's answer is trying to demonstrate that PATCH is not even left idempotent, but it's breaking too many things to do so:
First of all, PATCH is used on a set, although PATCH is defined to work on maps / dictionaries / key-value objects.
If someone really wants to apply PATCH to sets, then there is a natural translation that should be used: t: Set<T> → Map<T, Boolean>, defined with x in A iff t(A)(x) == True. Using this definition, patching is left idempotent.
In the example, this translation was not used, instead, the PATCH works like a POST. First of all, why is an ID generated for the object? And when is it generated? If the object is first compared to the elements of the set, and if no matching object is found, then the ID is generated, then again the program should work differently ({id: 1, email: "me#site.example"} must match with {email: "me#site.example"}, otherwise the program is always broken and the PATCH cannot possibly patch). If the ID is generated before checking against the set, again the program is broken.
One can make examples of PUT being non-idempotent with breaking half of the things that are broken in this example:
An example with generated additional features would be versioning. One may keep record of the number of changes on a single object. In this case, PUT is not idempotent: PUT /user/12 {email: "me#site.example"} results in {email: "...", version: 1} the first time, and {email: "...", version: 2} the second time.
Messing with the IDs, one may generate a new ID every time the object is updated, resulting in a non-idempotent PUT.
All the above examples are natural examples that one may encounter.
My final point is, that PATCH should not be globally idempotent, otherwise won't give you the desired effect. You want to change the email address of your user, without touching the rest of the information, and you don't want to overwrite the changes of another party accessing the same resource.
A very nice explanation is here-
https://blog.segunolalive.com/posts/restful-api-design-%E2%80%94-put-vs-patch/#:~:text=RFC%205789,not%20required%20to%20be%20idempotent.
A Normal Payload-
// House on plot 1
{
address: 'plot 1',
owner: 'segun',
type: 'duplex',
color: 'green',
rooms: '5',
kitchens: '1',
windows: 20
}
PUT For Updated-
// PUT request payload to update windows of House on plot 1
{
address: 'plot 1',
owner: 'segun',
type: 'duplex',
color: 'green',
rooms: '5',
kitchens: '1',
windows: 21
}
Note: In above payload we are trying to update windows from 20 to 21.
Now see the PATH payload-
// Patch request payload to update windows on the House
{
windows: 21
}
Since PATCH is not idempotent, failed requests are not automatically re-attempted on the network. Also, if a PATCH request is made to a non-existent url e.g attempting to replace the front door of a non-existent building, it should simply fail without creating a new resource unlike PUT, which would create a new one using the payload. Come to think of it, it’ll be odd having a lone door at a house address.
PUT method is ideal to update data in tabular format like in a relational db or entity like storage. Based on use case it can be used to update data partially or replace the entity as a whole. This will always be idempotent.
PATCH method can be used to update(or restructure) data in json or xml format which is stored in local file system or no sql database. This can be performed by mentioning the action/operation to be performed in the request like adding/removing/moving a key-value pair to json object. The remove operation can be used to delete a key-value pair and duplicate request will result in error as the key was deleted earlier making it a non-idempotent method. refer RFC 6902 for json data patching request.
This artical has detailed information related to PATCH method.
I will try to summarize in layman terms what I understood (maybe it helps)
Patch is not fully idempotent (it can be in an ideal situation where nobody changes another field of your entity).
In an not ideal (real life) situation somebody modifies another field of your object by another Patch operation and then both operations are not Idempotent (meaning that the resource you are both modifying comes back "wrong" from either one point of view)
So you cannot call it Idempotent if it does not cover 100% of the situations.
Maybe this is not that important to some, but to others is
One additional information I just one to add is that a PATCH request use less bandwidth compared to a PUT request since just a part of the data is sent not the whole entity. So just use a PATCH request for updates of specific records like (1-3 records) while PUT request for updating a larger amount of data. That is it, don't think too much or worry about it too much.

REST resource returning different object depending on state

I'm trying to define a REST API and I'm having trouble with one requirement.
I have an action that the API user can do that is the same thing, but can be done in two different ways.
For example, say my user uses my API to change the intensity of a light. I will have an URL something like
api/light/intensity
One option the user has to change the intensity is to set as a % of the maximum luminosity, the other option is setting the intensity as an exact value, in lumens (there is a detector for that) and he can pass the "precision" that can be low, medium and high (it changes the time it takes to get to the correct intensity).
I want the user to be able to GET the current intensity, meaning in which mode he is and depending on the mode, the % or the value in lumens and the precision.
This is where I'm lost, my GET will return a JSON object for example, is it OK to send something like
{
"Mode" = "Percent",
"Percent" = 50.5
}
when I'm in "percentage" mode and
{
"Mode" = "Exact",
"Lumens" = 200,
"Precision" = "High"
}
When I'm in "lumens" mode?
If that seems OK, how would I tell the user which type of "object" he should parse?
What would be the best way to let the user send his changes? I was thinking about having two URL, one for each mode, like
PUT /api/light/intensity/exact and PUT /api/light/intensity/percent
And both being waiting for JSON objects similar to the ones above, without the Mode.
Use HTTP Content negotiation. This allows:
the client to tell the server what representation of a resource it wants to GET,
the server to tell the client what representation of a resource it returns to the client,
the client to tell the server what represenation of a resource it is PUTing to the server.
Define two vendor content types:
application/vnd.com.example.light.intensity.percentage+json
application/vnd.com.example.light.intensity.lumens+json
The client tells the server which of both it wants:
GET /api/light/intensity/
Accept: application/vnd.com.example.light.intensity+percentage
The server responds:
200 OK
Content-Type: application/vnd.com.example.light.intensity+percentage
{
"Percent" = 50.5
}
The client wants to change the intensity:
PUT /api/light/intensity/
Content-Type: application/vnd.com.example.light.intensity+percentage
{
"Percent" = 42.7
}
The server knows from the Content-Type header how to interpret the JSON body. In this example it handles the request as in 'Percent' mode.
If the second content type was used, client and server would know to interpret the request/response as in 'Lumes' mode.
Edit: Note that the GET and PUT request use the same URL because the requests are about the same resource: the light intensity. All that differs is the representation of this resource. The proper way to handle this are content types.
The specifics will depend a bit on your API, and the needs of your users. The same GET method call to a RESTful API should always return the same value: a representation of the resource as defined by the information in the URL and nothing else. If you're maintaining state in the system, you're violating a precept of REST. (Edit: as pointed out by Gimly, that statement is unclear. It's not a violation of RESTful design for the system to maintain its own internal state, especially if a request changes the state of the system with a PUT, POST or DELETE. It's a violation for a request to rely on that state to return a representation of the resource, or to request a state change. Each request should be self-contained.)
I'd use a query string to change the format of the representation:
GET /api/light/intensity
GET /api/light/intensity?f=percent
That way /api/light/intensity always refers to the same resource (defaulting to the "exact" representation, which has the most data), and the query string "filters" the representation, similarly to a search query. It removes some data (in this case, the exact luminosity and precision) in favor of a relative representation in percent of some maximum value. Alternately, you could think of it as controlling the output format: GET /foo.json vs GET /foo.xml. The resource is the same, but the representation differs.
For updating a resource, you can take an object as you've described. Your server will have to understand the different formats, but you could either PUT to the bare URL, or again use a query parameter to control the format expected by the server, and then let your payload be more abstract, using value instead of lumens or percentage:
PUT /api/light/intensity
Payload: {"value": 200, "precision": "high"}
PUT /api/light/intensity?f=percent
Payload: {"value": 50.5}
That allows you to structure the API for your light resource in such a way that intensity is one property of the resource. "Percent" then becomes a convenience representation in the output, so when you return the entire light resource, it would read something like:
"light": {
"name": "the light",
"id": 12345,
"intensity": 200,
"max-intensity": 400,
...
}
So the API user could calculate current percent based on intensity and max-intensity. (You could of course substitute "percent" for "max-intensity" and let the user do the math the other way, but it feels more natural to me to provide absolute values and let the math calculate relative values.
Edit
Please see Tichodroma's answer for the better way of handling this. I'm leaving the answer because the discussion in the comments was useful to me, and may be useful to others in the future.