HTTP ReST: update large collections: better approach than JSON PATCH? - json

I am designing a web service to regularly receive updates to lists. At this point, a list can still be modeled as a single entity (/lists/myList) or an actual collection with many resources (/lists/myList/entries/<ID>). The lists are large (millions of entries) and the updates are small (often less than 10 changes).
The client will get web service URLs and lists to distribute, e.g.:
http://hostA/service/lists: list1, list2
http://hostB/service/lists: list2, list3
http://hostC/service/lists: list1, list3
It will then push lists and updates as configured. It is likely but undetermined if there is some database behind the web service URLs.
I have been researching and it seems a HTTP PATCH using the JSON patch format is the best approach.
Context and examples:
Each list has an identifying name, a priority and millions of entries. Each entry has an ID (determined by the client) and several optional attributes. Example to create a list "requiredItems" with priority 1 and two list entries:
PUT /lists/requiredItems
Content-Type: application/json
{
"priority": 1,
"entries": {
"1": {
"color": "red",
"validUntil": "2016-06-29T08:45:00Z"
},
"2": {
"country": "US"
}
}
}
For updates, the client would first need to know what the list looks like now on the server. For this I would add a property "revision" to the list entity.
Then, I would query this attribute:
GET /lists/requiredItems?property=revision
Then the client would see what needs to change between the revision on the server and the latest revision known by the client and compose a JSON patch. Example:
PATCH /list/requiredItems
Content-Type: application/json-patch+json
[
{ "op": "test", "path": "revision", "value": 3 },
{ "op": "add", "path": "entries/3", "value": { "color": "blue" } },
{ "op": "remove", "path": "entries/1" },
{ "op": "remove", "path": "entries/2/country" },
{ "op": "add", "path": "entries/2/color", "value": "green" },
{ "op": "replace", "path": "revision", "value": 10 }
]
Questions:
This approach has the drawback of slightly less client support due to the not-often-used HTTP verb PATCH. Is there a more compatible approach without sacrificing HTTP compatibility (idempotency et cetera)?
Modelling the individual list entries as separate resources and using PUT and DELETE (perhaps with ETag and/or If-Match) seems an option (PUT /lists/requiredItems/entries/3, DELETE /lists/requiredItems/entries/1 PUT /lists/requiredItems/revision), but how would I make sure all those operations are applied when the network drops in the middle of an update chain? Is a HTTP PATCH allowed to work on multiple resources?
Is there a better way to 'version' the lists, perhaps implicitly also improving how they are updated? Note that the client determines the revision number.
Is it correct to query the revision number with GET /lists/requiredItems?property=revision? Should it be a separate resource like /lists/requiredItems/revision? If it should be a separate resource, how would I update it atomically (i.e. the list and revision are both updated or both not updated)?
Would it work in JSON patch to first test the revision value to be 3 and then update it to 10 in the same patch?

This approach has the drawback of slightly less client support due to the not-often-used HTTP verb PATCH.
As far as I can tell, PATCH is really only appropriate if your server is acting like a dumb document store, where the action is literally "please update your copy of the document according to the following description".
So if your resource really just is a JSON document that describes a list with millions of entries, then JSON-Patch is a great answer.
But if you are expecting that the patch will, as a side effect, update an entity in your domain, then I'm suspicious.
Is a HTTP PATCH allowed to work on multiple resources?
RFC 5789
The PATCH method affects the resource identified by the Request-URI, and it also MAY have side effects on other resources
I'm not keen on querying the revision number; it doesn't seem to have any clear advantage over using an ETag/If-Match approach. Some obvious disadvantages - the caches between you and the client don't know that the list and the version number are related; a cache will happily tell a client that version 12 of the list is version 7, or vice versa.

Answering my own question. My first bullet point may be opinion-based and, as has been pointed out, I've asked many questions in one post. Nevertheless, here's a summary of what was answered by others (VoiceOfUnreason) and my own additional research:
ETags are HTTP's resource 'hashes'. They can be combined with If-Match headers to have a versioning system. However, ETag-headers are normally not used to declare the ETag of a resource that is being created (PUT) or updated (POST/PATCH). The server storing the resource usually determines the ETag. I've not found anything explicitly forbidding this, but many implementations may assume that the server determines the ETag and get confused when it is provided with PUT or PATCH.
A separate revision resource is a valid alternative to ETags for versioning. This resource must be updated at the same time as the resource it is the revision of.
It is not semantically enforceable on a HTTP level to have commit/rollback transactions, unless by modelling the transaction itself as a ReST resource, which would make things much more complicated.
However, some properties of PATCH allow it to be used for this:
A HTTP PATCH must be atomic and can operate on multiple resources. RFC 5789:
The server MUST apply the entire set of changes atomically and never provide (e.g., in response to a GET during this operation) a partially modified representation. If the entire patch document cannot be successfully applied, then the server MUST NOT apply any of the changes.
The PATCH method affects the resource identified by the Request-URI, and it also MAY have side effects on other resources; i.e., new resources may be created, or existing ones modified, by the application of a PATCH. PATCH is neither safe nor idempotent
JSON PATCH can consist of multiple operations on multiple resources and all must be applied or none must be applied, making it an implicit transaction. RFC 6902: Operations are applied sequentially in the order they appear in the array.
Thus, the revision can be modeled as a separate resource and still be updated at the same time. Querying the current revision is a simple GET. Committing a transaction is a single PATCH request containing first a test of the revision, then the operations on the resource(s) and finally the operation to update the revision resource.
The server can still choose to publish the revision as ETag of the main resource.

Related

REST API which needs multiple different resources?

I'm designing a REST api for running jobs on virtual machines in different domains (Active Directory domains, the virtual machines with the same name can exist in different domains).
/domains
/domains/{dname}
/domains/{dname}/vms
/domains/{dname}/vms/{cname}
And for jobs, which will be stored in a database
/jobs
/jobs/{id}
Now I need to add a new API for the following user stories.
As a user, I want to run a job (just job definition, not the stored job) on an existing VM.
As a user, I want to run a job (just job definition, not the stored job) on VM named x, which may or may not exist. The system should create the VM if x doesn't exist.
How should the api be designed?
Approach 1:
PUT /domains/{dname}
{ "state": "running_job", "vm": "vm_name", "job_definition": { .... } }
Approach 2:
PUT /domains/{dname}/vms/{vm_name}
{ "state": "running_job", "job_definition": { .... } }
Approach 3:
PUT /jobs
{ "state": "running", "domain": "name", "vm": "vm_name", "job_definition": { .... } }
Approach 4: create a new resource, saying scheduler,
PUT /scheduler
{ "domain": "name", "vm": "vm_name", "job_definition": { .... } }
(what if I need to update some attributes of scheduler in the future?)
In general, hwo to design the REST API url which needs multiple resources?
How should the api be designed?
How would you design this on the web?
There would be an HTML form, right? With a bunch of input controls to collect information from the operator about what job to use, and which VM to target, and so on. Operator would fill in the details, submit the form. The browser would then use the form to create the appropriate HTTP request to send to the server (the request-target being computed from the form meta data).
Since the server gets to decide what the request-target should be (benefits of using hypertext), it can choose any resource identifier it wants. In HTTP, a successful unsafe request happens to invalidate previously cached responses with the same request target, so one possible strategy is to consider which is the most important resource changed by successfully handling the request, and use that resource as the target.
In this specific case, we might have a resource that represents the job queue (ex /jobs), and what we are doing here is submitting a new entry in the queue, so we might expect
POST /jobs HTTP/1.1
....
If the server, in its handling of the request, also creating new resources for the specific job, then those would be indicated in the response
HTTP/1.1 201 Created
Location: /jobs/931a8a02-1a87-485a-ba5b-dd6ee716c0ef
....
Could you instead just use PUT?
PUT /jobs/931a8a02-1a87-485a-ba5b-dd6ee716c0ef HTTP/1.1
???
Yes, if (a) the client knows what spelling to use for the request-target and (b) is the client knows what the representation of the resource should look like.
Which unsafe HTTP method you use in the messages that trigger you business activities doesn't actually matter very much. You need to use the methods correctly (so that general purpose HTTP connectors don't get misled).
In particular, the important thing to remember about PUT is that the request body should be a complete representation of the resource - in other words, the request body for a PUT should match the response body of a GET. Think "save file"; we've made local edits to our copy of a resource, and we send back a copy of the entire document.

FIWARE's subscription do not notify

I have multiple subscriptions to different entities, and at some random time one of the subscriptions stops being notified.
This is because the lastNotification attribute is set in the future. Here is an example :
curl 'http://localhost:1026/v2/subscriptions/xxxxxxxxxxxxxxx'
{
"id": "xxxxxxxxxxxxxxx",
...
"status": "active",
...
"notification": {
"timesSent": 1316413,
"lastNotification": "2021-01-20T18:33:39.000Z",
...
"lastFailure": "2021-01-18T12:11:26.000Z",
"lastFailureReason": "Timeout was reached",
"lastSuccess": "2021-01-20T17:12:09.000Z",
"lastSuccessCode": 204
}
}
In this example, lastNotification is ahead of time. It is not until 2021-01-20T18: 33: 39.000Z for the subscription to be notified again.
I tried to modify lastNotification in the mongo database but that does not change it. Looks like the value 2021-01-20T18: 33: 39.000Z is cached.
The field lastFailureReason specifies the reason of the last notification failure associated to that subscription. The diagnose notification reception problems section in the documentation explains possible causes for "Timeout was reached". It makes sense to have this fail randomly if your network connection is somehow unstable.
With regards to timestamping in the future (no matter if it happens for lastNotification, lastFailure or lastSuccess) is pretty weird and probably not associated to Orion Context Broker operation. Orion takes timestamp from internal clock of the system where it is running, so maybe your system clock is set in the future.
Except if you run Orion Context Broker with -noCache (which in general is not recommended), subscriptions are cached. Thus, if you "hack" them in the DB you will not see the effect until next cache refresh (refreshes takes place at regular intervals, defined by -subCacheIval parameter).

Testing azure inline functions in policy definitions

I am currently working on implementing our naming policies using azure policy. I am having some issues with the match / equals / like operators. They seem to match even though I think they should not. For instance
"value": "[substring(field('name'), sub(length(field('name')), 6), 6)]",
"match": "Prod##"
matches autorb-PermProd-01 , and as far as I can understand, the index here starts at number six from the right, which means "rod-01" == "Prod##" . This just does not seem right to me? Also I wonder if there are ways to test these functions locally, as it takes forever to upload them and test in my sandbox environment.
Ok, so what I did wrong was; I misunderstood the effect. It does not match, but for some reason it lists as compliant when the resource is already made, and denies me making new resources when the effect is set to deny. Some more details
"value": "[substring(field('name'), sub(length(field('name')), 5), 5)]",
"match": "Dev##"
---- snipped for clarity ---
"then": {
"effect": "deny"
Will not trigger an alarm for autorb-permdev-01 when it is already made, but will deny it's creation.

Context Broker, ONTIMEINTERVAL subscribe immediatelly sends request to reference

The problem is even if I put condValues to PT10S, when I send request to contextBroker it requests back the reference url rigth away, not after 10 sec, and then it continues to send requests at 10 sec.
My question: is there a way to avoid the first initial request?
Here is a body of the request that I send to server where contextBroker is installed.
{
"entities": [{
"type": "Cycle",
"isPattern": "false",
"id": "someid"
}],
"attributes": [
...
],
"reference": "someurl"
"duration": "P1M",
"notifyConditions": [{
"type": "ONTIMEINTERVAL",
"condValues": [
"PT10S"
]
}]
}
At the present moment (Orion 1.1) initial notification cannot be avoided. However, being able to configure that behaviour would be an interesting feature to develop in the future and, consecuently, a github issue was created time ago about it.
In addition, note that ONTIMEINTERVAL subscriptions are no longer supported so you should avoid to use them:
ONTIMEINTERVAL subscriptions have several problems (introduce state in CB, thus making horizontal scaling configuration much harder, and makes it difficult to introduce pagination/filtering). Actually, they aren't really needed, as any use case based on ONTIMEINTERVAL notification can be converted to an equivalent use case in which the receptor runs queryContext at the same frequency (and taking advantage of the features of queryContext, such as pagination or filtering).
EDIT: the posibility of avoiding initial notification has been finally implemented at Orion. Details are at this section of the documentation. It is now in the master branch (so if you use fiware/orion:latest docker you will get it) and will be include in next Orion version (2.2.0).

PATCH multiple resources

Short: Is it standard-compliant, RESTful and otherwise good idea to enable PATCH requests to update a collection of resources, not just a single one, but still individually?
Long:
I'm considering exposing a method for enabling batch, atomic updates to my collection of resources. Example:
PATCH /url/myresources
[
{
"op": "add",
"path": "/1", // ID if the individual resource
"value":
{
... full resource representation ...
}
},
{
"op": "remove",
"path": "/2"
},
{
"op": "replace",
"path": "/3/name",
"value": "New name"
}
]
The context is a public API of a commercial solution. The benefits of allowing such PATCHes is the atomicity as well as batch-friendliness without spamming requests, handling failures individually etc.
I've consulted https://www.rfc-editor.org/rfc/rfc6902 and https://www.rfc-editor.org/rfc/rfc5789 but couldn't find a definitive answer if this is compliant. The RFCs mostly refer to "a resource", but a collection of resources could also be treated as such.
Is this a good idea? Are there better alternatives?
I like this idea. A collection is a resource, too. So acting on it is perfectly good REST.
The semantic of your PATCH request would be that every subresource not listed in the request body is to be left as it is. Every subresource that is listed is to be changed as described. Yes, that sounds good to me.
As long as every segment of the request can be executed in a single request, I see no problems. Both your "all in one" request and single requests like this would be fine.
PATCH /url/myresources/1
{
"op": "add",
"value":
{
... full resource representation ...
}
}