I have (JSON for) a resource like this at /api/foo/1/
{name: "foo", bar_pks: [10, 11, 12]}
Now I want to add an API for appending to bar_pks. There is no HTTP verb for appending I can find. If I do a patch to /api/foo/1/ with {bar_pks: [13]}, it will overwrite, not append.
Is there a verb more consistent with appending?
If I modify patch for this resource to always do an append, will it come to bite me later?
(I am using Django and Tastypie, but would prefer a language agnostic answer.)
Is there a compelling reason not to do the append on the client side and use a PUT/PATCH to send the updated value back to the server?
I see a couple of options if you're dead-set on doing this:
A POST to a new URI which gets interpreted as a an append to the
list.
A query parameter on the PATCH which indicates that the
changes should be appended rather than overwrite.
Neither of these are good options, and I do not endorse using them.
Related
In my karate tests i need to write response id's to txt files (or any other file format such as JSON), was wondering if it has any capability to do this, I haven't seen otherwise in the documentation. In the case of no, is there a simple JavaScript function to do so?
Try the karate.write(value, filename) API but we don't encourage it. Also the file will be written only to the current "build" directory which will be target for Maven projects / stand-alone JAR.
value can be any data-type, and Karate will write the bytes (or plain-text) out. There is no built-in support for any other format.
Here is an example.
EDIT: for others coming across this answer in the future the right thing to do is:
don't write files in the first place, you never need to do this, and this question is typically asked by inexperienced folks who for some reason think that the only way to "save" a response before validation is to write it to a file. No, please don't waste your time - and please just match against the response. You can save it (or parts of it) to variables while you make other HTTP requests. And do not write your tests so that scenarios (or features) depend on other scenarios, this is a very bad practice. Also note that by default, Karate will dump all HTTP requests and responses in the log file (typically in target/karate.log) and also in the HTML report.
see if karate.write() works for you as per this answer
write a custom Java (or JS function that uses the JVM) to do what you want using Java interop
Also note that you can use karate.toCsv() to convert JSON into CSV if needed.
My justification for writing to a file is a different one. I am using karate explicitly to implement a mock. I want to expose an endpoint wherein the upstream system will send some basic data through json payload using POST/PUT method and karate will construct the subsequent payload file and stores it the specific folder, and this newly created payload file will be exposed through another GET call.
A simple REST API:
GET: items/{id} - Returns a description of the item with the given id
PUT: items/{id} - Updates or Creates the item with the given id
DELETE: items/{id} - Deletes the item with the given id
Now the API-extension in question:
GET: items?filter - Returns all item ids matching the filter
PUT: items - Updates or creates a set of items as described by the JSON payload
[[DELETE: items - deletes a list of items described by JSON payload]] <- Not Correct
I am now being interested in the DELETE and PUT operation recycling functionality that can be easily accessed by PUT/DELETE items/{id}.
Question: Is it common to provide an API like this?
Alternative: In the age of Single Connection Multiple Requests issuing multiple requests is cheap and would work more atomic since a change either succeeds or fails but in the age of NOSQL database a change in the list might already have happend even if the request processing dies with internal server or whatever due to whatever reason.
[UPDATE]
After considering White House Web Standards and Wikipedia: REST Examples the following Example API is now purposed:
A simple REST API:
GET: items/{id} - Returns a description of the item with the given id
PUT: items/{id} - Updates or Creates the item with the given id
DELETE: items/{id} - Deletes the item with the given id
Top-resource API:
GET: items?filter - Returns all item ids matching the filter
POST: items - Updates or creates a set of items as described by the JSON payload
PUT and DELETE on /items is not supported and forbidden.
Using POST seems to do the trick as being the one to create new items in an enclosing resource while not replacing but appending.
HTTP Semantics POST Reads:
Extending a database through an append operation
Where the PUT methods would require to replace the complete collection in order to return an equivalent representation as quoted by HTTP Semantics PUT:
A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being returned in a 200 (OK) response.
[UPDATE2]
An alternative that seems even more consistent for the update aspect of multiple objects seems to be the PATCH method. The difference between PUT and PATCH is described in the Draft RFC 5789 as being:
The difference between the PUT and PATCH requests is reflected in the way the server processes the enclosed entity to modify the resource identified by the Request-URI. In a PUT request, the enclosed entity is considered to be a modified version of the resource stored on the origin server, and the client is requesting that the stored version be replaced. With PATCH, however, the enclosed entity contains a set of instructions describing how a resource currently residing on the origin server should be modified to produce a new version. The PATCH method affects the resource identified by the Request-URI, and it also MAY have side effects on other resources; i.e., new resources may be created, or existing ones modified, by the application of a PATCH.
So compared to POST, PATCH may be also a better idea since PATCH allows an UPDATE where as POST only allows appending something meaning adding without the chance of modification.
So POST seems to be wrong here and we need to change our proposed API to:
A simple REST API:
GET: items/{id} - Returns a description of the item with the given id
PUT: items/{id} - Updates or Creates the item with the given id
DELETE: items/{id} - Deletes the item with the given id
Top-resource API:
GET: items?filter - Returns all item ids matching the filter
POST: items - Creates one or more items as described by the JSON payload
PATCH: items - Creates or Updates one or more items as described by the JSON payload
You could PATCH the collection, e.g.
PATCH /items
[ { id: 1, name: 'foo' }, { id: 2, name: 'bar' } ]
Technically PATCH would identify the record in the URL (ie PATCH /items/1 and not in the request body, but this seems like a good pragmatic solution.
To support deleting, creating, and updating in a single call, that's not really supported by standard REST conventions. One possibility is a special "batch" service that lets you assemble calls together:
POST /batch
[
{ method: 'POST', path: '/items', body: { title: 'foo' } },
{ method: 'DELETE', path: '/items/bar' }
]
which returns a response with status codes for each embedded requests:
[ 200, 403 ]
Not really standard, but I've done it and it works.
Updating Multiple Resources With One Request - Is it standard or to be avoided?
Well, sometimes you simply need to perform atomic batch operations or other resource-related operations that just do not fit the typical scheme of a simple REST API, but if you need it, you cannot avoid it.
Is it standard?
There is no universally accepted REST API standard so this question is hard to answer. But by looking at some commonly quoted api design guidelines, such as jsonapi.org, restfulapi.net, microsoft api design guide or IBM's REST API Conventions, which all do not mention batch operations, you can infer that such operations are not commonly understood as being a standard feature of REST APIs.
That said, an exception is the google api design guide which mentions the creation of "custom" methods that can be associated via a resource by using a colon, e.g. https://service.name/v1/some/resource/name:customVerb, it also explicitly mentions batch operations as use case:
A custom method can be associated with a resource, a collection, or a service. It may take an arbitrary request and return an arbitrary response, and also supports streaming request and response. [...] Custom methods should use HTTP POST verb since it has the most flexible semantics [...] For performance critical methods, it may be useful to provide custom batch methods to reduce per-request overhead.
So in the example case you provided you do the following according to google's api guide:
POST /api/items:batchUpdate
Also, some public APIs decided to offer a central /batch endpoint, e.g. google's gmail API.
Moreover, as mentioned at restfulapi.net, there is also the concept of a resource "store", in which you store and retrieve whole lists of items at once via PUT – however, this concept does not count for server-managed resource collections:
A store is a client-managed resource repository. A store resource lets an API client put resources in, get them back out, and decide when to delete them. A store never generates new URIs. Instead, each stored resource has a URI that was chosen by a client when it was initially put into the store.
Having answered your original questions, here is another approach to your problem that has not been mentioned yet. Please note that this approach is a bit unconventional and does not look as pretty as the typical REST API endpoint naming scheme. I am personally not following this approach but I still felt it should be given a thought :)
The idea is that you could make a distinction between CRUD operations on a resource and other resource-related operations (e.g. batch operations) via your endpoint path naming scheme.
For example consider a RESTful API that allows you to perform CRUD operations on a "company"-resource and you also want to perform some "company"-related operations that do not fit in the resource-oriented CRUD-scheme typically associated with restful api's – such as the batch operations you mentioned.
Now instead of exposing your resources directly below /api/companies (e.g. /api/companies/22) you could distinguish between:
/api/companies/items – i.e. a collection of company resources
/api/companies/ops – i.e. operations related to company resources
For items the usual RESTful api http-methods and resource-url naming-schemes apply (e.g. as discussed here or here)
POST /api/companies/items
GET /api/companies/items
GET /api/companies/items/{id}
DELETE /api/companies/items/{id}
PUT /api/companies/items/{id}
Now for company-related operations you could use the /api/companies/ops/ route-prefix and call operations via POST.
POST /api/companies/ops/batch-update
POST /api/companies/ops/batch-delete
POST /api/companies/ops/garbage-collect-old-companies
POST /api/companies/ops/increase-some-timestamps-just-for-fun
POST /api/companies/ops/perform-some-other-action-on-companies-collection
Since POST requests do not have to result in the creation of a resource, POST is the right method to use here:
The action performed by the POST method might not result in a resource that can be identified by a URI.
https://www.rfc-editor.org/rfc/rfc2616#section-9.5
As far as I understand the REST concept it covers an update of multiple resources with one request. Actually the trick here is to assume a container around those multiple resources and take it as one single resource. E.g. you could just assume that list of IDs identifies a resource that contains several other resources.
In those examples in Wikipedia they also talk about resources in Plural.
I'm brand new to Pentaho and I'm trying to do the following workflow:
read a bunch of lines out of a DB
do some transformations
POST them to a REST web service in JSON
I've got the first two figured out using an input step and the Json Output step.
However I have two problems doing the final step:
1) I can't get the JSON formatted how I want. It insists on doing {""=[{...}]} when I just want {...}. This isn't a big deal - I can work around this since I have control over the web service and I could relax the input requirements a bit. (Note: this page http://wiki.pentaho.com/display/EAI/JSON+output gives an example for the output I want by setting no. rows in a block=1 and an empty JSON block name, but it doesn't work as advertised.)
2) This is the critical one. I can't get the data to POST as JSON. It posts as key=value, where the key is the name I specify in the HTTP Post field name (on the 'Fields' tab) and the value is the encoded JSON. I just want to post the JSON as the request body. I've tried googling on this but can't find anyone else doing it, leading me to believe that I'm just approaching this wrong. Any pointers in the right direction?
Edit: I'm comfortable scripting (in Javascript or another language) but when I tried to use XmlHttpRequest in a custom javascript snippet I got an error that XmlHttpRequest is not defined.
Thanks!
This was trivial...just needed to use the REST Client (http://wiki.pentaho.com/display/EAI/Rest+Client) instead of the HTTP Post task. Somehow all my googling didn't discover that, so I'll leave this answer here in case someone else has the same problem as me.
You need to parse the JSON using a Modified JavaScript step. e.g. if the Output Value from the JSON Output is called result and its contents are {"data"=[{...}]}, you should call var plainJSON = JSON.stringify(JSON.parse(result).data[0]) to get the JSON.
In the HTTP Post step, the Request entity field should be plainJSON. Also, don't forget to add a header for Content-Type as application/json (you might have to add that as a constant)
I have a collection resource called Columns. A GET with Accept: application/json can't directly return a collection, so my representation needs to nest it in a property:-
{ "propertyName": [
{ "Id": "Column1", "Description": "Description 1" },
{ "Id": "Column2", "Description": "Description 2" }
]
}
Questions:
what is the best name to use for the identifier propertyName above? should it be:
d (i.e. is d an established convention or is it specific to some particular frameworks (MS WCF and MS ASP.NET AJAX ?)
results (i.e. is results an established convention or is it specific to some particular specifications (MS OData)?)
Columns (i.e. the top level property should have a clear name and it helps to disambiguate my usage of generic application/json as the Media Type)
NB I feel pretty comfortable that there should be something wrapping it, and as pointed out by #tuespetre, XML or any other representation would force you to wrap it to some degree anyway
when PUTting the content back, should the same wrapping in said property be retained [given that it's not actually necessary for security reasons and perhaps conventional JSON usage idioms might be to drop such nesting for PUT and POST given that they're not necessary to guard against scripting attacks] ?
my gut tells me it should be symmetric as for every other representation but there may be prior art for dropping the d/*results** [assuming that's the answer to part 1]*
... Or should a PUT-back (or POST) drop the need for a wrapping property and just go with:-
[
{ "Id": "Column1", "Description": "Description 1" },
{ "Id": "Column2", "Description": "Description 2" }
]
Where would any root-level metadata go if one wished to add that?
How/would a person crafting a POST Just Know that it needs to be symmetric?
EDIT: I'm specifically interested in an answer that with a reasoned rationale that specifically takes into account the impacts on client usage with JSON. For example, HAL takes care to define a binding that makes sense for both target representations.
EDIT 2: Not accepted yet, why? The answers so far don't have citations or anything that makes them stand out over me doing a search and picking something out of the top 20 hits that seem reasonable. Am I just too picky? I guess I am (or more likely I just can't ask questions properly :D). Its a bit mad that a week and 3 days even with an )admittedly measly) bonus on still only gets 123 views (from which 3 answers ain't bad)
Updated Answer
Addressing your questions (as opposed than going off on a bit of a tangent in my original answer :D), here's my opinions:
1) My main opinion on this is that I dislike d. As a client consuming the API I would find it confusing. What does it even stand for anyway? data?
The other options look good. Columns is nice because it mirrors back to the user what they requested.
If you are doing pagination, then another option might be something like page or slice as it makes it clear to the client, that they are not receiving the entire contents of the collection.
{
"offset": 0,
"limit": 100,
"page" : [
...
]
}
2) TBH, I don't think it makes that much difference which way you go for this, however if it was me, I probably wouldn't bother sending back the envelope, as I don't think there is any need (see below) and why make the request structure any more complicated than it needs to be?
I think POSTing back the envelope would be odd. POST should let you add items into the collection, so why would the client need to post the envelope to do this?
PUTing the envelope back could make sense from a RESTful standpoint as it could be seen as updating metadata associated with the collection as a whole. I think it is worth thinking about the sort of meta data you will be exposing in the envelope. All the stuff I think would fit well in this envelope (like pagination, aggregations, search facets and similar meta data) is all read only, so it doesn't make sense for the client to send this back to the server. If you find yourself with a lot of data in the envelope that the client is able to mutate - then it could be a sign to break that data out into a separate resource with the list as a sub collection. Rubbish example:
/animals
{
"farmName": "farm",
"paging": {},
"animals": [
...
]
}
Could be broken up into:
/farm/1
{
"id": 1,
"farmName": "farm"
}
and
/farm/1/animals
{
"paging": {},
"animals": [
...
]
}
Note: Even with this split, you could still return both combined as a single response using something like Facebook's or LinkedIn's field expansion syntax. E.g. http://example.com/api/farm/1?field=animals.offset(0).limit(10)
In response, to your question about how the client should know what the JSON payload they are POSTing and PUTing should look like - this should be reflected in your API documentation. I'm not sure if there is a better tool for this, but Swagger provides a spec that allows you to document what your request bodies should look like using JSON Schema - check out this page for how to define your schemas and this page for how to reference them as a parameter of type body. Unfortunately, Swagger doesn't visualise the request bodies in it's fancy web UI yet, but it's is open source, so you could always add something to do this.
Original Answer
Check out William's comment in the discussion thread on that page - he suggests a way to avoid the exploit altogether which means you can safely use a JSON array at the root of your response and then you need not worry about either of you questions.
The exploit you link to relies on your API using a Cookie to authenticate a user's session - just use a query string parameter instead and you remove the exploit. It's probably worth doing this anyway since using Cookies for authentication on an API isn't very RESTful - some of your clients may not be web browsers and may not want to deal with cookies.
Why Does this fix work?
The exploit is a form of CSRF attack which relies on the attacker being able to add a script tag on his/her own page to a sensitive resource on your API.
<script src="http://mysite.com/api/columns"></script>
The victims web browser will send all Cookies stored under mysite.com to your server and to your servers this will look like a legitimate request - you will check the session_id cookie (or whatever your server-side framework calls the cookie) and see the user is authenticated. The request will look like this:
GET http://mysite.com/api/columns
Cookie: session_id=123456789;
If you change your API you ignore Cookies and use a session_id query string parameter instead, the attacker will have no way of tricking the victims web browser into sending the session_id to your API.
A valid request will now look like this:
GET http://mysite.com/api/columns?session_id=123456789
If using a JavaScript client to make the above request, you could get the session_id from a cookie. An attacker using JavaScript from another domain will not be able to do this, as you cannot get cookies for other domains (see here).
Now we have fixed the issue and are ignoring session_id cookies, the script tag on the attackers website will still send a similar request with a GET line like this:
GET http://mysite.com/api/columns
But your server will respond with a 403 Forbidden since the GET is missing the required session_id query string parameter.
What if I'm not authenticating users for this API?
If you are not authenticating users, then your data cannot be sensitive and anyone can call the URI. CSRF should be a non-issue since with no authentication, even if you prevent CSRF attacks, an attacker could just call your API server side to get your data and use it in anyway he/she wants.
I would go for 'd' because it clearly separates the 'envelope' of your resource from its content. This would also make it easier for consumers to parse your responses, as opposed to 'guessing' the name of the wrapping property of a given resource before being able to access what it holds.
I think you're talking about two different things:
POST request should be sent in application/x-www-form-urlencoded. Your response should basically mirror a GET if you choose to include a representation of the newly created resource in your reply. (not mandatory in HTTP).
PUTs should definitely be symmetric to GETs. The purpose of a PUT request is to replace an existing resource representation with another. It just makes sense to have both requests share the same conventions, doesn't it?
Go with 'Columns' because it is semantically meaningful. It helps to think of how JSON and XML could mirror each other.
If you would PUT the collection back, you might as well use the same media type (syntax, format, what you will call it.)
I am just confused with the these terms. Can anybody please provide/explain me brief with an example?
Ajax - "Asynchronous Javascript and XML". Ajax loosely defines a set of technologies to help make web applications present a richer user experience. Data updating and refreshing of the screen is done asynchronously using javascript and xml (or json or just a normal http post).
JSON - "Javascript Object Notation". JSON is like xml in that it can be used to describe objects, but it's more compact and has the advantage of being actual javascript. An object expressed in JSON can be converted into an actual object to be manipulated in javascript code.
By default, Ajax requests have to occur in the same domain of the page where the request originates. JSONP - "JSON with padding" - was created to allow you to request JSON resources from a different domain. (CORS is a newer and better alternative to JSONP.)
REST - "Representational State Transfer". Applications using REST principles have a Url structure and a request/response pattern that revolve around the use of resources. In a pure model, the HTTP Verbs Get, Post, Put and Delete are used to retrieve, create, update and delete resources respectively. Put and Delete are often not used, leaving Get and Post to map to select (GET) and create, update and delete (POST)
Ajax, or more properly, AJAX, stands for Asynchronous Javascript And Xml. Technically it refers to any asynchronous request made by the browser (anything that uses an XmlHttpRequest) on behalf of some script running on the current page, regardless of what content-type is returned. It can also be used to describe a certain pattern of constructing a page/site where most/all of the content is fetched/updated dynamically on the page. When used to describe a data format, "ajax" typically means "xml".
JSON is a data encoding format. The name itself is an acronym for "JavaScript Object Notation". JSON-formatted data looks like:
{"key": "value1", "key2": {"number": 1, "array": [0, 1, 2]}}
JSON data may be fetched by an AJAX request, though it is quite commonly used in other contexts as a lightweight, extensible, and easy-to-parse data exchange format.
JSONP is simply JSON-formatted data wrapped in a callback function. The "P" stands for "with Padding", which is kind of stupid unless you like to think of function calls as "padding". In any case, JSONP data will look like:
someFunction({"key": "value1", "key2": {"number": 1, "array": [0, 1, 2]}});
As such, JSONP is really just a JavaScript snippet, and unlike JSON is not used outside of the context of JavaScript, browsers (or other JavaScript-capable clients), and AJAX requests. The reason for using JSONP is that it allows the same-origin policy to be subverted. A script that was sourced in from site X cannot make a direct request to site Y if site Y is on a different domain from site X. But if site Y's server can send JSONP-formatted responses, then the script from site X can add a new <script> tag to the document that references a URL on site Y, and when the response from site Y is loaded it will invoke some callback function that script X has defined in the document, thus giving script X access to data that was loaded dynamically from site Y.
Note that JSONP data is not (typically) requested using an XmlHttpRequest. It can be done this way, subject to the standard caveats of the same-origin policy, but then you lose the cross-domain magic that makes JSONP useful in the first place.
REST is simply the formal spec/description of how HTTP actually works/is intended to be used. If you understand the concept of a URL being used to request a corresponding resource from a server and the difference between Get and Post then you really know all you need to about REST.
Ajax stand for Asynchronous JavaScript and Xml/XhttpRequet (the X depends and changed since mostly json is used today.
It's a way to execute a request from the page using javascript to the server and receive some response. This response can be anything, json, xml, text, html, etc...
This makes pages look very responsive without having to reload the complete page to execute actions on it. For example posting this answer to your questions. :-)
Json is a data format stands for JavaScrip Object Notation. It's a lighter serialization format than xml and has the advantage to be JavaScript.
JsonP is the next and logical step to using Ajax with Json.
A server will response with JSONP wrapping the Json object in a callback function. The name of the function is passed by the client to the server, usually as a parameter in the querystring.
The P stands for padding, since the server surrounds the json object with the name of the function and the object as an argument.
callback({"name":"my name"});
See: http://en.wikipedia.org/wiki/JSONP for a more detailed explanation.