I have a problem in deciding what to do in this case in REST API design.
here is my problem,
I have a resource domain model, which has a nested object, which is also a domain model.
you can imagine something like this
{
"name":"abc"
"type":{
"name":"typeName",
"description":"description"
}
}
Now, i want to be able to fetch the outer model resources, based on the inner model and few more params.
for example, i want to fetch all outer model resources which have a given type and some params like page number, size etc.
So my questions,
1.the API should accept inner model in post, and return outer model, is it a good rest design?
How do i pass the extra params? It's a POST, can't put them in url, and can't put them inner model.
Should i create a new model, which contains these extra params and the type resource also?
like
{
"page":"10",
"type":{
"name":"typeName",
"description":"description"
}
}
If you are making a generic HTTP service, it's acceptable to use POST to send a complex query, and to get a response.
If you are trying to be RESTful, then this is a bad practice. You have two options. Option 1 is to find a way to encode your query in the URL, so you can use a GET request.
Option 2 is more involved. I wouldn't necessarily say that I would suggest this, but it's a method to get around the constraints of REST while having complex queries.
The idea is that you use POST to create a 'query' resource. Almost as if you doing a server-side prepared statement, and then later on use GET to get the result of the query.
Example of the client->server conversation:
POST /queries
Content-Type: application/json
...
A response to this might be:
HTTP/1.1 201 Created
Location: /queries/1234
Link: </queryresults/1234>; rel="some-relationship-identifier"
Then after that you could do a GET on /queries/1234 to see the query you 'prepared' and a GET on /queryresults/1234 to see the actual report.
Benefits of this is that it stays within the constraints of REST, and that you could potentially re-use queries and take a longer time to generate the results.
The obvious drawback is that it's harder to explain this to a consumer of your API, as not everyone might be familiar with this pattern and it's an extra HTTP request.
So you have to decide:
Is it worth doing this?
Can you encode the query in the URI instead to avoid this altogether
Maybe you don't care enough about being RESTful and you might just want to break the rules and use POST for some complex queries.
Related
How to properly design REST if you have a composition? I have a TestResult entity, which has TestCaseResults entities. Both support full set of REST methods. The important fact about this (which I believe differs from many examples I found on a web) is that TestResult is not consistent if it doesn't have all of TestCaseResults How do I properly design this in REST?
Let's say I create it as separate but dependent resources: api\testresults\ and api\testresults\1\testcaseresults. When the client wants to create a test result, he needs to POST to api\testresults, then retrieve URL api\testresults\1\testcaseresutls by a link from the response, and POST all of test case results to it. This means that at some point in time the test result is not consistent until the user finishes its operation. Basically, there is no concept of the transaction here.
Let's say I create only api\testresults resource, and embed an array of test case results inside, like this:
{
"Name": "Test A"
"Results": [
{
"Measured": "BB",
...
},
...
]
...
}
Then it is easier to insert, but it still hard to work with. Simple GET to api\testresults\1\ will retrieve test result with a big amount of test case results. GET to api\testresults\ will retrieve much more! The structure of this becomes complex. Furthermore, in the real word I have a few entities like TestCaseResults belong to TestResults, so there will be a few arrays, and each could have 100-200 elements.
I could try to combine the approaches. Embed the array, but also provide links to api\testresults\1\testcaseresults and support operations there as well. Maybe on GET api\testresults\1\ I could provide TestResult without it's TestCaseResults but only with a link pointing to a resource, but on POST I could accept an array of TestCaseResults embedded (not sure though it is allowed to have different return types for POST and GET in REST) But now there are two approaches for inserting information, it is confusing and I'm still not sure it solves anything.
your approach with api\testresults\1 and api\testresults\1\testcaseresults seems promising.
As JSON does not have a fixed structure, you can add query parameters to your URL to control if results are inserted or not.
api\testresults\1?with_results=true would mean that your caller want to see the test cases in addition to the test results.
api\testresults\1\testcaseresults would still return the test case results for your test 1.
If you fear that the number of test case results is too large, you can add pagination parameters, that would be reuse in the testcaseresults call.
api\testresults\1?with_results=true&per_page=10 would include the only the 10 first results. To get more, use api\testresults\1\testcaseresults?per_page=10&page=2 and so on, as it is the dedicated endpoint.
Cheers
Note: if you want a flexible API still returning JSON data, you can give a look to GraphQL, the trendy approach.
This issue is bugging me for some time now. To test it I just installed a fresh Apigility, set the db (PDO:mysql) and added a DB-Connected service. In the table I have 40+ records. When I make a GET collection request the response looks OK (with the default HAL content negotiation). Then I change the content negotiation to JSON. Now when I make a GET collection request my response contains only 10 elements.
So my question is: where do I set/change this limit?
You can set the page size manually, like so:
$paginator = $this->getAlbumTable()->fetchAll(true);
// set the current page to what has been passed in query string, or to 1 if none set
$paginator->setCurrentPageNumber((int) $this->params()->fromQuery('page', 1));
// set the number of items per page to 10
$paginator->setItemCountPerPage(10);
http://framework.zend.com/manual/current/en/tutorials/tutorial.pagination.html
Could you please send the page_size, total_items part at the end of the json output?
it's like:
"page_count": 140002,
"page_size": 25,
"total_items": 3500035,
"page": 1
This is not an ideal fix, because it requires you to go into the source code rather than using the page size given in the UI.
The collection class that is auto generated for you by the DB-Connected style derives off of Zend/Paginator/Paginator. This class defines the $defaultItemCountPerPage static protected member which is defaulted to 10. That's why you're only getting 10 results. If you open up the auto-generated collection class for your entity and add: protected static $defaultItemCountPerPage = 100; in the otherwise empty class, you will see that you now get up to 100 results in the response. You can look at other Paginator class variables and methods that you could replace in your derived class to get your desired behavior.
This is not an ideal solution. I'd prefer that the generated code automatically used the same configed page size that the HalJson strategy uses. Maybe I'll contribute a PR to change that. Or, maybe I'll just use the HalJson approach. It does seem like the better way to go. You should have some limit to how much data you load in from the DB at a time to not have an overly long running query or an overly large collection of data coming back you have to deal with. And, whatever limit you set, what do you do when you hit that limit? With the simple Json method, you can't ever get "page 2" of data. So, if you are going to work with some sizeable amount of data, it might be better to use HalJson on and then have some logic on the client side to grab pages of data at a time as needed. The returned JSON structure is a little more complicated, but not terribly so.
I'm probably in the same spot you are -- I'm trying to do a simple little api to play with while keeping everything simple and so I didn't want the client to have to deal with the other stuff in HalJson, but probably better to deal with that complexity and have a smooth way to page through data if you're going to use this with some real set of data. At least, that's the pep talk I'm giving myself right now. :-)
I have a collection of IDs of RESTful resources (all the same type of resource), the number of which can be indefinitely large. I want to make a REST call to get the names of these resources. Something like this:
Send:
['005fc983-fe41-43b5-8555-d9a2310719cd', '4c6e6898-e519-4bac-b03e-e8873d3fa3f0',...]
Receive:
['Resource A', 'Resource B',...]
What is the best way to retrieve the names of these resources RESTfully?
Here are the ideas I have had and the problems I see with each approach:
The naive approach is to iterate through all IDs in my collection and do a 'GET /resource/:id' for each ID. This would be prohibitively slow and resource intensive because of the large number of HTTP calls I would have to make.
The next approach I thought of is to pass the IDs as parameters to a single GET call. The problem here is that most servers have a limit on the URL length, which would be quickly exceeded.
Next, I thought that putting the IDs in the body of a GET would work, but according to Roy Fielding, data in the GET body should not affect the results of a REST call: HTTP GET with request body
I could use a POST request and put the data on the POST body, but POST is intended for creating and modifying resources, which is not what I'm doing. Maybe I should ignore the intent of the verb and use it anyway?
I could split the request into multiple GET requests to avoid exceeding the max URL length. The problem here is that I have to combine the results after all calls have returned, which is potentially slow.
I could create a collection resource within my main resource by posting my list of IDs to 'POST /resource/collection', then use a 'GET /resource/collection/:id' call to retrieve the results. This actually works, but then I have to do a 'DELETE /resource/collection/:id' to clean up. It takes multiple calls, requires cleanup, and seems a bit clunky overall, so it's okay, but not ideal.
Is there a better way to do this?
Your last approach is RESTful and the one I recommend. I'd do this:
Step 1:
Request:
POST /resource/collection
Content-Tpye: application/json
{
"ids": [
"005fc983-fe41-43b5-8555-d9a2310719cd",
"4c6e6898-e519-4bac-b03e-e8873d3fa3f0"
]
}
Response:
201 Created
Location: /resource/collection/89AB8902-FDF1-11E4-ADDF-CD4FB664A5DC
Step 2:
Request:
GET /resource/collection/89AB8902-FDF1-11E4-ADDF-CD4FB664A5DC
Response:
200 OK
Content-Type: application/json
{
"resources": [ ... ]
}
but then I have to do a 'DELETE /resource/collection/:id' to clean up.
Not, that is not necessary. The server could implement a job that removes all collections that are older than a specific timestamp. It is not the client who has to do this.
If later a client access the collection again, the server would respond with
410 Gone
When handling POST, PUT, and PATCH requests on the server-side, we often need to process some JSON to perform the requests.
It is obvious that we need to validate these JSONs (e.g. structure, permitted/expected keys, and value types) in some way, and I can see at least two ways:
Upon receiving the JSON, validate the JSON upfront as it is, before doing anything with it to complete the request.
Take the JSON as it is, start processing it (e.g. access its various key-values) and try to validate it on-the-go while performing business logic, and possibly use some exception handling to handle vogue data.
The 1st approach seems more robust compared to the 2nd, but probably more expensive (in time cost) because every request will be validated (and hopefully most of them are valid so the validation is sort of redundant).
The 2nd approach may save the compulsory validation on valid requests, but mixing the checks within business logic might be buggy or even risky.
Which of the two above is better? Or, is there yet a better way?
What you are describing with POST, PUT, and PATCH sounds like you are implementing a REST API. Depending on your back-end platform, you can use libraries that will map JSON to objects which is very powerful and performs that validation for you. In JAVA, you can use Jersey, Spring, or Jackson. If you are using .NET, you can use Json.NET.
If efficiency is your goal and you want to validate every single request, it would be ideal if you could evaluate on the front-end if you are using JavaScript you can use json2.js.
In regards to comparing your methods, here is a Pro / Cons list.
Method #1: Upon Request
Pros
The business logic integrity is maintained. As you mentioned trying to validate while processing business logic could result in invalid tests that may actually be valid and vice versa or also the validation could inadvertently impact the business logic negatively.
As Norbert mentioned, catching the errors before hand will improve efficiency. The logical question this poses is why spend the time processing, if there are errors in the first place?
The code will be cleaner and easier to read. Having validation and business logic separated will result in cleaner, easier to read and maintain code.
Cons
It could result in redundant processing meaning longer computing time.
Method #2: Validation on the Go
Pros
It's efficient theoretically by saving process and compute time doing them at the same time.
Cons
In reality, the process time that is saved is likely negligible (as mentioned by Norbert). You are still doing the validation check either way. In addition, processing time is wasted if an error was found.
The data integrity can be comprised. It could be possible that the JSON becomes corrupt when processing it this way.
The code is not as clear. When reading the business logic, it may not be as apparent what is happening because validation logic is mixed in.
What it really boils down to is Accuracy vs Speed. They generally have an inverse relationship. As you become more accurate and validate your JSON, you may have to compromise some on speed. This is really only noticeable in large data sets as computers are really fast these days. It is up to you to decide what is more important given how accurate you think you data may be when receiving it or whether that extra second or so is crucial. In some cases, it does matter (i.e. with the stock market and healthcare applications, milliseconds matter) and both are highly important. It is in those cases, that as you increase one, for example accuracy, you may have to increase speed by getting a higher performant machine.
Hope this helps.
The first approach is more robust, but does not have to be noticeably more expensive. It becomes way less expensive even when you are able to abort the parsing process due to errors: Your business logic usually takes >90% of the resources in a process, so if you have an error % of 10%, you are already resource neutral. If you optimize the validation process so that the validations from the business process are performed upfront, your error rate might be much lower (like 1 in 20 to 1 in 100) to stay resource neutral.
For an example on an implementation assuming upfront data validation, look at GSON (https://code.google.com/p/google-gson/):
GSON works as follows: Every part of the JSON can be cast into an object. This object is typed or contains typed data:
Sample object (JAVA used as example language):
public class someInnerDataFromJSON {
String name;
String address;
int housenumber;
String buildingType;
// Getters and setters
public String getName() { return name; }
public void setName(String name) { this.name=name; }
//etc.
}
The data parsed by GSON is by using the model provided, already type checked.
This is the first point where your code can abort.
After this exit point assuming the data confirmed to the model, you can validate if the data is within certain limits. You can also write that into the model.
Assume for this buildingType is a list:
Single family house
Multi family house
Apartment
You can check data during parsing by creating a setter which checks the data, or you can check it after parsing in a first set of your business rule application. The benefit of first checking the data is that your later code will have less exception handling, so less and easier to understand code.
I would definitively go for validation before processing.
Let's say you receive some json data with 10 variables of which you expect:
the first 5 variables to be of type string
6 and 7 are supposed to be integers
8, 9 and 10 are supposed to be arrays
You can do a quick variable type validation before you start processing any of this data and return a validation error response if one of the ten fails.
foreach($data as $varName => $varValue){
$varType = gettype($varValue);
if(!$this->isTypeValid($varName, $varType)){
// return validation error
}
}
// continue processing
Think of the scenario where you are directly processing the data and then the 10th value turns out to be of invalid type. The processing of the previous 9 variables was a waste of resources since you end up returning some validation error response anyway. On top of that you have to rollback any changes already persisted to your storage.
I only use variable type in my example but I would suggest full validation (length, max/min values, etc) of all variables before processing any of them.
In general, the first option would be the way to go. The only reason why you might need to think of the second option is if you were dealing with JSON data which was tens of MBs large or more.
In other words, only if you are trying to stream JSON and process it on the fly, you will need to think about second option.
Assuming that you are dealing with few hundred KB at most per JSON, you can just go for option one.
Here are some steps you could follow:
Go for a JSON parser like GSON that would just convert your entire
JSON input into the corresponding Java domain model object. (If GSON
doesn't throw an exception, you can be sure that the JSON is
perfectly valid.)
Of course, the objects which were constructed using GSON in step 1
may not be in a functionally valid state. For example, functional
checks like mandatory fields and limit checks would have to be done.
For this, you could define a validateState method which repeatedly
validates the states of the object itself and its child objects.
Here is an example of a validateState method:
public void validateState(){
//Assume this validateState is part of Customer class.
if(age<12 || age>150)
throw new IllegalArgumentException("Age should be in the range 12 to 120");
if(age<18 && (guardianId==null || guardianId.trim().equals(""))
throw new IllegalArgumentException("Guardian id is mandatory for minors");
for(Account a:customer.getAccounts()){
a.validateState(); //Throws appropriate exceptions if any inconsistency in state
}
}
The answer depends entirely on your use case.
If you expect all calls to originate in trusted clients then the upfront schema validation should be implement so that it is activated only when you set a debug flag.
However, if your server delivers public api services then you should validate the calls upfront. This isn't just a performance issue - your server will likely be scrutinized for security vulnerabilities by your customers, hackers, rivals, etc.
If your server delivers private api services to non-trusted clients (e.g., in a closed network setup where it has to integrate with systems from 3rd party developers), then you should at least run upfront those checks that will save you from getting blamed for someone else's goofs.
It really depends on your requirements. But in general I'd always go for #1.
Few considerations:
For consistency I'd use method #1, for performance #2. However when using #2 you have to take into account that rolling back in case of non valid input may become complicated in the future, as the logic changes.
Json validation should not take that long. In python you can use ujson for parsing json strings which is a ultrafast C implementation of the json python module.
For validation, I use the jsonschema python module which makes json validation easy.
Another approach:
if you use jsonschema, you can validate the json request in steps. I'd perform an initial validation of the most common/important parts of the json structure, and validate the remaining parts along the business logic path. This would allow to write simpler json schemas and therefore more lightweight.
The final decision:
If (and only if) this decision is critical I'd implement both solutions, time-profile them in right and wrong input condition, and weight the results depending on the wrong input frequency. Therefore:
1c = average time spent with method 1 on correct input
1w = average time spent with method 1 on wrong input
2c = average time spent with method 2 on correct input
2w = average time spent with method 2 on wrong input
CR = correct input rate (or frequency)
WR = wrong input rate (or frequency)
if ( 1c * CR ) + ( 1w * WR) <= ( 2c * CR ) + ( 2w * WR):
chose method 1
else:
chose method 2
Hi so I'm new to CouchDB looks great so far, but really struggling with what must be simple to do!
I have documents structured as:
{
"_id" : "245431e914ce42e6b2fc6e09cb00184d",
"_rev": "3-2a69f0325962b93c149204aa3b1fa683",
"type": "student",
"studentID": "12345678",
"Name": "Test",
"group: "A"
}
And would like to access them them with queries such as http://couchIP/student?group=A or something like that. Are Views what I need here? I don't understand how to take the parameter from the query in the Map functions in Views. example:
function(doc,req) {
if(req.group==='A'){
emit(doc.id, doc.name);
}
}
Is my understanding of how Couch is working wrong or what's my problem here? Thanks in advance, I'm sure this is Couch 101
Already read through http://guide.couchdb.org/ but it didn't really answer the question!
You need views to achieve the desired results.
Define the following map function inside a view of a design document. ( let's name the view "byGroup" and assume this lives in a design document named "_design/students" )
function(doc) {
if(doc.group){
emit(doc.group,null);
}
}
Results can be obtained from the following url
http://couchIP:5984/dbname/_design/students/_view/byGroup?startkey="A"&endkey="A"&include_docs=true
To have friendly url couchdb also provides url rewriting options.
You need to some further reading about views and the relevance that they return key/pair values.
It's not clear what you want to return from the view so I'll guess. If you want to return the whole document you'd create a view like:
function (doc) { emit(doc.group, doc) };
This will emit the group name as a key which you can lookup against, the whole doc will be returned as the value when you look it up.
If you want to just have access to the names of those users you want to do something like:
function (doc) { emit(doc.group, doc.name) };
Your question arises from a misconception about what a view does. Views use map/reduce to generate a representation of your data. You have no control of the output of your view in your query because the view is updated according to changes in your DB documents only.
Using a list is also not a good option. It may seem that you can use knowledge of your request in your list to generate a different output depending on the query parameters but this is wrong because couchdb uses ETags for caching and this means that most times you will get the same answer regardless of your list parameters since the underlying documents won't have changed. There is a trick though to fool couchdb in this case and this implies using two different alternating users but I wouldn't even try this way because surely there are easier ways to achieve your objectives and you can probably solve your problem using group as a key in your map function.