I have a collection of IDs of RESTful resources (all the same type of resource), the number of which can be indefinitely large. I want to make a REST call to get the names of these resources. Something like this:
Send:
['005fc983-fe41-43b5-8555-d9a2310719cd', '4c6e6898-e519-4bac-b03e-e8873d3fa3f0',...]
Receive:
['Resource A', 'Resource B',...]
What is the best way to retrieve the names of these resources RESTfully?
Here are the ideas I have had and the problems I see with each approach:
The naive approach is to iterate through all IDs in my collection and do a 'GET /resource/:id' for each ID. This would be prohibitively slow and resource intensive because of the large number of HTTP calls I would have to make.
The next approach I thought of is to pass the IDs as parameters to a single GET call. The problem here is that most servers have a limit on the URL length, which would be quickly exceeded.
Next, I thought that putting the IDs in the body of a GET would work, but according to Roy Fielding, data in the GET body should not affect the results of a REST call: HTTP GET with request body
I could use a POST request and put the data on the POST body, but POST is intended for creating and modifying resources, which is not what I'm doing. Maybe I should ignore the intent of the verb and use it anyway?
I could split the request into multiple GET requests to avoid exceeding the max URL length. The problem here is that I have to combine the results after all calls have returned, which is potentially slow.
I could create a collection resource within my main resource by posting my list of IDs to 'POST /resource/collection', then use a 'GET /resource/collection/:id' call to retrieve the results. This actually works, but then I have to do a 'DELETE /resource/collection/:id' to clean up. It takes multiple calls, requires cleanup, and seems a bit clunky overall, so it's okay, but not ideal.
Is there a better way to do this?
Your last approach is RESTful and the one I recommend. I'd do this:
Step 1:
Request:
POST /resource/collection
Content-Tpye: application/json
{
"ids": [
"005fc983-fe41-43b5-8555-d9a2310719cd",
"4c6e6898-e519-4bac-b03e-e8873d3fa3f0"
]
}
Response:
201 Created
Location: /resource/collection/89AB8902-FDF1-11E4-ADDF-CD4FB664A5DC
Step 2:
Request:
GET /resource/collection/89AB8902-FDF1-11E4-ADDF-CD4FB664A5DC
Response:
200 OK
Content-Type: application/json
{
"resources": [ ... ]
}
but then I have to do a 'DELETE /resource/collection/:id' to clean up.
Not, that is not necessary. The server could implement a job that removes all collections that are older than a specific timestamp. It is not the client who has to do this.
If later a client access the collection again, the server would respond with
410 Gone
Related
I am using the REST API of Google Fit. I want to list sessions with the fitness.users.sessions.list method. This gives me a few dozen of results.
Now I would like to get more results and for this I set the pageToken to the value I got from the previous response. But the new results does not contain any data points, just yet another pageToken:
{
"session": [
],
"deletedSession": [
],
"nextPageToken": "1541027616563"
}
The same happens when I use the pagination function of the Google Python API Client: I iterate on results but never get any new data.
request = self.service.users().sessions().list(userId='me')
while request is not None:
response = request.execute()
for ds in response['session']:
yield ds
request = self.service.users().sessions().list_next(request, response)
I am sure there is much(!) more session data in Google Fit for my account. Am I missing something regarding pagination?
Thanks
I think that the description of the pageToken parameter is actually rather confusing in the documentation (this answer was written prior to the documentation being updated).
The continuation token, which is used to page through large result sets. To get the next page of results, set this parameter to the value of nextPageToken from the previous response.
This is conflating two concepts: continuation, and paging. There isn't actually any paging in the implementation of Users.sessions.
Sessions are indexed by their modification timestamp. There are two (or three, depending on how you count) ways to interact with the API:
Pass a start and/or end time. Omitted start and end times are taken to be the start and end of time respectively. In this case, you will get back all sessions falling between those times.
Pass neither start nor end times. In this case, you will receive all sessions between some time in the past and now. That time is:
pageToken, if provided
Otherwise, it's 7 days ago (this doesn't actually appear in the documentation, but it is the behavior)
In any of these cases, you receive a nextPageToken back which is just after the most recent session in the results. As such, nextPageToken is really a continuation token, because what it is saying is that you have been told about all sessions modified up to now: pass that token back to be told about anything modified between nextPageToken and "current time" to get updates.
As such, if you issue a request that fetches all sessions for the last 7 days (no start/end time, no page token) and get a nextPageToken, you will only get something back in a request using that nextPageToken if any sessions have been modified in between the first and second requests.
So, if you're making these requests in quick succession, it is expected that you won't see anything in the second response.
In terms of the validity of the startTime you were passing in your comment, that's a bug. RFC3339 defines that fractional seconds should be optional.
I'll see about getting that fixed; but in the interim, just make sure you pass a fractional number of seconds (even if it is just .0, e.g. 2018-10-18T00:00:00.0+00:00).
It may be because the format of the URL you're using is different from the example in the documentation.
You are using:
startTime=2018-10-18T00:00:00+00:00
Wherein the one in the documentation has it as:
startTime=2014-04-01T00:00:00.00Z
The documentation also stated that both startTime and endTime query parameters are required.
How to properly design REST if you have a composition? I have a TestResult entity, which has TestCaseResults entities. Both support full set of REST methods. The important fact about this (which I believe differs from many examples I found on a web) is that TestResult is not consistent if it doesn't have all of TestCaseResults How do I properly design this in REST?
Let's say I create it as separate but dependent resources: api\testresults\ and api\testresults\1\testcaseresults. When the client wants to create a test result, he needs to POST to api\testresults, then retrieve URL api\testresults\1\testcaseresutls by a link from the response, and POST all of test case results to it. This means that at some point in time the test result is not consistent until the user finishes its operation. Basically, there is no concept of the transaction here.
Let's say I create only api\testresults resource, and embed an array of test case results inside, like this:
{
"Name": "Test A"
"Results": [
{
"Measured": "BB",
...
},
...
]
...
}
Then it is easier to insert, but it still hard to work with. Simple GET to api\testresults\1\ will retrieve test result with a big amount of test case results. GET to api\testresults\ will retrieve much more! The structure of this becomes complex. Furthermore, in the real word I have a few entities like TestCaseResults belong to TestResults, so there will be a few arrays, and each could have 100-200 elements.
I could try to combine the approaches. Embed the array, but also provide links to api\testresults\1\testcaseresults and support operations there as well. Maybe on GET api\testresults\1\ I could provide TestResult without it's TestCaseResults but only with a link pointing to a resource, but on POST I could accept an array of TestCaseResults embedded (not sure though it is allowed to have different return types for POST and GET in REST) But now there are two approaches for inserting information, it is confusing and I'm still not sure it solves anything.
your approach with api\testresults\1 and api\testresults\1\testcaseresults seems promising.
As JSON does not have a fixed structure, you can add query parameters to your URL to control if results are inserted or not.
api\testresults\1?with_results=true would mean that your caller want to see the test cases in addition to the test results.
api\testresults\1\testcaseresults would still return the test case results for your test 1.
If you fear that the number of test case results is too large, you can add pagination parameters, that would be reuse in the testcaseresults call.
api\testresults\1?with_results=true&per_page=10 would include the only the 10 first results. To get more, use api\testresults\1\testcaseresults?per_page=10&page=2 and so on, as it is the dedicated endpoint.
Cheers
Note: if you want a flexible API still returning JSON data, you can give a look to GraphQL, the trendy approach.
I have a problem in deciding what to do in this case in REST API design.
here is my problem,
I have a resource domain model, which has a nested object, which is also a domain model.
you can imagine something like this
{
"name":"abc"
"type":{
"name":"typeName",
"description":"description"
}
}
Now, i want to be able to fetch the outer model resources, based on the inner model and few more params.
for example, i want to fetch all outer model resources which have a given type and some params like page number, size etc.
So my questions,
1.the API should accept inner model in post, and return outer model, is it a good rest design?
How do i pass the extra params? It's a POST, can't put them in url, and can't put them inner model.
Should i create a new model, which contains these extra params and the type resource also?
like
{
"page":"10",
"type":{
"name":"typeName",
"description":"description"
}
}
If you are making a generic HTTP service, it's acceptable to use POST to send a complex query, and to get a response.
If you are trying to be RESTful, then this is a bad practice. You have two options. Option 1 is to find a way to encode your query in the URL, so you can use a GET request.
Option 2 is more involved. I wouldn't necessarily say that I would suggest this, but it's a method to get around the constraints of REST while having complex queries.
The idea is that you use POST to create a 'query' resource. Almost as if you doing a server-side prepared statement, and then later on use GET to get the result of the query.
Example of the client->server conversation:
POST /queries
Content-Type: application/json
...
A response to this might be:
HTTP/1.1 201 Created
Location: /queries/1234
Link: </queryresults/1234>; rel="some-relationship-identifier"
Then after that you could do a GET on /queries/1234 to see the query you 'prepared' and a GET on /queryresults/1234 to see the actual report.
Benefits of this is that it stays within the constraints of REST, and that you could potentially re-use queries and take a longer time to generate the results.
The obvious drawback is that it's harder to explain this to a consumer of your API, as not everyone might be familiar with this pattern and it's an extra HTTP request.
So you have to decide:
Is it worth doing this?
Can you encode the query in the URI instead to avoid this altogether
Maybe you don't care enough about being RESTful and you might just want to break the rules and use POST for some complex queries.
This issue is bugging me for some time now. To test it I just installed a fresh Apigility, set the db (PDO:mysql) and added a DB-Connected service. In the table I have 40+ records. When I make a GET collection request the response looks OK (with the default HAL content negotiation). Then I change the content negotiation to JSON. Now when I make a GET collection request my response contains only 10 elements.
So my question is: where do I set/change this limit?
You can set the page size manually, like so:
$paginator = $this->getAlbumTable()->fetchAll(true);
// set the current page to what has been passed in query string, or to 1 if none set
$paginator->setCurrentPageNumber((int) $this->params()->fromQuery('page', 1));
// set the number of items per page to 10
$paginator->setItemCountPerPage(10);
http://framework.zend.com/manual/current/en/tutorials/tutorial.pagination.html
Could you please send the page_size, total_items part at the end of the json output?
it's like:
"page_count": 140002,
"page_size": 25,
"total_items": 3500035,
"page": 1
This is not an ideal fix, because it requires you to go into the source code rather than using the page size given in the UI.
The collection class that is auto generated for you by the DB-Connected style derives off of Zend/Paginator/Paginator. This class defines the $defaultItemCountPerPage static protected member which is defaulted to 10. That's why you're only getting 10 results. If you open up the auto-generated collection class for your entity and add: protected static $defaultItemCountPerPage = 100; in the otherwise empty class, you will see that you now get up to 100 results in the response. You can look at other Paginator class variables and methods that you could replace in your derived class to get your desired behavior.
This is not an ideal solution. I'd prefer that the generated code automatically used the same configed page size that the HalJson strategy uses. Maybe I'll contribute a PR to change that. Or, maybe I'll just use the HalJson approach. It does seem like the better way to go. You should have some limit to how much data you load in from the DB at a time to not have an overly long running query or an overly large collection of data coming back you have to deal with. And, whatever limit you set, what do you do when you hit that limit? With the simple Json method, you can't ever get "page 2" of data. So, if you are going to work with some sizeable amount of data, it might be better to use HalJson on and then have some logic on the client side to grab pages of data at a time as needed. The returned JSON structure is a little more complicated, but not terribly so.
I'm probably in the same spot you are -- I'm trying to do a simple little api to play with while keeping everything simple and so I didn't want the client to have to deal with the other stuff in HalJson, but probably better to deal with that complexity and have a smooth way to page through data if you're going to use this with some real set of data. At least, that's the pep talk I'm giving myself right now. :-)
I'm using the Drive API to list files from a collection which do not contain a certain string in their title.
My query looks something like this:
files().list(q="'xxxxx' in parents and not title contains 'toto'")
In my drive collection, I have 100 files, all contain the string "toto" in their title except for let's say 10 files.
I'm using pagination to retrieve the results 20 by 20, so I'm expecting to get only one page with the 10 files corresponding to my request. Surprisingly, the API returns 5 pages, with the first 4 having no results but with a nextToken page, and the files which are compliant with my request only come with the fifth page.
I'm still trying some use-cases here but it seems that it has something to do with the "not" operator. Like if the request was made without it, therefore returning 5 pages, but the results not corresponding to the request being removed from the response. It's very disturbing for me as I'm looking for the best performance here, and obviously having to make 5 requests to Drive instead of one single is not good for me. I'm also noticing that the results don't always come in the last page. I made the test with another collection, the results show up in the second page, but I still get 3 empty pages after that.
Am I missing something here ? Is this kind of behaviour "normal" ? I mean imagine if I had 1000 documents in my collection, having to make 50 requests to find only a few is not what I expect.
I have similar problem in files.list API. I tried to receive all three folders under root folder. I received result only on 342nd page. After several hours of researching I found some regularity in this strange behavior.
As I understood, the Drive API works in this way:
Detects something like index that best match your query
Selects first 20 records using index from step 1
Applies your filter: removes records that do not match your query
Rest is returned to you (maybe empty) with next page token.
The nextPageToken is looks like just OFFSET for the first record on next page in decided index, maybe it contains some information about query or index.
After base64 decode this token I found appropriate record number for next result in 121st position in decoded token.
Previously I built index of tokens using maxResults=1.
This is crazy, but I have no other explanation for observable behavior.
It is very useful for server because server do a very small work for search. From other side this algorithm must produce a lot of requests for pagenate whole list. But limitation for requests per second solve this problem.
Only You can do is pagenage and skip empty results. Do not forget about limitation of number of requests.
Do not try to find errors on your side. This is how Google Drive API works.
contains operator is working as a prefix matcher at the moment.title contains 'toto' will match "totolong" and "toto", but not "blahtoto".