Pagination yields no results in Google Fit - google-fit

I am using the REST API of Google Fit. I want to list sessions with the fitness.users.sessions.list method. This gives me a few dozen of results.
Now I would like to get more results and for this I set the pageToken to the value I got from the previous response. But the new results does not contain any data points, just yet another pageToken:
{
"session": [
],
"deletedSession": [
],
"nextPageToken": "1541027616563"
}
The same happens when I use the pagination function of the Google Python API Client: I iterate on results but never get any new data.
request = self.service.users().sessions().list(userId='me')
while request is not None:
response = request.execute()
for ds in response['session']:
yield ds
request = self.service.users().sessions().list_next(request, response)
I am sure there is much(!) more session data in Google Fit for my account. Am I missing something regarding pagination?
Thanks

I think that the description of the pageToken parameter is actually rather confusing in the documentation (this answer was written prior to the documentation being updated).
The continuation token, which is used to page through large result sets. To get the next page of results, set this parameter to the value of nextPageToken from the previous response.
This is conflating two concepts: continuation, and paging. There isn't actually any paging in the implementation of Users.sessions.
Sessions are indexed by their modification timestamp. There are two (or three, depending on how you count) ways to interact with the API:
Pass a start and/or end time. Omitted start and end times are taken to be the start and end of time respectively. In this case, you will get back all sessions falling between those times.
Pass neither start nor end times. In this case, you will receive all sessions between some time in the past and now. That time is:
pageToken, if provided
Otherwise, it's 7 days ago (this doesn't actually appear in the documentation, but it is the behavior)
In any of these cases, you receive a nextPageToken back which is just after the most recent session in the results. As such, nextPageToken is really a continuation token, because what it is saying is that you have been told about all sessions modified up to now: pass that token back to be told about anything modified between nextPageToken and "current time" to get updates.
As such, if you issue a request that fetches all sessions for the last 7 days (no start/end time, no page token) and get a nextPageToken, you will only get something back in a request using that nextPageToken if any sessions have been modified in between the first and second requests.
So, if you're making these requests in quick succession, it is expected that you won't see anything in the second response.
In terms of the validity of the startTime you were passing in your comment, that's a bug. RFC3339 defines that fractional seconds should be optional.
I'll see about getting that fixed; but in the interim, just make sure you pass a fractional number of seconds (even if it is just .0, e.g. 2018-10-18T00:00:00.0+00:00).

It may be because the format of the URL you're using is different from the example in the documentation.
You are using:
startTime=2018-10-18T00:00:00+00:00
Wherein the one in the documentation has it as:
startTime=2014-04-01T00:00:00.00Z
The documentation also stated that both startTime and endTime query parameters are required.

Related

How to fix a query in functions within foundry which is hiting ObjectSet:PagingAboveConfiguredLimitNotAllowed?

I have phonorgraph object with billions of rows and we are querying it through object set service
for example, I want to get all DriverLicences from certain city.
#Function()
public getDriverLicences(city: string): ObjectSet<DriverLicences> {
let drivers = Objects.search().DriverLicences().filter(row => row.city.exactMatch(city));
return drivers ;
}
I am facing this error when I am trying query it from slate:
ERROR 400: {"errorCode":"INVALID_ARGUMENT","errorName":"ObjectSet:PagingAboveConfiguredLimitNotAllowed","errorInstanceId":"0000-000","parameters":{}}
I understand that I am probably retrieving more than 100 000 results but I need all the results because of the implemented logic in the front is a complex slate dashboard built by another team that we cannot re-factor.
The issue here is that, specifically in the Slate <> Function connector, there is a "translation layer" that serializes the contents of the object set and provides a response data structure that materializes the property:value pairs for each object in the set.
This clearly doesn't work for large object sets where throwing so much data into the browser is likely to overwhelm the resources allocated to the tab.
From context it seems like you might be migrating an existing Slate app over to Functions; in the current version, how is the query limiting the number of results returned? It certainly must not be returning several 100 thousand results for further processing on the front end? (And if so, that might be an anti-pattern to consider addressing).
As for options that you could currently explore, you can sort your object set and then specify a smaller limit to return:
Objects.search().DriverLicences().filter(row => row.city.exactMatch(city)).orderBy(date_of_issue).take(100)
You'll find a few more details in the Functions documentation Reference entry on Ontology API: Object Sets in the section on Ordering and limiting.
You can even make a work around for the (current) lack of paging when return an ObjectSet to Slate by using the last value from the property ordered on (i.e. date_of_issue) as a filter in the subsequent request and return the next N objects.
This can work if you need a Slate table or HTML widget that renders on set of results then, on a user action, gets the next page.

Request.post arguments in python

Is there a way to determine the max value of an argument in a requests.post command to a website if we don't know the amount of data in the website's dataset? I'm trying to execute the following code to get specific information on all daycares from this website, but don't know the value of the last argument (length). Currently, I'm assuming this value is 20, but it is subject to change from time to time. How do I keep it open ended so I don't have to guess the max value for lenth? Code as follows:
data_requested = requests.post("https://data.nj.gov/views/INLINE/rows.json?"
"accessType=WEBSITE&method=getByIds&asHashes=true&start=0&length=20",
json=data)
njcc_data = data_requested.json()
Notice that this has nothing to do with requests.post - the range of values length can take is determined by the creator of that API and is an unknown quantity both to you and to requests.
You can try to reason about what possible values it could take, is it the length of a person? If yes, it's probably not going to be more than 250cm.
You can also use trial and error and see how high you can make it before the API endpoint gives back an error, but I guess this is what you were trying to avoid.
If length is the number of items returned (the length of the returned json array) then you could just try setting it to a high number like 1000 and see if you can get away with it.

MediaWiki not returning continue parameter

I'm using this API request:
https://en.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=51.540951897949|-0.051086739997922&format=json&gslimit=50&continue=
which delivers 50 results. I want to use the 'continue' parameter to get the next page of results. According to the documentation I should get a continue field back in the results. I don't get any such result so can't get the next page.
Does anyone have any suggestions?
Dave, as #svick says, it seems list=geosearch (which is part of extension:GeoData) does not support continuation; indeed, it actually returns a "batchcomplete" element to indicate no more results (see in human-readable form).
I think you should either just get the maximum number of results (500 for users, 5000 for bots on Wikipedia), or if that's not satisfactory for your use case (which is?), pipe in at task T78703.
(Or, if you believe it to be a separate issue, report a new bug.

How to do a RESTful GET on an indefinite number of parameters?

I have a collection of IDs of RESTful resources (all the same type of resource), the number of which can be indefinitely large. I want to make a REST call to get the names of these resources. Something like this:
Send:
['005fc983-fe41-43b5-8555-d9a2310719cd', '4c6e6898-e519-4bac-b03e-e8873d3fa3f0',...]
Receive:
['Resource A', 'Resource B',...]
What is the best way to retrieve the names of these resources RESTfully?
Here are the ideas I have had and the problems I see with each approach:
The naive approach is to iterate through all IDs in my collection and do a 'GET /resource/:id' for each ID. This would be prohibitively slow and resource intensive because of the large number of HTTP calls I would have to make.
The next approach I thought of is to pass the IDs as parameters to a single GET call. The problem here is that most servers have a limit on the URL length, which would be quickly exceeded.
Next, I thought that putting the IDs in the body of a GET would work, but according to Roy Fielding, data in the GET body should not affect the results of a REST call: HTTP GET with request body
I could use a POST request and put the data on the POST body, but POST is intended for creating and modifying resources, which is not what I'm doing. Maybe I should ignore the intent of the verb and use it anyway?
I could split the request into multiple GET requests to avoid exceeding the max URL length. The problem here is that I have to combine the results after all calls have returned, which is potentially slow.
I could create a collection resource within my main resource by posting my list of IDs to 'POST /resource/collection', then use a 'GET /resource/collection/:id' call to retrieve the results. This actually works, but then I have to do a 'DELETE /resource/collection/:id' to clean up. It takes multiple calls, requires cleanup, and seems a bit clunky overall, so it's okay, but not ideal.
Is there a better way to do this?
Your last approach is RESTful and the one I recommend. I'd do this:
Step 1:
Request:
POST /resource/collection
Content-Tpye: application/json
{
"ids": [
"005fc983-fe41-43b5-8555-d9a2310719cd",
"4c6e6898-e519-4bac-b03e-e8873d3fa3f0"
]
}
Response:
201 Created
Location: /resource/collection/89AB8902-FDF1-11E4-ADDF-CD4FB664A5DC
Step 2:
Request:
GET /resource/collection/89AB8902-FDF1-11E4-ADDF-CD4FB664A5DC
Response:
200 OK
Content-Type: application/json
{
"resources": [ ... ]
}
but then I have to do a 'DELETE /resource/collection/:id' to clean up.
Not, that is not necessary. The server could implement a job that removes all collections that are older than a specific timestamp. It is not the client who has to do this.
If later a client access the collection again, the server would respond with
410 Gone

Drive API files.list query with 'not' parameter returns empty pages

I'm using the Drive API to list files from a collection which do not contain a certain string in their title.
My query looks something like this:
files().list(q="'xxxxx' in parents and not title contains 'toto'")
In my drive collection, I have 100 files, all contain the string "toto" in their title except for let's say 10 files.
I'm using pagination to retrieve the results 20 by 20, so I'm expecting to get only one page with the 10 files corresponding to my request. Surprisingly, the API returns 5 pages, with the first 4 having no results but with a nextToken page, and the files which are compliant with my request only come with the fifth page.
I'm still trying some use-cases here but it seems that it has something to do with the "not" operator. Like if the request was made without it, therefore returning 5 pages, but the results not corresponding to the request being removed from the response. It's very disturbing for me as I'm looking for the best performance here, and obviously having to make 5 requests to Drive instead of one single is not good for me. I'm also noticing that the results don't always come in the last page. I made the test with another collection, the results show up in the second page, but I still get 3 empty pages after that.
Am I missing something here ? Is this kind of behaviour "normal" ? I mean imagine if I had 1000 documents in my collection, having to make 50 requests to find only a few is not what I expect.
I have similar problem in files.list API. I tried to receive all three folders under root folder. I received result only on 342nd page. After several hours of researching I found some regularity in this strange behavior.
As I understood, the Drive API works in this way:
Detects something like index that best match your query
Selects first 20 records using index from step 1
Applies your filter: removes records that do not match your query
Rest is returned to you (maybe empty) with next page token.
The nextPageToken is looks like just OFFSET for the first record on next page in decided index, maybe it contains some information about query or index.
After base64 decode this token I found appropriate record number for next result in 121st position in decoded token.
Previously I built index of tokens using maxResults=1.
This is crazy, but I have no other explanation for observable behavior.
It is very useful for server because server do a very small work for search. From other side this algorithm must produce a lot of requests for pagenate whole list. But limitation for requests per second solve this problem.
Only You can do is pagenage and skip empty results. Do not forget about limitation of number of requests.
Do not try to find errors on your side. This is how Google Drive API works.
contains operator is working as a prefix matcher at the moment.title contains 'toto' will match "totolong" and "toto", but not "blahtoto".