List request page size being ignored - google-drive-api

I've been working with the drive SDK for the last few days and previously I was able to request 1000 items on a page using listrequest. Now I'm only getting 100 files no matter how high I set the parameter, but if I set it below 100 it will return whatever number. Did they decide to change the max results overnight or something?

I'm facing the same problem so I've been doing some tests and this is what I've found:
If I try files.list with fields: "files, nextPageToken", I get 100 files only.
If I try files.list with fields: "files(name, id, etc, parents), nextPageToken", I get 460 files.
If I try files.list with fields: "files(name, id, etc), nextPageToken" without parents field, I get 1000 files.
If I try files.list with fields: "files(parents), nextPageToken", I get 460 files.
So it seems that depends on how many fields you request and also if one of the fields requested is parents field.

You've misunderstood what pagesize does. It is a maximum value, not an absolute value. You should always iterate list results until nextPageToken is null.

Ok never mind I figured this out. I was requesting all the metadata of each file, which seems to limit the request to 100. I was able to get 1000 per page by requesting only three fields.
EDIT by pinoyyid: This is not the answer. There is nothing that you can do to guarantee that the page will have exactly pageSize items.

Related

How to Retrieve Large URL Json Data Set?

I am trying to obtain a data set via json and url, using SODA's API. The issue is that the data set is greater then 50K, and I need to sort the data set using multiple keys. Sorting by multiple keys is not something that is permitted by SODA's API. The question is how could I get around that?
Example (This table is small, but for illustrative purposes I have included it):
https://data.medicare.gov/resource/apyc-v239.json?$Limit=1000&$Order=measure_id
but once I attempt to add state to the order the API errors out.
The data set above is only 3800 recs, however there are other datasets with over 250000 recs, which require the same approach - sorting, then paging through the results...
Any assistance would be greatly appreciated.
Try the new API endpoint for that dataset: http://dev.socrata.com/foundry/#/data.medicare.gov/apyc-v239
It'll allow you to sort by multiple columns and there's no maximum for $limit on the new endpoints, so you can do stuff like this:
https://data.medicare.gov/resource/q7p2-jxeh.json?$order=state,measure_name&$limit=100000000

Google Bigquery json API, pageToken has no effect

I'm trying to implement the JSON api (v2) of bigquery. In my code I get the same behaviour as on the documentation page for tabledata-list
My table size is about 11.000 rows. In the documentation page I fill in the following parameters:
ProjectId = X
DatasetId = Y
TableId = Z
MaxResults = 10000 #I want to paginate my results
This returns 10.000 rows and a pageToken. So I do the same request and now I set the page token so that I get the next page of results.
And that returns the same 10.000 rows as before. I expected this to do pagination as described on this page:
All collection.list methods return paginated results under certain circumstances. The number of results per page is controlled by the maxResults property
A page is a subset of the total number of rows. If your results are more than one page of data, the result data will have a nextPageToken property. To retrieve the next page of results, make another list call and include the token value as a URL parameter named pageToken.
Where do I go wrong?
EDIT:
My colleague pointed out to me that on the other documentation pages the result contains a nextPageToken except the response contains a pageToken. The difference being that where pageToken refers to the current page, the nextPageToken refers to the next page.
However the documentation states it should return a nextPageToken (except when there is no more data). But len(table) > len(result)
On the same page it's mentioned that there is a difference for TableData.List() call
The bigquery.tabledata.list method, which is used to page through
table data, uses a row offset value or a page token.
So for TableData.List() you must use the row offset value to paginate, and in order to access previous pages you can use your hashes from your session. This is built because with large volume and big data, you cannot pre-cache the next set of data from your worker pool.
You can help improving the documentation, by using the link on top right of each page that says: Feedback on this document feel free to use that to reach out with improvements.
Also you can submit issues to https://code.google.com/p/google-bigquery/issues/list
Unfortunately, the field returned for TableData.List() that contains the logical "next page token" is literally named "pageToken", rather than "nextPageToken".
Other APIs, like Datasets.List(), return a field literally named "nextPageToken" which contains the logical "next page token".
It's a case of inconsistent naming, but hopefully this helps clear up some confusion.

largestChangeId smaller than the last item's id?

I moved three files in My Drive to the trash. Then retrieved changes with Changes: list API (https://developers.google.com/drive/v2/reference/changes/list). It returned three changes, with ids 11607, 11608 and 11609. However, the largestChangeId field was 11608. When I made the API request again with startChangeId: 11608, it returned the two last changes. When I made the API request with startChangeId: 11609, it returned no result.
Is it expected? Or relying on change ids in such a way is not right?
In your case, it looks like you're exposed to a bug. I'm adding details of the changes list and how it should work in normal circumstances. Let's assume your latest change id was 11606 and you made three trashing operations.
GET changes?startChangeId=11607 should list:
11607
11608
11609
And the next time you are requesting changes.list, you should always increment the latest latestChangeId by 1.
In this case you need to request the following on your next poll.
GET changes?startChangeID=11610 and it will be returning an empty list.

largestChangeId being returned inconsistent?

I'm using the drive change feed to find changes of files in Google Drive. I have been experiencing some inconsistency. I just tested using the API Explorer found here :
https://developers.google.com/drive/v2/reference/changes/list
On one user the largestChangeId returned was 1208, although when setting the startChangeId to 1208, the request returned 6 entries, including the change of 1208, and each other change being higher than 1208. I also have the flags includeDeleted = true, includeSubscribed = false.
I then tried on another user, the returned ID was 136023. When using this value with the same params as above Drive returned 0 entries, which I expected.
Then I tried on a third user, the returned ID was 8267. Then setting the call again with the same params as above Drive returned 1 entries, which had the same change id of 8267.
Note all three accounts are testing accounts on a test domain in which no documents are being modified or shared.
So three users three results making the same call. Is this expected or a better question, what should I expect when making this call. In the documentation it states : " Change ID to start listing changes from. (string)". When it says from and you are returned the highest ID I would expect 0 results to be returned.
Thanks.

Clarification on maxResults and nextPageToken using Google Drive API v2

I just wanted clarification with regard to the Files: list feature of the Google Drive API here:
https://developers.google.com/drive/v2/reference/files/list
What is the the maximum value that can be specified with maxResults? I assume this value calculates the number of results on the next page of results?
Also, is the nextPageToken simply part of the query string that's required to be passed with nextLink to get the next page of results?
Thanks!
The maxResults query parameter can be used to limit (or increase) the number of items returned in a list request. There is a default value and a hard limit that is set by our server.
Unfortunately, we don't usually document those numbers as they can easily change and recommend developers to look for a nextPageToken and/or nextLink in the resulting collection to know whether or not all items have been returned.
The nextPageToken attribute is to be used as the pageToken query parameter on a list request. If you are using the nextLink from the resulting collection, you do not need to specify the pageToken query parameter as it should already be included.
maxResults cannot be larger than 1000 according to this page: https://developers.google.com/drive/v2/reference/files/list