Google Drive API /files slow response - google-drive-api

I want to ask for help/ideas on the issue I will describe below.
Our iOS app allows users to access their Google Drive files.
We use Changes API (https://developers.google.com/drive/api/v3/reference/changes). The main pre-condition to using this API is to build a local DB that holds the snapshot of the user's Drive file tree and the token. To initially fill the DB we must request the list of all files from user's Drive. Getting the list of all files (with metadata) takes too long for many of our users. This is the issue I want to address.
We request files with the series of Files requests (https://developers.google.com/drive/api/v3/reference/files/list). Most requests are plain files?q=trashed%20%3D%20false.
For example, at my own private Google Drive:
69K files
initial request of all files takes 5+ minutes with my current network speed (Download 527 Mbps, Upload 417 Mbps; ping www.googleapis.com – 40–45 ms)
~150 requests
each request brings information about ~460 files
each request takes around 2-2.5 seconds
Sometimes I observed requests to take up to 6 seconds, which means that getting all files list took 15 minutes at my account.
If I look at the Developer Console, the latency is below 0.1s
Many of our users have Drives far bigger than mine. Standard iOS app user's session is not long enough to complete the initial request. We do save every intermediate page token so that all data received during single app session is not lost if user leaves the app – next session we will keep downloading data from the last saved token. But still there're some cases when our app needs the DB to be filled out with data before starting some operations – in that case our users see "Pending..." progress and they complain that our app is slow.
So, questions:
is it possible to improve the described request speed/latency?
maybe there's some quota that we are missing and it can be changed?
maybe someone can advice a more effective way of getting all files list?
P.S. We could potentially reduce the amount of requests. We have to perform some double checks for Shared with Me folders as we observed that sometimes request of all files doesn't list all files from Shared folders. That's a bit of a side story, and I don't think this will dramatically improve situation for us. I can provide more details on the actual set of requests we perform if necessary.

Are you returning all the fields - I would assume so since the only query param provided is trashed=false as the query param. Do you need all the fields? Can you try to reduce the query to only return the fields you really care about (using a field mask) and see if that improves your performance?

Related

Autodesk Forge Data Management API. Too many requests on search

We use Forge Data Management API to access project documents.
We execute next set of requests:
project/v1/hubs to get hubs (hubId)
project/v1/hubs/:hubId/projects to get projects (projectId and rootFolder)
data/v1/projects/:projectId/folders/:folderId/search?filter[fileType]=rvt,nwc,nwd to get files filtered by extension
On search we often get Too Many Requests status code.
We have about 20 files in the project.
Sometimes this request is processed instantly, sometimes 8 seconds, sometimes 20 seconds.
Response body:
{"jsonapi":{"version":"1.0"},"errors":[{"id":"GUID","status":"429","detail":"Too Many Requests"}]}
On this page we can see that there is a limit 300 requests/minute. But we get Too Many Requests status after exceeding not more that 10 requests/minute.
Also here we see that response must contain Retry-after header. But the is no such header.
So i have next questions:
What is the best practice to search project files by the file extension?
Why we get Too Many Requests even if we don't exceed the limit?
What should we do for not getting Too Many Requests status?
Why the is no Retry-after header in the Too Many Requests response?

Best way to refresh the thumbnail/base url of Google Photos/Google Drive API

Google suggests to retrieve the base url again when needed after 60 minutes after the origin query because the url's expire.
So far, so good. But what if I'm developing a photo gallery and I'm displaying 5000 or them in a grid? Should I query the API again and again? They use a maximum page size of 100 (instead pf 1000 for google drive), so we're starting many requests if that's true.
I'm already caching the photos locally, but when the user scrolls to another section, the url will be expired after one hour.
What is the best solution for that?
Google has a batch request for that use case :
https://developers.google.com/photos/library/reference/rest/v1/mediaItems/batchGet
Anyway the maximum number of items you can request per call is 50, so you should anyway queue some of those requests.
Personally I made it automatic in my code. When the baseUrl i'm loading gives me a 403, it automatically get the mediaitem updated object and retry.
Also you should not cache base urls or generally mediaobjects within different app launches (assuming you're writing an app). You should retain only the mediaItem id and reload it or the entire collection when needed.

Google Drive Rest API - Create File - Quota

I have a program where I have to copy about 500,000 files onto google drive to different folders. I use the google drive v3 nodejs api. I issue about 2 uploads per second (every 450ms). After a while, I get ECONNRESET or socket hang up from API.
When I look at the quota on the console.cloud.google.com. I am nowhere near my quota. Why is it failing?
For kicks, I have tried google filestream and it has no problems pushing into the drive under my user account. It's about 5 times faster.
Did anyone run into this problem?
I think your quota per se is not the problem here. This is happening when you're writing too much data within a short time frame. Try to slow it down and try to shard the requests across different user accounts. This should help with the heavy lifting of the many requests you are performing. Also, don't forget to implement exponential backoff for 4xx error retries. My two cents.
This does happen when I call passing a stream. There is no warning in the developers.google.com but there is a warning at their github repository.
You can also upload media by specifying media.body as a Readable stream. This can allow you to upload very large files that cannot fit into memory.
Note: Your readable stream may be unstable. Use at your own risk.
Once I have changed it not to use the streams, I started getting the proper error message such as status code 403, going over your rate limit.
I simply changed my code to use a straight buffer. Buffer is read via fs.readFileSync before the call.
media: {
mimeType: 'text/plain',
body: buf
}

Google Drive multiple files download

We have a client-server architecture that uses Google Drive for sharing files between the client and the server, without having to actually send them.
The client uses the Google Drive API to get a list of file IDs of all files it wants to share with the server.
The server then downloads the files with the appropriate authorization token.
Server response time is crucial for user experience.
We tried a few approaches:
First, we used the webContentLink. This worked until we started receiving large files from the client. Instead of getting the files' content, we got an html with a warning "exceeds the maximum size that Google can scan". We could not find a header we can use to skip this check.
Second, we switched to the Google API resource URL with the alt=media query param. This works, but we then hit API quota errors (User Rate Limit Exceeded). Since this is server code, it was identified as a single user for all requests.
Then we added the quotaUser param to represent on behalf of which user each request is. We still got many 403 responses.
In addition, we implemented exponential backoff for the failed requests.
We also added a cache for the successful requests.
Our current solution is a combination of the two. Using the webContentLink whenever possible (which appears not to affect the Google API quota). If the response is not as expected, (i.e. an html, wrong size, etc.), we try the Google API resource URL (with exponential backoff).
(Most of the files are small enough to not exceed the scan size limit)
Both client and server uses the same OAuth 2.0 client ID.
Here are my questions:
1. Is it possible to skip the virus scan, so that all files can be downloaded using the webContentLink?
2. Is the size threshold for the virus scan documented? Assuming we know the file size we can then save the round-trip of the first request (using the webContentLink)
3. Is there anything else we can do other than applying for a higher quota?
Is it possible to skip the virus scan, so that all files can be downloaded using the webContentLink?
If it is greater than 25MB it is not possible with webContentLink but since you are using authorized request use files.get with alt=media. Apply appropriate error handling options (which you have done using exponential backoff). The next step would be checking if you code is optimized then after checking and applied recommended optimization and still received Error 403 Limit Exceed, time to apply for a higher quota.
Is the size threshold for the virus scan documented? Assuming we know the file size we can then save the round-trip of the first request (using the webContentLink)
To answer this, you can refer to the Google Drive Help Forum : How can I successfully download large files from google drive without network errors at the most end of the download:
Only files smaller than 25 MB can be scanned for viruses.
Is there anything else we can do other than applying for a higher quota?
You can do the following before applying for a higher quota:
Performance Tips
Drive Platform Best Practices
Handling API Errors
After all optimization is done, the only option is to apply for higher quota limit.
Hope this helps!

Add analytics to a desktop application

I have developed a desktop application using HTML 5 and node web-kit .
I would like to track parts of the app , such as how long its used , clicks ect.
I would like the analytics system to work both on and offline (storing data until its on-line).
Is there anything that I could use to do this?
The Google measurement protocol allows you to track everything that can send an http request. You need to generate a unqiue client id to group pageviews into session (the part is usually done by the Javascript tracker which does not help you) and can then choose between various interaction types and their related data to be added as parameters in a request to the Google Analytics server.
As far as offline capabilites, there is a "queue time" parameter that allows you to send delayed calls to GA. However as per documentation that delay is 4 hours at most (intended for Smartphones and Tablets that temporarily lose connection rather than to work permanently offline).
In the end it depends what data you need - you might just as well send calls to your own server and log them in a csv file and feed that to Klipfolio or some other dashboard solution (or even use Excel if you expect a low data volume).