Q: Google Drive API - Getting modified files - google-drive-api

When requesting a list of files modified since a certain time, does anyone know how long it takes for the response to show files after they has been modified?
I modified a file a little after 6/24 at 12:30am (which was a few minutes ago). If I request a list of files that has been modified since 8:35pm on the previous day, the file shows up:
REQUEST (modifiedTime > "2017-06-23T21:30:00.000Z")
GET https://www.googleapis.com/drive/v3/files?corpora=teamDrive&includeTeamDriveItems=true&orderBy=modifiedTime+desc&q=(trashed+!%3D+true)+AND+(NOT+(mimeType+contains+%22.folder%22))+AND+(modifiedTime+%3E+%222017-06-23T21%3A30%3A00.000Z%22)&supportsTeamDrives=true&teamDriveId=0AF36YeSWsu3dUk9PVA&fields=files(name%2Cid%2CfileExtension%2CmimeType%2CcreatedTime%2CmodifiedTime%2Csize%2CimageMediaMetadata(height%2Cwidth)%2Cparents%2CwebContentLink%2CheadRevisionId)&key={YOUR_API_KEY}
RESPONSE
{
"files": [
{
"id": "1gc9ooedN1YNQkMHqFuI-keekHvuN9h57ssz8Dn8cpU0",
"name": "2017 Men's NCAA Wrap-Up",
"mimeType": "application/vnd.google-apps.spreadsheet",
"parents": [
"0B4jAnSzS-VxlLVpBQ21KMjVMSE0"
],
"createdTime": "2017-06-16T12:38:55.364Z",
"modifiedTime": "2017-06-24T00:31:46.251Z"
}
]
}
If I request a list of files that have been updated since 11:30pm on the previous day, it does not:
REQUEST (modifiedTime > "2017-06-23T23:30:00.000Z")
GET https://www.googleapis.com/drive/v3/files?corpora=teamDrive&includeTeamDriveItems=true&orderBy=modifiedTime+desc&q=(trashed+!%3D+true)+AND+(NOT+(mimeType+contains+%22.folder%22))+AND+(modifiedTime+%3E+%222017-06-23T23%3A30%3A00.000Z%22)&supportsTeamDrives=true&teamDriveId=0AF36YeSWsu3dUk9PVA&fields=files(name%2Cid%2CfileExtension%2CmimeType%2CcreatedTime%2CmodifiedTime%2Csize%2CimageMediaMetadata(height%2Cwidth)%2Cparents%2CwebContentLink%2CheadRevisionId)&key={YOUR_API_KEY}
RESPONSE
{
"files": [
]
}
Eventually the file will show up in the list, but it does not seem to be a matter of minutes (I stopped clicking refresh after 5 minutes). If I walk away for an hour or two, it shows up in the list. Interestingly enough, the modifiedTime on the file is immediately correct if the file is returned in the response (see the first response above). Is this a bug or should I expect to have to wait a certain period of time (and if so, how long) before the query returns the right results?

The answer I've found is that the time seems to vary. I have switched to using the drive.changes.list method instead of the drive.files.list method with the "q" parameter. Not only do changes appear sooner in the changes list, but you can actually see how long it was between the file "modifiedTime" and the change "time". I have seen it range from seconds up to 10-15 minutes.
The other observation I had was that if I close the file in the browser, the change immediately appears in the changes list. I guess Google auto saves the document at particular times. I can't find a way to force a save while the document is open. An explicit File | Save might be nice to have, but closing the window seems to do the trick.

Related

Google Drive Rest API - How to check if file has changed

Is there a reliable way, short of comparing full contents, of checking if a file was updated/change in Drive?
I have been struggling with this for a bit. Here's the two things I have tried:
1. File version number
I upload a plain text file to Google Drive (simple upload, update endpoint), and save the version from the file metadata returned after a successful upload.
Then I poll the Drive API (get endpoint) occasionally to check if the version has changed.
The trouble is that within a second or two of uploading the file, the version gets bumped up again.
There are no changes to the file content. The file has not been opened, viewed, or even downloaded anywhere else. Still, the version number increases from what it was after the upload.
To my code this version number change indicates that the remote file has been changed in Drive, so it downloads the new version. Every time!
2. The Changes endpoints
As an alternative I tried using the Changes api.
After I upload the file, I get a page token using changes.getStartPageToken or changes.list.
Later I use this page token to poll the Changes API for changes, and filter the changes for the fileId of uploaded file. I use these options when polling for changes:
{
"includeRemoved": false
"restrictToMyDrive": true
"spaces": "drive"
}
Here again, there is the same problem as with the version number. The page token returned immediately after uploading the file changes again within a second or two. The new page token shows the uploaded file having been changed.
Again, there is no change to the content of the file. It hasn't been opened, updated, downloaded anywhere else. It isn't shared with anyone else.
Yet, a few seconds after uploading, the file reappears in the changes list.
As a result, the local code redownloads the file from Drive, assuming remote changes.
Possible workaround
As a hacky hook, I could wait a few seconds after the file upload before getting the new file-version/changes-page-token. This may take care of the delayed version increment issue.
However, there is no documentation of what is causing this phantom change in version number (or changes.list). So, I have no sure way of knowing:
How long a wait is safe enough to get a 'settled' version number without losing possible changes by other users/apps?
Whether the new (delayed) version number will be stable, or may change again at any time for no reason?
Is there a reliable way, short of comparing full contents, of checking if a file was updated/change in Drive?
You can try using the md5Checksum property of the File resource object, if your file is not a Google Doc file (ie. binary). You should be able to use that to track changes to the contents of your binary files.
You might also be able to use the Revisions API.
The Revisions resource object also has a md5Checksum property.
As a workaround, how about using Drive Activity API? I think that there are several answers for your situation. So please think of this as just one of them.
When Drive Activity API is used, the activity information about the target file can be retrieved. For example, from ActionDetail, you can see whether the target file was edited, renamed, deleted and so on.
The sample endpoint and request body are as follows.
Endpoint:
POST https://driveactivity.googleapis.com/v2/activity:query?fields=activities%2CnextPageToken
Request body:
{"itemName": "items/### fileId of target file ###"}
Response:
Sample response is as follows. You can see the information from this. The file with the fileId and filename was edited at the timestamp.
{
"activities": [
{
"primaryActionDetail": {
"edit": {} <--- If the target file was edited, this property is added.
},
"actors": [
{
"user": {
"knownUser": {
"personName": "people/### userId who edited the target file ###",
"isCurrentUser": true
}
}
}
],
"actions": [
{
"detail": {
"edit": {}
}
}
],
"targets": [
{
"driveItem": {
"name": "items/### fileId of target file ###",
"title": "### filename of target file ###",
"file": {},
"mimeType": "### mimeType of target file ###",
"owner": {
"user": {
"knownUser": {
"personName": "people/### owner's userId ###",
"isCurrentUser": true
}
}
}
}
}
],
"timestamp": "2000-01-01T00:00:0.000Z"
},
],
"nextPageToken": "###"
}
Note:
When you use this API in my environment, please enable Drive Activity API at API console and include https://www.googleapis.com/auth/drive.activity.readonly in the scopes.
Although when I used this API, I felt that the response was fast, if the response was slow when you use this, I apologize.
References:
Google Drive Activity API
ActionDetail
If this was not what you want, I apologize.
What you are seeing is the eventual consistency feature of the Google Drive filesystem. If you think about search, it doesn't matter how quickly a search index is updated, only that it is eventually updated and is very efficient for reading. Google Drive works on the same premise.
Drive acknowledges your updates as quickly as possible. Long before those updates have propagated to all worldwide copies of your file. Derived data (eg. timestamps and I think I recall, md5sums) are also calculated after the update has "completed".
The solution largely depends on how problematic the redundant syncs are to your app.
The delay of a few seconds is enough to deal with the vast majority of phantom updates.
You could switch to the v2 API and use etags.
You could implement your own version number using custom properties. So every time you sync up, you increment your own version number. You only sync down if the application version number has changed.

Forge convertion to obj only returning svf

I'm following the step-by-step instructions Extract Geometry tutorial , and everything seems to work fine, except when I check the manifest after posting the job, it always returns the manifest for the initial conversion to SVF.
The tutorial specifically states that you must convert to SVF first. This takes a few seconds to a few minutes, starting at 0% and going until 100%. I await completion, and when I post the second job with the following payload (verifying that the payload is as requested)
let objPayload = {
"input": {
"urn": job.urn # urn retrieved from the file upload / svf conversion
},
"output": {
"formats": [
{
"type": "obj"
, "advanced": {
"modelGuid": metaData[0].guid,
"objectIds": [-1]
}
}]
}
}
( where metaData[0].guid is the provided guid from Step 1's call to /modelderivative/v2/designdata/${urn}/metadata)
, then the job actually starts at about 99%. It sometimes takes a few moments to complete, but when it does, the call to retrieve the manifest returns the previous manifest where the output format is marked at "svf".
The POST Job page states that
Derivatives are stored in a manifest that is updated each time this endpoint is used on a source file.
So I would expect the the returned manifest to be updated to return the requested 'obj'. But it is not.
What am I missing here?
As Cyrille pointed out, the translate job only works consistently when translating to SVF. If translating to OBJ, you can only do so from specific formats, listed in this table.
At the time of this writing, if you request a job outside that table (eg IFC->OBJ), it will still accept your job, and simply not do it. So if you're following the "Extract Geometry" tutorial, when you request the manifest, it is still pointing to the original SVF translation.

Drive API files.list returning nextPageToken with empty item results

In the last week or so we got a report of a user missing files in the file list in our app. We we're a bit confused at first because they said they only had a couple files that matched our query string, but with a bit of work we were able to reproduce their issue by adding a large number of files to our Google Drive. Previously we had been assuming people would have less than 100 files and hadn't been doing paging to avoid multiple files.list requests.
After switching to use paging, we noticed that on one of our test accounts was sending hundreds and hundreds of files.list requests and most of the responses did not contain any files but did contain a nextPageToken. I'll update as soon as I can get a screenshot - but the client was sending enough requests to heat the computer up and drain battery fairly quickly.
We also found that based on what the query is even though it matches the same files it can have a drastic effect of the number of requests needed to retrieve our full file list. For example, switching '=' to 'contains' in the query param significantly reduces the number of requests made, but we don't see any guarantee that this is a reasonable and generalizeable solution.
Is this the intended behavior? Is there anything we can do to reduce the number of requests that we are sending?
We're using the following code to retrieve files created by our app that is causing the issue.
runLoad: function(pageToken)
{
gapi.client.drive.files.list(
{
'maxResults': 999,
'pageToken': pageToken,
'q': "trashed=false and mimeType='" + mime + "'"
}).execute(function (results)
{
this.filePageRequests++;
if (results.error || !results.nextPageToken || this.filePageRequests >= MAX_FILE_PAGE_REQUESTS)
{
this.isLoading(false);
}
else
{
this.runLoad(results.nextPageToken);
}
}.bind(this));
}
It is, but probably shouldn't be, the correct behaviour.
It generally occurs when using the drive.file scope. What (I think) is happening is that the API layer is fetching all files, and then removing those that are outside of the current scope/query, and returning the remainder to your client app. In theory, a particular page of files could have no files in-scope, and so the returned array is empty.
As you've seen, it's a horribly inefficient way of doing it, but that seems to be the way it is. You simply have to keep following the next page link until it's null.
As to "Is there anything we can do to reduce the number of requests that we are sending?"
You're already setting max results to 999 which is the obvious step. Just be aware that I have seen this value trigger internal errors (timeouts?) which manifest themselves as 500 errors. You might want to sacrifice efficiency for reliability and stick to the default of 100 which seems to be better tested.
I don't know if the code you posted is your actual code, or just a simplified illustration, but you need to make sure you are dealing with 401 errors (auth expiry) and 500 errors (sometimes recoverable with a retry)

Can using the Documents List API cause files to appear on the change list?

My application is currently using the Document List API to track file and metadata changes using the Changelist. When we find a file has changed, we grab the metadata, the acl information, and the actual file. Lately we've found that we are getting a number a percentage of files that continually show up in the changelist every time we check.
After a bit of investigating, there is very little metadata that is changing in the file.
Here are examples from two different files that continually show up in the changelists.
Is there anyway I can avoid seeing these files over and over again? I have partially optimized to not download the files again, but it is still taking extra quite a bit of overhead to weed out false-positives from the changelist. Does anyone know if updating my app to use the Drive API will fix this issue?
Here is an example of what I'm seeing:
File 1 - Through the Documents List API
Initial Info
entry:etag=\""CkcaSU1LASt7ImBk"\"
id:...feeds/id/spreadsheet%3A0AgVqS9FfzZOCdGhZSVZ4UEtyT2tmRnZsR3lGNFBrVWc
published:2010-12-13T01:58:22.467Z
updated:2010-12-13T02:03:22.269Z
...
link:rel=\"thumbnail\" type=\"image/jpeg\" href=...?id=0AgVqS9FfzZOCdGhZSVZ4UEtyT2tmRnZsR3lGNFBrVWc&v=1&s=AMedNnoAAAAAUQHGlnP_b5jppjlFLN9OHRY5VSP2KZNR&sz=s220\"
...
/entry
Next Time I looked at the changelist
entry etag=\""CkUFR0sIQyt7ImBk"\"
id:...feeds/id/spreadsheet%3A0AgVqS9FfzZOCdGhZSVZ4UEtyT2tmRnZsR3lGNFBrVWc
published:2010-12-13T01:58:22.467Z
updated:2010-12-13T02:03:22.269Z
...
link:rel=\"thumbnail\" type=\"image/jpeg\" href=\"...?id=0AgVqS9FfzZOCdGhZSVZ4UEtyT2tmRnZsR3lGNFBrVWc&v=1&s=AMedNnoAAAAAUQMH4STQC7QSN1CJivPIl0U5KvMD8eKe&sz=s220\"
...
/entry
The only differences are the etag, updated time, and thumbnail image. The file itself did not change at all.
File 2 - This info I grabbed using the APIs explorer (using the DriveAPI 2 changes.get)
{
"kind": "drive#change",
"id": "21012",
"fileId": "0AgVqS9FfzZOCdGQyQUNjWkF0alVpNGd0WXNLMnpNU2c",
...
"thumbnailLink": ".../feeds/vt?gd=true&id=0AgVqS9FfzZOCdGQyQUNjWkF0alVpNGd0WXNLMnpNU2c&v=1&s=AMedNnoAAAAAUQlhSo3rF73K5WnN7E0qSR0uMhWEqM-t&sz=s220",
...
}
Ran through grabbing changes from the Documents List API, then checked the changelist again.
{
"kind": "drive#change",
"id": "21013",
"fileId": "0AgVqS9FfzZOCdGQyQUNjWkF0alVpNGd0WXNLMnpNU2c",
...
"thumbnailLink": ".../feeds/vt?gd=true&id=0AgVqS9FfzZOCdGQyQUNjWkF0alVpNGd0WXNLMnpNU2c&v=1&s=AMedNnoAAAAAUQlh69m8ZG_MzNujmmu80HN9XJ2jpG61&sz=s220",
...
}
In this case, the thumbnail link had again changed, and there was no longer a change with id 21012.

Box API 2.0 Uploading files with conflict name returns 200

After uploading a file with a name conflict with existing one, the server still responds with HTTP status code 201 Created. I had to parse the response body to know exactly whether it is really created or not. It sounds to me that I should be able to know the result of the operation just by the status code. So I am wondering if this is an intended behavior.
The following is the response I get
{
"total_count":1,
"entries":[
{
"type":"error",
"status":409,
"code":"item_name_in_use",
"context_info":{
"conflicts":[
{
"type":"file",
"id":"2990420477",
"sequence_id":"0",
"etag":"1f64ca909178de30bc682a4ca2d14444719cf9a2",
"name":"Extensions.pdf"
}
]
},
"help_url":"http:\/\/developers.box.com\/docs\/#errors",
"message":"Item with the same name already exists",
"request_id":"1389504407503c7c1e8183c"
}
]
}
We are in the process of changing this from a 200 to a 202. Later this week (or possibly tonight) we'll roll out a change to make upload statuses be 202's, to show that the upload request has been accepted. I'll post a bit more on our blog to explain more details.
The basic logic is that uploads can be sent in bulk, and the API call has to return you an array of upload statuses (stati?). If you only upload one file, you'll get an array of 1, and you'll have to dig into the array to see if you were successful or not. If you upload a group of files, then you'll be digging into the array to find out the status of each file.
You might ask: Why not collapse the status when there is only one file? Our thought there is that you'd have to implement 2 different code paths to deal with single vs bulk-upload, and it would be easier to just write the code once to handle uploads either way.
Hope that helps. Let us know if you see unexpected behavior after we flip the error code over from the 200 to the 202.