Can using the Documents List API cause files to appear on the change list? - google-drive-api

My application is currently using the Document List API to track file and metadata changes using the Changelist. When we find a file has changed, we grab the metadata, the acl information, and the actual file. Lately we've found that we are getting a number a percentage of files that continually show up in the changelist every time we check.
After a bit of investigating, there is very little metadata that is changing in the file.
Here are examples from two different files that continually show up in the changelists.
Is there anyway I can avoid seeing these files over and over again? I have partially optimized to not download the files again, but it is still taking extra quite a bit of overhead to weed out false-positives from the changelist. Does anyone know if updating my app to use the Drive API will fix this issue?
Here is an example of what I'm seeing:
File 1 - Through the Documents List API
Initial Info
entry:etag=\""CkcaSU1LASt7ImBk"\"
id:...feeds/id/spreadsheet%3A0AgVqS9FfzZOCdGhZSVZ4UEtyT2tmRnZsR3lGNFBrVWc
published:2010-12-13T01:58:22.467Z
updated:2010-12-13T02:03:22.269Z
...
link:rel=\"thumbnail\" type=\"image/jpeg\" href=...?id=0AgVqS9FfzZOCdGhZSVZ4UEtyT2tmRnZsR3lGNFBrVWc&v=1&s=AMedNnoAAAAAUQHGlnP_b5jppjlFLN9OHRY5VSP2KZNR&sz=s220\"
...
/entry
Next Time I looked at the changelist
entry etag=\""CkUFR0sIQyt7ImBk"\"
id:...feeds/id/spreadsheet%3A0AgVqS9FfzZOCdGhZSVZ4UEtyT2tmRnZsR3lGNFBrVWc
published:2010-12-13T01:58:22.467Z
updated:2010-12-13T02:03:22.269Z
...
link:rel=\"thumbnail\" type=\"image/jpeg\" href=\"...?id=0AgVqS9FfzZOCdGhZSVZ4UEtyT2tmRnZsR3lGNFBrVWc&v=1&s=AMedNnoAAAAAUQMH4STQC7QSN1CJivPIl0U5KvMD8eKe&sz=s220\"
...
/entry
The only differences are the etag, updated time, and thumbnail image. The file itself did not change at all.
File 2 - This info I grabbed using the APIs explorer (using the DriveAPI 2 changes.get)
{
"kind": "drive#change",
"id": "21012",
"fileId": "0AgVqS9FfzZOCdGQyQUNjWkF0alVpNGd0WXNLMnpNU2c",
...
"thumbnailLink": ".../feeds/vt?gd=true&id=0AgVqS9FfzZOCdGQyQUNjWkF0alVpNGd0WXNLMnpNU2c&v=1&s=AMedNnoAAAAAUQlhSo3rF73K5WnN7E0qSR0uMhWEqM-t&sz=s220",
...
}
Ran through grabbing changes from the Documents List API, then checked the changelist again.
{
"kind": "drive#change",
"id": "21013",
"fileId": "0AgVqS9FfzZOCdGQyQUNjWkF0alVpNGd0WXNLMnpNU2c",
...
"thumbnailLink": ".../feeds/vt?gd=true&id=0AgVqS9FfzZOCdGQyQUNjWkF0alVpNGd0WXNLMnpNU2c&v=1&s=AMedNnoAAAAAUQlh69m8ZG_MzNujmmu80HN9XJ2jpG61&sz=s220",
...
}
In this case, the thumbnail link had again changed, and there was no longer a change with id 21012.

Related

Google Drive Rest API - How to check if file has changed

Is there a reliable way, short of comparing full contents, of checking if a file was updated/change in Drive?
I have been struggling with this for a bit. Here's the two things I have tried:
1. File version number
I upload a plain text file to Google Drive (simple upload, update endpoint), and save the version from the file metadata returned after a successful upload.
Then I poll the Drive API (get endpoint) occasionally to check if the version has changed.
The trouble is that within a second or two of uploading the file, the version gets bumped up again.
There are no changes to the file content. The file has not been opened, viewed, or even downloaded anywhere else. Still, the version number increases from what it was after the upload.
To my code this version number change indicates that the remote file has been changed in Drive, so it downloads the new version. Every time!
2. The Changes endpoints
As an alternative I tried using the Changes api.
After I upload the file, I get a page token using changes.getStartPageToken or changes.list.
Later I use this page token to poll the Changes API for changes, and filter the changes for the fileId of uploaded file. I use these options when polling for changes:
{
"includeRemoved": false
"restrictToMyDrive": true
"spaces": "drive"
}
Here again, there is the same problem as with the version number. The page token returned immediately after uploading the file changes again within a second or two. The new page token shows the uploaded file having been changed.
Again, there is no change to the content of the file. It hasn't been opened, updated, downloaded anywhere else. It isn't shared with anyone else.
Yet, a few seconds after uploading, the file reappears in the changes list.
As a result, the local code redownloads the file from Drive, assuming remote changes.
Possible workaround
As a hacky hook, I could wait a few seconds after the file upload before getting the new file-version/changes-page-token. This may take care of the delayed version increment issue.
However, there is no documentation of what is causing this phantom change in version number (or changes.list). So, I have no sure way of knowing:
How long a wait is safe enough to get a 'settled' version number without losing possible changes by other users/apps?
Whether the new (delayed) version number will be stable, or may change again at any time for no reason?
Is there a reliable way, short of comparing full contents, of checking if a file was updated/change in Drive?
You can try using the md5Checksum property of the File resource object, if your file is not a Google Doc file (ie. binary). You should be able to use that to track changes to the contents of your binary files.
You might also be able to use the Revisions API.
The Revisions resource object also has a md5Checksum property.
As a workaround, how about using Drive Activity API? I think that there are several answers for your situation. So please think of this as just one of them.
When Drive Activity API is used, the activity information about the target file can be retrieved. For example, from ActionDetail, you can see whether the target file was edited, renamed, deleted and so on.
The sample endpoint and request body are as follows.
Endpoint:
POST https://driveactivity.googleapis.com/v2/activity:query?fields=activities%2CnextPageToken
Request body:
{"itemName": "items/### fileId of target file ###"}
Response:
Sample response is as follows. You can see the information from this. The file with the fileId and filename was edited at the timestamp.
{
"activities": [
{
"primaryActionDetail": {
"edit": {} <--- If the target file was edited, this property is added.
},
"actors": [
{
"user": {
"knownUser": {
"personName": "people/### userId who edited the target file ###",
"isCurrentUser": true
}
}
}
],
"actions": [
{
"detail": {
"edit": {}
}
}
],
"targets": [
{
"driveItem": {
"name": "items/### fileId of target file ###",
"title": "### filename of target file ###",
"file": {},
"mimeType": "### mimeType of target file ###",
"owner": {
"user": {
"knownUser": {
"personName": "people/### owner's userId ###",
"isCurrentUser": true
}
}
}
}
}
],
"timestamp": "2000-01-01T00:00:0.000Z"
},
],
"nextPageToken": "###"
}
Note:
When you use this API in my environment, please enable Drive Activity API at API console and include https://www.googleapis.com/auth/drive.activity.readonly in the scopes.
Although when I used this API, I felt that the response was fast, if the response was slow when you use this, I apologize.
References:
Google Drive Activity API
ActionDetail
If this was not what you want, I apologize.
What you are seeing is the eventual consistency feature of the Google Drive filesystem. If you think about search, it doesn't matter how quickly a search index is updated, only that it is eventually updated and is very efficient for reading. Google Drive works on the same premise.
Drive acknowledges your updates as quickly as possible. Long before those updates have propagated to all worldwide copies of your file. Derived data (eg. timestamps and I think I recall, md5sums) are also calculated after the update has "completed".
The solution largely depends on how problematic the redundant syncs are to your app.
The delay of a few seconds is enough to deal with the vast majority of phantom updates.
You could switch to the v2 API and use etags.
You could implement your own version number using custom properties. So every time you sync up, you increment your own version number. You only sync down if the application version number has changed.

Why does my forge bucket not show any objects?

I have followed this tutorial and have uploaded my file successfully to: https://developer.api.autodesk.com/oss/v2/buckets/timmyisabucket/objects/audobon_arch.rvt
It has uploaded successfully and I can verify this by calling https://developer.api.autodesk.com/modelderivative/v2/designdata/dXJuOmFkc2sub2JqZWN0czpvcy5vYmplY3Q6dGltbXlpc2FidWNrZXQvYXVkb2Jvbl9hcmNoLnJ2dA==/metadata/c63a6682-a73c-a2a8-a08c-dfeee25781f4/properties which successfully returns all the object properties.
However, when I ask the api to list all the objects inside the bucket, it simply returns an empty list!
The endpoint I'm calling: https://developer.api.autodesk.com/oss/v2/buckets/timmyisabucket/objects
The response:
{
"items": []
}
Where am I going wrong?
Thanks
Just to close this off, the helpful comment by Xiaodong Liang led me to the the fact that my bucket type was created with the incorrect type of "Transient" meaning that all my stuff gets deleted after 24 hours.
It should have been Temporary or Persistent.
Retention policy
Transient
Think of this type of storage as a cache. Use it for
ephemeral results. For example, you might use this for objects that
are part of producing other persistent artifacts, but otherwise are
not required to be available later.
Objects older than 24 hours are removed automatically. Each upload of
an object is considered unique, so, for example, if the same rendering
is uploaded multiple times, each of them will have its own retention
period of 24 hours.
Temporary
This type of storage is suitable for artifacts produced for
user-uploaded content where after some period of activity, the user
may rarely access the artifacts.
When an object has reached 30 days of age, it is deleted.
Persistent
Persistent storage is intended for user data. When a file
is uploaded, the owner should expect this item to be available for as
long as the owner account is active, or until he or she deletes the
item.

Q: Google Drive API - Getting modified files

When requesting a list of files modified since a certain time, does anyone know how long it takes for the response to show files after they has been modified?
I modified a file a little after 6/24 at 12:30am (which was a few minutes ago). If I request a list of files that has been modified since 8:35pm on the previous day, the file shows up:
REQUEST (modifiedTime > "2017-06-23T21:30:00.000Z")
GET https://www.googleapis.com/drive/v3/files?corpora=teamDrive&includeTeamDriveItems=true&orderBy=modifiedTime+desc&q=(trashed+!%3D+true)+AND+(NOT+(mimeType+contains+%22.folder%22))+AND+(modifiedTime+%3E+%222017-06-23T21%3A30%3A00.000Z%22)&supportsTeamDrives=true&teamDriveId=0AF36YeSWsu3dUk9PVA&fields=files(name%2Cid%2CfileExtension%2CmimeType%2CcreatedTime%2CmodifiedTime%2Csize%2CimageMediaMetadata(height%2Cwidth)%2Cparents%2CwebContentLink%2CheadRevisionId)&key={YOUR_API_KEY}
RESPONSE
{
"files": [
{
"id": "1gc9ooedN1YNQkMHqFuI-keekHvuN9h57ssz8Dn8cpU0",
"name": "2017 Men's NCAA Wrap-Up",
"mimeType": "application/vnd.google-apps.spreadsheet",
"parents": [
"0B4jAnSzS-VxlLVpBQ21KMjVMSE0"
],
"createdTime": "2017-06-16T12:38:55.364Z",
"modifiedTime": "2017-06-24T00:31:46.251Z"
}
]
}
If I request a list of files that have been updated since 11:30pm on the previous day, it does not:
REQUEST (modifiedTime > "2017-06-23T23:30:00.000Z")
GET https://www.googleapis.com/drive/v3/files?corpora=teamDrive&includeTeamDriveItems=true&orderBy=modifiedTime+desc&q=(trashed+!%3D+true)+AND+(NOT+(mimeType+contains+%22.folder%22))+AND+(modifiedTime+%3E+%222017-06-23T23%3A30%3A00.000Z%22)&supportsTeamDrives=true&teamDriveId=0AF36YeSWsu3dUk9PVA&fields=files(name%2Cid%2CfileExtension%2CmimeType%2CcreatedTime%2CmodifiedTime%2Csize%2CimageMediaMetadata(height%2Cwidth)%2Cparents%2CwebContentLink%2CheadRevisionId)&key={YOUR_API_KEY}
RESPONSE
{
"files": [
]
}
Eventually the file will show up in the list, but it does not seem to be a matter of minutes (I stopped clicking refresh after 5 minutes). If I walk away for an hour or two, it shows up in the list. Interestingly enough, the modifiedTime on the file is immediately correct if the file is returned in the response (see the first response above). Is this a bug or should I expect to have to wait a certain period of time (and if so, how long) before the query returns the right results?
The answer I've found is that the time seems to vary. I have switched to using the drive.changes.list method instead of the drive.files.list method with the "q" parameter. Not only do changes appear sooner in the changes list, but you can actually see how long it was between the file "modifiedTime" and the change "time". I have seen it range from seconds up to 10-15 minutes.
The other observation I had was that if I close the file in the browser, the change immediately appears in the changes list. I guess Google auto saves the document at particular times. I can't find a way to force a save while the document is open. An explicit File | Save might be nice to have, but closing the window seems to do the trick.

Google Drive Objective C SDK error when asking for files list in root

I'm using the Google Drive objective C SDK to find (among other things) the list of files in the root directory of users GDrive. The vast majority of the time, this works beautifully. However, for some small percentage of users, I seem to hit some mysterious 500 error. I turned on logging and got this:
Response: status 200
Response headers:
<set of regular-looking headers>
Response body: (35 bytes)
{
"id" : "gtl_1",
"error" : {
"code" : 500
}
}
The error I get back in code from the SDK is:
Error Domain=com.google.GTLJSONRPCErrorDomain Code=500 "The operation couldn’t be completed. (com.google.GTLJSONRPCErrorDomain error 500.)" UserInfo=0xd8aa3c0 {GTLStructuredError=GTLErrorObject 0xb7abba0: {code:500}}
This seems to happen more with Google Apps for Education accounts than with regular #gmail accounts. When asking for files in the root directory, I use q='root' in parents, as Claudio suggests to do here. When I remove q='root' in parents from my query, everything seems to work fine (except for the fact that I get back a list of all files, instead of just the files in root).
Is this an issue with Google Apps for Education and root directories? Is there a way I can get around or fix this issue, while still only asking Google Drive only for the list of files that I actually want?
This issue may be related to this?

Box API 2.0 Uploading files with conflict name returns 200

After uploading a file with a name conflict with existing one, the server still responds with HTTP status code 201 Created. I had to parse the response body to know exactly whether it is really created or not. It sounds to me that I should be able to know the result of the operation just by the status code. So I am wondering if this is an intended behavior.
The following is the response I get
{
"total_count":1,
"entries":[
{
"type":"error",
"status":409,
"code":"item_name_in_use",
"context_info":{
"conflicts":[
{
"type":"file",
"id":"2990420477",
"sequence_id":"0",
"etag":"1f64ca909178de30bc682a4ca2d14444719cf9a2",
"name":"Extensions.pdf"
}
]
},
"help_url":"http:\/\/developers.box.com\/docs\/#errors",
"message":"Item with the same name already exists",
"request_id":"1389504407503c7c1e8183c"
}
]
}
We are in the process of changing this from a 200 to a 202. Later this week (or possibly tonight) we'll roll out a change to make upload statuses be 202's, to show that the upload request has been accepted. I'll post a bit more on our blog to explain more details.
The basic logic is that uploads can be sent in bulk, and the API call has to return you an array of upload statuses (stati?). If you only upload one file, you'll get an array of 1, and you'll have to dig into the array to see if you were successful or not. If you upload a group of files, then you'll be digging into the array to find out the status of each file.
You might ask: Why not collapse the status when there is only one file? Our thought there is that you'd have to implement 2 different code paths to deal with single vs bulk-upload, and it would be easier to just write the code once to handle uploads either way.
Hope that helps. Let us know if you see unexpected behavior after we flip the error code over from the 200 to the 202.