Algorithm to process all files and folders across multiple runs - google-apps-script

I note the new Docslist Token and get*ForPaging() options available now but I am still struggling with an algorithm to process "all files and folders" for arbitrarily large file/folder trees.
Assume a Google Drive based web file system with n files and folders. It will take multiple runs of 6 minutes to get through with a Google Apps Script. Nightly I need to process all files older than 30 days in trees of subfolders beneath a starting folder. I need to process each file once only (but my functions are idempotent so I don't mind if I run against files again).
I have my recursive algo working but the thing that I am missing is a way to have a placeholder so that I don't have to start at the top of the folder tree each time I invoke the script. In six minutes I get through only a few hundred folders and a few thousand files.
My question is what index can I store and how do I start where I left off the next time through?
I have thought about storing Tokens or the last completed folder path "/mytop/sub4/subsub47/" but how would that help me on another invocation? If I started there it would falsely just work down the tree from there and miss siblings and ancestor folders.
I have thought about the "find" methods and using a "before:2012/10..." style search but there's no way to limit that to files in my tree (only a single folder).
I am not pasting my code as it's just standard recursive getFolders/getFiles and not actually relevant to the core of the question.

I'd create an array of the folders that I have to work on and save it all for a future run.
Since you said it's no problem to work on some files/folders repeatedly, you don't even need to put a fake stop to your function. You can let it timeout every time.
Something like this:
var folders = null;
//call this to start the process or set the property manually
function start() {
folders = ['id-of-the-starting-folder'];
work();
}
//set this to run on the trigger
function work() {
if( folders == null )
folders = ScriptProperties.getProperty('folders').split(',');
while( folders.length > 0 ) {
workOnFolder(folders[0]);
folders.shift(); //remove the 1st element
ScriptProperties.setProperty('folders', folders.join());
}
//remove the trigger here
}
function doFolderLater(folder) {
folders.push(folder.getId());
}
function workOnFolder(id) {
var folder = DocsList.getFolderById(id);
folder.getFolders().forEach(doFolderLater);
folder.getFiles().forEach(workOnFile);
}
function workOnFile(file) {
//do your thing
}

Related

Server-error when DriveApp.getFolderById try to get (bigger) folder

I'm automating folder-creation for both projets and customers. The customer-folder has ~400 sub-folders and throws the same error. Exception: a servererror occured. wait and try again later (translated)
function test () {
DriveApp.getFolderById("28letterFolderIdxxxxxxxxxxxx")
}
The Apps Script is unable to access the main folder. It fails before doing anything fancy. I've tried multiple account; incl. owner.
I did a migrate a few folders (many hours prior) but are now starting to suspect that it's the size (360+ subfolder) that is the problem.
Drive-api works fine (get folder)
Same function (copy/paste) works in new script-file
Can script-files get corrupted? I rather not remake it, as it's a library, in use.
Solution: Script had a connected GCP-project where the Drive-scope hadn't been added. It also needed to be reinstalled.

How to log a the creation date of a google drive folder in a spreadsheets script?

This is my first google apps script and the original plan was to do something way more comlplicated, but I got stuck in the very beginning.
Now, I just want to log the creation date of my folder, and I can't.
When I press ctr+enter there's nothing to see in the logs window.
Why is this happening? I am not trying to build a rocket here...
Here's my code :
fetchFiles();
function fetchFiles() {
var folder = DriveApp.getFoldersByName("Aff Comm");
Logger.log(folder.next().getDateCreated());
}
Your code, just tested in my Script, works. But:
you need to save your script (extremely important). If there is the red asterisk a in the image below you have not saved your code
you need to select fetchFiles in the dropdown menu (the one in the red rectangle in the image below, if you did not save, the function may be unavailable in the dropdown menu)
at the first run you agree with the permission request from google
You may improve your code in this way:
function fetchFiles() {
var folders = DriveApp.getFoldersByName("Aff Comm");
while (folders.hasNext()) {
var folder = folders.next();
Logger.log(folder.getDateCreated());
}
}

files.list() returns an incomplete list when searching for q='FOLDER_ID' in parent

I am trying to gather all of the files and folders that are descendants of a given folder.
To do this I use file.list() with q="'FOLDER_ID' in parent" and trashed=false with FOLDER_ID being the ID of the folder I am interested in. As I process the results I keep track of all of the folders that get returned from this request and then repeat the files.list() call using the new folders in the q parameter. I combine multiple folders in one request by using or and continue to repeat this until no new folders are returned.
Example:
Initial Request: q="('FOLDER_ID' in parent) and trashed=false"
All Subsequent Requests: q="('FOLDER_ID_1' in parent or 'FOLDER_ID_2' in parent or 'FOLDER_ID_3' in parent ...) and trashed=false"
(For more information about how to create queries see Drive REST API - Search for Files)
Sometimes this returns all the folders it should and other times some are left out. This doesn't happen if I remove the q parameter as every single file and folder are returned, none are missing.
After some testing/trial and error, I discovered that if I am not receiving all the folders I should be, sending a request with no q seems to "fix" the problem. The next time I run my application and it uses q, all the correct folders do get returned.
Other Information:
It is not a permissions issue, I am using drive.readonly
It is not a pageSize issue as I have tried different values for this and get different results.
It is not a pageToken issue as I make sure to send a request again with the given nextPageToken when it exists.
I am running this on a folder that has a little under 4,000 descendant folders in it and a little under 25,000 descendant files in it.
I feel like this must be a bug related to using multiple folders in the q parameter in a single request, considering that I can perform the exact same process and will get different results seemingly randomly.
I suggest you abandon the approach you've taken. Making so many calls to Drive will take forever and possibly give you quota problems.
It's much, much simpler to simply fetch all the folders in a single query, and then build an in-memory hierarchy of the folder ID's you're interested in. Then run a second set of queries to fetch files with those parents.
Alternatively, if these files are being created by an application, make them all children of a common dummy parent folder that you can query against.
I found a similar issue when looking for all files a given user owns, eg:
'example.user#company.com' in owners and trashed=false
I have about 5000 files and usually I can iterate through all of them via pagination. Some days however (like today) I only get <100 results with the query above. When I rewrite my code to fetch files for a given parent-ID and then recursively iterate through the sub-folders, I will get all files. Afterwards the original query succeeds again as well.
It looks like some kind of caching issue on the google-drive server to me.

Using Google App Scripts to share documents

I have so many documents that I need to manage due to the amount of people that have access to them, and when someone needs to be added or removed, it can become a real pain and time consumer.
I'm curious, is there a script that you can set across multiple documents that give edit or view access.
I'm aware that there is this, "addEditor(emailAddress)", but from what I have gathered, you have to do a script for each document which defeats the purpose of productivity.
Basically, I need one script that gives access to a particular set of documents, and when I need someone removed, i just delete their name and run the script and removes edit/view access from those documents.
Ax example of a script, or rather what I'm trying to achieve. (Not actually a script):
//Human Resources//
addEditor(emailAddress) to BBM1 - Membership Tracker, "https://docs.google.com/spreadsheets/d/1gb8T1K74cRR_6qSyByqrtiphujrchceLP_QsMunoras/edit#gid=0"
//Administrator//
addEditor(emailAddress) to BBM1 - Membership Tracker, "https://docs.google.com/spreadsheets/d/1gb8T1K74cRR_6qSyByqrtiphujrchceLP_QsMunoras/edit#gid=0"
//Moderator//
addViewer(emailAddress) to BBM1 - Membership Tracker, "docs.google.com/spreadsheets/d/1gb8T1K74cRR_6qSyByqrtiphujrchceLP_QsMunoras/edit#gid=0"
So basically, I can just add emails to that, run the script, and it gives them edit access. But, I just don't know how to create a script that actually works for that and also to have a script across multiple documents.
Many Thanks,
Shaun.
Take a look on folder api. You can put all those files into a folder, and then everytime you run the script you can loop through all files in the folder and set correct permissions.
Example (untested):
var folderName = "My auto-managed documents";
var folderIterator = DriveApp.getFoldersByName(folderName);
while (folderIterator.hasNext()) {
var folder = folderIterator.next();
var fileIterator = folder.getFiles();
while (fileIterator.hasNext()) {
var file = fileIterator.next();
// Do things with the file:
// file.addEditor('my-email#example.com');
}
}

How can I get a Windows batch or Perl script to run when a file is added to a directory?

I am trying to write a script that will parse a local file and upload its contents to a MySQL database. Right now, I am thinking that a batch script that runs a Perl script would work, but am not sure if this is the best method of accomplishing this.
In addition, I would like this script to run immediately when the data file is added to a certain directory. Is this possible in Windows?
Thoughts? Feedback? I'm fairly new to Perl and Windows batch scripts, so any guidance would be appreciated.
You can use Win32::ChangeNotify. Your script will be notified when a file is added to the target directory.
Checking a folder for newly created files can be implemented using the WMI functionality. Namely, you can create a Perl script that subscribes to the __InstanceCreationEvent WMI event that traces the creation of the CIM_DirectoryContainsFile class instances. Once that kind of event is fired, you know a new file has been added to the folder and can process it as you need.
These articles provide more information on the subject and contain VBScript code samples (hope it won't be hard for you to convert them to Perl):
How Can I Automatically Run a Script Any Time a File is Added to a Folder?
WMI and File System Monitoring
The function you want is ReadDirectoryChangesW. A quick search for a perl wrapper yields this Win32::ReadDirectoryChanges module.
Your script would look something like this:
use Win32::ReadDirectoryChanges;
$rdc = new Win32::ReadDirectoryChanges(path => $path,
subtree => 1,
filter => $filter);
while(1) {
#results = $rdc->read_changes;
while (scalar #results) {
my ($action, $filename) = splice(#results, 0, 2);
... run script ...
}
}
You can easily achieve this in Perl using File::ChangeNotify. This module is to be found on CPAN: http://search.cpan.org/dist/File-ChangeNotify/lib/File/ChangeNotify.pm
You can run the code as a daemon or as a service, make it watch one or more directories and then automatically execute some code (or start up a script) if some condition matches.
Best of all, it's cross-platform, so should you want to switch to a Linux machine or a Mac, it would still work.
It wouldn't be too hard to put together a small C# application that uses the FileSystemWatcher class to detect files being added to a folder and then spawn the required script. It would certainly use less CPU / system resources / hard disk bandwidth than polling the folder at regular intervals.
You need to consider what is a sufficient heuristic for determining "modified".
In increasing order of cost and accuracy:
file size (file content can still be changed as long as size is maintained)
file timestamp (If you aren't running ntpd time is not monotonic)
file sha1sum (bulletproof but expensive)
I would run ntpd, and then loop over the timestamps, and then compare the checksum if the timestamp changes. This can cover a lot of ground in little time.
These methods are not appropriate for a computer security application, they are for file management on a sane system.