I've written a script to iterate through a large number of files in a Google Drive folder. Due to the processing I am doing on those files it exceeds the maximum execution time. Naturally I wrote into the script to use DriveApp.continueFileIterator(continuationToken): the token gets stored in the Project Properties and when the script runs it checks to see if there's a token, if there is it creates the FileIterator from the token if not it starts afresh.
What have I found is even though the script restarts with the continuation token it still starts from the beginning of the iteration, trying to process the same files again which wastes time for the subsequent executions. Have I missed something vital as in a command or method to make it start from where it left off? Am I supposed to update the continuation token at various stages thoughout the while(contents.hasNext()) loop?
Here's the sample code slimmed down to give you an idea:
function listFilesInFolder() {
var id= '0fOlDeRiDg';
var scriptProperties = PropertiesService.getScriptProperties();
var continuationToken = scriptProperties.getProperty('IMPORT_ALL_FILES_CONTINUATION_TOKEN');
var lastExecution = scriptProperties.getProperty('LAST_EXECUTION');
if (continuationToken == null) {
// first time execution, get all files from drive folder
var folder = DriveApp.getFolderById(id);
var contents = folder.getFiles();
// get the token and store it in a project property
var continuationToken = contents.getContinuationToken();
scriptProperties.setProperty('IMPORT_ALL_FILES_CONTINUATION_TOKEN', continuationToken);
} else {
// we continue to import from where we left
var contents = DriveApp.continueFileIterator(continuationToken);
}
var file;
var fileID;
var name;
var dateCreated;
while(contents.hasNext()) {
file = contents.next();
fileID = file.getId();
name = file.getName();
dateCreated = file.getDateCreated();
if(dateCreated > lastExecution) {
processFiles(fileID);
}
}
// Finished processing files so delete continuation token
scriptProperties.deleteProperty('IMPORT_ALL_FILES_CONTINUATION_TOKEN');
var currentExecution = Utilities.formatDate(new Date(), "GMT", "yyyy-MM-dd HH:mm:ss");
scriptProperties.setProperty('LAST_EXECUTION',currentExecution);
};
Like Jonathon said, you're comparing dates wrongly. But that's not the main issue with your script nor what you asked.
The main concept you're getting wrong is that the continuation token can't be saved before you do your loop. When you get the token, it saves where you were at that moment, if you continue iterating afterwards, that's not saved and you will repeat those steps later, just like you're experiencing.
To get the token later you cannot let your script terminate with an error. You have to measure how many files you can process under 5 minutes and stop your script manually before that, so you can have a chance at saving the token.
Here's the correct way of doing it:
function listFilesInFolder() {
var MAX_FILES = 20; //use a safe value, don't be greedy
var id = 'folder-id';
var scriptProperties = PropertiesService.getScriptProperties();
var lastExecution = scriptProperties.getProperty('LAST_EXECUTION');
if( lastExecution === null )
lastExecution = '';
var continuationToken = scriptProperties.getProperty('IMPORT_ALL_FILES_CONTINUATION_TOKEN');
var iterator = continuationToken == null ?
DriveApp.getFolderById(id).getFiles() : DriveApp.continueFileIterator(continuationToken);
try {
for( var i = 0; i < MAX_FILES && iterator.hasNext(); ++i ) {
var file = iterator.next();
var dateCreated = formatDate(file.getDateCreated());
if(dateCreated > lastExecution)
processFile(file);
}
} catch(err) {
Logger.log(err);
}
if( iterator.hasNext() ) {
scriptProperties.setProperty('IMPORT_ALL_FILES_CONTINUATION_TOKEN', iterator.getContinuationToken());
} else { // Finished processing files so delete continuation token
scriptProperties.deleteProperty('IMPORT_ALL_FILES_CONTINUATION_TOKEN');
scriptProperties.setProperty('LAST_EXECUTION', formatDate(new Date()));
}
}
function formatDate(date) { return Utilities.formatDate(date, "GMT", "yyyy-MM-dd HH:mm:ss"); }
function processFile(file) {
var id = file.getId();
var name = file.getName();
//your processing...
Logger.log(name);
}
Anyway, it may be possible that a file gets created between your runs and you do not get it on your continued-iteration. Then, by saving the execution time after your the last run, you may miss it on your next run too. I do not know your use-case, if it's acceptable to eventually reprocess some files or to miss some. If you can't have either situations at all, then the only solution I see is to save the ids of all files you have already processed. You may need to store those on a drive file, because PropertiesService may be too small for too many ids.
your date comparison won't work in the way you have it.
var currentExecution = Utilities.formatDate(new Date(), "GMT", "yyyy-MM-dd HH:mm:ss");
will store "2014-04-18 08:32:01" whereas the file date file.getDateCreated() will return a Date object comparing these using either < or > will always return false.
so I'd suggest that you store the time as a timestamp (because you cant store Date objects) and then compare that to the timestamp of the file created date.
// stored time stamp
var lastExecution = scriptProperties.getProperty('LAST_EXECUTION');
…
dateCreated = file.getDateCreated().getTime();
…
var currentExecution = new Date().getTime();
scriptProperties.setProperty('LAST_EXECUTION',currentExecution);
that comparison will work as you expect it.
Related
I've written a script to iterate through a large number of files in a Google Drive folder. Due to the processing I am doing on those files it exceeds the maximum execution time. Naturally I wrote into the script to use DriveApp.continueFileIterator(continuationToken): the token gets stored in the Project Properties and when the script runs it checks to see if there's a token, if there is it creates the FileIterator from the token if not it starts afresh.
What have I found is even though the script restarts with the continuation token it still starts from the beginning of the iteration, trying to process the same files again which wastes time for the subsequent executions. Have I missed something vital as in a command or method to make it start from where it left off? Am I supposed to update the continuation token at various stages thoughout the while(contents.hasNext()) loop?
Here's the sample code slimmed down to give you an idea:
function listFilesInFolder() {
var id= '0fOlDeRiDg';
var scriptProperties = PropertiesService.getScriptProperties();
var continuationToken = scriptProperties.getProperty('IMPORT_ALL_FILES_CONTINUATION_TOKEN');
var lastExecution = scriptProperties.getProperty('LAST_EXECUTION');
if (continuationToken == null) {
// first time execution, get all files from drive folder
var folder = DriveApp.getFolderById(id);
var contents = folder.getFiles();
// get the token and store it in a project property
var continuationToken = contents.getContinuationToken();
scriptProperties.setProperty('IMPORT_ALL_FILES_CONTINUATION_TOKEN', continuationToken);
} else {
// we continue to import from where we left
var contents = DriveApp.continueFileIterator(continuationToken);
}
var file;
var fileID;
var name;
var dateCreated;
while(contents.hasNext()) {
file = contents.next();
fileID = file.getId();
name = file.getName();
dateCreated = file.getDateCreated();
if(dateCreated > lastExecution) {
processFiles(fileID);
}
}
// Finished processing files so delete continuation token
scriptProperties.deleteProperty('IMPORT_ALL_FILES_CONTINUATION_TOKEN');
var currentExecution = Utilities.formatDate(new Date(), "GMT", "yyyy-MM-dd HH:mm:ss");
scriptProperties.setProperty('LAST_EXECUTION',currentExecution);
};
Like Jonathon said, you're comparing dates wrongly. But that's not the main issue with your script nor what you asked.
The main concept you're getting wrong is that the continuation token can't be saved before you do your loop. When you get the token, it saves where you were at that moment, if you continue iterating afterwards, that's not saved and you will repeat those steps later, just like you're experiencing.
To get the token later you cannot let your script terminate with an error. You have to measure how many files you can process under 5 minutes and stop your script manually before that, so you can have a chance at saving the token.
Here's the correct way of doing it:
function listFilesInFolder() {
var MAX_FILES = 20; //use a safe value, don't be greedy
var id = 'folder-id';
var scriptProperties = PropertiesService.getScriptProperties();
var lastExecution = scriptProperties.getProperty('LAST_EXECUTION');
if( lastExecution === null )
lastExecution = '';
var continuationToken = scriptProperties.getProperty('IMPORT_ALL_FILES_CONTINUATION_TOKEN');
var iterator = continuationToken == null ?
DriveApp.getFolderById(id).getFiles() : DriveApp.continueFileIterator(continuationToken);
try {
for( var i = 0; i < MAX_FILES && iterator.hasNext(); ++i ) {
var file = iterator.next();
var dateCreated = formatDate(file.getDateCreated());
if(dateCreated > lastExecution)
processFile(file);
}
} catch(err) {
Logger.log(err);
}
if( iterator.hasNext() ) {
scriptProperties.setProperty('IMPORT_ALL_FILES_CONTINUATION_TOKEN', iterator.getContinuationToken());
} else { // Finished processing files so delete continuation token
scriptProperties.deleteProperty('IMPORT_ALL_FILES_CONTINUATION_TOKEN');
scriptProperties.setProperty('LAST_EXECUTION', formatDate(new Date()));
}
}
function formatDate(date) { return Utilities.formatDate(date, "GMT", "yyyy-MM-dd HH:mm:ss"); }
function processFile(file) {
var id = file.getId();
var name = file.getName();
//your processing...
Logger.log(name);
}
Anyway, it may be possible that a file gets created between your runs and you do not get it on your continued-iteration. Then, by saving the execution time after your the last run, you may miss it on your next run too. I do not know your use-case, if it's acceptable to eventually reprocess some files or to miss some. If you can't have either situations at all, then the only solution I see is to save the ids of all files you have already processed. You may need to store those on a drive file, because PropertiesService may be too small for too many ids.
your date comparison won't work in the way you have it.
var currentExecution = Utilities.formatDate(new Date(), "GMT", "yyyy-MM-dd HH:mm:ss");
will store "2014-04-18 08:32:01" whereas the file date file.getDateCreated() will return a Date object comparing these using either < or > will always return false.
so I'd suggest that you store the time as a timestamp (because you cant store Date objects) and then compare that to the timestamp of the file created date.
// stored time stamp
var lastExecution = scriptProperties.getProperty('LAST_EXECUTION');
…
dateCreated = file.getDateCreated().getTime();
…
var currentExecution = new Date().getTime();
scriptProperties.setProperty('LAST_EXECUTION',currentExecution);
that comparison will work as you expect it.
I was wondering: is it even possible to use Logger.Log in Google Apps Script to log different string to be posted to a spreadsheet?
I have the following code:
var ss = SpreadsheetApp.openByUrl("spreadsheet url");
var sheet = ss.getSheetByName("spreadsheet sheet");
var DocNumber = e.parameter.DocNumber;
var folderId = "Folder ID 1";
var lastFileUrl = getLatestFile(folderId); // just a function that retrieves url of latest file in the folder
Logger.log(lastFileUrl);
var addUrl = sheet.getRange(1,2,sheet.getLastRow(),1);
var fileURL = "https://drive.google.com/uc?export=view&id="+lastFileUrl;
var folderId2 = "Folder ID 2";
var lastFileUrl2 = getLatestFile(folderId2); // same as above
Logger.log(lastFileUrl2);
var addUrl2 = sheet.getRange(1,3,sheet.getLastRow(),1);
var fileURL2 = "https://drive.google.com/uc?export=view&id="+lastFileUrl2;
sheet.appendRow([DocNumber,fileURL,fileURL2]);
}
When this get posted to the spreadsheet, it only posts the second url (fileURL2) - I assume because the last value in the log is this. But I was hoping to post both URL into the spreadsheet.
I tried setting it as a var first as well:
var URL2 = Logger.log(lastFileURL2);
but then the posted value will be https://drive.google.com/uc?export=view&id=Logger
I also tried using appendRow before the second URL logging but it still only takes the second url and disregard the first url.
Therefore, I was curios whether this is even possible at all?
And if not, what's the best way to achieve this without using Logger.log?
Spreadsheet output:
URL1 and URL2 is the URL from Google Drive folder.
Also, forgot to mention, I'm using the script as a Web App, used by an android app. Posting files into the Drive folder is okay, the only problem is fetching the links of the files in different folders.
These are the codes I used to get the latest file url from my folders:
function getLatestFile(folderId) {
var files = DriveApp.getFolderById("Folder_1_ID").getFiles();
var fileObj = [];
while (files.hasNext()) {
var file = files.next();
fileObj.push({id: file.getId(), date: file.getDateCreated()});
}
fileObj.sort(function(a, b) {return new Date(b.date) - new Date(a.date)});
return fileObj[0].id;
}
function getLatestFile(folderId2) {
var files2 = DriveApp.getFolderById("Folder_2_ID").getFiles();
var fileObj2 = [];
while (files2.hasNext()) {
var file2 = files2.next();
fileObj2.push({id: file2.getId(), date: file2.getDateCreated()});
}
fileObj2.sort(function(a, b) {return new Date(b.date) - new Date(a.date)});
return fileObj2[0].id;
}
Problem
Having two functions declared under the same name
Solution
Step by step:
Remove one of the functions (they are identical in terms in usage)
Make the remaining one use the parameter passed in it:
function getLatestFile(folderId) {
var files = DriveApp.getFolderById(folderId).getFiles();
var fileObj = [];
while (files.hasNext()) {
var file = files.next();
fileObj.push({id: file.getId(), date: file.getDateCreated()});
}
fileObj.sort(function(a, b) {return new Date(b.date) - new Date(a.date)});
return fileObj[0].id;
}
Change Logger to console - as of recently, all logs are sent to Stackdriver service, and thus there is no benefit in using Logger (besides by using console you make script more portable).
Commentary
What happens when you declare two or more functions under same name? Normally, the last one declared gets executed (basically, second declaration overwrites the first):
function clone(original) {
return `I am the clone of ${original}`;
}
function clone(cloned) {
return `I am a clone of ${cloned}'s clone`;
}
const elem = document.querySelector("#cloned");
elem.textContent = clone("Gary");
<h2 id="cloned"></h2>
I receive a new CSV file every hour in my Google Drive.
I need my spreadsheet updated with the data in the latest CSV file after it has been received in the Google Drive folder.
The files coming into the folder has a unique name for each new one according to date and time.
For example: FileName_date_24hourtime.csv
FileName_20190524_1800.csv then FileName_20190524_1900.csv etc.
Firstly I'm not sure what the best approach is:
simply with a formula (probably not possible with not knowing the exact filename?) like =IMPORTDATA
a google script to find latest .csv file and automatically import as soon as file was added to Google Drive folder
Any assistance will be great!
The .csv file:
.csv file contains 28 rows and data should be split by ;
.csv file looks like this:
NAME;-63.06;-58.08;50.62;-66.67;-80.00
NAME;-61.82;-56.83;-50.55;-77.78;-70.00
NAME;-57.77;-50.21;52.88;-77.78;-70.00
NAME1;-57.69;-61.48;-55.59;-55.56;-60.00
NAME2;-61.62;-53.79;50.34;-66.67;-70.00
NAME3;-54.62;-54.57;-52.22;55.56;-60.00
... with total of 28 rows
Data should go to "Import_Stats" sheet.
The best approach here would be a script with a trigger that runs a function that performs data import to a spreadsheet.
Create a time-based trigger with 1-hour offset:
function trigger() {
var trg = ScriptApp.newTrigger('checkFiles');
trg.timeBased().everyHours(1).create();
}
Create function that checks files in a folder (e.g. "checkFiles").
function checkFiles(alreadyWaited) {
//get spreadsheet and sheet;
var id = 'yourSpreadsheetId';
var ss = SpreadsheetApp.openById(id);
var sh = ss.getSheetByName('Import_Stats');
var folderId = 'yourIdHere'; //folder by id is the simplest way;
//get folder and files in it;
var folder = DriveApp.getFolderById(folderId);
var files = folder.getFilesByType('text/csv');
var filesImport = folder.getFilesByType('text/csv'); //fetch files again;
//try to fetch number of files;
var scriptProps = PropertiesService.getScriptProperties();
var numFiles = scriptProps.getProperty('numFiles');
//count current number of files;
var numCurr = 0;
while(files.hasNext()) {
var f = files.next();
numCurr++;
}
//if this is the first time, save current number;
if(numFiles===null) {
scriptProps.setProperty('numFiles',numCurr);
}else {
numFiles = parseInt(numFiles);
}
if(numFiles===null||numFiles===(numCurr-1)) {
//get today and reset everything except hours;
var today = new Date();
today.setMinutes(0);
today.setSeconds(0);
today.setMilliseconds(0);
//iterate over files;
while(files.hasNext()) {
var file = files.next();
//get file creation date and reset;
var created = file.getDateCreated();
created.setMinutes(0);
created.setSeconds(0);
created.setMilliseconds(0);
//calculate offset, equals 0 for each file created this hour;
var offset = today.valueOf()-created.valueOf();
if(offset===0) {
//perform data import here;
var data = file.getBlob().getDataAsString();
//ignore empty files;
if(data!=='') {
//split data in rows;
var arr = data.split('\r\n');
//resplit array if only one row;
if(arr.length===1) {
arr = data.split('\n');
}
//append rows with data to sheet;
arr.forEach(function(el){
el = el.split(';');
sh.appendRow(el);
});
}
}
}
}else {
//if never waited, set minute to wait, else add minute;
if(!alreadyWaited) {
alreadyWaited = 60000;
}else {
alreadyWaited += alreadyWaited;
}
//if waited for 10 minutes -> end recursion;
if(alreadyWaited===600000) {
Logger.log('Waited 10 minutes but recieved no files!');
return;
}
//wait a minute and recheck;
Utilities.sleep(60000);
return checkFiles(alreadyWaited);
}
}
And this is what should happen:
Because of Drive API Quotas, Services Quotas and limit of script execution time 6 min it's often critical to split Google Drive files manipulations on chunks.
We can use PropertiesService to store continuationToken for FolderIterator or FileIterator.
This way we can stop our script and on next run continue from the place we stop.
Working example (linear iterator)
// Logs the name of every file in the User's Drive
// this is useful as the script may take more that 5 minutes (max execution time)
var userProperties = PropertiesService.getUserProperties();
var continuationToken = userProperties.getProperty('CONTINUATION_TOKEN');
var start = new Date();
var end = new Date();
var maxTime = 1000*60*4.5; // Max safe time, 4.5 mins
if (continuationToken == null) {
// firt time execution, get all files from Drive
var files = DriveApp.getFiles();
} else {
// not the first time, pick up where we left off
var files = DriveApp.continueFileIterator(continuationToken);
}
while (files.hasNext() && end.getTime() - start.getTime() <= maxTime) {
var file = files.next();
Logger.log(file.getName());
end = new Date();
}
// Save your place by setting the token in your user properties
if(files.hasNext()){
var continuationToken = files.getContinuationToken();
userProperties.setProperty('CONTINUATION_TOKEN', continuationToken);
} else {
// Delete the token
PropertiesService.getUserProperties().deleteProperty('CONTINUATION_TOKEN');
}
Problem (recursive iterator)
For retrieve tree-like structure of folder and get it's files we have to use recursive function. Somethiong like this:
doFolders(DriveApp.getFolderById('root folder id'));
// recursive iteration
function doFolders(parentFolder) {
var childFolders = parentFolder.getFolders();
while(childFolders.hasNext()) {
var child = childFolders.next();
// do something with folder
// go subfolders
doFolders(child);
}
}
However, in this case I have no idea how to use continuationToken.
Question
How to use ContinuationToken with recursive folder iterator, when we need to go throw all folder structure?
Assumption
Is it make sense to construct many tokens with name based on the id of each parent folder?
If you're trying to recursively iterate on a folder and want to use continuation tokens (as is probably required for large folders), you'll need a data structure that can store multiple sets of continuation tokens. Both for files and folders, but also for each folder in the current hierarchy.
The simplest data structure would be an array of objects.
Here is a solution that gives you the template for creating a function that can recursively process files and store continuation tokens so it can resume if it times out.
Simply modify MAX_RUNNING_TIME_MS to your desired value (now it's set to 1 minute).
You don't want to set it more than ~4.9 minutes as the script could timeout before then and not store its current state.
Update the processFile method to do whatever you want on files.
Finally, call processRootFolder() and pass it a Folder. It'll be smart enough to know how to resume processing the folder.
Sure there is room for improvement (e.g. it simply checks the folder name to see if it's a resume vs. a restart) but this will most likely be sufficient for 95% of people that need to iterate recursively on a folder with continuation tokens.
function processRootFolder(rootFolder) {
var MAX_RUNNING_TIME_MS = 1 * 60 * 1000;
var RECURSIVE_ITERATOR_KEY = "RECURSIVE_ITERATOR_KEY";
var startTime = (new Date()).getTime();
// [{folderName: String, fileIteratorContinuationToken: String?, folderIteratorContinuationToken: String}]
var recursiveIterator = JSON.parse(PropertiesService.getDocumentProperties().getProperty(RECURSIVE_ITERATOR_KEY));
if (recursiveIterator !== null) {
// verify that it's actually for the same folder
if (rootFolder.getName() !== recursiveIterator[0].folderName) {
console.warn("Looks like this is a new folder. Clearing out the old iterator.");
recursiveIterator = null;
} else {
console.info("Resuming session.");
}
}
if (recursiveIterator === null) {
console.info("Starting new session.");
recursiveIterator = [];
recursiveIterator.push(makeIterationFromFolder(rootFolder));
}
while (recursiveIterator.length > 0) {
recursiveIterator = nextIteration(recursiveIterator, startTime);
var currTime = (new Date()).getTime();
var elapsedTimeInMS = currTime - startTime;
var timeLimitExceeded = elapsedTimeInMS >= MAX_RUNNING_TIME_MS;
if (timeLimitExceeded) {
PropertiesService.getDocumentProperties().setProperty(RECURSIVE_ITERATOR_KEY, JSON.stringify(recursiveIterator));
console.info("Stopping loop after '%d' milliseconds. Please continue running.", elapsedTimeInMS);
return;
}
}
console.info("Done running");
PropertiesService.getDocumentProperties().deleteProperty(RECURSIVE_ITERATOR_KEY);
}
// process the next file or folder
function nextIteration(recursiveIterator) {
var currentIteration = recursiveIterator[recursiveIterator.length-1];
if (currentIteration.fileIteratorContinuationToken !== null) {
var fileIterator = DriveApp.continueFileIterator(currentIteration.fileIteratorContinuationToken);
if (fileIterator.hasNext()) {
// process the next file
var path = recursiveIterator.map(function(iteration) { return iteration.folderName; }).join("/");
processFile(fileIterator.next(), path);
currentIteration.fileIteratorContinuationToken = fileIterator.getContinuationToken();
recursiveIterator[recursiveIterator.length-1] = currentIteration;
return recursiveIterator;
} else {
// done processing files
currentIteration.fileIteratorContinuationToken = null;
recursiveIterator[recursiveIterator.length-1] = currentIteration;
return recursiveIterator;
}
}
if (currentIteration.folderIteratorContinuationToken !== null) {
var folderIterator = DriveApp.continueFolderIterator(currentIteration.folderIteratorContinuationToken);
if (folderIterator.hasNext()) {
// process the next folder
var folder = folderIterator.next();
recursiveIterator[recursiveIterator.length-1].folderIteratorContinuationToken = folderIterator.getContinuationToken();
recursiveIterator.push(makeIterationFromFolder(folder));
return recursiveIterator;
} else {
// done processing subfolders
recursiveIterator.pop();
return recursiveIterator;
}
}
throw "should never get here";
}
function makeIterationFromFolder(folder) {
return {
folderName: folder.getName(),
fileIteratorContinuationToken: folder.getFiles().getContinuationToken(),
folderIteratorContinuationToken: folder.getFolders().getContinuationToken()
};
}
function processFile(file, path) {
console.log(path + "/" + file.getName());
}
I am programming a Google Apps script within a spreadsheet. My use case includes iterating over a large set of folders that are children of a given one. The problem is that the processing takes longer than the maximum that Google allows (6 minutes), so I had to program my script to be able to resume later. I am creating a trigger to resume the task, but that is not part of my problem (at least, not the more important one at this moment).
My code looks like this (reduced to the minimum to illustrate my problem):
function launchProcess() {
var scriptProperties = PropertiesService.getScriptProperties();
scriptProperties.setProperty(SOURCE_PARENT_FOLDER_KEY, SOURCE_PARENT_FOLDER_ID);
scriptProperties.deleteProperty(CONTINUATION_TOKEN_KEY);
continueProcess();
}
function continueProcess() {
try {
var startTime = (new Date()).getTime();
var scriptProperties = PropertiesService.getScriptProperties();
var srcParentFolderId = scriptProperties.getProperty(SOURCE_PARENT_FOLDER_KEY);
var continuationToken = scriptProperties.getProperty(CONTINUATION_TOKEN_KEY);
var iterator = continuationToken == null ? DriveApp.getFolderById(srcParentFolderId).getFolders() : DriveApp.continueFolderIterator(continuationToken);
var timeLimitIsNear = false;
var currTime;
while (iterator.hasNext() && !timeLimitIsNear) {
var folder = iterator.next();
processFolder_(folder);
currTime = (new Date()).getTime();
timeLimitIsNear = (currTime - startTime >= MAX_RUNNING_TIME);
}
if (!iterator.hasNext()) {
scriptProperties.deleteProperty(CONTINUATION_TOKEN_KEY);
} else {
var contToken = iterator.getContinuationToken();
scriptProperties.setProperty(CONTINUATION_TOKEN_KEY, contToken);
}
} catch (e) {
//sends a mail with the error
}
}
When launchProcess is invoked, it only prepares the program for the other method, continueProcess, that iterates over the set of folders. The iterator is obtained by using the continuation token, when it is present (it will not be there in the first invocation). When the time limit is near, continueProcess obtains the continuation token, saves it in a property and waits for the next invocation.
The problem I have is that the iterator is always returning the same set of folders although it has been built from different tokens (I have printed them, so I know they are different).
Any idea about what am I doing wrong?
Thank you in advance.
It appears that your loop was not built correctly. (edit : actually, probably also another issue about how we break the while loop, see my thoughts about that in comments)
Note also that there is no special reason to use a try/catch in this context since I see no reason that the hasNext() method would return an error (but if you think so you can always add it)
here is an example that works, I added the trigger creation / delete lines to implement my test.
EDIT : code updated with logs and counter
var SOURCE_PARENT_FOLDER_ID = '0B3qSFd3iikE3MS0yMzU4YjQ4NC04NjQxLTQyYmEtYTExNC1lMWVhNTZiMjlhMmI'
var MAX_RUNNING_TIME = 5*35*6;
function launchProcessFolder() {
var scriptProperties = PropertiesService.getScriptProperties();
scriptProperties.setProperty('SOURCE_PARENT_FOLDER_KEY', SOURCE_PARENT_FOLDER_ID);
scriptProperties.setProperty('counter', 0);
scriptProperties.deleteProperty('CONTINUATION_TOKEN_KEY');
ScriptApp.newTrigger('continueProcess').timeBased().everyMinutes(10).create();
continueProcessFolder();
}
function continueProcessFolder() {
var startTime = (new Date()).getTime();
var scriptProperties = PropertiesService.getScriptProperties();
var srcParentFolderId = scriptProperties.getProperty('SOURCE_PARENT_FOLDER_KEY');
var continuationToken = scriptProperties.getProperty('CONTINUATION_TOKEN_KEY');
var iterator = continuationToken == null ? DriveApp.getFolderById(srcParentFolderId).getFolders() : DriveApp.continueFolderIterator(continuationToken);
var timeLimitIsNear = false;
var currTime;
var counter = Number(scriptProperties.getProperty('counter'));
while (iterator.hasNext() && !timeLimitIsNear) {
var folder = iterator.next();
counter++;
Logger.log(counter+' - '+folder.getName());
currTime = (new Date()).getTime();
timeLimitIsNear = (currTime - startTime >= MAX_RUNNING_TIME);
if (!iterator.hasNext()) {
scriptProperties.deleteProperty('CONTINUATION_TOKEN_KEY');
ScriptApp.deleteTrigger(ScriptApp.getProjectTriggers()[0]);
Logger.log('******************no more folders**************');
break;
}
}
if(timeLimitIsNear){
var contToken = iterator.getContinuationToken();
scriptProperties.setProperty('CONTINUATION_TOKEN_KEY', contToken);
scriptProperties.setProperty('counter', counter);
Logger.log('write to scriptProperties');
}
}
EDIT 2 :
(see also last comment)
Here is a test with the script modified to get files in a folder. From my different tests it appears that the operation is very fast and that I needed to set a quite short timeout limit to make it happen before reaching the end of the list.
I added a couple of Logger.log() and a counter to see exactly what was happening and to know for sure what was interrupting the while loop.
With the current values I can see that it works as expected, the first (and second) break happens with time limitation and the logger confirms that the token is written. On a third run I can see that all files have been dumped.
var SOURCE_PARENT_FOLDER_ID = '0B3qSFd3iikE3MS0yMzU4YjQ4NC04NjQxLTQyYmEtYTExNC1lMWVhNTZiMjlhMmI'
var MAX_RUNNING_TIME = 5*35*6;
function launchProcess() {
var scriptProperties = PropertiesService.getScriptProperties();
scriptProperties.setProperty('SOURCE_PARENT_FOLDER_KEY', SOURCE_PARENT_FOLDER_ID);
scriptProperties.setProperty('counter', 0);
scriptProperties.deleteProperty('CONTINUATION_TOKEN_KEY');
ScriptApp.newTrigger('continueProcess').timeBased().everyMinutes(10).create();
continueProcess();
}
function continueProcess() {
var startTime = (new Date()).getTime();
var scriptProperties = PropertiesService.getScriptProperties();
var srcParentFolderId = scriptProperties.getProperty('SOURCE_PARENT_FOLDER_KEY');
var continuationToken = scriptProperties.getProperty('CONTINUATION_TOKEN_KEY');
var iterator = continuationToken == null ? DriveApp.getFolderById(srcParentFolderId).getFiles() : DriveApp.continueFileIterator(continuationToken);
var timeLimitIsNear = false;
var currTime;
var counter = Number(scriptProperties.getProperty('counter'));
while (iterator.hasNext() && !timeLimitIsNear) {
var file = iterator.next();
counter++;
Logger.log(counter+' - '+file.getName());
currTime = (new Date()).getTime();
timeLimitIsNear = (currTime - startTime >= MAX_RUNNING_TIME);
if (!iterator.hasNext()) {
scriptProperties.deleteProperty('CONTINUATION_TOKEN_KEY');
ScriptApp.deleteTrigger(ScriptApp.getProjectTriggers()[0]);
Logger.log('******************no more files**************');
break;
}
}
if(timeLimitIsNear){
var contToken = iterator.getContinuationToken();
scriptProperties.setProperty('CONTINUATION_TOKEN_KEY', contToken);
scriptProperties.setProperty('counter', counter);
Logger.log('write to scriptProperties');
}
}
As of January 1, 2016 this is still a problem. The bug report lists a solution using the Advanced Drive API, which is documented here, under "Listing folders".
If you don't want to use Advanced services, an alternative solution would be to use the Folder Iterator to make an array of File Ids.
It appears to me that the Folder Iterator misbehaves only when created using DriveApp.continueFolderIterator(). When using this method, only 100 Folders are included in the returned Folder Iterator.
Using DriveApp.getFolders() and only getting Folder Ids, I am able to iterate through 694 folders in 2.734 seconds, according the Execution transcript.
function allFolderIds() {
var folders = DriveApp.getFolders(),
ids = [];
while (folders.hasNext()) {
var id = folders.next().getId();
ids.push(id);
}
Logger.log('Total folders: %s', ids.length);
return ids;
}
I used the returned array to work my way through all the folders, using a trigger. The Id array is too big to save in the cache, so I created a temp file and used the cache to save the temp file Id.
This is caused by a bug in GAS:
https://code.google.com/p/google-apps-script-issues/issues/detail?id=4116
It appears you're only storing a single continuation token. If you want to recursively iterate over a set of folders and allow the script to pause at any point (e.g. to avoid the timeout) and resume later, you'll need to store a bunch more continuation tokens (e.g. in an array of objects).
I've outlined a template that you can use here to get it working properly. This worked with thousands of nested files over the course of 30+ runs perfectly.