There is a job that sends a couple of to the drive. Files are sent by our jobs successfully but arrive to google drive late. And once they do arrive, the creation time of file that is shown in drive is the time of our job execution. But strangely, at that actual time the files are not there in the drive, so there is delay in files appearing in the drive. And it seems to happen very randomly. Sometimes the files show up right away, sometimes in a couple of hours. But the file creation time is always the same time of the job execution.
I appreciate any help.
Many thanks!
Related
I have a container-bound Apps Script Project contained to a Form Response Google Sheet, triggered on form submit. The script runs as me. I'm dealing with execution runtimes 6-8x the nominal run time during peak hours of the day, which seems largely correlated to increased traffic of form submissions. The program follows this series of steps:
Form response collects a string of information and triggers the Apps Script Project to execute
Project creates a copy of a complex Google sheet of a few tabs and a lot of complex formulas
Project pastes the string collected from the form response into this google sheet copy, flushes the sheet, and then returns a string of results calculated within the sheet
Google sheet file is named and the Project creates a unique Drive folder where this sheet eventually gets moved to.
Run complete
The Project performs a wide variety of getValue and SetValue calls throughout the run as it's updating cell values or reading calculated results. Over the last year, I've improved optimization in many ways (i.e. batch calls to getValues or setValues, batch API calls, etc). It's normal run time is 25-45 seconds, but increases to 200+ seconds during my company's peak business hours. Using logs, there is no one particular step that gets hung up. But rather the script lags in all aspects (When it creates the file copy, SpreadsheetApp.flush(), append or delete rows in google other sheets it references by Sheet ID, etc). Although, I'll say a common reason for a failed execution returns the error message "Service Spreadsheets timed out while accessing document with id..." But most of the executions complete successfully, just after a lengthy run.
It seems there is a clear correlation between execution time and traffic. So my questions are:
Can anyone confirm that theory?
Is there a way I can increase the service's bandwidth for my project?
Is there a better alternative to having the script run as me? (mind you I'm performing the rest of my job using Chrome throughout the day while this program continues to automatically run in the background)
This is an Apps Script managed project; it is not tied to a personal GCP Project. Is that a potential factor at play here?
I've been developing an apps script project for my company that tracks our time/expenses. I've structured the project like so:
The company has a paid Gsuite account that owns all the spreadsheets hosted on the company's google drive.
Each employee has their own "user" spreadsheet which is shared from the company Gsuite account with the employee's personal gmail account.
Each of the user spreadsheets has a container-bound script that accesses a central library script.
The library script allows us to update the script centrally and the effects are immediate for each user. It also prevents users from seeing the central script and meddling with it.
Each of the user container-bound scripts have installable triggers that are authorized by the company account so that the code being run has full authority to do what it needs to to the spreadsheets.
This setup has been working quite well for us with about 40 users. The drawback to this setup is that since all the script activity is run by the company account via the triggers, the activity of all our users is logged under the single company account and therefore capped by the apps script server quotas for a single user. This hasn't been much of an issue for us yet as long as our script is efficient in how it runs. I have looked into deploying this project as a web-app for our company, but there doesn't seem to be a good way to control/limit user access to the central files. In other words, if this project was running as a web app installed by each user, each user would need to have access to all the central spreadsheets that the project uses behind the scenes. And we don't want that.
SO with that background, here is my question. How do I efficiently track apps script activity to see how close we are to hitting our server quota, and identify which of my functions need to be optimized?
I started doing this by writing a entry into a "activity log" spreadsheet every time the script was called. It tracked what function was called, and who the user was and it had a start time entry and and end time entry so I can see how long unique executions took and which ones failed. This was great because I had a live view into the project activity and could graph it using the spreadsheet graphs tools. Where this began to break down was the fact that every execution of the script required two write-actions: one for initialization and another for completion. Since the script is being executed every time a user made an edit to their spreadsheet, during times of high traffic, the activity log spreadsheet became inaccessible and errors would be thrown all over the place.
So I have since transitioned to tracking activity by connecting each script file to a single Google Cloud Platform (GCP) project and using the Logger API. Writing logs is a lot more efficient than writing an entry to a spreadsheet, so the high traffic errors are all but gone. The problem now is that the GCP log browser isn't as easy to use as a spreadsheet and I can't graph the logs or sum up the activity to see where we stand with our server quota.
I've spent some time now trying to figure out how to automatically export the logs from the GCP so I can process the logs in real-time. I see how to download the logs as csv files, which I can then import into a google spreadsheet and do the calcs and graphing I need, but this is a manual process, and doesn't show live data.
I have also figured out how to stream the logs from GCP by setting up a "sink" that transfers the logs to a "bucket" which can theoretically be read by other services. This got me excited to try out Google Data Studio, which I saw is able to use Google Cloud Storage "buckets" as a data source. Unfortunately though, Google Data Studio can only read csv files in cloud storage, and not the json files that my "sink" is generating in my "bucket" for the logs.
So I've hit a wall. Am I missing something here? I'm just trying to get live data showing current activity on our apps script project so I can identify failed executions, see total processing time, and sort the logs by user or function so I can quickly identify where I need to optimize my script.
You've already referenced using GCP side of your Apps Script.
Have a look at Metric explorer, it lets you see quota usage per resource and auto generates graph for you.
But long term I think re-building your solution may be a better idea. At minimum switching to submitting data via Google Forms will save you on operation.
I have thousands of files on google drive which i have deleted sent to trash. My problem is they are still showing up after emptying the trash.
Emptying the trash simply does not work. The only way i can delete items and reduce quota is by searching for items using the is:trashed search operator permanently deleting them that way except this takes forever and i have no idea how to automate this using drives api.
I have successfully ran this from another question but it doesn't help my cause.
I am essentially after a script that lists files tagged as trashed owned by me and then permanently deletes them.
Any help greatly appreciated
"I have thousands of files on google drive which i have deleted sent to trash. My problem is they are still showing up after emptying the trash."
Just picking up on that second sentence, it's worth bearing in that when using the Google API (eg with rclone) or even with Google's own web interface, there is always a delay between issuing any "Empty the Bin" instruction, to Google Drive actually actioning it. And the more files that are in the Bin, and the smaller they are, the longer it will take for the Bin to be fully cleared. Thousands of very small files can take quite a while. And you have no control over when Google Drive will actually start the deletion or how fast it will work its way through the Bin.
Anyone who's familiar with Garbage Collection will understand precisely what I mean - "Ye know not at what hour the master cometh", as my minister Dad would have [annoyingly] put it :D
You can demonstrate this behavior for yourself by "removing" lots of files to the Bin, have the Bin displayed in a web browser and then issue (for example), "rclone cleanup mydrive:" which is the rclone command to empty the Google Drive Bin (where mydrive: has been configured to point to your Google Drive).
Firstly, you can observe that although the instruction has been issued, it can take a while before the Bin's list of files begins changing and at times, it can chug quite slowly through the list.
The other thing is that if you instruct the web interface to empty the bin it will immediately replace the list of files and folders with a graphic saying the Bin is empty. In fact, it's not, as you will see if you click on the "My Drive" folder and then immediately click back on the Bin again - the entire list will be redisplayed as before, minus anything that Google Drive has deleted in the interim. In the background, however, Google Drive will be revving up to delete your files.
The other, other thing is also that any further files and folders that are removed to the Bin after the initial "Empty the Bin" instruction has been issued, will need a subsequent command for them to be deleted permanently - they will not be covered by the first command. Again, you can demonstrate this to yourself by removing items to the Bin while the first "Empty the Bin" is in progress - you will ultimately be left with a Bin containing the second lot of removals.
I just thought it was worth pointing this out (you may disagree :D ), as I too have tried to empty my Bin in the past and thought that Google Drive wasn't doing anything.
Edit: apologies for the edit. I meant to say that once you've issued the "rclone cleanup mydrive: " command, if you then use "rclone about mydrive:", one of the stats it reports is the total space used by the files in the Bin. By periodically issuing "rclone about..." you can see the amount of space used by Bin decreasing each time. This is often the better way of checking how your "Delete Forever" command is progressing since Google Drive already considers those files gone, albeit in theory rather than practice.
Cooper's method failed for me.
files.hasNext()
does not iterate through trash items. Cooper did point me at the Advanced Google Services API:
//This method requires adding the Advanced Google Services Drive API under Resources -> Advanced Google Services
//And the Google API Console[https://console.cloud.google.com/apis/library/drive.googleapis.com/]
function DeleteTrashedFiles(){
Drive.Files.emptyTrash();
};
If you would like to see how much of your quota you deleted you can do this.
function DeleteTrashedFilesLogged(){
// Enable Drive.About.get() fields here:
// https://developers.google.com/apis-explorer/?hl=en_US#p/drive/v3/
var lngBytesInTrash = Drive.About.get().quotaBytesUsedInTrash;
var filesDrive = Drive.Files;
Logger.log('Attempting to clean up ' + lngBytesInTrash + ' bytes');
filesDrive.emptyTrash();
Logger.log('Trash now contains ' + Drive.About.get().quotaBytesUsedInTrash + ' bytes');
filesDrive = Drive.Files;
};
Try this:
function delTrashedFiles() {
var files=DriveApp.getFiles();
while(files.hasNext()) {
var file=files.next();
if(file.isTrashed()) {
Drive.Files.remove(file.getId())
}
}
}
You will need to enable Drive API
I have a bunch of workbooks that have dozens... up to 100... sheets in them, and we do an import process, loading one sheet after the other, on these workbooks that tends to get throttled when we collate our entire body of data. In other words, we read every sheet in every workbook and partway through the process, we get bogged down by the fact that Google is putting off our calls.
What I'd like to do is look at the last modification time for each sheet before I request it. If it is more recent than the data I've cached for that sheet, I'll download it. Otherwise, I'll skip it and use my cached version.
It's absolutely possible to get the last mod time for the workbook, but if I used that, and only one sheet in the book changed, I'd be stuck downloading all 100 sheets in that book when I may only need one.
I've looked at the idea of having Google notify me any time a cell changes, but there are limits on how many notifications are sent in a day. If I were to activate the notification for all the sheets I'm watching, and turn them off as I mark cached data as being dirty, I would still have the problem that I wouldn't know if I'd suddenly stopped getting notifications because I was being throttled. There are some sneaky workarounds to this, but none of them would notice if I'd been throttled for a while, but wasn't anymore.
So, how do I find the last modification for a sheet - preferably via one of the Python libraries?
I've deleted spreadsheets with scripts attributed to them that run on a timed frequency. I also had notifications setup to inform me when a script had an error.
Since delete them I've gotten a steady stream of failure notifications that I can't stop, any ideas?
Did you empty the trash in Google Drive/Docs? That usually does the trick...