Google colab deletes my unzipped files after restarting session - google-drive-api

I am trying to extract (tar.gz) a 2.2GB dataset on my google drive that I need to run models on Colab.
I use the command !tar -xf source.tar.gz -C destination to extract it to my desired directory.
After 30 minutes, it is extracted and all the files are properly extracted.
I restart the session after a while and i see that I am missing more than half of the files. So I extract them again and I close my session, come back and see that almost all are missing.
How could I fix this? Also the google drive interface is very laggy and async from all the changes that are happening in the Colab.
I really need the GPU on colab. How do I resolve this issue?
I even tried using tf.keras.utils.get_file with the extract option on but I have lost most of my files again after i opened the notebook.
EDIT: Forgot to mention that it is shared with some other people with whom i am in the project with. Is it possible that there is not enough space and it stores them in memory while the session is running and is not able to fully move them to the drive?

Unfortunately this is a limitation of Google Colab, based on Google's Colaboratory FAQ
Q: Where is my code executed? What happens to my execution state if I
close the browser window?
A: Code is executed in a virtual machine private to your account. Virtual
machines are deleted when idle for a while, and have a maximum
lifetime enforced by the Colab service.
The virtual machine that runs the code is recycled after a certain amount of inactivity. There's no mechanism yet to persist data saved on Colab right now.
Possible workaround:
I have stumbled upon a video from 1littlecoder's YouTube channel that shows how to prevent Google Colab Session Runtime from Closing with JavaScript.
The video is using document.querySelector that clicks a button every setInterval to avoid idle time.
Disclaimer: I am not affiliated with the video nor the YouTube channel.
References:
https://research.google.com/colaboratory/faq.html
https://colab.sandbox.google.com/notebooks/io.ipynb
https://www.youtube.com/watch?v=5VkKlHuE4JQ
https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector

Related

Google Colaboratory and Google Drive integration fails

I am not able to save a new notebook in my Google Drive environment.
Google Colaboratory works with predefined notebooks such as Hello, Colaboratory, but I am not able to save any into my Drive folder.
I have the Colaboratory app allowed in the Google Drive settings and really dont know how to solve it. Colaboratory communicates with Drive - it even creates the notebook files in the Google Drives folder, but when loading any notebook file it always report following Notebook loading error
There was an error loading this notebook. Ensure that the file is accessible and try again.
Neither the details of the error help much:
Failed to fetch TypeError: Failed to fetch
I was playing with the access rights of both the file and the folder and could not find any solution.
Update: Chrome 64.0.3282.167 (64 bit); Windows 10 1709. I use two users on the Chrome. Creating Notebooks works normally on different computers with my username.
This is the output from console:
Chrome Console Output
On Google Chrome, I was seeing this issue randomly and it mentioned not being able to load the file /some/google/path/thats/gone/because/this/fixed/it/client.js. I tried clearing my cache and hard reloading, and sure enough, Colab starting working again.
As with standard cookies, third-party cookies are placed so that a site can remember something about you at a later time. Both are typically used to store surfing and personalization preferences and tracking information.
Google's colaboratory uses third party cookies and your browser most likely has them disabled.
Navigate to your browser settings, search for cookies and enable third party cookies. This should hopefully fix your problem.
I had the same problem, and I just disabled AdBlock on google Colab in everything works perfectly.
try to close/pause AdBlock and reload the page, it works for me.

User disconnecting app in Drive causes loss of data under FILE scope

I've run into this issue a few times, but could never point my finger on it, attributing it to GDAA's latency, my buggy code, etc... I finally managed to come up with a scenario where I can safely reproduce it, so I would like to ask people who know if it is a feature I don' understand or a plain bug. If latter is the case, please point me to the place where I can nag about it.
I will discuss it on the REST API's background for simplicity.
1/ Let's have a Drive API authenticated app that runs under DRIVE_FILE scope
com.google.api.services.drive.Drive svc =
new Drive.Builder(
AndroidHttp.newCompatibleTransport(),
new GsonFactory(),
GoogleAccountCredential
.usingOAuth2( context, Collections.singletonList(DriveScopes.DRIVE_FILE))
).build();
2/ create a file (files/folders) in Google drive using
svc.files().insert([METADATA], [CONTENT]).execute();
3/ search for the objects you've created using
svc.files().list().setQ([QUERY]).setFields([FIELDS]).execute();
When the app is run, user goes through the usual Account-Pick / Drive-Authorize routine and everything works as expected. Files are created, visible, can be found ... until a user revokes the authorization by means of
Settings > Manage Apps > Disconnect From Drive
in drive.google.com.
After that, if the Android app is restarted (and re-authorized), none of the objects created prior the revocation is visible.
It may be by design, I don't know. If this is the case, I can't find a way how the Android app can get to anything it created before. I could certainly create another 'maintenance' app with DRIVE scope to fix this, but...
Now, in case of GDAA, it gets even worse. Not only GDAA does not have the DRIVE scope to fix it, but if the same sequence of steps is done and the app creates a file/folder immediately after revocation, GDAA does not complain, but the file/folder is not created at all. After a while (minutes), the re-authorization pops-up, but still, the files created meanwhile are nowhere to be found and everything pre-dating the revocation is lost to the (creator) app as well (it certainly is visible in the web app that obviously has DRIVE like scope).
Thank you for you patience.
The first issue is:
A user revokes authorization via: Settings > Manage Apps > Disconnect From Drive
Then reauthorizes that App
Files this App was authorized to see with DRIVE_FILE scope are no longer authorized.
This is the expected behavior of the REST and Android APIs.
We don't think users would intuitively expect all previously authorized files to be re-authorized. The user may not remember the files that were previously authorized, and informing users that these files are going to be authorized again will likely cause confusion.
The second issue is GDAA's behavior for folder creation in this situation. We don't currently support CompletionEvents for folder creations, but this is something we'll look into.

Google Realtime API and Sharing Permissions Timing Issue

In a nutshell, what we are seeing is that if we create a new realtime document and immediately share it with another collaborator, and that collaborator loads the realtime document upon seeing it show up in their "Shared with me" folder, then when that collaborator tries to write data to the file, an error occurs, and sometimes the realtime API fails silently.
We have been able to reproduce this both by adding permissions programmatically and by using Google Drive's sharing dialogue. Here are the steps for reproducing this bug.
Log in to two different Google accounts on separate browsers
Create a new realtime document in one account
Copy the URL pointing to the new document
Share the newly created document with the other account by typing in the email address
Try to as quickly as possible verify that the new document shows up in the "Shared with me" folder of the other account and paste the copied URL in the other browser to load the document for the other account (I was able to reproduce the issue consistently when doing this in under 30 seconds on my machine, but everything seems to work ok if there is at least a 35 second delay)
When the document loads for the shared with account, try to write data to the document
Sometimes the realtime API crashes silently
If the write to the document uses compound operations, we get the following errors:
Drive Realtime API Error: invalid_compound_operation:
Open compound operation at end of synchronous block - did you forget to call endCompoundOperation()?
Uncaught
DocumentClosedError: Document is closed.
This issue also occurs when sharing an existing file with a new collaborator. When testing on my machine, it appears to be a timing issue, as I can consistently reproduce the error when waiting less than 30 seconds to the load the shared document, and I haven't been able to reproduce the issue when the wait is 35 seconds or more. Another interesting find is that the problem seems to be only with writing data. I am always able to read data from the shared document properly, but if it was loaded in the under 30 seconds scenario, then the first time I try to write data, the issue will occur. What's even more curious is that if the page is refreshed, then it will work properly even if the refresh occurs within 30 seconds of the document being shared.
Thanks.
I'm not sure what your specific issue is, however it is likely that the realtime API is catching an error thrown in your JS making it fail silently and skip calling endCompoundOperation. I would recommend opening up chrome and enable the 'Pause on Exceptions' to catch what is causing the problem as described here (https://developers.google.com/chrome-developer-tools/docs/javascript-debugging?csw=1#pause-on-exceptions) to see what is actually failing.

Cloud Storage Download Appears to Be Malicious

I uploaded a utility in the last few days to google cloud storage.
It's a zip file containing two executables and a readme file.
I tested the download and it worked fine. I then looked into how I could see the download stats and yesterday I enabled logging.
I posted the link to a mailing list this afternoon and clicked it to verify that I had the right link and the download in chrome reports "xxx.zip appears to be malicious".
This did not happen prior to when I enabled logging, but I don't know for sure that is what caused it.
I am using a CNAME alias for the download, and I am a paying google apps customer.
The executables are not malicious in any way. They are simple utilities for doing replacements in text files. They do not access the network at all.
My question is "Why is my zip file being reported as malicious?" and is there any way to remedy this situation?
I looked around for a solution to this problem and I found the following advice:
1) Sign your EXEs. As it turns out, this advice is incorrect. While it has worked for some people, there are people who report that even signed executables are reported as malicious downloads.
2) Use SSL. SSL access is not available for google cloud storage unless you use the commondatastorage.googleapis.com or sandbox.google.com URLs. While this does might work, it doesn't resolve my problem.
3) Use the commondatastorage.googleapis.com URL. This works. The same file using the commondatastorage.googleapis.com url rather than my custom CNAME record does not report that it "appears malicious".
4) Register your site with Google Webmaster Tools. Getting around Chrome's Malicious File Warning According to this stackoverflow entry, the solution is to sign up for Google Webmaster Tools and add your site.
I have tried this one, but it has not made a change just yet. Because this is google cloud storage and not a main site, I added an index.html page, a 404 page, and ran the gsutil commands to enable web configuration within google cloud storage. I added the site to Webmaster Tools and additionally added it to Google Analytics.
I'll give solution 4 a few days to see if it pans out.
It seems like this is more of an issue with Google Chrome and not necessarily Google Cloud Storage. Chrome's methods for identifying malicious files are less than desirable right now.

Calling a Google Drive SDK from Google App Script application

i have been going around in circles here and have totally confused myself. I need some help.
I am (trying to) writing an application for a client that in concept is simple. he want a google write document with a button. the google drive account has several folders, each shared with several people. when he drops a new file in one of the folders, he wants to be able to open this write file, this file is the template for his email. he clicks the button, the system calls the changes service in the Google Drive SDK https://developers.google.com/drive/manage-changes, gets the list of files that have been added since the last time it was checked, then pull the list of people that the file has been shared with, and use the write file as a template to send that list of people an email saying their file is ready.
SO, easy enough, right?
I started by looking at the built in functions in the Google App Script API. I found this method, https://developers.google.com/apps-script/class_docslist#find in the DocsList class. problem is the description for the query simply says "the query string". So at first i tried the Drive SDK query parameters, which are
var files = DocsList.find("modifiedDate > 2012-12-20T12:00:00-08:00.");
it didn't work. that leads me to believe it is a simple full text search on the content. Thats not good enough.
That lead me into trying to call a Drive SDK method from within an App Script application. Great, we need an OLap 2 authentication. easy enough. found the objects in the script reference and hit my wall.
Client ID and Client Secret.
you see, when i create what this really is, a service account, the olap control in apps script doesn't know how to handle the encrypted json and pass it back and forth. Then when i tried to create and use an installed applications key, i get authentication errors because the controls again, don't know what to do with the workflow. and finally, when i try to create a web app key, i can't because i don't have the site host name or redirect URI. And i can't use the application key ability because since im working with files OLap 2 is required.
i used the anonymous olap for a while, but hit the limit of anonymous calls per day in the effort of trying to figure out the code a bit, thats not going to work because the guy is going to be pushing this button constantly thru the day.
i have been pounding my head on the desk over this for 5 hours now. i need some help here, can anyone give me a direction to go?
PS, yes, i know i can use the database controls and load the entire list of files into memory and compare it to the list of files in the database. problem being, we are talking tens of thousands of files. bad idea.
I wouldn't use DocsList anymore - DriveApp is supposed to be a more reliable replacement. Some of the commands have changed, so instead of find, use searchFiles. This should work more effectively (they even use a query like yours as an example).