How can I speed up unzipping / reading files with a mounted Google Drive in Colab? - google-drive-api

I'm using Colab with a mounted Google Drive to unpack zips and consolidate the csvs that come out of them. But this, for example:
for z in zip_list:
zipfile.ZipFile(z, 'r').extractall()
zipfile.ZipFile(z, 'r').close()
os.remove(z)
runs about 60x slower in Colab/Drive compared to when I run it on my local computer. Why is this so much slower and how can I fix it?

A typical strategy is to copy the .zip file from Drive to the local disk first.
Unzipping involves lots of small operations like file creation, which are much faster on a local disk than Drive, which is remote.

Related

Does the Google COLAB folder in Google Drive not support subfolders?

This may be the dumbest question ever, but here goes. I cannot get COLAB to open a folder within the COLAB folder of Google Drive. Is no folder organization allowed in the colab folder? How do I organize colab notebooks so that every notebook is not in the same google drive directory?
Most similar questions are concerned with importing data or other files into a particular notebook. That is not my concern. I want to organize separate colab notebooks. But, trying to open in colab anything other than an actual notebook file throws an error instead of, say, opening a subfolder with notebooks stored in it.
What am I missing?!?

Google drive disconnect from COLAB

I am working on neural networks in Keras and I use Colab to train my network. Unfortunately, any time I stop the training, one of the following problems occurs:
Colab unmounts my gdrive folder. So I must remount it to restart the training.
My gdrive folder on Colab partially empties (I loose my dataset). In this case I also need to restart the session in order to remount gdrive.
Does anyone know the reasons ?
By stopping the training, do you mean stopping the kernel?
If you stop or restart the kernel, the drive will be unmounted.
If you want your training to continue, save your models into checkpoints.

Open folder vs create new project from existing files, located under shared network drive in PhpStorm

It's not clear to my why I should use the option in PhpStorm to create a new project from existing files instead of just opening a folder and declaring the project directory.
I have a web server installed and I can access it's root by a shared network drive. Now I can just open the a folder in PhpStorm and declare it's root. It will generate a PhpStorm project at the given directory.
But there is also an option to open a new project from existing files (located under shared network drive). My best guess is that this option is the way to go. Is this true and if so, why? Or if it doesn't matter, why doesn't it?
There will be several people using the same shared drive to work in different projects in the webroot.
You can, of course, create a project on mounted network drive via File/Open, but note that this is not officially supported. All IDE functionality is based on the index of the project files which PHPStorm builds when the project is loaded and updates on the fly as you edit your code. To provide efficient coding assistance, PHPStorm needs to re-index code fast, which requires fast access to project files and caches storage. The latter can be ensured only for local files, that is, files that are stored on you hard disk and are accessible through the file system. Sure, mounts are typically in the fast network, but one day some hiccup happen and a user sends a stacktrace and all we see in it is blocking I/O call.
So, the suggested approach is downloading files to your local drive and use deployment configuiration to synchronize local files with remote. See https://confluence.jetbrains.com/display/PhpStorm/Sync+changes+and+automatic+upload+to+a+deployment+server+in+PhpStorm

How do I download files from Google Drive to my server in parallel?

I want to download files from a Google Drive account to a server for backup purposes. The account holds about 40GB of files, which are mostly not owned by the user (so Google Takeout won't work).
I'd like to download the files in parallel to speed up the process.
You can use the Google Drive Linux client, which is conveniently called drive. It's under development, but works pretty well.
It's got some dependencies (seemingly Go 1.2+), which can be hard to satisfy in a server environment. But it's possible to install.
$ drive init
$ drive pull
Will pull your whole Drive account down, but be fairly slow.
$ drive list | sed -e 's/^\///' | xargs -P 10 -I{} drive pull -quiet -no-prompt '{}'
Will download your top level folders in parallel, which may or may not be what you want.
It is possible to download in parallel, however you will reach quotas designed to prevent abuse of the system. In the developers console you can increase the rate limit so that a single user (you) can consume all the quota but you will eventually reach the rate limit exception with too many files downloaded in parallel. Basically google makes sure you dont go over a per-second limit given that its a free service or fixed price like google apps.

Script to perform a local backup of files stored in Google drive

I am about to store all my critical files on Google Drive. Before doing this, I'd like to make sure I have a proper backup in case I delete accidentally files.
There are tools to perform backups / restore of Google Drive (e.g. backupify). However, I'd like to keep it simple and have a script running on my PC that let's say once a day takes a backup of files stored in Google Drive.
Does anyone has a script to perform this ? The script can be on PC or Mac.
Thanks a lot for your help !
Hugues
The easiest way to keep a local backup of your files would probably be to install the Google Drive application. There are versions for both PC and Mac:
http://support.google.com/drive/bin/answer.py?hl=en&answer=2374989
Jay