Too slow to unzip dataset in google colab from google drive - deep-learning

I have my dataset which is around 1.2GB and want to upload it on google colab. What I tried is, I compressed the dataset into zip and it turned 479MB. Then upload the zip file into google drive, and do the folloing command in google colab.
!unzip Archive.zip
It starts to unzip the file but it's too slow(I've waited for three hours but it didn't finish). I'm using GPU of google colab and smaller zip file was unzipped correctly. Is there faster way to upload dataset on google colab?

Related

Trouble in uploading a large file from google drive url to google colab

I need to upload a large file ( Sample File) to google colab. This file is located on a google drive account.
Consider these situation:
My google drive is approximately full, so I could not upload it to my drive
My connection speed is low and downloading this file and uploading it to google drive is a great challenge for me.
Also, I read some stackoverflow pages like: Import data into Google Colaboratory and these ones:
Download File from URL to Google Drive using Google Colab in Python , Get Started: 3 Ways to Load CSV files into Colab and 7 ways to load external data into Google Colab. But non of them was useful for my case. I also tried !wget command but it could not download a google drive link.
Assume you have a shared google drive link like this one:
Then share it with another person:
Now, go to your google drive and check Shared with me:
Add shortcut to drive: Now this file has been added to google drive and it is accessible in colab.
Finally, go to colab and run these commands:
from google.colab import drive
drive.mount('/content/drive')
import os
os.chdir("drive/My Drive/Weights/")
And file is there!

How to load data to google colaboratory for others to use without downloading the data folders?

I use the 'mount drive' to use the dataset folder.
However, the google drive belongs to only me so others who want to run my code on colab have to download the dataset and upload to their google drive, then mount drive to run the code.
Is there any solution for others to run my the code without downloading the dataset?
I made a library to do just that
!pip install kora
from kora import drive
drive.download_folder(folder_id)

Google colab and google drive: Copy file from colab to Google Drive

There seem to be lots of ways to access a file on Google Drive from Colab but no simple way to save a file from Google Colab back to Google Drive.
For example, to access a Google Drive file from Colab, you can mount the Google Drive using
from google.colab import drive
drive.mount('/content/drive')
However, to save an output file you've generated in Colab on Google Drive the methods seem very complicated as in:
Upload File From Colab to Google Drive Folder
Once Google Drive is mounted, you can even view the drive files in the Table of Contents from Colab. Is there no simple way to save or copy a file created in Colab and visible in the Colab directory back to Google Drive?
Note: I don't want to save it to a local machine using something like
from google.colab import files
files.download('example.txt')
as the file is very large
After you have mounted the drive, you can just copy it there.
# mount it
from google.colab import drive
drive.mount('/content/drive')
# copy it there
!cp example.txt /content/drive/MyDrive
Other answers suggest how to copy a specific file, I would like to mention you can also copy the entire directory, which is useful when copying logs from callbacks from Colab to Drive:
from google.colab import drive
drive.mount('/content/drive')
In my case, the folder names were:
%cp -av "/content/logs/scalars/20201228-215414" "/content/drive/MyDrive/Colab Notebooks/logs/scalars/manual_add"
You can use shutil to copy/move files between colab and google drive
import shutil
shutil.copy("/content/file.doc", "/content/gdrive/file.doc")
When you are saving files, simply specify the Google Drive path for saving the file.
When using large files, Colab sometimes syncs the VM and Drive asynchronously. To force the sync, simply run:
from google.colab import drive
drive.flush_and_unmount()
in my case I use the common approach with the !cp command.
But sometimes, it didn't work in Colab because we didn't enter the right file path.
basic code: !cp source_filepath destination_filepath
implementation code:
!cp /content/myfolder/myitem.txt /content/gdrive/MyDrive/mydrivefolder/
in addition, to correctly enter the path, you can copy the path location from the table of contents on the left side by clicking the dot menu -> copy path.
Once you see the file in the Table of Contents of Colab on the left, simply drag that file into the "/content/drive/My Drive/" directory located on the same panel. Once the file is inside your "My Drive", you will be able to see it inside your Google Drive.
After you mount your drive...
from google.colab import drive
drive.mount('/content/drive')
...just prepend the full path, including the mounted path (/content/drive) to the file you want to write.
someList = []
with open('/content/drive/My Drive/data/file.txt', 'w', encoding='utf8') as output:
for line in someList:
output.write(line + '\n')
In this case we save it in a folder called data located in the root of your Google Drive.
You may often run into quota limits using the gdown library.
Access denied with the following error:
Too many users have viewed or downloaded this file recently. Please
try accessing the file again later. If the file you are trying to
access is particularly large or is shared with many people, it may
take up to 24 hours to be able to view or download the file. If you
still can't access a file after 24 hours, contact your domain
administrator.
You may still be able to access the file from the browser:
https://drive.google.com/uc?id=FILE_ID
No doubt gdown is faster but i copy my files using the command below and avoid quota limits
!cp /content/drive/MyDrive/Dataset/test1.zip /content/dataset

how to upload image folder to colab?

I was trying to upload a big image folder into google drive and github but github not allowed and google drive taking too long. How can I upload the local folder to colab.
Sorry, I don't think there's a solution to your issue. If your fundamental problem is limited upload capacity from the machine with the images, you'll just need to wait.
A nice property to uploading to Drive is that you can use programs like Backup and Sync to retry the transfer until it's successful. And, once the images have been uploaded to Drive once, you'll be able to access them quickly in Colab thereafter without uploading again. (See this example notebook showing how to connect your Google Drive files to Colab as a filesystem.)
convert the folder to zip file and then upload it on colab.
further you can unzip your folder by following command.
! unzip "your path"
The unzip method only works for csv files.
If you use a kaggle dataset, use
os.environ['KAGGLE_USERNAME'] = 'enter_username_here' # username
os.environ['KAGGLE_KEY'] = 'enter_key_here' # key
!kaggle datasets download -d dataset_api_command_here
If you have the image in google drive, use
from google.colab import drive
drive.mount('/content/drive')

From colab directly manipulate sqlite3 format data in Google drive

From colaboratory, is it possible to directly manipulate sqlite 3 format data in google drive?
It is possible if you upload it, but it is convenient to use it in google drive.
You can do load files directory from Drive by mounting your Google Drive as a FUSE filesystem.
Here's an example:
https://colab.research.google.com/notebook#fileId=1srw_HFWQ2SMgmWIawucXfusGzrj1_U0q
There's no official Google Drive FUSE filesystem. But, several open-source FUSE + Drive libraries have been written by third parties. The example notebook above uses google-drive-ocamlfuse. The notebook shows three things:
Installing the Drive FUSE wrapper in the Colab VM.
Authenticating to Drive and mounting your Drive using the FUSE wrapper.
Listing and creating files in the newly mounted Drive-backed filesystem.
First and foremost, upload the database.sqlite and the desired csv file(Reviews.csv in my case) to Google Drive.
Then, you need to mount your drive in Google Colab using the following command:
from google.colab import drive
drive.mount('/content/gdrive')
This will result in the following output:
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly
Enter your authorization code:
Click on the URL that will lead to a new tab that shows your Google account to provide certain privileges to Google Drive File Stream:
Choose an account
to continue to Google Drive File Stream
You must select the same Google account with which you logged in to the Google Colab. Then click on the 'Allow' button. This will lead to another page that shows the alphanumeric code. Copy the code and paste it in the text area(Enter your authorization code: ).
Consequently, the message about the drive being mounted will be displayed:
··········
Mounted at /content/gdrive
Now click on the folder icon(below 'show code snippet pane' and is located on the left side of the code cell), you can see the gdrive folder. Your Google Drive - My Drive will be located inside gdrive. Now click on the desired folder in which you stored the database.sqlite file. Right-click on it and select 'copy path'.
The path you copied should be pasted in the path link of the following command:
con = sqlite3.connect('paste the path you copied')
For example, if the database.sqlite resides in '/content/gdrive/My Drive/Colab Notebooks/database.sqlite', then the command will be as follows:
con = sqlite3.connect('/content/gdrive/My Drive/Colab Notebooks/database.sqlite')
Now you may run some SQL queries to check whether all is well:
filtered_data = pd.read_sql_query("""
SELECT *
FROM Reviews
WHERE Score != 3
""", con)
print(filtered_data.head())