Fetch Folder from drive for Google Colab - deep-learning

I'm trying to run a deep learning model in jupyter notebook and its taking forever and also karnel dies during training . So i'm trying to run it on Google Colab . I've learned some basics that are available on the internet but its not helping me at all . The model gets it dataset from a module ,
this link https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/utils/iam_dataset.py has the module that extract and preprocess dataset for trining from local computer. I've uploaded the dataset in Gdrive now i want to change the path so that this module finds that 'dataset' folder . I've been stuck on it for 5 days and now i'm clueless .

I will suggest you not to load the dataset from GDrive to colab directly. It increases the dataset loading time.
Google Colab provides some local storage for your work(around 70 GB) that is shown on the upper-right corner below the RAM bar. Bring your dataset to that storage. This is how you can do it:-
import zipfile
from google.colab import drive
zip_ref = zipfile.ZipFile("/content/drive/My Drive/dataset.zip", 'r')
zip_ref.extractall("/content/")
zip_ref.close()
Please note that your entire dataset should be zipped.
It will be more than 20 times faster than the method you are trying...
Format of zipfile.ZipFile() function above:-
zip_ref = zipfile.ZipFile("/content/drive/Zip file location in GDrive", 'r')
If you click the folder icon in the left side in colab interface you should see your dataset there.
You can then access your dataset using the filepath='/content/dataset'

Related

How to recursively save GAN generated images to a folder in Kaggle

img_list2 = []
img_list2.append(fake)
for i in range(len(fake)):
vutils.save_image(img_list2[-1][i],
"/content/drive/MyDrive/DAugmentn/genPyImage
/gpi%d.jpg" % i, normalize=True)
With this code snippet at the end of my model, i successfully saved my Gan generated images recursively to genpyImage subfolder as jpeg files located in my google drive using Colab. I want to do same in kaggle. I have sucessfully loaded the training data and model is working fine in kaggle ,but cannnot save generated images at the end of training in kaggle.I want to take advantage of kaggle GPU.
I tried to create the my genPyImage subfolder in the kaggle output working directory, but not working.
It is just a simple task of saving it into kaggle output directory by replacing the output directory with "/kaggle/working" and then download afterwards into local directly.
Change from
vutils.save_image(img_list2[-1][i],
"/content/drive/MyDrive/DAugmentn/genPyImage
/gpi%d.jpg" % i, normalize=True)
To
vutils.save_image(img_list2[-1][i],"/kaggle/working/gpi%d.jpg" % i, normalize=True)

How to save the fatures from Web Feature Service as a shapefile in python?

0
I have a WFS running on URL that is mentioned in the code below. i want to get all the features available in the service as a shapefile so i can open it and visualize it in QGIS or any other GIS software. The servies provides the ability to download the shapefile as can be viewed here :
https://www.geoseaportal.de/wss/service/Site_Development_Plan_2020/guest?SERVICE=WFS&REQUEST=GetCapabilities&VERSION=2.0.0
I have tried the below code to fetch the data and save as a shapefile but it just creates an empty point shapefile which is not correct. How can i download it correctly?
from owslib.wfs import WebFeatureService
eaWFS = WebFeatureService(url='https://www.geoseaportal.de/wss/service/Site_Development_Plan_2020/guest?SERVICE=WFS', version='2.0.0')
floodData = eaWFS.getfeature(typename='Site_Development_Plan_2020:All', bbox=(419000,419000,421000,421000), outputFormat='SHAPE-ZIP')
out = open('C:\\Users\Chaudhr1\\arcgis\DATA.zip', 'wb')
out.write(floodData.read())
out.close()

Loading text data (210 MB) from google drive to google colab is excruciatingly slow

I am going through Hugging Face's Tutorial on Fine-Tuning BERT with a custom dataset but am unable to follow along as Google Colab has been executing for over an hour (so far) just trying to load the data.
The tutorial uses the IMDB Review dataset which I downloaded to my Google Drive account (in order to mimic how I will be getting data for my actual project). The dataset contains 50000 movie reviews in which each movie review is saved as a standalone txt file.
The code that I am trying to execute is the following (taken from the tutorial):
from pathlib import Path
def read_imdb_split(split_dir):
split_dir = Path(split_dir)
texts = []
labels = []
for label_dir in ["pos", "neg"]:
for text_file in (split_dir/label_dir).iterdir():
texts.append(text_file.read_text())
labels.append(0 if label_dir is "neg" else 1)
return texts, labels
train_texts, train_labels = read_imdb_split('/content/drive/MyDrive/aclImdb/train')
test_texts, test_labels = read_imdb_split('/content/drive/MyDrive/aclImdb/test')
It is only 210 MB so I do not understand how it could possibly be taking so long. Is it normal for it to be taking this long? What can I do?
I will also mention that I have Colab Pro and am using a GPU with High-RAM.

Google Colab / Google Drive and h5py file storage

I'am trying to train a UNet neural network with data stored on my google drive with Google Colab.
I created a core library, a dataset etc... But it was slow to access data.
In order to prevent it, I've build a ".hdf5" file with h5py library.
XDataPath="/content/drive/My Drive/Dataset/data/X"
YDataPath="/content/drive/My Drive/Dataset/data/Y"
h5Path="/content/drive/My Drive/Dataset/data/dataset.hdf5"
nbX=len(os.listdir(XDataPath))
nbY=len(os.listdir(YDataPath))
# CleanData
dst=[os.path.splitext(f)[0] for f in os.listdir(YDataPath)]
src=[os.path.splitext(f)[0] for f in os.listdir(XDataPath)]
for f in src:
if f not in dst:
fpth=os.path.join(XDataPath,f+'.jpg')
os.remove(fpth)
print(fpth)
for f in dst:
if f not in src:
fpth=os.path.join(YDataPath,f+'.png')
os.remove(fpth)
print(fpth)
with h5py.File(h5Path,'a') as hfile:
if not "X" in hfile:
hfile.create_dataset("X",(nbX,512,512,3))
if not "Y" in hfile:
hfile.create_dataset("Y",(nbY,512,512))
for i,Path in tqdm.tqdm_notebook(enumerate(os.listdir(XDataPath)),total=nbX):
ImPath=os.path.join(XDataPath,Path)
with h5py.File(h5Path,'a') as hfile:
with Image.open(ImPath) as f:
X=np.array(f)
hfile["X"][i]=X
The file is correctly created :
What is surprising for me is that i don't see this file on my google drive (only a 0ko file with the same name).
More, i don't have enough storage in order to store it
Why this file is not created on the drive ?
Where is it stored ?
Another problem is that when i restart the environment, the hdf5 file is now 0ko, like on my google drive.. And empty of course !
Thanks,
The file is created and stored in Google Cloud(Colab instance). The file is too big, so it cannot be synced back to Google Drive.
So, I suggest you use a GCS bucket to store it instead of GDrive.

Goolge Colab file upload not uploading the csv file

Ran the following code in Colab:
uploaded = files.upload()
Clicked "Choose Files" and selected the csv I want to upload.
But the file is not being uploaded. It worked once before but not again. All videos I watch about how to upload csv's locally, once the file is selected, Colab uploads the file immediately.
Colab is not doing that in my case. It's just sitting stagnant like this:
stagnant colab
This is a Colab bug --
https://github.com/googlecolab/colabtools/issues/437
The team reports they are working on a service change to correct the problem today. (21 Feb, 2019)
Well even i had the same probem
you can also go for other option
1. In left pane of google colab there is a tab called Files Click on that.
2. click on the upload files.
Thats it!!!! Pretty Simple Enough!!!!
for more details :https://www.youtube.com/watch?v=0rygVrmHidg
Upload the file from the local machine using the following command:
from google.colab import files
import pandas as pd
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
After that read the csv/text file using the pandas:
dataFrame= pd.read_csv('filename.csv')
I have same problem. I use this method. Upload the file from the google drive using the following command:
from google.colab import drive
drive.mount('/content/drive')
data = pd.read_csv('/content/drive/Myfiles/datafiles/mydata.csv', delimiter=",")