Google Colab / Google Drive and h5py file storage

Google Colab / Google Drive and h5py file storage - google-drive-api

I'am trying to train a UNet neural network with data stored on my google drive with Google Colab.
I created a core library, a dataset etc... But it was slow to access data.
In order to prevent it, I've build a ".hdf5" file with h5py library.
XDataPath="/content/drive/My Drive/Dataset/data/X"
YDataPath="/content/drive/My Drive/Dataset/data/Y"
h5Path="/content/drive/My Drive/Dataset/data/dataset.hdf5"
nbX=len(os.listdir(XDataPath))
nbY=len(os.listdir(YDataPath))
# CleanData
dst=[os.path.splitext(f)[0] for f in os.listdir(YDataPath)]
src=[os.path.splitext(f)[0] for f in os.listdir(XDataPath)]
for f in src:
if f not in dst:
fpth=os.path.join(XDataPath,f+'.jpg')
os.remove(fpth)
print(fpth)
for f in dst:
if f not in src:
fpth=os.path.join(YDataPath,f+'.png')
os.remove(fpth)
print(fpth)
with h5py.File(h5Path,'a') as hfile:
if not "X" in hfile:
hfile.create_dataset("X",(nbX,512,512,3))
if not "Y" in hfile:
hfile.create_dataset("Y",(nbY,512,512))
for i,Path in tqdm.tqdm_notebook(enumerate(os.listdir(XDataPath)),total=nbX):
ImPath=os.path.join(XDataPath,Path)
with h5py.File(h5Path,'a') as hfile:
with Image.open(ImPath) as f:
X=np.array(f)
hfile["X"][i]=X
The file is correctly created :
What is surprising for me is that i don't see this file on my google drive (only a 0ko file with the same name).
More, i don't have enough storage in order to store it
Why this file is not created on the drive ?
Where is it stored ?
Another problem is that when i restart the environment, the hdf5 file is now 0ko, like on my google drive.. And empty of course !
Thanks,

The file is created and stored in Google Cloud(Colab instance). The file is too big, so it cannot be synced back to Google Drive.
So, I suggest you use a GCS bucket to store it instead of GDrive.

Related

How to recursively save GAN generated images to a folder in Kaggle

img_list2 = []
img_list2.append(fake)
for i in range(len(fake)):
vutils.save_image(img_list2[-1][i],
"/content/drive/MyDrive/DAugmentn/genPyImage
/gpi%d.jpg" % i, normalize=True)
With this code snippet at the end of my model, i successfully saved my Gan generated images recursively to genpyImage subfolder as jpeg files located in my google drive using Colab. I want to do same in kaggle. I have sucessfully loaded the training data and model is working fine in kaggle ,but cannnot save generated images at the end of training in kaggle.I want to take advantage of kaggle GPU.
I tried to create the my genPyImage subfolder in the kaggle output working directory, but not working.

It is just a simple task of saving it into kaggle output directory by replacing the output directory with "/kaggle/working" and then download afterwards into local directly.
Change from
vutils.save_image(img_list2[-1][i],
"/content/drive/MyDrive/DAugmentn/genPyImage
/gpi%d.jpg" % i, normalize=True)
To
vutils.save_image(img_list2[-1][i],"/kaggle/working/gpi%d.jpg" % i, normalize=True)

Why Google App Script UrlFetchApp when downloads a zip file changes its binary content?

I want to download a zip file in Google Drive via Google Apps Script.
After downloading a sample zip file with the code below and saving it into the folder in google drive.
const exampleUrl = "https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-zip-file.zip";
var response = UrlFetchApp.fetch(exampleUrl);
var parentFolder = DriveApp.getFolderById('1aba-tnQZxZMN7DN52eAywTU-Xs-eqOf4');
parentFolder.createFile('sample_CT.zip', response.getContentText()); // doesn't work
parentFolder.createFile('sample_C.zip', response.getContent()); // doesn't work
parentFolder.createFile('sample_B.zip', response.getBlob()); // doesn't work
parentFolder.createFile('sample.zip', response); // doesn't work
After downloading it on my machine I try to unpack with unzip utility but all of the above versions give me the following:
> unzip sample_CT.zip
Archive: sample_CT.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of sample_CT.zip or
sample_CT.zip.zip, and cannot find sample_CT.zip.ZIP, period.
In the picture I am comparing broken zip file (above) and the correct one (below):
broken:
PKuÔøΩÔøΩP
sample.txtUT
ÔøΩbÔøΩ^ÔøΩbÔøΩ^ÔøΩbÔøΩ^uxÔøΩÔøΩEÔøΩ1RÔøΩ0ÔøΩÔøΩÔøΩQÔøΩ0ÔøΩUz. ,
ÔøΩÔøΩXKÔøΩ!ÔøΩÔøΩ2ÔøΩÔøΩV#ÔøΩ6ÔøΩ:
ÔøΩÔøΩMÔøΩ
ÔøΩÔøΩ#uxÔøΩhÔøΩttPkHT—∫ÔøΩHÔøΩb+ÔøΩ:NÔøΩ>mÔøΩÔøΩÔøΩÔøΩÔøΩhÔøΩ`{ÔøΩcÔøΩ0ÔøΩAÔøΩÔøΩ(yhÔøΩÔøΩÔøΩ&ÔøΩÔøΩÔøΩ{ÔøΩU~ÔøΩYÔøΩ~ÔøΩÔøΩÔøΩÔøΩÔøΩHAÔøΩÔøΩÔøΩÔøΩÔøΩk8wÔøΩpÔøΩÔøΩÔøΩ6ÔøΩIkÔøΩÔøΩkÔøΩÔøΩ?k"?OJxÔøΩÔøΩ(nÎ≤ºgÔøΩ_ÔøΩtPK[ÔøΩcÔøΩPKuÔøΩÔøΩP[ÔøΩcÔøΩ
ÔøΩÔøΩsample.txtUT
ÔøΩbÔøΩ^ÔøΩbÔøΩ^ÔøΩbÔøΩ^uxÔøΩÔøΩPKX
correct:
PKu¬ì¬•P
sample.txtUT
√áb¬±^√áb¬±^√áb¬±^ux√®√®E¬è1R√Ö0√ª¬ú√¢Q√ë0¬πUz. ,
√é√†XK√æ!¬∑√ø2√∞¬áV#√≠¬Æ6¬ú:
¬£√®M√†
√Ø¬¥#ux¬≠h√∞¬Æ¬∏ttPkHT√ë¬∫√≤H¬≤b+¬™:N¬™>m√¥¬î√â√§¬íh¬ò`{√∫c√å0√ÖA√µ¬ö(yh¬Æ¬©¬ª&√ä√¥√è{√ΩU~¬∞Y√ä~¬ì¬æ√ã√≤√∂HA¬Ñ√Ñ√º√ó√∑k8w√èp√π√∂¬π6√ïIk¬ª√∞k¬§√º?k"?OJx¬∫√ò(n√´¬≤¬ºg¬™_√∂tPK[¬∞c¬∂PKu¬ì¬•P[¬∞c¬∂
¬¥¬Åsample.txtUT
√áb¬±^√áb¬±^√áb¬±^ux√®√®PKX
The image in my text editor
As you can see in the picture (file snippets above) some symbols differ. I have no idea why UrlFetch changes certain bytes when it downloads a zip file.
Also on top it a file after UrlFetch takes more space.

It's because the script is converting it to string. Folder.createFile() accepts a blob, but it should be it's only argument. If it's passed as a second argument, other method signatures like Folder.createFile(name:string, content:string) takes precedence and Blob is converted to String to match the method signature.
parentFolder.createFile(response.getBlob().setName('TheMaster.zip'))

Fetch Folder from drive for Google Colab

I'm trying to run a deep learning model in jupyter notebook and its taking forever and also karnel dies during training . So i'm trying to run it on Google Colab . I've learned some basics that are available on the internet but its not helping me at all . The model gets it dataset from a module ,
this link https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/utils/iam_dataset.py has the module that extract and preprocess dataset for trining from local computer. I've uploaded the dataset in Gdrive now i want to change the path so that this module finds that 'dataset' folder . I've been stuck on it for 5 days and now i'm clueless .

I will suggest you not to load the dataset from GDrive to colab directly. It increases the dataset loading time.
Google Colab provides some local storage for your work(around 70 GB) that is shown on the upper-right corner below the RAM bar. Bring your dataset to that storage. This is how you can do it:-
import zipfile
from google.colab import drive
zip_ref = zipfile.ZipFile("/content/drive/My Drive/dataset.zip", 'r')
zip_ref.extractall("/content/")
zip_ref.close()
Please note that your entire dataset should be zipped.
It will be more than 20 times faster than the method you are trying...
Format of zipfile.ZipFile() function above:-
zip_ref = zipfile.ZipFile("/content/drive/Zip file location in GDrive", 'r')
If you click the folder icon in the left side in colab interface you should see your dataset there.
You can then access your dataset using the filepath='/content/dataset'

Autodesk Forge download object, but cannot tell if it is a Revit model or zip file

I was downloading Revit models from BIM360 team hub via ForgeAPI using the following uri.
https://developer.api.autodesk.com/oss/v2/buckets/:bucketKey/objects/:objectName
All my objectName ended with .rvt. So I downloaded and saved them as rvt file.
However I noticed that some of the files cannot be opened by Revit. They are actually not rvt files but zip files. So I have to change the extension to .zip and unzip the file to get real 'rvt` files.
My Problem is that not all files is zip file. I cannot tell from the API because the URI I request is always ended with .rvt.

Every Unix OS provides the file command, a standard utility program for recognising the type of data contained in a computer file:
https://en.wikipedia.org/wiki/File_(command)
A zip file is directly recognised and reported like this:
$ file test_dataset.zip
test_dataset.zip: Zip archive data, at least v2.0 to extract
A Revit RVT model is a Windows compound document file, so it generates the following output:
$ file little_house_2021.rvt
little_house_2021.rvt: Composite Document File V2 Document, Cannot read section info
Hence you can use the same algorithm as file does to distinguish between RVT and ZIP files.
Afaik, file just looks at the first couple of bytes in the given file.
The Python programming language offers similar utilities; try an Internet search
for distinguish file type python; the first hits explain
How to check type of files without extensions in Python
and point to the filetype Python project.
Other programming languages can provide similar functionality.

export nested JSON from GCS into Spreadsheet

I have a nested NDJSON file that I exported from BQ into Google Cloud Storage. From there I would like to open it in Spreadsheet again as a nested table.
I see a lot of Appscripts to import JSON files but none are for files stored in GCS.
What would be the best solution to open the data table in spreadsheet?
the csv file I see when I use the tool suggested by Alex
This is the NDJSON example:
{"page":"/xxxx","country":"DE","pageviews":"72136","daily_peak_pageviews":"5465","daily_peak_users":"3118","users_unique":"37763","SEO":true,"campaign_info":[{"channel_group":"Referral","users_c":"16","pageviews_c":"17","title":"404"},{"channel_group":"Social","users_c":"2255","pageviews_c":"3839","title":"OK"},{"channel_group":"other","users_c":"33185","pageviews_c":"63320","title":"OK"},{"channel_group":"Referral","users_c":"316","pageviews_c":"556","title":"OK"},{"channel_group":"Paid","users_c":"47","pageviews_c":"49","title":"404"},{"channel_group":"Paid","users_c":"1088","pageviews_c":"1706","title":"OK"},{"channel_group":"other","users_c":"1888","pageviews_c":"2517","title":"404"},{"channel_group":"Social","users_c":"100","pageviews_c":"132","title":"404"}]}
{"page":"/yyy","country":"DE","pageviews":"67576","daily_peak_pageviews":"5390","daily_peak_users":"2843","users_unique":"32772","SEO":true,"campaign_info":[{"channel_group":"other","users_c":"7","pageviews_c":"10","title":"404"},{"channel_group":"other","users_c":"30951","pageviews_c":"64345","title":"OK"},{"channel_group":"Paid","users_c":"782","pageviews_c":"1303","title":"OK"},{"channel_group":"Referral","users_c":"265","pageviews_c":"467","title":"OK"},{"channel_group":"Social","users_c":"889","pageviews_c":"1450","title":"OK"},{"channel_group":"Paid","users_c":"1","pageviews_c":"1","title":"404"}]}
{"page":"/zzz","country":"DE","pageviews":"7558","daily_peak_pageviews":"619","daily_peak_users":"331","users_unique":"4117","SEO":true,"campaign_info":[{"channel_group":"other","users_c":"7","pageviews_c":"14","title":"404"},{"channel_group":"Paid","users_c":"38","pageviews_c":"70","title":"OK"},{"channel_group":"other","users_c":"3987","pageviews_c":"7309","title":"OK"},{"channel_group":"Paid","users_c":"1","pageviews_c":"1","title":"404"},{"channel_group":"Referral","users_c":"18","pageviews_c":"26","title":"OK"},{"channel_group":"Social","users_c":"70","pageviews_c":"138","title":"OK"}]}
{"page":"hdhh","country":"DE","pageviews":"3616","daily_peak_pageviews":"336","daily_peak_users":"206","users_unique":"2131","campaign_info":[{"channel_group":"Social","users_c":"267","pageviews_c":"379","title":"OK"},{"channel_group":"Paid","users_c":"776","pageviews_c":"1394","title":"OK"},{"channel_group":"other","users_c":"1089","pageviews_c":"1814","title":"OK"},{"channel_group":"Referral","users_c":"17","pageviews_c":"24","title":"OK"},{"channel_group":"other","users_c":"2","pageviews_c":"5","title":"404"}]}
{"page":"/ethehh","country":"DE","pageviews":"1394","daily_peak_pageviews":"322","daily_peak_users":"294","users_unique":"1232","campaign_info":[{"channel_group":"Paid","users_c":"61","pageviews_c":"67","title":"OK"},{"channel_group":"Social","users_c":"271","pageviews_c":"301","title":"OK"},{"channel_group":"other","users_c":"3","pageviews_c":"5","title":"404"},{"channel_group":"Referral","users_c":"10","pageviews_c":"10","title":"OK"},{"channel_group":"other","users_c":"888","pageviews_c":"1011","title":"OK"}]}
and this is the csv example:
page,country,pageviews,daily_peak_pageviews,daily_peak_users,users_unique,SEO,campaign_info/0/channel_group,campaign_info/0/users_c,campaign_info/0/pageviews_c,campaign_info/0/title,campaign_info/1/channel_group,campaign_info/1/users_c,campaign_info/1/pageviews_c,campaign_info/1/title,campaign_info/2/channel_group,campaign_info/2/users_c,campaign_info/2/pageviews_c,campaign_info/2/title,campaign_info/3/channel_group,campaign_info/3/users_c,campaign_info/3/pageviews_c,campaign_info/3/title,campaign_info/4/channel_group,campaign_info/4/users_c,campaign_info/4/pageviews_c,campaign_info/4/title,campaign_info/5/channel_group,campaign_info/5/users_c,campaign_info/5/pageviews_c,campaign_info/5/title,campaign_info/6/channel_group,campaign_info/6/users_c,campaign_info/6/pageviews_c,campaign_info/6/title,campaign_info/7/channel_group,campaign_info/7/users_c,campaign_info/7/pageviews_c,campaign_info/7/title
/xxxx,DE,72136,5465,3118,37763,true,Referral,16,17,404,Social,2255,3839,OK,other,33185,63320,OK,Referral,316,556,OK,Paid,47,49,404,Paid,1088,1706,OK,other,1888,2517,404,Social,100,132,404
/yyy,DE,67576,5390,2843,32772,true,other,7,10,404,other,30951,64345,OK,Paid,782,1303,OK,Referral,265,467,OK,Social,889,1450,OK,Paid,1,1,404,,,,,,,,
/zzz,DE,7558,619,331,4117,true,other,7,14,404,Paid,38,70,OK,other,3987,7309,OK,Paid,1,1,404,Referral,18,26,OK,Social,70,138,OK,,,,,,,,
hdhh,DE,3616,336,206,2131,,Social,267,379,OK,Paid,776,1394,OK,other,1089,1814,OK,Referral,17,24,OK,other,2,5,404,,,,,,,,,,,,
/ethehh,DE,1394,322,294,1232,,Paid,61,67,OK,Social,271,301,OK,other,3,5,404,Referral,10,10,OK,other,888,1011,OK,,,,,,,,,,,,

I found some scripts to load json files into a Google SpreadSheet, but all of them need to be loaded using a url, so the steps to get a public link to your JSON file in GCS are:
Go to your Google Cloud Storage bucket and then in your json file click in the three dots at the right.
click into "edit permissions"
Click into "Add item"
in "ENTITY" choose "User", then en "NAME" type "allUsers" and in "ACCESS" choose "Reader".
Now you have an external link to load your JSON using some scripts, like this one or this other one, but you need to edit the JSON file or the code a bit.
Another solution (and the easiest one), is to convert the JSON file into CSV using this tool and then, import the CSV into Google SpreadSheet clicking into "File" -> "import" -> "Upload" and then select your CSV file.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Google Colab / Google Drive and h5py file storage - google-drive-api

The file is created and stored in Google Cloud(Colab instance). The file is too big, so it cannot be synced back to Google Drive. So, I suggest you use a GCS bucket to store it instead of GDrive.

Related

How to recursively save GAN generated images to a folder in Kaggle

Why Google App Script UrlFetchApp when downloads a zip file changes its binary content?

Fetch Folder from drive for Google Colab

Autodesk Forge download object, but cannot tell if it is a Revit model or zip file

export nested JSON from GCS into Spreadsheet

Categories

Resources