How to recursively save GAN generated images to a folder in Kaggle - deep-learning

img_list2 = []
img_list2.append(fake)
for i in range(len(fake)):
vutils.save_image(img_list2[-1][i],
"/content/drive/MyDrive/DAugmentn/genPyImage
/gpi%d.jpg" % i, normalize=True)
With this code snippet at the end of my model, i successfully saved my Gan generated images recursively to genpyImage subfolder as jpeg files located in my google drive using Colab. I want to do same in kaggle. I have sucessfully loaded the training data and model is working fine in kaggle ,but cannnot save generated images at the end of training in kaggle.I want to take advantage of kaggle GPU.
I tried to create the my genPyImage subfolder in the kaggle output working directory, but not working.

It is just a simple task of saving it into kaggle output directory by replacing the output directory with "/kaggle/working" and then download afterwards into local directly.
Change from
vutils.save_image(img_list2[-1][i],
"/content/drive/MyDrive/DAugmentn/genPyImage
/gpi%d.jpg" % i, normalize=True)
To
vutils.save_image(img_list2[-1][i],"/kaggle/working/gpi%d.jpg" % i, normalize=True)

Related

Azure ADLS Gen2 file created by Azure Databricks doesn't inherit ACL

I have a databricks notebook that is writing a dataframe to a file in ADLS Gen2 storage.
It creates a temp folder, outputs the file and then copies that file to a permanent folder. For some reason the file doesn't inherit the ACL correctly. The folder it creates has the correct ACL.
The code for the notebook:
#Get data into dataframe
df_export = spark.sql(SQL)
# OUTPUT file to temp directory coalesce(1) creates a single output data file
(df_export.coalesce(1).write.format("parquet")
.mode("overwrite")
.save(TempFolder))
#get the parquet file name. It's always the last in the folder as the other files are created starting with _
file = dbutils.fs.ls(TempFolder)[-1][0]
#create permanent copy
dbutils.fs.cp(file,FullPath)
The temp folder that is created shows the following for the relevant account.
Where the file shows the following.
There is also a mask. I'm not really familiar with masks so not sure how this differs.
The Mask permission on the folder shows
On the file it shows as
Does anyone have any idea why this wouldn't be inheriting the ACL from the parent folder?
I've had a response from Microsoft support which has resolved this issue for me.
Cause: Databricks stored files have Service principal as the owner of the files with permission -rw-r--r--, consequently forcing the effective permission of rest of batch users in ADLS from rwx (directory permission) to r-- which in turn causes jobs to fail
Resolution: To resolve this, we need to change the default mask (022) to custom mask (000) on Databricks end. You can set the following in Spark Configuration settings under your cluster configuration: spark.hadoop.fs.permissions.umask-mode 000
Wow, thats great! I was looking for a solution. Passthrough Authentication might be a proper solution now.
I had the feeling it was part of this acient hadoop bug:
https://issues.apache.org/jira/browse/HDFS-6962 (solved in hadoop-3, now part of spark 3+).
Spark tries to set the ACL's after moving the files, but fails. First the files are created somewhere else in a tmp dir. The tmp-dir rights are inherated by default adls-behaviour.

Import pre-trained Deep Learning Models into Foundry Codeworkbooks

How do you import a h5 model locally from Foundry into code workbook?
I want to use the hugging face library as shown below, and in its documentation the from_pretrained method expects a URL path to the where the pretrained model lives.
I would ideally like to download the model onto my local machine, upload it onto Foundry, and have Foundry read in said model.
For reference I’m trying to do this on code workbook or code authoring. It looks like you can work directly with files from there, but I’ve read the documentation and the given example was for a CSV file whereas this model contains a variety of files like h5 and json format. Wondering how I can access these files and have them passsed into the from_pretrained method from the transformers package
Relevant links:
https://huggingface.co/transformers/quicktour.html
Pre-trained Model:
https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english/tree/main
Thank you!
I've gone ahead and added the transformers (hugging face) package onto the platform.
As for the uploading the package you can follow these steps:
Use your dataset with the model-related files as an input to your code workbook transform
Use pythons raw file access to access the contents of the dataset: https://docs.python.org/3/library/filesys.html
Use pythons built-in tempfile to build a folder and add the files from step 2, https://docs.python.org/3/library/tempfile.html#tempfile.mkdtemp , https://www.kite.com/python/answers/how-to-write-a-file-to-a-specific-directory-in-python
Pass in the tempfile (tempfile.mkdtemp() return the absolute path) to the from_pretrained method
import tempfile
def sample (dataset_with_model_folder_uploaded):
full_folder_path = tempfile.mkdtemp()
all_file_names = ['config.json', 'tf_model.h5', 'ETC.ot', ...]
for file_name in all_file_names:
with dataset_with_model_folder_uploaded.filesystem().open(file_name) as f:
pathOfFile = os.path.join(fullFolderPath, file_name)
newFile = open(pathOfFile, "w")
newFile.write(f.read())
newFile.close()
model = TF.DistilSequenceClassification.from_pretrained(full_folder_path)
tokenizer = TF.Tokenizer.from_pretrained(full_folder_path)
Thanks,

Fetch Folder from drive for Google Colab

I'm trying to run a deep learning model in jupyter notebook and its taking forever and also karnel dies during training . So i'm trying to run it on Google Colab . I've learned some basics that are available on the internet but its not helping me at all . The model gets it dataset from a module ,
this link https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/utils/iam_dataset.py has the module that extract and preprocess dataset for trining from local computer. I've uploaded the dataset in Gdrive now i want to change the path so that this module finds that 'dataset' folder . I've been stuck on it for 5 days and now i'm clueless .
I will suggest you not to load the dataset from GDrive to colab directly. It increases the dataset loading time.
Google Colab provides some local storage for your work(around 70 GB) that is shown on the upper-right corner below the RAM bar. Bring your dataset to that storage. This is how you can do it:-
import zipfile
from google.colab import drive
zip_ref = zipfile.ZipFile("/content/drive/My Drive/dataset.zip", 'r')
zip_ref.extractall("/content/")
zip_ref.close()
Please note that your entire dataset should be zipped.
It will be more than 20 times faster than the method you are trying...
Format of zipfile.ZipFile() function above:-
zip_ref = zipfile.ZipFile("/content/drive/Zip file location in GDrive", 'r')
If you click the folder icon in the left side in colab interface you should see your dataset there.
You can then access your dataset using the filepath='/content/dataset'

Google Colab / Google Drive and h5py file storage

I'am trying to train a UNet neural network with data stored on my google drive with Google Colab.
I created a core library, a dataset etc... But it was slow to access data.
In order to prevent it, I've build a ".hdf5" file with h5py library.
XDataPath="/content/drive/My Drive/Dataset/data/X"
YDataPath="/content/drive/My Drive/Dataset/data/Y"
h5Path="/content/drive/My Drive/Dataset/data/dataset.hdf5"
nbX=len(os.listdir(XDataPath))
nbY=len(os.listdir(YDataPath))
# CleanData
dst=[os.path.splitext(f)[0] for f in os.listdir(YDataPath)]
src=[os.path.splitext(f)[0] for f in os.listdir(XDataPath)]
for f in src:
if f not in dst:
fpth=os.path.join(XDataPath,f+'.jpg')
os.remove(fpth)
print(fpth)
for f in dst:
if f not in src:
fpth=os.path.join(YDataPath,f+'.png')
os.remove(fpth)
print(fpth)
with h5py.File(h5Path,'a') as hfile:
if not "X" in hfile:
hfile.create_dataset("X",(nbX,512,512,3))
if not "Y" in hfile:
hfile.create_dataset("Y",(nbY,512,512))
for i,Path in tqdm.tqdm_notebook(enumerate(os.listdir(XDataPath)),total=nbX):
ImPath=os.path.join(XDataPath,Path)
with h5py.File(h5Path,'a') as hfile:
with Image.open(ImPath) as f:
X=np.array(f)
hfile["X"][i]=X
The file is correctly created :
What is surprising for me is that i don't see this file on my google drive (only a 0ko file with the same name).
More, i don't have enough storage in order to store it
Why this file is not created on the drive ?
Where is it stored ?
Another problem is that when i restart the environment, the hdf5 file is now 0ko, like on my google drive.. And empty of course !
Thanks,
The file is created and stored in Google Cloud(Colab instance). The file is too big, so it cannot be synced back to Google Drive.
So, I suggest you use a GCS bucket to store it instead of GDrive.

Forge DWG file translation

When translating and downloading DWG files from a server, the downloaded content contains different folders like
24d925af-2793-8061-0b78-6eba65e7eba8_f2d,
382a3ef0-6066-5db8-8f62-79017ae1e777_f2d,
4215b4a9-11b7-7e99-6d6f-4f124effceea_f2d etc.
which each contain a primaryGraphics.f2d file.
What is the use of these different folders and the primaryGraphics.f2d files in them?
Each f2d file represents a layout in the original DWG file.
I think you might probably refer to what https://extract.autodesk.io/ provided. The code workflow is to get the urn of the derivative,
https://github.com/cyrillef/extract.autodesk.io/blob/838b63f1f76668081c789d9962b93a0f97d9555c/server/bubble.js#L110
self.extractPathsFromGraphicsUrn (node.urn, item) ;
and get out the section of guid ***_f2d
https://github.com/cyrillef/extract.autodesk.io/blob/838b63f1f76668081c789d9962b93a0f97d9555c/server/bubble.js#L391
var basePath =urn.slice (0, urn.lastIndexOf ('/') + 1) ;
finally this guid ***_f2d will be used for the folder of the zip.
The urn of the derivative can be achieved by https://developer.autodesk.com/en/docs/model-derivative/v2/reference/http/urn-manifest-GET/ .
The following is the example of one of my test models:
Hope it helps.