I have uploaded pickled file on google colab using
from google.colab import files
uploaded = files.upload()
Say my pickled file name is Train.p, how do I use it using typical functions, I have tried the code below but it does not work.
with open(io.StringIO(uploaded['train.p']), 'rb') as file:
train = pickle.load(file)
Try this
import io
train = pickle.load(io.BytesIO(uploaded[‘train.p’]))
Related
I want to read a single CSV file in a google bucket with pyarrow. How do I do this?
I can create a FileSystem object with gcsfs, but I don't see a way to provide this to pyarrow.csv.read_csv.
Do I need to create some sort of file stream from the file system? What's the best way to do this?
import gcsfs
import pyarrow.csv as csv
fs = gcsfs.GCSFileSystem(project='foo')
csv.read_csv("bucket/foo/bar.csv", filesystem=fs)
TypeError: read_csv() got an unexpected keyword argument 'filesystem'
Using pyarrow version 6.0.1
I'm guessing you are working with this doc. You're correct that the approach listed there does not work with read_csv because there is no filesystem parameter. We can still generally do this but the process is a bit different.
Pyarrow has its own filesystem abstraction. If you have a pyarrow filesystem then you can first open a file and then use that file to read the CSV:
import pyarrow as pa
import pyarrow.csv as csv
import pyarrow.fs as fs
local_fs = fs.LocalFileSystem()
with local_fs.open_input_file('foo/bar.csv') as csv_file:
csv.read_csv(csv_file)
Unfortunately, a gcsfs.GCSFileSystem is not a "pyarrow filesystem" but you have a few options.
The method gcsfs.GCSFileSystem.open can give you a "python file object" which you can use as input to pyarrow.csv.read_csv.
import gcsfs
import pyarrow.csv as csv
fs = gcsfs.GCSFileSystem(project='foo')
with fs.open("bucket/foo/bar.csv", 'rb') as csv_file:
csv.read_csv(csv_file)
How do you import a h5 model locally from Foundry into code workbook?
I want to use the hugging face library as shown below, and in its documentation the from_pretrained method expects a URL path to the where the pretrained model lives.
I would ideally like to download the model onto my local machine, upload it onto Foundry, and have Foundry read in said model.
For reference I’m trying to do this on code workbook or code authoring. It looks like you can work directly with files from there, but I’ve read the documentation and the given example was for a CSV file whereas this model contains a variety of files like h5 and json format. Wondering how I can access these files and have them passsed into the from_pretrained method from the transformers package
Relevant links:
https://huggingface.co/transformers/quicktour.html
Pre-trained Model:
https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english/tree/main
Thank you!
I've gone ahead and added the transformers (hugging face) package onto the platform.
As for the uploading the package you can follow these steps:
Use your dataset with the model-related files as an input to your code workbook transform
Use pythons raw file access to access the contents of the dataset: https://docs.python.org/3/library/filesys.html
Use pythons built-in tempfile to build a folder and add the files from step 2, https://docs.python.org/3/library/tempfile.html#tempfile.mkdtemp , https://www.kite.com/python/answers/how-to-write-a-file-to-a-specific-directory-in-python
Pass in the tempfile (tempfile.mkdtemp() return the absolute path) to the from_pretrained method
import tempfile
def sample (dataset_with_model_folder_uploaded):
full_folder_path = tempfile.mkdtemp()
all_file_names = ['config.json', 'tf_model.h5', 'ETC.ot', ...]
for file_name in all_file_names:
with dataset_with_model_folder_uploaded.filesystem().open(file_name) as f:
pathOfFile = os.path.join(fullFolderPath, file_name)
newFile = open(pathOfFile, "w")
newFile.write(f.read())
newFile.close()
model = TF.DistilSequenceClassification.from_pretrained(full_folder_path)
tokenizer = TF.Tokenizer.from_pretrained(full_folder_path)
Thanks,
I am training a model in Google Colab and storing it in json format. I want to upload this trained model to my drive in the colab itself.
I am currently doing:
model_json = model.to_json()
with open("trainedModel.json", "w") as json_file:
json_file.write(model_json)
model.save_weights("trainedModel.h5")
print("Saved model to disk")
print("This file ran till end.\nNow uploading to drive:")
uploaded = drive.CreateFile({'parents':[{u'id':'#id_no'}],'title': 'trainedModel.json'})
uploaded.SetContentFile('trainedModel.json')
uploaded.Upload()
uploaded = drive.CreateFile({'parents':[{u'id': '#id_no''}],'title': 'trainedModel.h5'})
uploaded.SetContentFile('trainedModel.h5')
uploaded.Upload()
But this gives me:
FileNotFoundError: [Errno 2] No such file or directory: 'client_secrets.json'
I'd recommend using the file browser browser or Drive FUSE instead. Both are radically simpler than using the Drive API directly.
File browser upload:
Drive FUSE:
from google.colab import drive
drive.mount('/content/gdrive')
(Details)
This was happening because the authorization code given to the notebook that grants it permission expires after a few minutes/hours.
This problem got resolved by requesting the authorization code again. That is inserting
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
after saving the model file and before uploading it to the drive.
Ran the following code in Colab:
uploaded = files.upload()
Clicked "Choose Files" and selected the csv I want to upload.
But the file is not being uploaded. It worked once before but not again. All videos I watch about how to upload csv's locally, once the file is selected, Colab uploads the file immediately.
Colab is not doing that in my case. It's just sitting stagnant like this:
stagnant colab
This is a Colab bug --
https://github.com/googlecolab/colabtools/issues/437
The team reports they are working on a service change to correct the problem today. (21 Feb, 2019)
Well even i had the same probem
you can also go for other option
1. In left pane of google colab there is a tab called Files Click on that.
2. click on the upload files.
Thats it!!!! Pretty Simple Enough!!!!
for more details :https://www.youtube.com/watch?v=0rygVrmHidg
Upload the file from the local machine using the following command:
from google.colab import files
import pandas as pd
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
After that read the csv/text file using the pandas:
dataFrame= pd.read_csv('filename.csv')
I have same problem. I use this method. Upload the file from the google drive using the following command:
from google.colab import drive
drive.mount('/content/drive')
data = pd.read_csv('/content/drive/Myfiles/datafiles/mydata.csv', delimiter=",")
I have a ipython file that I want to execute on colab. When I first ran it, the local files were imported but now it gives me an error.Following are the code snippet and error
FileNotFoundError Traceback (most recent call last)
<ipython-input-1-9132fbd19d75> in <module>()
1 import pickle
----> 2 pickle_in =
open(r"C:/Users/manas/PycharmProjects/allProjects/X.pickle","rb")
3 X = pickle.load(pickle_in)
4
5 pickle_in =
open(r"C:/Users/manas/PycharmProjects/allProjects/y.pickle","rb")
FileNotFoundError: [Errno 2] No such file or directory:
'C:/Users/manas/PycharmProjects/allProjects/X.pickle'
import pickle
pickle_in =
open(r"C:/Users/manas/PycharmProjects/allProjects/X.pickle","rb")
X = pickle.load(pickle_in)
pickle_in =
open(r"C:/Users/manas/PycharmProjects/allProjects/y.pickle","rb")
Y = pickle.load(pickle_in)
You can upload your files on colab if you're not going to be doing this often. Else it can get quite annoying.
from google.colab import files
files.upload()
Using the above snippet you can upload and use whatever you'd like.
However, if you're pickles are of a larger size, I'd advice you to just upload them on your Drive. Accessing them from your Drive is far easier and less troublesome. To access files on your Drive, all you have to do is mount it in colab's file directory.
from google.colab import drive
drive.mount("/content/drive")
This will generate a link, click on it and sign in using Google OAuth, paste the key in the colab cell and you're connected.
Check out the list of available files in the side panel on the left side and copy the path of the file you want to access. Read it as you would, with any other file.