Pickle reports FileNotFoundError for file that definitely exists in Goolge Colab - google-drive-api

I am attempting to pickle.load three files that definitely exist within my Google Drive yet I am not able to load them because my code is not able to find them.
Here is the error message:
FileNotFoundError: [Errno 2] Failed to open local file '/root/.cache/huggingface/datasets/good_reads_practice_dataset/main_domain/1.1.0/5f8cad709a7746be18b722642fc8ade5c1cedfa3b259440a401f7f4701079561/cache-af18ded1f5c5aaac.arrow'. Detail: [errno 2] No such file or directory
This is my code that is not working:
with open(r"/content/drive/MyDrive/Thesis/Datasets/book_preprocessing/PreTokenized/ALBERT_NER_512/train_dataset.pkl", "rb") as input_file:
train_dataset = pickle.load(input_file)
with open(r"/content/drive/MyDrive/Thesis/Datasets/book_preprocessing/PreTokenized/ALBERT_NER_512/val_dataset.pkl", "rb") as input_file:
val_dataset = pickle.load(input_file)
with open(r"/content/drive/MyDrive/Thesis/Datasets/book_preprocessing/PreTokenized/ALBERT_NER_512/test_dataset.pkl", "rb") as input_file:
test_dataset = pickle.load(input_file)
And here is a screenshot of my google drive directory tree:
I have checked the names of the files many times so unless I am going crazy the files as I have pointed to them definitely exist. Moreover I am sure that my code is correct as I have loaded pickle objects in all the other folders in the 'PreTokenized' folder several times without issue which makes this bug even more mysterious. I also have no clue as to why it is looking for the pickled objects in the cache of the google colab environment. If anyone has an idea as to what is happening and how I can solve it, I would greatly appreciate any help.

Still don't know why this is happening exactly but I did realize that my files were not actually being uploaded since they were too big. (I did not get an error or warning of this but the pickled files were only a few Kbs so I knew something was up.
What I did to get around this was to segment the files in halves and then upon downloading, I would concatenate them back together.

Related

How should one zip a large folder in Windows 10, upload it to GDrive, then unzip it?

I have a directory consisting of 22 sub-directories. Altogether, the directory is about 750GB in size and I need this data on GDrive so that I can work with it in Google Colab. Obviously uploading this takes an absolute age (particularly with my slow connection) so I would like to zip it, upload it, then unzip it in the cloud.
I am using 7zip and zipping each subdirectory using the zip format and "normal" compression level. (EDIT: Can now confirm that I get the same error for 7z and tar format). Each subdirectory ends up between 14 and 20GB in size. I then upload this and attempt to unzip it in Google Colab using the following code:
drive.mount('/content/gdrive/')
!apt-get install p7zip-full
!7za x "/content/gdrive/My Drive/av_tfrecords/drumming_7zip.zip" -o"/content/gdrive/My Drive/unzipped_av_tfrecords/" -aos
This extracts some portion of the zip file before throwing an error. There are a variety of errors and sometimes the code will not even begin unzipping the file before throwing an error. This is the most common error:
Can not open the file as archive
ERROR: Unknown error -2147024891
Archives with Errors: 1
If I then attempt to rerun the !7za command, it may extract one or 2 files more from the zip file before throwing this error:
terminate called after throwing an instance of 'CInBufferException'
It may also complain about particular files within the zip archive:
ERROR: Headers Error : drumming/yt-g0fi0iLRJCE_23.tfrecords
I have also tried using:
!unzip -n "/content/gdrive/My Drive/av_tfrecords/drumming_7zip.zip" -d "/content/gdrive/My Drive/unzipped_av_tfrecords/"
But that just begins throwing errors:
file #254: bad zipfile offset (lseek): 8137146368
file #255: bad zipfile offset (lseek): 8168710144
file #256: bad zipfile offset (lseek): 8207515648
Although I would prefer a solution in Colab, I have also tried using an app available in GDrive named "Zip Extractor". But that too throws an error and has a dataquota.
This has now happened across 4 zip files and each time I try something new, it takes an a long time to try it out because of the upload speeds. Any explanations for why this is happening and how I can resolve the issue would be greatly appreciated. Also I understand there are probably alternatives to what I am trying to do and they would be appreciated also, even if they do not directly answer the question. Thank you!
I got same problem
Solve it by
new ProcessBuilder(new String[] {"7z", "x", fPath, "-o" + dir)
Use command line array, not just full line!
Luck!
Why does this command behave differently depending on whether it's called from terminal.app or a scala program?

SevenZipArchiveException: Invalid archive. open/read error

I got the following error when I try to extract a zip file:
"SevenZip.SevenZipArchiveException: Invalid archive: open/read error! Is it encrypted and a wrong password was provided?
If your archive is an exotic one, it is possible that SevenZipSharp has no signature for its format and thus decided it is TAR by mistake."
Nothing works with zip files, but everything works fine with 7z files. Is it possible to extract zip files with the SevenZipExtractor?
string sourcePath = #"c:/temp/yyy.zip";
using (var file = new SevenZipExtractor(sourcePath))
{
file.ExtractArchive(outputPath);
}
What I found with this error when I encountered it was that it was an issue when I would attempt to decompress a certain set of files. For example, if you were to run the SevenZipCompressor and say it stopped halfway through, this would corrupt the compression of said files, so when you would attempt to decompress the files, the error would occur.
The fix for me was to recompress the set of files and to be sure it ran completely, and then the error went away, allowing the extraction to work.
So the moral of the issue at hand is to look at the source in this case and make sure the files or the archive aren't corrupt.
I've run into the same issue recently with version 18.5.0.
Downgrading the package to 9.38.3 solved the problem for me.
For people still running into this problem: this can also happen when trying to uncompress rar5 files that have filename encrypted turned on.

read/write an online .wav file in matlab?

I am currently working on a signal processing lab for school that requires me to download and analyze a .wav file. I was wondering if there was a way to wavread() or wavwrite() a URL so I don't have to re-download the audio file every-time I move to a new computer or send the code to the members of my group?
All the files can be found here.
And this is the url for one of the .wav files:
http://www.soe.uoguelph.ca/webfiles/sgregori/Audio/speech.wav
I have tried urlread() and urlwrite() but to be honest I don't quite understand what to do with the html coding. I have also tried:
[x,fs]=wavread('http://www.soe.uoguelph.ca/webfiles/sgregori/Audio/speech.wav');
but ended up with the error:
Error using wavread (line 67)
Invalid Wave File. Reason: Cannot open file.
I am also using the student version of Matlab so maybe that is the issue?
Any help would be greatly appreciated!
Thank you.
This should work:
urlwrite('http://www.soe.uoguelph.ca/webfiles/sgregori/Audio/speech.wav','s1.wav');
This saves a file s1.wav to the directory you work on. Then line
[x,fs]=wavread('s1');
should work fine

insecure string pickle error when uploading and downloading to MKS Integrity

I am getting the exception "ValueError: insecure string pickle" when attempting to run my program after creating a sandbox from MKS.
Hopefully you are still interested in helping if you are still reading this, so here's the full story.
I created an application in Python that analyzes data. When saving specific data from my program, I pickle the file. I correctly read and write it in binary and everything is working correctly on my computer.
I then used py2exe to wrap everything into an .exe. However, in order to get the pickled files to continue to work, I have to physically copy them into the the folder that py2exe. So my pickle is inside of the .exe folder and everything is working correctly when I run the .exe.
Next, I upload everything to MKS (an ALM, here is the Wikipedia page http://en.wikipedia.org/wiki/MKS_Integrity).
When I proceed to create a sandbox of my files and run the program, I get the dreaded "insecure string pickle" error. In other words, I am wondering if MKS screwed something up or added an end of line character to my pickle files. When I compare the contents of the MKS pickle file and the one I created before I uploaded the program to MKS, there are no differences.
I hope this is enough detail to describe my problem.
Please help!
Thanks
Have you tried adding your pickled files to your Integrity sandbox as binaries and not text?
When adding the file, on the Create Archive interface, select the options button, and change data type to "Binary" from "Auto". This will maintain any non-text formatting within the file.

get data from .csv file, analyze, produce output - python3

I am trying to complete an assignment in Python3. It is very similar to the pdf found here
I have a few questions on both the execution of how to get the information I need, and if possible, some code that could move me along. I am new to python. As right now from the code I have, I keep getting the error "directory not found" after running a function to try and read the data. I know the .csv file should be in the directory where I save it to in WingIDE, but I can't get it to work correctly.
My first question is after getting each line of the .csv file to read from my get_file_list, what is the best way to take each category and throw it into an efficiency equation?
Here is my get_data_list function:
def get_data_list(filename):
data_file = open(filename, "r")
data_list = [ ]
for line_str in data_file:
data_list.append(line_str.strip().split(','))
return data_list
when I run get_data_list("player_regular_season.csv") I get the following error:
builtins.IOError: [Errno 2] No such file or directory:'player_regular_season.csv'
For the first try, you can put the data file to the same directory with the Python program and launch it from the directory.
Try also a single purpose script to learn how to work with directories. Learn the functions from the standard doc 15.1.5. Files and Directories, namely os.getcwd(), os.chdir(path), and then 10.1. os.path — Common pathname manipulations, namely os.path.isfile(path).
But read also the doc of other functions in the documents to learn what is available.
When knowing how to work with filenames and paths, have a look at the 13.1. csv — CSV File Reading and Writing. Not to be scared of all the stuff, start from the end -- 13.1.5. Examples of using the csv module.