How should one zip a large folder in Windows 10, upload it to GDrive, then unzip it? - google-drive-api

I have a directory consisting of 22 sub-directories. Altogether, the directory is about 750GB in size and I need this data on GDrive so that I can work with it in Google Colab. Obviously uploading this takes an absolute age (particularly with my slow connection) so I would like to zip it, upload it, then unzip it in the cloud.
I am using 7zip and zipping each subdirectory using the zip format and "normal" compression level. (EDIT: Can now confirm that I get the same error for 7z and tar format). Each subdirectory ends up between 14 and 20GB in size. I then upload this and attempt to unzip it in Google Colab using the following code:
drive.mount('/content/gdrive/')
!apt-get install p7zip-full
!7za x "/content/gdrive/My Drive/av_tfrecords/drumming_7zip.zip" -o"/content/gdrive/My Drive/unzipped_av_tfrecords/" -aos
This extracts some portion of the zip file before throwing an error. There are a variety of errors and sometimes the code will not even begin unzipping the file before throwing an error. This is the most common error:
Can not open the file as archive
ERROR: Unknown error -2147024891
Archives with Errors: 1
If I then attempt to rerun the !7za command, it may extract one or 2 files more from the zip file before throwing this error:
terminate called after throwing an instance of 'CInBufferException'
It may also complain about particular files within the zip archive:
ERROR: Headers Error : drumming/yt-g0fi0iLRJCE_23.tfrecords
I have also tried using:
!unzip -n "/content/gdrive/My Drive/av_tfrecords/drumming_7zip.zip" -d "/content/gdrive/My Drive/unzipped_av_tfrecords/"
But that just begins throwing errors:
file #254: bad zipfile offset (lseek): 8137146368
file #255: bad zipfile offset (lseek): 8168710144
file #256: bad zipfile offset (lseek): 8207515648
Although I would prefer a solution in Colab, I have also tried using an app available in GDrive named "Zip Extractor". But that too throws an error and has a dataquota.
This has now happened across 4 zip files and each time I try something new, it takes an a long time to try it out because of the upload speeds. Any explanations for why this is happening and how I can resolve the issue would be greatly appreciated. Also I understand there are probably alternatives to what I am trying to do and they would be appreciated also, even if they do not directly answer the question. Thank you!

I got same problem
Solve it by
new ProcessBuilder(new String[] {"7z", "x", fPath, "-o" + dir)
Use command line array, not just full line!
Luck!
Why does this command behave differently depending on whether it's called from terminal.app or a scala program?

Related

Forge DWG Pack and Go Translation

I'm having some issues when translating the IDW to a DWG. I'm using the created configuration file from Inventor which works fine when translating one file and is saved as *.dwg which becomes available for download. The problem occurs when changing the transmittal value to "yes" in the config file to handle multiple sheets and create a zip.
[12/02/2021 17:12:20] Start upload phase.
[12/02/2021 17:12:20] Error: Non-optional output [exportDWG.zip] is missing.
[12/02/2021 17:12:20] Error: An unexpected error happened during phase Publishing of job.
[12/02/2021 17:12:20] Job finished with result FailedMissingOutput
This fails during the upload phase as per the log info above and returns an error due to the missing zip file even after updating the code to reference the zip file. The zip file is created when debugging locally with the AppBundle.
I would like to have the created zip file available for download instead of the single dwg.

Managing a large SPSS (*.sav) file (4.2 GB)

I have received an SPSS file from survey fielded by another company that allegedly only contains ~1500 respondents, but the file size somehow has ballooned 4.2GB. My hunch is that the reason for this is that the file was from a global survey and the 1500 records that have been selected are from the US only so there are a series of blank variables, metadata for those variables that are included in this file and may also be in multiple languages/alphabets.
I only need a subset of this data, and can likely work with it if I removed the metadata but my issue has been that I can't get the damn thing open to cut down on the number of variables. I have been using the tools at my disposal to try the following workarounds, though I'm sure there are better options:
Opening the file using PSPP (freeware SPSS) - this causes the PSPP to stop responding
Using the R command read.spss (from the foreign package) to write a .csv - this claims that the file has a duplicate variable name and won't proceed further
Using the R command spss.system.file to write a .csv - when I tried this, R has spend a lot of time thinking as it as it attempts to run this and has been running for a couple hours with no apparent success.
Using the PSPP text conversion tool (https://pspp.benpfaff.org/) to create either a dictionary or a .csv file - both of these options crash after the file has completed uploading.
I've gone back to the other company to try have them work on reducing the file size, however I wasn't sure if anyone else had any ideas to do either of the following:
Open the file using another program/converter that could turn it into a .csv or other similarly skinny file format
Use another program to at least read only the variable names included in the file so that I can provide the other company with the specific variables I need
The following command from PSPP should do what you need:
$ pspp-convert originalFile.sav output.csv
In case it doesn't, please provide terminal error message.

LOAD CSV command keeps using old file: location, ignores command input

I am using Community edition 3.0.5 on Windows 10 . I made multiple efforts to execute a LOAD CSV command before being told that such files cannot reside on an external drive. When I moved the file to users/user/ and tried to execute the LOAD CSV command I got the same message "Couldn't load the external resource at: file:/F:/Neo4j%20DBs/Data.gov%20Consumer%20Complaints/Consumer%20Complaints%20DB/import/Users/CharlieOh/Consumer_Complaints.csv" in spite of the fact the command I entered was
"LOAD CSV WITH HEADERS FROM
'file:///Users/CharlieOh/Consumer_Complaints.csv' AS line
WITH line
LIMIT 1
RETURN line"
I tried to locate the file neo4j.conf and could only find C:\Program Files (x86)\Neo4j Community 3.2.2\Neo4j Community.install4j\i4jparams.conf . I even deleted the old DB and recreated the small amount of data and got the same error, which seems to indicate that the LOAD CSV function is totally useless across all my neo4j databases. BTW the %20 in the file specification was due to suggestions on Stack Overflow as well as using underscores to avoid any use of blank spaces in the file specification. None of it worked and now that I believe that I may have solved the problem by putting the csv file in the user directory, the LOAD CSV function won't let me do it. One last thing, I am following the YouTube video https://www.youtube.com/watch?v=Eh_79goBRUk to learn how to load a csv file into neo4j.
The csv file needs to go in the import directory of the specific database. With Neo4j Desktop this is easy to identify by clicking on the Manage button of the database and then the open folder button. It looks like you've found it.
Once the database import directory is located, you specify it in the LOAD CSV with the statement LOAD CSV WITH HEADERS FROM 'file:///" + FN + "'where FN is your file name, including the csv extension. You do NOT use the full path; that is assumed.

SevenZipArchiveException: Invalid archive. open/read error

I got the following error when I try to extract a zip file:
"SevenZip.SevenZipArchiveException: Invalid archive: open/read error! Is it encrypted and a wrong password was provided?
If your archive is an exotic one, it is possible that SevenZipSharp has no signature for its format and thus decided it is TAR by mistake."
Nothing works with zip files, but everything works fine with 7z files. Is it possible to extract zip files with the SevenZipExtractor?
string sourcePath = #"c:/temp/yyy.zip";
using (var file = new SevenZipExtractor(sourcePath))
{
file.ExtractArchive(outputPath);
}
What I found with this error when I encountered it was that it was an issue when I would attempt to decompress a certain set of files. For example, if you were to run the SevenZipCompressor and say it stopped halfway through, this would corrupt the compression of said files, so when you would attempt to decompress the files, the error would occur.
The fix for me was to recompress the set of files and to be sure it ran completely, and then the error went away, allowing the extraction to work.
So the moral of the issue at hand is to look at the source in this case and make sure the files or the archive aren't corrupt.
I've run into the same issue recently with version 18.5.0.
Downgrading the package to 9.38.3 solved the problem for me.
For people still running into this problem: this can also happen when trying to uncompress rar5 files that have filename encrypted turned on.

Copy/Move files in PDI / Spoon yields 'is not a file' error

I am trying to automate weekly generation of a database. As a first step in this process, I need to obtain a set of files from network location M:\. The process is as follows:
Delete any possibly remaining old source files from my local folder (REMOVE_OLD_FILES).
Obtain the names of the required files using regular expressions (GET_FILES).
Copy the files from the network location to my local folder for further processing (COPY/MOVE FILES)
Step 3 is where I run into trouble, I frequently receive the below error:
Error processing files. Exception : org.apache.commons.vfs.FileNotFoundException: Could not read from "file:///M:/FILESOURCE/FILENAME.zip" because it is a not a file.
However, when I manually locatae the 'erroneous' file on the network location and try to open or copy it, there are no problems. If I then re-run the Spoon job, no errors occur for this file (although the next file might lead to an error).
So far, I have verified that steps 1 and 2 run correctly: more specifically, there are no errors in the file names returned from step 2.
Obviously, I would prefer not having to manually open all the files first to ensure that Spoon can correctly copy them. Does anyone have an idea what might be causing this behaviour?
For completeness, below are the parameters selected in the COPY/MOVE FILES step.
I was facing same issue with different clients and finally i tried with some basic approach and it got resolved. It might help in your case as well.
Also, other users can follow this rule.
Just try this: Create all required folder with Spoon Job "Create a Folder" and inactive/delete those hops from your job or transformation once your folders are created.
This is because, user you are using to delete the file/s is not recognized as Windows User. Once your folder is in place you can remove "Create a Folder" steps from your Job.
The path to the file is wrong. If you are running spoon in a Windows environment you should use the Windows format for filepaths. Try changing from
"file:///M:/FILESOURCE/FILENAME.zip"
To
"M:\FILESOURCE\FILENAME.zip"
By the way, it will only work if M: is an actual drive in the machine. If you want to access a file in the network you should use the network path to the shared folder, this way:
"\\MachineName\M$\FILESOURCE\FILENAME.zip"
or
"\\MachineName\FILESOURCE\FILENAME.zip"
If you try to access a file in a network mounted drive it won't work.