Why Google App Script UrlFetchApp when downloads a zip file changes its binary content? - google-apps-script

I want to download a zip file in Google Drive via Google Apps Script.
After downloading a sample zip file with the code below and saving it into the folder in google drive.
const exampleUrl = "https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-zip-file.zip";
var response = UrlFetchApp.fetch(exampleUrl);
var parentFolder = DriveApp.getFolderById('1aba-tnQZxZMN7DN52eAywTU-Xs-eqOf4');
parentFolder.createFile('sample_CT.zip', response.getContentText()); // doesn't work
parentFolder.createFile('sample_C.zip', response.getContent()); // doesn't work
parentFolder.createFile('sample_B.zip', response.getBlob()); // doesn't work
parentFolder.createFile('sample.zip', response); // doesn't work
After downloading it on my machine I try to unpack with unzip utility but all of the above versions give me the following:
> unzip sample_CT.zip
Archive: sample_CT.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of sample_CT.zip or
sample_CT.zip.zip, and cannot find sample_CT.zip.ZIP, period.
In the picture I am comparing broken zip file (above) and the correct one (below):
broken:
PKuÔøΩÔøΩP
sample.txtUT
ÔøΩbÔøΩ^ÔøΩbÔøΩ^ÔøΩbÔøΩ^uxÔøΩÔøΩEÔøΩ1RÔøΩ0ÔøΩÔøΩÔøΩQÔøΩ0ÔøΩUz. ,
ÔøΩÔøΩXKÔøΩ!ÔøΩÔøΩ2ÔøΩÔøΩV#ÔøΩ6ÔøΩ:
ÔøΩÔøΩMÔøΩ
��#ux�h�ttPkHTѺ�H�b+�:N�>m�����h�`{�c�0�A��(yh���&���{�U~�Y�~�����HA�����k8w�p���6�Ik��k��?k"?OJx��(n벼g�_�tPK[�c�PKu��P[�c�
ÔøΩÔøΩsample.txtUT
ÔøΩbÔøΩ^ÔøΩbÔøΩ^ÔøΩbÔøΩ^uxÔøΩÔøΩPKX
correct:
PKu“¥P
sample.txtUT
Çb±^Çb±^Çb±^uxèèE1RÅ0ûœâQÑ0¹Uz. ,
ÎàXKþ!·ÿ2ð‡V#í®6œ:
£èMà
ï´#ux­hð®¸ttPkHTѺòH²b+ª:Nª>mô”Éä’h˜`{úcÌ0ÅAõš(yh®©»&ÊôÏ{ýU~°YÊ~“¾ËòöHA„Äü×÷k8wÏpùö¹6ÕIk»ðk¤ü?k"?OJxºØ(në²¼gª_ötPK[°c¶PKu“¥P[°c¶
´sample.txtUT
Çb±^Çb±^Çb±^uxèèPKX
The image in my text editor
As you can see in the picture (file snippets above) some symbols differ. I have no idea why UrlFetch changes certain bytes when it downloads a zip file.
Also on top it a file after UrlFetch takes more space.

It's because the script is converting it to string. Folder.createFile() accepts a blob, but it should be it's only argument. If it's passed as a second argument, other method signatures like Folder.createFile(name:string, content:string) takes precedence and Blob is converted to String to match the method signature.
parentFolder.createFile(response.getBlob().setName('TheMaster.zip'))

Related

"could not decompress gzip" error with un-gzipping Blob in Apps script

In an apps script code, I have to download a gzipped compressed file and extract files out of it to store on Google Drive.
Following is the extract of the code that is relevant to my question:
var resp = UrlFetchApp.fetch(file_download_url);
var blob = resp.getBlob();
blob = blob.setContentType("application/x-gzip");
var blobList = Utilities.ungzip(blob);
console.log("total g-zipped files: " + blobList.length);
Initially when I run the code without using setContentType( ), it complained as "invalid argument" error. Then, I found this useful post 'Invalid Argument' error with un-gzipping Blob in Apps script
and used the proper content type for gzip files (not yet found anywhere on Google Reference pages).
But now it shows the error as:
could not decompress gzip.
Please help.
Note: If I simply download that compressed file to my PC and use 7-zip program to extract files as .gz, it runs fine and extracts those files.
Further Addition as required for complete understanding:
Following are the first 30 bytes of file as extracted by hex editor - frhed:
[20,0a,50,4b,03,04,14,00,00,00,08,00,1a,44,e6,52,5b,79,b3,61,8c,4a,11,00,1b,39,12,00,13,00]
7-zip snapshot that works fine with .gz
7-zip snapshot that does not work with .zip, neither the Windows built-in decomporessor nor the Utilities.unzip
As I suspected, this is not a gzip file, which is why ungzip cannot decompress it.
It is a zip file.
There are two extraneous bytes at the start (a space and a line-feed), but after that the 50 4b 03 04 is the signature of a local header of a zip file. From the remaining provided bytes I see that it requires PKZip 2.0 or later to decompress, that the first entry uses the deflate method for decompression, the modification date and time is 2021 July 6 08:32:52, the compressed size is 1,133,196 bytes, the uncompressed size is 1,194,267 bytes, and there are 19 bytes in the file name.
7-zip can, of course, decompress and extract zip files. zip files, unlike gzip files, can have multiple entries.

Is it possible to use the Google Drive API to get file from within a shared .zip file

Assume the following .zip file:
unzip -l myarchive.zip
Archive: myarchive.zip
Length Date Time Name
--------- ---------- ----- ----
3663 1980-00-00 00:00 sub_dir1/file1.txt
4573 1980-00-00 00:00 sub_dir1/file2.txt
6021 1980-00-00 00:00 sub_dir2/file1.txt
6627 1980-00-00 00:00 file1.txt
The following command extracts the file sub_dir1/file1.txt from the .zip file when it is in the file system.
unzip -p myarchive.zip sub_dir1/file1.txt > file1.txt
But if the .zip file is in Google Drive with a shared link (e.g. the fileId is: 1234567...v4rzj),
Is it possible to make a Google Drive API query to get a specific file (e.g. sub_dir1/file1.txt) from within a .zip file?
I am attempting to do a similar action. Take a look at my question here.
How to read file names of items in a Zipped Folder? Google App Script
This portion of the code can unzip the file on Google Drive and place it in any location you need. However it will run through the entire zip folder.
/// "var zfi" define a zip file iterator ///
while (zfi.hasNext()){ // loops through ZIP file iterator
var file = zfi.next(); // every loop sets active file to next
Logger.log("Zip Folder: %s", file.getName());
var fileBlob = file.getBlob(); // get file blob
fileBlob.setContentType("application/zip");
var unZippedfile = Utilities.unzip(fileBlob); // unzipped file iterator
//// loops all blob elements ////
for (i=0; i<unZippedfile.length; i++) {
var uzf = temp.createFile(unZippedfile[i]);
Google drive is simply a file storage system it in and of itself it does not have the ability to unzip files in this manner or to check the contents of a file. The google drive api just gives you the ability to Create, update ,delete upload and download the files.
Other options.
as your unzip command works on a file stored locally on your machine. You will need to download the file from Google drive first and then run your unzip.
As you have not mentioned which programming language you are intending to use i recommend checking the documentation for examples.
This is an example using Java, you will need the authorization code as well.
String fileId = "0BwwA4oUTeiV1UVNwOHItT0xfa2M";
OutputStream outputStream = new ByteArrayOutputStream();
driveService.files().get(fileId)
.executeMediaAndDownloadTo(outputStream);

Published https://docs.google.com/spreadsheets redirects to other URL (CSV data)

We auto-publish a Google Docs Spreadsheet (one tab as CSV). Google docs is providing a fixed URL that refers to the CSV. We import this CSV in another tool for product data import.
Suddenly this URL is redirected by Google Spreadsheet. If we go again in "File/Publish To The Internet" we can the same URL for that CSV.
Question: How can get the URL without redirection again?
Error: Source file
https://docs.google.com/spreadsheets/d/e/2PACX-1vTQsBEmvOwFwxORMqYg2N6LzzYqdqsdDCjxqsdqsdH72gdMCP4xrs1lsN37RO4h1-rjJsQ/pub?gid=501162839&single=true&output=csv doesn't exist (HTTPS : File not found ! (HTTP/1.0 307 Temporary Redirect)). Please check the source file path.
In short, the collection process needs to follow the Location header. Depending how you're getting the CSV this might be simple or a pain. I collect CSVs using curl so just adding the -L switch is sufficient to make sure the incoming files are the CSV we're looking for instead of the HTML that we were getting without -L. Without knowing what utility or process you're using to download the CSV I can't be more specific, unfortunately.

Finding a .txt file on Google Drive by name and getting its contents

I want to find text files by their name and then get their contents and make them into a var.
I have tried to find the file by its name, but it doesn't seem to work. I'm clueless as to how to find the file contents though.
My code to find the file:
function testThing() {
var findquestions = DriveApp.getFilesByName('tempquestions.txt')
Logger.log(findquestions)
}
I want it to log what it found, but the output is nothing but: "FileIterator". I don't know what that means.
As you can see in the documentation, .getFilesByName return fileiterator. What's a file iterator? The documentation states
An iterator that allows scripts to iterate over a potentially large collection of files
There may be large amount of files with the same name. This iterator provides access to all those files.
What methods provide access to file from fileIterator? This method does.
How to get contents of such file? Get blob from file and getDataAsString from blob
Logger.log(DriveApp
.getFilesByName('tempquestions.txt') //fileIterator
.next() //file(first file with that name)
.getBlob() //blob
.getDataAsString() //string
)

Is there a size limit to a blob for Utilities.unzip(blob) in Google Apps Script?

I have a google script which accesses a website, finds a .zip file, unzips it, and extracts the relevant data from the relevant files.
I want to do the same thing, but for a different, larger .zip file on the same site.
I've accessed the .zip (using almost identical code), but it throws an error: "Could not unzip."
My code:
var dir = UrlFetchApp.fetch(url);
var b = dir.getBlob();
var files = Utilities.unzip(b);
The only difference between the two files are these:
File A ends with "Update%201.8.5.zip" and contains 9 files (5.46MB, 5.15MB zipped)
File B ends with "260_185.zip" and contains 407 files (384MB, 280MB zipped)
This makes me think that there is a limit (size or number of files) to the Utilities.unzip() method. Can anyone confirm this, or is there something with the format of the filenames that is messing things up?
Quotas for Google Services include the restriction
URL Fetch data received: 100MB / day
so it seems your URL request could not actually produce a valid zip file.
Besides the URL Fetch limits, Utilities.unzip(blob) has its own undisclosed limits.
I just tried to unzip a >200MB file that I get by using DriveApp.getFileById(id) and it returned
El archivo contratacionesabiertas_bulk.csv.zip supera el tamaño de archivo máximo permitido.
(Translation by Google Translate)
The contratacionesabiertas_bulk.csv.zip file exceeds the maximum file size allowed.