Faster digest(SHA256) calculation with Google App Script - google-apps-script

I'm looking forward to have a faster SHA256 hash calculation.
My current implementation looks like this where url points to the uploaded file on telegram server
var response = UrlFetchApp.fetch(url);
var fileText = response.getContent();
var bytes = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_256, fileText);
var hexstr = bytes.map(byte => ('0' + (byte & 0xFF).toString(16)).slice(-2)).join('');
return hexstr;
which takes 4-5 seconds for a 2MB file.
For the performance comparison, I took #filehashing_bot telegram bot, and it calculates hashes way faster than my implementation.
Kindly suggest a better and faster solution. How can I improve my implementation? What could be the possible ways?
I was reading about crypto.subtle.digest() and implemented something for local files and it's working way faster
async function myFunction(){
const finput = document.getElementById('fileinput');
const file = finput.files[0];
const arrayBuffer = await file.arrayBuffer();
const hashBuffer = await crypto.subtle.digest('SHA-256', arrayBuffer); // hash the message
console.log(hashBuffer);
const hashArray = Array.from(new Uint8Array(hashBuffer)); // convert buffer to byte array
const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
document.getElementById("hash").innerHTML = hashHex;
}
Can it be used in app script?

Related

Convert a gdoc into image

is there a way to create an image (e.g. a png) from a google document?
I really mean an image, not just a pdf. GetAS only creates pdf, but returns an error if contentType is set to image/png or other equivalent formats.
My (actually trivial) code is
function convertFile() {
var SOURCE_TEMPLATE = "1HvqYidpUpihzo_HDAQ3zE5ScMVsHG9NNlwPkN80GHK0";
var TARGET_FOLDER = "1Eue-3tJpE8sBML0qo6Z25G0D_uuXZjHZ";
var source = DriveApp.getFileById(SOURCE_TEMPLATE);
var targetFolder = DriveApp.getFolderById(TARGET_FOLDER);
var target = source.makeCopy(source,targetFolder);
var newFile = DriveApp.createFile(target.getAs('image/png'));
}
When I run this code, I get the following error (my translation):
The conversion from application/vnd.google-apps.document to image/png is not supported.
Ty
How about this answer?
Reason of error:
makeCopy() returns File object. getAs() cannot be used for this. By this, the error occurs.
Workaround:
Unfortunately, in the current stage, Google Document cannot be directly exported as PNG images. So it is required to think of workarounds. Google Document can be converted to PDF. This answer uses this. As a workaround, I would like to propose to use an external API which is ConvertAPI. I thought that using the external API, the script becomes simple. This a method (PDF to PNG API) of API can be converted from PDF data to PNG data.
When you try this, for example, you can also test this using "Free Package". When you try using "Free Package", please Sign Up at "Free Package" and retrieve your Secret key.
Sample script:
Before you run this script, please retrieve your Secret key and set it.
var secretkey = "###"; // Please set your secret key.
var SOURCE_TEMPLATE = "1HvqYidpUpihzo_HDAQ3zE5ScMVsHG9NNlwPkN80GHK0";
var TARGET_FOLDER = "1Eue-3tJpE8sBML0qo6Z25G0D_uuXZjHZ";
var url = "https://v2.convertapi.com/convert/pdf/to/png?Secret=" + secretkey;
var options = {
method: "post",
payload: {File: DriveApp.getFileById(SOURCE_TEMPLATE).getBlob()},
}
var res = UrlFetchApp.fetch(url, options);
res = JSON.parse(res.getContentText());
res.Files.forEach(function(e) {
var blob = Utilities.newBlob(Utilities.base64Decode(e.FileData), "image/png", e.FileName);
DriveApp.getFolderById(TARGET_FOLDER).createFile(blob);
});
References:
makeCopy()
getAs()
ConvertAPI
PDF to PNG API of ConvertAPI
Updated on January 11, 2023:
In the current stage, Google Apps Script can use V8 runtime. By this, there are some Javascript libraries that can be used with Google Apps Script. Ref1, Ref2 In this question, in the current stage, by using pdf-lib, all pages in a PDF file can be converted to PNG images using Google Apps Script. The sample script is as follows.
Sample script:
This method uses Drive API. Please enable Drive API at Advanced Google services.
Please set SOURCE_TEMPLATE and TARGET_FOLDER, and run main().
/**
* This is a method for converting all pages in a PDF file to PNG images.
* PNG images are returned as BlobSource[].
* IMPORTANT: This method uses Drive API. Please enable Drive API at Advanced Google services.
*
* #param {Blob} blob Blob of PDF file.
* #return {BlobSource[]} PNG blobs.
*/
async function convertPDFToPNG_(blob) {
// Convert PDF to PNG images.
const cdnjs = "https://cdn.jsdelivr.net/npm/pdf-lib/dist/pdf-lib.min.js";
eval(UrlFetchApp.fetch(cdnjs).getContentText()); // Load pdf-lib
const setTimeout = function (f, t) { // Overwrite setTimeout with Google Apps Script.
Utilities.sleep(t);
return f();
}
const data = new Uint8Array(blob.getBytes());
const pdfData = await PDFLib.PDFDocument.load(data);
const pageLength = pdfData.getPageCount();
console.log(`Total pages: ${pageLength}`);
const obj = { imageBlobs: [], fileIds: [] };
for (let i = 0; i < pageLength; i++) {
console.log(`Processing page: ${i + 1}`);
const pdfDoc = await PDFLib.PDFDocument.create();
const [page] = await pdfDoc.copyPages(pdfData, [i]);
pdfDoc.addPage(page);
const bytes = await pdfDoc.save();
const blob = Utilities.newBlob([...new Int8Array(bytes)], MimeType.PDF, `sample${i + 1}.pdf`);
const id = DriveApp.createFile(blob).getId();
Utilities.sleep(3000); // This is used for preparing the thumbnail of the created file.
const link = Drive.Files.get(id, { fields: "thumbnailLink" }).thumbnailLink;
if (!link) {
throw new Error("In this case, please increase the value of 3000 in Utilities.sleep(3000), and test it again.");
}
const imageBlob = UrlFetchApp.fetch(link.replace(/\=s\d*/, "=s1000")).getBlob().setName(`page${i + 1}.png`);
obj.imageBlobs.push(imageBlob);
obj.fileIds.push(id);
}
obj.fileIds.forEach(id => DriveApp.getFileById(id).setTrashed(true));
return obj.imageBlobs;
}
// Please run this function.
async function myFunction() {
const SOURCE_TEMPLATE = "1HvqYidpUpihzo_HDAQ3zE5ScMVsHG9NNlwPkN80GHK0";
const TARGET_FOLDER = "1Eue-3tJpE8sBML0qo6Z25G0D_uuXZjHZ";
// Use a method for converting all pages in a PDF file to PNG images.
const blob = DriveApp.getFileById(SOURCE_TEMPLATE).getBlob();
const imageBlobs = await convertPDFToPNG_(blob);
// As a sample, create PNG images as PNG files.
const folder = DriveApp.getFolderById(TARGET_FOLDER);
imageBlobs.forEach(b => folder.createFile(b));
}
When this script is run, all pages of the inputted PDF file are converted to PNG images, and those images are created in the destination folder.
Note:
I think that the above script works. But, in this case, when you directly copy and paste the Javascript retrieved from https://cdn.jsdelivr.net/npm/pdf-lib/dist/pdf-lib.min.js to your Google Apps Script project, the process cost for loading it can be reduced.
References:
pdf-lib
copyPages of pdf-lib
addPage of pdf-lib
I know this is an older question, but I thought I'd answer, since I believe I've found a solution that doesn't involve paying for a third-party subscription.
This can be accomplished by accessing the thumbnail of the Doc and creating a new PNG file from that thumbnail. Try this:
function convertFile() {
var SOURCE_TEMPLATE = "1HvqYidpUpihzo_HDAQ3zE5ScMVsHG9NNlwPkN80GHK0";
var TARGET_FOLDER = "1Eue-3tJpE8sBML0qo6Z25G0D_uuXZjHZ";
var source = DriveApp.getFileById(SOURCE_TEMPLATE).getThumbnail().getAs('image/png');
var targetFolder = DriveApp.getFolderById(TARGET_FOLDER);
TARGET_FOLDER.createFile(source);
}
However, I've found that getting the thumbnail of the Doc is not as high quality as getting the thumbnail of a PDF created from the Doc. You can try the code below to compare which version of the new PNG you prefer.
To do this, you will also need to enable Advanced Services on your project, specifically the Drive API service. To do this, follow these instructions to add a new Service to your Google Apps Script project:
Open the Apps Script project.
At the left, click Editor < >.
At the left, next to Services, click Add a service +.
Select Drive API and click Add.
Once you do that, you'll be able to use the Drive command in your script, which is different than DriveApp. Note also the update to source.makeCopy() to only include the TARGET_FOLDER:
function convertFile() {
var SOURCE_TEMPLATE = "1HvqYidpUpihzo_HDAQ3zE5ScMVsHG9NNlwPkN80GHK0";
var TARGET_FOLDER = "1Eue-3tJpE8sBML0qo6Z25G0D_uuXZjHZ";
var source = DriveApp.getFileById(SOURCE_TEMPLATE);
var targetFolder = DriveApp.getFolderById(TARGET_FOLDER);
var target = source.makeCopy(targetFolder);
var pdfBlob = target.getAs(MimeType.PDF);
var newPDF = TARGET_FOLDER.createFile(pdfBlob).setName('Some Name.pdf');
var newId = newPDF.getId();
Drive.Files.update({
title: newPDF.getName(), mimeType: MimeType.PDF
}, newId, pdfBlob);
var newFile = DriveApp.getFileById(newId).getThumbnail().getAs('image/png');
TARGET_FOLDER.createFile(newFile);
target.setTrashed(true);
newPDF.setTrashed(true);
}
This code will create a copy of your Google Doc file, convert it to a PDF, then grab the thumbnail of the PDF as a PNG, and then delete the copy of the Doc file and the PDF that were created.
The Drive.Files.update() function is the critical part of this code, as it finalizes the creation of the PDF file in your Drive. Trying to run the code without that portion will just return the new PDF file as null since the new PDF hasn't completely finished being created at that point.
Hope this helps!

IMPORTJSON in Google Sheet sometimes not getting data

I have created a sheet to keep my crypto holdings. I use this importJSON function I found on youtube : (I have changed the help text for myself)
/**
* Imports JSON data to your spreadsheet Ex: IMPORTJSON("https://api.coinmarketcap.com/v2/ticker/1/?convert=EUR","data/quotes/EUR/price")
* #param url URL of your JSON data as string
* #param xpath simplified xpath as string
* #customfunction
*/
function IMPORTJSON(url,xpath){
try{
// /rates/EUR
var res = UrlFetchApp.fetch(url);
var content = res.getContentText();
var json = JSON.parse(content);
var patharray = xpath.split("/");
//Logger.log(patharray);
for(var i=0;i<patharray.length;i++){
json = json[patharray[i]];
}
//Logger.log(typeof(json));
if(typeof(json) === "undefined"){
return "Node Not Available";
} else if(typeof(json) === "object"){
var tempArr = [];
for(var obj in json){
tempArr.push([obj,json[obj]]);
}
return tempArr;
} else if(typeof(json) !== "object") {
return json;
}
}
catch(err){
return "Error getting data";
}
}
I use this function to readout an API :
This is a piece of my script :
var btc_eur = IMPORTJSON("https://api.coinmarketcap.com/v2/ticker/1/?convert=EUR","data/quotes/EUR/price");
var btc_btc = IMPORTJSON("https://api.coinmarketcap.com/v2/ticker/1/?convert=BTC","data/quotes/BTC/price");
ss.getRange("B2").setValue([btc_eur]);
ss.getRange("H2").setValue([btc_btc]);
var bhc_eur = IMPORTJSON("https://api.coinmarketcap.com/v2/ticker/1831/?convert=EUR","data/quotes/EUR/price");
var bhc_btc = IMPORTJSON("https://api.coinmarketcap.com/v2/ticker/1831/?convert=BTC","data/quotes/BTC/price");
ss.getRange("B3").setValue([bhc_eur]);
ss.getRange("H3").setValue([bhc_btc]);
The last few days I get "Error getting data" errors. When I start manualy the script it works.
I than tried this code I found here :
ImportJson
function IMPORTJSON(url,xpath){
var res = UrlFetchApp.fetch(url);
var content = res.getContentText();
var json = JSON.parse(content);
var patharray = xpath.split("/");
var res = [];
for (var i in json[patharray[0]]) {
res.push(json[patharray[0]][i][patharray[1]]);
}
return res;
}
But this gives an error about : TypeError: Cannot read property "quotes" from null. What am I doing wrong ?
The big problem is your script call API at least 4 times. When few users do it too, the Google server call API too much times.
The API of Coinmarketcap has limited bandwidth. When any client reach this limit, the API return HTTP error 429. Google Scripts is on shared Google servers, that means lot of users looks as one client for Coinmarketcap API.
When API decline your request, your script fails – the error message corresponds to the assumed error (xpath cant find quotes component in empty varible).
This is ruthless behavior. Please, don't ruin API via mass calls.
You can load data from API at once and re-use it angain for each finding in data.
I have similar Spreadsheet automatically filled from Coinmarketcap API, you can copy it for your:
Coins spreadsheet
Google Script on GitHub.
This my script is strictly ask API only once for whole runtime and reusing one response for all queries.
Change of your script
Also you can make few changes in your Code for saving resources:
Change IMPORTJSON function from this:
function IMPORTJSON(url,xpath){
var res = UrlFetchApp.fetch(url);
var content = res.getContentText();
var json = JSON.parse(content);
...
to this:
function IMPORTJSON(json, xpath) {
...
and rutime section of code you can change like this:
var res = UrlFetchApp.fetch("https://api.coinmarketcap.com/v2/ticker/1/?convert=EUR");
var content = res.getContentText();
var json = JSON.parse(content);
var btc_eur = IMPORTJSON(json,"data/quotes/EUR/price");
var btc_btc = IMPORTJSON(json,"data/quotes/BTC/price");
ss.getRange("B2").setValue([btc_eur]);
ss.getRange("H2").setValue([btc_btc]);
...
Main benefit is: the UrlFetchApp.fetch is called only once.
Yes, I know, this code is not works 1:1 like your. That because that receive prices only for EUR and not for BTC. Naturally fetching comparation between BTC and BTC is unnecessary because it is always 1 and other values you can count matematically from EUR response – please don't abuse an api for such queries.
As Jakub said, the main issue is that all requests are counted as coming from the same Google server.
One solution which I consider easier is to put a proxy server in the middle, this can be done by either purchasing a server and setting it up (which is quite complex) or using a service like Proxycrawl which includes some free requests and after that, unless you run thousands of queries per month, it should cost you less than 1 USD per month.
To do that you just need to edit one line of the script:
var res = UrlFetchApp.fetch(url);
This line, becomes this:
var res = UrlFetchApp.fetch(`https://api.proxycrawl.com/?token=YOUR_TOKEN&url=${encodeURIComponent(url)}`);
Make sure to replace YOUR_TOKEN with your actual service token
Just this simple change will make the requests never fail as each request will be sent from a different IP instead of all coming from Google.

Google Script, Run functions in sequence without exceeding execution time

I have a lot of functions that fetch JSON API data from a website, but if I run them in sequence in this way, I get the exceeding execution time error:
function fetchdata () {
data1();
data2();
data3();
data4();
...
}
I can schedule a trigger to run them at 5 minutes one of the other (cause a single one runs in 3 minutes), but I would like to know if there is any other way around. Thank you
EDIT:
Every "data" function is like this one:
function data1() {
var addresses = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Import");
var baseUrl = 'https://myapiurl';
var address = addresses.getRange(2, 1, 500).getValues();
for(var i=0;i<address.length;i++){
var addrID = address[i][0];
var url = baseUrl.concat(addrID);
var responseAPI = UrlFetchApp.fetch(url);
var json = JSON.parse(responseAPI.getContentText());
var data = [[json.result]];
var dataRange = addresses.getRange(i+2, 2).setValue(data);
}
}
data2 is for rows 502-1001,
data3 is for rows 1002-1501,
and so on...
I just removed the concat because it has performance issues according to MDN but obviously the real problem is the fetch and there's not much we can do about that unless you can get your external api to dump a bigger batch.
You could initiate each function from a webapp and then have it return via withSuccessHandler and then start the next script in the series and daisy chain your way through all of the subfunctions until your done. Each sub function will take it's 3 minutes or so but you only have to worry about keeping each sub function under 6 minutes that way.
function data1()
{
var addresses = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Import");
var baseUrl = 'https://myapiurl';
var address = addresses.getRange(2, 1, 500).getValues();
for(var i=0;i<address.length;i++){
var responseAPI = UrlFetchApp.fetch(baseUrl + address[i][0]);
var json = JSON.parse(responseAPI.getContentText());
var data = [[json.result]];
var dataRange = addresses.getRange(i+2, 2).setValue(data);
}
}

reading/writing large files with PapaParse/BabyParse

I have a large CSV file (~500mb) that I want to convert to JSON using BabyParse (the node version of PapaParse). With smaller files I can read the CSV into a string and then pass the string to parse. However, a 500mb file is to too big to be read into a string in this way.
I have a workaround that reads the CSV file as a stream line-by-line, but it's horrendously slow (see below). Can someone tell me a faster way to work with large CSV files in Papa/Baby parse?
var Baby = require('babyparse');
var fs = require('fs');
var readline = require('readline');
var stream = require('stream');
var file = '500mbbigtest.csv';
//var content = fs.readFileSync(file, { encoding: 'binary' }); DOESN'T WORK
var instream = fs.createReadStream('500mbbigtest.csv');
var outstream = new stream;
var rl = readline.createInterface(instream, outstream);
rl.on('line', function(line) {
parsed = Baby.parse(line, {fastMode: false});
rows = parsed.data;
rows = JSON.stringify(rows);
fs.appendFileSync("blahblahblah.json", rows);
});

What alternative to ScriptDB I could use to store a big array of arrays? (without using external DB)

I was a user of the deprecated ScriptDB. The use I made of ScriptDB was fairly simple: to store a certain amount of information contained on a panel options, this way:
var db = ScriptDb.getMyDb();
function showList(folderID) {
var folder = DocsList.getFolderById(folderID);
var files = folder.getFiles();
var arrayList = [];
for (var file in files) {
file = files[file];
var thesesName = file.getName();
var thesesId = file.getId();
var thesesDoc = DocumentApp.openById(thesesId);
for (var child = 0; child < thesesDoc.getNumChildren(); child++){
var thesesFirstParagraph = thesesDoc.getChild(child);
var thesesType = thesesFirstParagraph.getText();
if (thesesType != ''){
var newArray = [thesesName, thesesType, thesesId];
arrayList.push(newArray);
break;
}
}
}
arrayList.sort();
var result = db.query({arrayName: 'savedArray'});
if (result.hasNext()) {
var savedArray = result.next();
savedArray.arrayValue = arrayList;
db.save(savedArray);
}
else {
var record = db.save({arrayName: "savedArray", arrayValue:arrayList});
}
var mydoc = SpreadsheetApp.getActiveSpreadsheet();
var app = UiApp.createApplication().setWidth(550).setHeight(450);
var panel = app.createVerticalPanel()
.setId('panel');
var label = app.createLabel("Choose the options").setStyleAttribute("fontSize", 18);
app.add(label);
panel.add(app.createHidden('checkbox_total', arrayList.length));
for(var i = 0; i < arrayList.length; i++){
var checkbox = app.createCheckBox().setName('checkbox_isChecked_'+i).setText(arrayList[i][0]);
panel.add(checkbox);
}
var handler = app.createServerHandler('submit').addCallbackElement(panel);
panel.add(app.createButton('Submit', handler));
var scroll = app.createScrollPanel().setPixelSize(500, 400);
scroll.add(panel);
app.add(scroll);
mydoc.show(app);
}
function include(arr, obj) {
for(var i=0; i<arr.length; i++) {
if (arr[i] == obj) // if we find a match, return true
return true; }
return false; // if we got here, there was no match, so return false
}
function submit(e){
var scriptDbObject = db.query({arrayName: "savedArray"});
var result = scriptDbObject.next();
var arrayList = result.arrayValue;
db.remove(result);
// continues...
}
I thought I could simply replace the ScriptDB by userProperties (using JSON to turn the array into string). However, an error warns me that my piece of information is too large to be stored in userProperties.
I did not want to use external databases (parse or MongoDB), because I think it isn't necessary for my (simple) purpose.
So, what solution I could use as a replacement to ScriptDB?
You could store a string using the HtmlOutput Class.
var output = HtmlService.createHtmlOutput('<b>Hello, world!</b>');
output.append('<p>Hello again, world.</p>');
Logger.log(output.getContent());
Google Documentation - HtmlOutput
There are methods to append, clear and get the content out of the HtmlOutput object.
OR
Maybe create a Blob:
Google Documentation - Utilities Class - newBlob Method
Then you can get the data out of the blob as a string.
getDataAsString
Then if you need to you can convert the string to an object if it's in the right JSON format.
Firstly, if you're hitting the limits on the Properties service, I would recommend you look at an alternative external store, as you're manipulating a large amount of data, and any workaround given here is possibly going to be slower and less efficient then simply using a dedicated service.
Alternatively of course, you could look at making your data come under the limits for the properties service by splitting it up and using multiple properties etc.
One other alternative would be to use a Google Doc or Sheet to store the string. When you're required to pull the data again, you can simply access the sheet and get the string, but this might be slow depending on the size of the string. At a glance it looks like you're just pulling Data on the folders in your drive, so you could consider writing it to a sheet, which would allow you to even display the information in a user friendly way. Given your use of arrays already, you can write them to a sheet easily using .setValues() if you convert them to a 2D array.
Bruce McPherson has done a lot of work on abstracting databases. Take a look at his cDbAbstraction library then you could easily chop and change which DB you use and compare performance. Maybe even create a cDbAbstraction library to use HTMLOutput (I like that idea Sandy, Bruce does some funky stuff with parallel processes via HTMLService)