Import and parse a HTML file from Drive to Sheets - html

I have a bunch of HTML files in the google drive, but I need to extract tables from them and put into Gsheets.
So far I saw ImportHTML function but it does not work with the drive link.
How can I import and parse HTML files from my Drive? Thank you

You want to put the values of the table from HTML data using Google Apps Script and/or the built-in functions of Spreadsheet.
The HTML files are put in your Google Drive.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Pattern 1:
In this pattern, IMPORTXML is used for the tables deployed with Web Apps.
Usage:
1. copy and paste the following script to the script editor.
function doGet(e) {
var fileId = e.parameter.id;
var html = DriveApp.getFileById(fileId).getBlob().getDataAsString();
var html = "<sample>" + html.match(/<table[\w\s\S]+?<\/table>/gi).join("") + "</sample>";
return ContentService.createTextOutput(html).setMimeType(ContentService.MimeType.XML);
}
2. Deploy Web Apps.
On the script editor, Open a dialog box by "Publish" -> "Deploy as web app".
Select "Me" for "Execute the app as:".
Select "Anyone, even anonymous" for "Who has access to the app:".
Click "Deploy" button as new "Project version".
Automatically open a dialog box of "Authorization required".
Click "Review Permissions".
Select own account.
Click "Advanced" at "This app isn't verified".
Click "Go to ### project name ###(unsafe)"
Click "Allow" button.
Click "OK".
Copy the URL of Web Apps. It's like https://script.google.com/macros/s/###/exec.
When you modified the Google Apps Script, please redeploy as new version. By this, the modified script is reflected to Web Apps. Please be careful this.
3. Put the formula.
Please put the following formula to a cell.
=IMPORTXML("https://script.google.com/macros/s/###/exec?id=###fileId###","//tr")
###fileId### is the file ID of HTML file on Google Drive.
Pattern 2:
In this pattern, the HTML tables are retrieved from the HTML data, and the tables are put to the Spreadsheet using Sheets API.
Usage:
1. copy and paste the following script to the script editor.
Please set the variables of fileId, spreadsheetId and sheetName.
function myFunction() {
var fileId = "###"; // Please set the file ID of HTML file.
var spreadsheetId = "###"; // Please set the Spreadsheet ID for putting the values.
var sheetName = "Sheet1"; // Please set the sheet name for putting the values.
// Retrieve tables from HTML data.
var html = DriveApp.getFileById(fileId).getBlob().getDataAsString();
var values = html.match(/<table[\w\s\S]+?<\/table>/gi);
// Put the HTML tables to the Spreadsheet.
var ss = SpreadsheetApp.openById(spreadsheetId);
var sheet = ss.getSheetByName(sheetName);
var sheetId = sheet.getSheetId();
var rowIndex = 0;
values.forEach(function(e) {
var resource = {requests: [{pasteData: {html: true, data: e, coordinate: {sheetId: sheetId, rowIndex: rowIndex}}}]};
Sheets.Spreadsheets.batchUpdate(resource, spreadsheetId);
rowIndex = sheet.getLastRow();
})
}
2. Enable Sheets API.
Please enable Sheets API at Advanced Google services.
3. Run the script.
When you run the function myFunction, the values are retrieved from HTML data and they are put to the Spreadsheet.
Note:
These are the simple sample scripts. So please modify them for your actual situation.
References:
Web Apps
Taking advantage of Web Apps with Google Apps Script
Advanced Google services
spreadsheets.batchUpdate
Unfortunately, from your question, I cannot understand about your actual HTML data. So if an error occurs and this was not the direction you want, I apologize.

Related

How to generate a Google Docs with an embedded AppScript, using Apps Script?

I'm interested in generating a Google Document with a particular Apps Script from my library.
The way I have been creating the Documents has been through basically, Jeff Everhart's code as seen here.
Is it possible that the code be changed as to have the new Google Docs have an embedded Apps Script right from the get go?
My reduced code:
function createNewGoogleDocs() {
const googleDocTemplate = DriveApp.getFileById('The template ID goes here');
const destinationFolder = DriveApp.getFolderById('The destination folder ID goes here');
const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Sheet1');
const rows = sheet.getDataRange().getValues();
rows.forEach(function(row, index) {
if (index === 0 ) return;
if(row[rows[0].length - 1]) return;
const copy = googleDocTemplate.makeCopy(`First Column Name (${row[0]})`, destinationFolder);
const doc = DocumentApp.openById(copy.getId());
// Hopefully here we can have the addition of the the Apps Script on the new Docs
doc.saveAndClose();
const url = doc.getUrl();
sheet.getRange(index+1, rows[0].length).setValue(url);
})
}
If your goal is to create an Apps Script project bound to a Doc, you have to create it from that particular document. As it is said on the docs:
A script is bound to a Google Sheets, Docs, Slides, or Forms file if
it was created from that document rather than as a standalone script.
The file that a bound script is attached to is called a "container."
Bound scripts generally behave like standalone scripts except that
they do not appear in Google Drive, they cannot be detached from the
file they are bound to, and they gain a few special privileges over
the parent file.
If your end goal is to create a Doc with an embedded Apps Script from a different Apps Script project, you would have to first create that Doc/script as a template and then use makeCopy to create a copy of the Doc/script.
I am documenting this answer to help other people on this community, please feel free to leave a comment in your need further assistance.

How to access data on different Google Spreadsheet through Google Apps Script for user without Editor access?

I have 2 Spreadsheets, the 1st will Search on the 2nd spreadsheet with data using a google script function.
I want to keep the 2nd spreadsheet with data to be hidden (no editor access) from a user, but he/she will be able to Search on it via the google script function only.
I'm using google script Openbyurl to do it, but it won't let this user to run the Openbyurl unless he/she has editor access to the 2nd spreadsheet.
how should I deal with this?
Below function is in the 1st Spreadsheet, openByUrl links to 2nd Spreadsheet:
function onSearch(SN) {
var ss = SpreadsheetApp.openByUrl('docs.google.com/spreadsheets/...');
var sheets = ss.getSheets();
// search for data in ss sheets . . .
// return array of found data
}
I believe your goal is as follows.
You have 2 Google Spreadsheets "A" and "B".
You want to make the user retrieve values from Spreadsheet "B" with the script of Spreadsheet "A".
You don't want to share Spreadsheet "B" while Spreadsheet "A" is shared with the user.
In this case, as a workaround, how about accessing Spreadsheet "B" using Web Apps? By this, you can make the user access Spreadsheet "B" without sharing the Spreadsheet with the user. When this is reflected in your script, it becomes as follows.
Flow:
1. For Spreadsheet "B":
Please copy and paste the following script to the script editor of Spreadsheet "B".
function doGet(e) {
var SN = e.parameter.sn; // You can use the value of SN here.
var ss = SpreadsheetApp.getActiveSpreadsheet(); // Your 2nd Spreadsheet.
var sheets = ss.getSheets();
// search for data in ss sheets . . .
// return array of found data
var returnValues = ["sample"]; // Please replace this value for your actual script.
return ContentService.createTextOutput(JSON.stringify(returnValues));
}
2. Deploy Web Apps.
Please run this flow on the script editor of Spreadsheet "B". The detailed information can be seen at the official document.
On the script editor, at the top right of the script editor, please click "click Deploy" -> "New deployment".
Please click "Select type" -> "Web App".
Please input the information about the Web App in the fields under "Deployment configuration".
Please select "Me" for "Execute as".
This is the importance of this workaround.
Please select "Anyone" for "Who has access".
In this case, the user is not required to use the access token. So please use this as a test case.
Of course, you can also access your Web Apps using the access token. Please check this report.
Please click "Deploy" button.
Copy the URL of the Web App. It's like https://script.google.com/macros/s/###/exec.
When you modified the Google Apps Script, please modify the deployment as a new version. By this, the modified script is reflected in Web Apps. Please be careful this.
You can see the detail of this in the report of "Redeploying Web Apps without Changing URL of Web Apps for new IDE".
3. For Spreadsheet "A":
Please copy and paste the following script to the script editor of Spreadsheet "A". And, set your Web Apps URL. In this case, please give the value to SN.
function onSearch(SN) {
var url = "https://script.google.com/macros/s/###/exec"; // Please set your Web Apps URL here.
var res = UrlFetchApp.fetch(`${url}?sn=${SN}`);
var ar = JSON.parse(res.getContentText()); // This is the returned value from Spreadsheet "B".
// do something.
}
4. Testing:
When you run the script of onSearch, the value of SN is sent to Spreadsheet "B" and run the script of Spreadsheet "B", and the result values are returned. By this flow, you can retrieve the values from Spreadsheet "B" without sharing Spreadsheet "B" with the user.
Note:
In this sample script, when you directly run onSearch, the value of SN is not declared. So please be careful about this.
When you modified the Google Apps Script, please modify the deployment as a new version. By this, the modified script is reflected in Web Apps. Please be careful this.
You can see the detail of this in the report of "Redeploying Web Apps without Changing URL of Web Apps for new IDE".
My proposed script is a simple script. So please modify it for your actual situation.
References:
Web Apps
Taking advantage of Web Apps with Google Apps Script

Using private sheets with tabletop.js

I've used tabletop.js [1] in the past and is amazing! You can simply do anything you want seriously.
The only problem I saw is that you need to publish your spreadsheets to the web, which of course is really risky if you are working with sensitive data.
I'm in need now of using it in a project with sensitive data, so I was hoping someone can guide me on how to use it with spreadsheets that are not published to the web.
I've been searching for this for a long time without any success but seems that tabletop.js does support private sheets (here's the pull request that added this option [2]).
In fact, looking at the documentation they included it [1]:
authkey
authkey is the authorization key for private sheet support.
ASK: How am I suppose to use the authkey? can someone provide me with an example so I can try?
Thanks in advance!
[1] https://github.com/jsoma/tabletop
[2] https://github.com/jsoma/tabletop/pull/64
How about this answer?
Issue and workaround:
At "tabletop.js", from the endpoint (https://spreadsheets.google.com/feeds/list/###/###/private/values?alt=json) of request, it seems that "tabletop.js" uses Sheets API v3. And when authkey is used, oauth_token=authkey is added to the query parameter. In this case, unfortunately, it seems that the private Spreadsheet cannot be accessed with it. From this situation, unfortunately, I thought that in the current stage, "tabletop.js" might not be able to use the private Spreadsheet. But I'm not sure whether this might be resolved in the future update. Of course, it seems that the web-published Spreadsheet can be accessed using this library.
So, in this answer, I would like to propose the workaround for retrieving the values from Spreadsheet as the JSON object.
Pattern 1:
In this pattern, Google Apps Script is used. With Google Apps Script, the private Spreadsheet can be easily accessed.
Sample script:
When you use this script, please copy and paste it to the script editor and run the function myFunction.
function myFunction() {
const spreadsheetId = "###"; // Please set the Spreadsheet ID.
const sheetName = "Sheet1"; // Please set the sheet name.
const sheet = SpreadsheetApp.openById(spreadsheetId).getSheetByName(sheetName);
const values = sheet.getDataRange().getValues();
const header = values.shift();
const object = values.map(r => r.reduce((o, c, j) => Object.assign(o, {[header[j]]: c}), {}));
console.log(object) // Here, you can see the JSON object from Spreadsheet.
}
I thought that this might be the simple way.
Pattern 2:
In this pattern, the Web Apps created by Google Apps Script is used. When the Web Apps is used, the private Spreadsheet can be easily accessed. Because the Web Apps is created with Google Apps Script. In this case, you can access to the Web Apps from outside by logging in to Google account. And, the JSON object can be retrieved in HTML and Javascript.
Usage:
Please do the following flow.
1. Create new project of Google Apps Script.
Sample script of Web Apps is a Google Apps Script. So please create a project of Google Apps Script. In order to use Document service, in this case, Web Apps is used as the wrapper.
If you want to directly create it, please access to https://script.new/. In this case, if you are not logged in Google, the log in screen is opened. So please log in to Google. By this, the script editor of Google Apps Script is opened.
2. Prepare script.
Please copy and paste the following script (Google Apps Script) to the script editor. This script is for the Web Apps.
Google Apps Script side: Code.gs
function doGet() {
return HtmlService.createHtmlOutputFromFile("index");
}
function getObjectFromSpreadsheet(spreadsheetId, sheetName) {
const sheet = SpreadsheetApp.openById(spreadsheetId).getSheetByName(sheetName);
const values = sheet.getDataRange().getValues();
const header = values.shift();
const object = values.map(r => r.reduce((o, c, j) => Object.assign(o, {[header[j]]: c}), {}));
return object;
}
HTML&Javascript side: index.html
<script>
const spreadsheetId = "###"; // Please set the Spreadsheet ID.
const sheetName = "Sheet1"; // Please set the sheet name.
google.script.run.withSuccessHandler(sample).getObjectFromSpreadsheet(spreadsheetId, sheetName);
function sample(object) {
console.log(object);
}
</script>
spreadsheetId and sheetName are given from Javascript side to Google Apps Script side. From this situation, in this case, getObjectFromSpreadsheet might be instead of "tabletop.js".
3. Deploy Web Apps.
On the script editor, Open a dialog box by "Publish" -> "Deploy as web app".
Select "Me" for "Execute the app as:".
By this, the script is run as the owner.
Select "Only myself" for "Who has access to the app:".
In this case, in order to access to the Web Apps, it is required to login to Google account. From your situation, I thought that this might be useful.
Click "Deploy" button as new "Project version".
Automatically open a dialog box of "Authorization required".
Click "Review Permissions".
Select own account.
Click "Advanced" at "This app isn't verified".
Click "Go to ### project name ###(unsafe)"
Click "Allow" button.
Click "OK".
Copy the URL of Web Apps. It's like https://script.google.com/macros/s/###/exec.
When you modified the Google Apps Script, please redeploy as new version. By this, the modified script is reflected to Web Apps. Please be careful this.
4. Run the function using Web Apps.
You can test above scripts as follows.
Login to Google account.
Access to the URL of Web Apps like https://script.google.com/macros/s/###/exec using your browser.
By this, you can see the retrieved JSON object at the console.
Note:
When you modified the script of Web Apps, please redeploy the Web Apps as new version. By this, the latest script is reflected to the Web Apps. Please be careful this.
References:
Web Apps
Taking advantage of Web Apps with Google Apps Script

Rename appsscript project on duplication of spreadsheet

I'm duplicating spreadsheets based on a template file with attached appsscript project. Below you can see the basic code.
This works perfectly for the spreadsheets, but the name of the appsscript project remains the same as the template file. Which is a problem, as I can't distinguish them anymore. I will have hundreds of duplicates in the end.
Is there a way to set the appsscript project name on duplication?
Thank you in advance!
function copyTemplatev2(filename, sheetID) {
var ss = SpreadsheetApp.openById(sheetID);
//Make a copy of the template file
var copy = DriveApp.getFileById(sheetID).makeCopy()
var documentId = copy.getId();
// Set permissions
copy.setSharing(DriveApp.Access.ANYONE, DriveApp.Permission.EDIT)
//Rename the copied file
DriveApp.getFileById(documentId).setName(filename);
}
attached appsscript project of a template file with attached appsscript project is the container-bound script of Spreadsheet.
You want to rename the GAS project name of the container-bound script of Spreadsheet which was copied.
The Spreadsheet is used as the template, and the container-bound script is included in the Spreadsheet.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Issue and workaround:
The container-bound script of Google Docs cannot be retrieved by the methods of Files: list and Files: get in Drive API. This has already been reported to issue tracker.
The metadata of container-bound script of Google Docs can be updated by the method of Files: update in Drive API.
In your case, the GAS project ID (the script ID) is not changed because it is included in the template Spreadsheet. I think that this can be used for achieving your issue.
From above situation, I would like to propose the following flow.
Flow:
Set the variables of the container-bound script ID of the template Spreadsheet and the original project name of container-bound script ID of the template Spreadsheet.
Rename of the GAS project of the template Spreadsheet to the new project name.
Copy the template Spreadsheet. At this time, the GAS project is also copied as the new project name.
Rename of the GAS project of the template Spreadsheet to the original project name.
By above flow, the GAS project name of container-bound script in the copied Spreadsheet can be renamed.
When above workaround is reflected to your script, it becomes as follows.
Modified script:
Before you run the script, please enable Drive API at Advanced Google services. And please set the variables of GASProjectId, originalGASProjectName and newGASProjectName.
function copyTemplatev2(filename, sheetID) {
var GASProjectId = "###"; // Please set the container-bound script ID of the template Spreadsheet.
var originalGASProjectName = "originalName"; // Please set the original project name of container-bound script ID of the template Spreadsheet.
var newGASProjectName = "newName"; // Please set the new GAS project name.
// Rename to new project name.
Drive.Files.update({title: newGASProjectName}, GASProjectId);
var ss = SpreadsheetApp.openById(sheetID);
//Make a copy of the template file
var copy = DriveApp.getFileById(sheetID).makeCopy()
var documentId = copy.getId();
// Set permissions
copy.setSharing(DriveApp.Access.ANYONE, DriveApp.Permission.EDIT)
//Rename the copied file
DriveApp.getFileById(documentId).setName(filename);
// Rename to original project name.
Drive.Files.update({title: originalGASProjectName}, GASProjectId);
}
References:
Advanced Google services
Files: update
If I misunderstood your question and this was not the direction you want, I apologize.

How to extract the pubhtml ID in google sheet using Apps Script?

I'm trying to get the webpage link of my google sheet using Apps Script. When you go to the File>Publish to the web>publish, you will see a weblink there. I want to get that link via apps script. Please help.
You want to retrieve the URL like https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml using Google Apps Script.
If my understanding is correct, how about this answer?
Issue and workaround:
File>Publish to the web>publish can be run for the Google Docs. This URL could been retrieved with publishedLink of Drive API v2 before. But in the current stage, unfortunately, this cannot be used for both Drive API v2 and v3. So as a workaround, I would like to propose to use file ID.
By the way, 2PACX-### of https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml is not the file ID. Also please be careful this.
Sample script:
In this case, as a test case, the URL of published Spreadsheet is retrieved.
function myFunction() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var url = ss.getUrl().replace("edit", "pubhtml"); // or replace("edit", "pub");
Logger.log(url)
}
Note:
At above script, when the Spreadsheet is not published to web, when you access to the URL, the Spreadsheet is not shown. Please be careful this.
For example, if you want to retrieve the URL of Google Document, you can use the following script. Unfortunately, at Google Document, getUrl() returns https://docs.google.com/open?id=###. So above script cannot be used.
function myFunction() {
var doc = DocumentApp.getActiveDocument();
var url = "https://docs.google.com/document/d/" + doc.getId() + "/pub";
Logger.log(url)
}
References:
V2 to v3 reference
getUrl()
If I misunderstood your question and this was not the direction you want, I apologize.
Added:
If you can enable Drive API at Advanced Google Services, you can also use the following script.
Pattern 1:
When you use the container-bound script of Google Docs, you can use this.
var url = Drive.Files.get((DocumentApp.getActiveDocument() || SpreadsheetApp.getActiveSpreadsheet() || SlidesApp.getActivePresentation()).getId()).alternateLink.replace(/\/edit.+/, "/pub");
Logger.log(url);
Pattern 2:
When you have the file ID of Google Docs, you can use this.
var url = Drive.Files.get(fileId).alternateLink.replace(/\/edit.+/, "/pub");
Logger.log(url);