Creating a UrlFetchApp script to replace the Google Sheet importHTML function - google-apps-script

I used the following formula for about a year now and suddenly it stopped working/importing the table.
=IMPORTHTML("https://tradingeconomics.com/matrix";"table";1)
It gives me a "Could not fetch url: https://tradingeconomics.com/matrix" error.
I tried various things and one of the interesting findings was that the importHTML works for the cached version, but only in a new sheet under a different Google account. Furthermore, the cached version breaks randomly too.
Thus, it seems I won't get around using a script for this purpose.
Ideally, this script would be flexible enough, where it would have a dedicated function e.g. importHTMLtable where the user can input the URL and the table no. and it works. So it would work for the following functions I currently use e.g.
=importHTMLtable("https://tradingeconomics.com/matrix";"table";1)
OR
=importHTMLtable("https://tradingeconomics.com/country-list/business-confidence?continent=world";"table";1)
OR
=importHTMLtable("https://tradingeconomics.com/country-list/ease-of-doing-business";"table";1)
etc...
Not sure if this Github code solves this problem. It seems to only parse text?
As I would assume this is a fairly common problem users of Google Sheets have and would think there might already be an AppScript out there that does exactly this and might be faster in terms of importing speed too.
I can't program, so I tried copying and posting codes to see if I can get some code to work. No luck :(
Can anyone provide a code or maybe an existing app script (I'm not aware of) that does exactly this that?

Try this way
=importTableHTML(A1,1)
with
function importTableHTML(url,n){
var html = UrlFetchApp.fetch(url,{followRedirects : true,muteHttpExceptions: true}).getContentText().replace(/(\r\n|\n|\r|\t| )/gm,"")
const tables = [...html.matchAll(/<table[\s\S\w]+?<\/table>/g)];
var trs = [...tables[n-1][0].matchAll(/<tr[\s\S\w]+?<\/tr>/g)];
var data = [];
for (var i=0;i<trs.length;i++){
console.log(trs[i][0])
var tds = [...trs[i][0].matchAll(/<(td|th)[\s\S\w]+?<\/(td|th)>/g)];
var prov = [];
for (var j=0;j<tds.length;j++){
donnee=tds[j][0].match(/(?<=\>).*(?=\<\/)/g)[0];
prov.push(stripTags(donnee));
}
data.push(prov);
}
return(data)
}
function stripTags(body) {
var regex = /(<([^>]+)>)/ig;
return body.replace(regex,"").replace(/ /g,' ').trim();
}
url-fetch-app#advanced-parameters
matchAll

Related

Format imported table

I've been attempting to import a table of events from the following website:  https://scpgajt.bluegolf.com/bluegolf/scpgajt23/schedule/index.htm?type=2&display=champ (and others similar in structure).
I am attempting to reproduce the example website table on a Google Sheet where I would later add a check-box column and then select the events I need (which would copy the selection to another sheet for personalized planning).
So far, I have been able to use copied/pasted Apps Script coding found on Stack Overflow (see my Example Sheet HERE) and this =ImportTableHTML(A1,1) formula on the sheet to pull the table from the site into the sheet.
This Apps Script method has finally produced a complete list of events, however, the results are horribly formatted incorrectly (see Example Sheet 1 - Scrape Import / Raw). The result I am looking for should format close to the the original columns and rows as the original table, or filter and distribute the pulled data into certain specified cells (see Example Sheet 2 - Model Result).
This is the farthest I have been able to get, thanks to the scripts found on Stack Overflow, combining scripts posted in Replacing =ImportHTML with URLFetchApp) and Creating a UrlFetchApp script to replace the Google Sheet importHTML function.
Unfortunately, now I cannot figure out the options in the script to affect formatting / distributing of the results into the proper cells.
Is it possible to reproduce the table in my example sheet with proper or modifiable formatting?
The site I am attempting to capture table data from
The resulting import using =ImportTableHTML(A1,1)
The way the imported data should be parsed and distributed
App Script Code I am currently using:
function importTableHTML(url,n){
var html = '<table' + UrlFetchApp.fetch(url, {muteHttpExceptions: true}).getContentText().replace(/(\r\n|\n|\r|\t| )/gm,"").match(/(?<=\<table).*(?=\<\/table)/g) + '</table>';
var trs = [...html.matchAll(/<tr[\s\S\w]+?<\/tr>/g)];
var data = [];
for (var i=0;i<trs.length;i++){
var tds = [...trs[i][0].matchAll(/<(td|th)[\s\S\w]+?<\/(td|th)>/g)];
var prov = [];
for (var j=0;j<tds.length;j++){
donnee=tds[j][0].match(/(?<=\>).*(?=\<\/)/g)[0];
prov.push(stripTags(donnee));
}
data.push(prov);
}
return(data);
}
function stripTags(body) {
var regex = /(<([^>]+)>)/ig;
return body.replace(regex,"");
}

Long processing time when inserting data in google sheets through google scripts

I have just built my first google script for a dashboard I am building in google sheets. Essentially, this dashboard has cells which act as filters for calculations in other cells. I am trying to build two buttons, one which returns the filter-cells values to a default value and one which copies the values for the filter from another sheet in the same google spreadsheet.
Problem
The script is taking between 2 and 120s to finish, for an action that done manually takes about 20s, essentially making the button "useless" when it takes more than 30s. Some times the script is incredibly fast, but I would need it to be that fast consistently for it to be worth having. I have tried optimizing the code to the best of my ability (which is not much) based on other threads I have found, but I haven't been able to optimize it yet and I was hoping to find a way to improve my code. Any suggestions?
Code
Default button
function Filters() {
var spreadsheet = SpreadsheetApp.getActiveSheet();
range_apply_1 = spreadsheet.getRange('C2:C8'),
range_apply_2 = spreadsheet.getRange('E4:E6');
var values_1 = SpreadsheetApp.getActiveSheet().getRange('\'Aux AM\'!A11:A17').getValues()
values_2 = SpreadsheetApp.getActiveSheet().getRange('\'Aux AM\'!A20:A22').getValues();
range_apply_1.setValues(values_1),
range_apply_2.setValues(values_2);
};
Copy values button
function test() {
var spreadsheet = SpreadsheetApp.getActiveSheet();
range_apply_1 = spreadsheet.getRange('C2:C8'),
range_apply_2 = spreadsheet.getRange('E4:E6');
var values_1 = SpreadsheetApp.getActiveSheet().getRange('\'AM Metrics - Orders\'!C2:C8').getValues()
values_2 = SpreadsheetApp.getActiveSheet().getRange('\'AM Metrics - Orders\'!E4:E6').getValues();
range_apply_1.setValues(values_1),
range_apply_2.setValues(values_2);
};
The basic to reduce the execution time in Google Apps Script is to to reduce the number of calls to Google Apps Script services. In your specific case you could reduce the calls by using a single getRangeList instead of using multiple getRange.
I think that the following modification should work:
function Filters() {
var sheet = SpreadsheetApp.getActiveSheet();
var name = sheet.getName();
var [range_apply_1, range_apply_2, range_1, range_2] = sheet
.getParent()
.getRangeList([
`${name}!C2:C8`,
`${name}!E4:E6`,
'Aux AM!A11:A17',
'Aux AM!A20:A22'
]).getRanges();
var values_1 = range_1.getValues();
var values_2 = range_2.getValues();
range_apply_1.setValues(values_1);
range_apply_2.setValues(values_2);
};
Regarding variation on the execution time that is normal. It's caused by things on your control and things out of it. One example of things that might be on your control is your PC load as the spreadsheet recalculation might be affected by the PC resources availability. An example of things out of your control is the Google servers responsiveness among other things.

IMPORTXML- Could not fetch URL

I am trying to scrape data from wine-searcher.com and am having an issue with IMPORTXML in google sheets, I keep getting the "could not fetch url" error when trying either of the following:
=IMPORTXML("https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa","//h1")
=IMPORTXML("https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa","//*[#id='tab-info']/div/div[1]/div[2]/div/div[1]/span[2]/span[2]") ( xpath to scrape current average price)
I've tried suggestions in other stack posts such as with/out http/s, www, and both XPath and full XPath to no avail. I have also tried with other URLs and they work no problem, maybe the problem is with URL length or format? Any help would be appreciated. If it cannot be done with IMPORT XML, any free alternatives suggested?
As the page is built in javascript on the client side and not on the server side, you will not be able to retrieve the data by the importxml / importhtml functions. However, the page contains a json which you can retrieve and analyze to retrieve the information you need.
function myFunction() {
var url = 'https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa'
var source = UrlFetchApp.fetch(url).getContentText()
var jsonString = source.split('<script type="application/ld+json">')[1].split('</script>')[0]
var data = JSON.parse(jsonString)
Logger.log(data)
}
all these informations are available, from x=0 to x=23
data.offers[x].#type
data.offers[x].priceCurrency
data.offers[x].availability
data.offers[x].priceValidUntil
data.offers[x].url
data.offers[x].name
data.offers[x].seller.#type
data.offers[x].seller.name
data.offers[x].seller.description
data.offers[x].seller.availableDeliveryMethod
data.offers[x].seller.address.#type
data.offers[x].seller.address.addressRegion
data.offers[x].seller.address.addressCountry.#type
data.offers[x].seller.address.addressCountry.name
data.offers[x].priceSpecification.#type
data.offers[x].priceSpecification.description
data.offers[x].priceSpecification.price
data.offers[x].priceSpecification.priceCurrency
https://docs.google.com/spreadsheets/d/17f6lhaHA_xpSWClzxkYZcNs4FeM4VHA480QrmwyJvT4/copy
as mentioned both these basic formulae return nothing:
=IMPORTXML("https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa"; "//*")
=IMPORTDATA("https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa")
pls note that importing data into spreadsheet is URL specific, so if something works well for www.aaa.org most likely it wont work for www.bbb.org

Keep getting 'Null' when pulling data from google sheets

Whenever I try to pull data from 2 of 3 sheets from using the 'google.script.run' function call from Javascript, I keep getting an error saying the array I am returning is Null, but when I just change the exact same function call to work on another sheet, it returns the data perfectly
I have tried deleting the sheets and giving it the same names, I have tried using 'openWithURL' instead of 'getActive' to access the spreadsheet, I have tried rewriting the code, I have tried the same code in a different project, and checking the documentation to make sure I am not missing any detail. I have tried changing the references to the sheets, some work and some dont.
var SS = SpreadsheetApp.getActive();
var DB_BOOKINGS = SS.getSheetByName("BookingDatabase");
var DB_VEHICLES = SS.getSheetByName("VehicleDatabase");
var DB_REQUESTS = SS.getSheetByName("RequestDatabase");
function getRequestData(){
return DB_REQUESTS.getDataRange().getValues();
}
<script>
function getRequestData(callingFunction) {
google.script.run
.withSuccessHandler(callingFunction)
.withFailureHandler(CustomAlert)
.getRequestData();
}
</script>
I want to retrieve the sheet data but keep getting a null value
Since this is an issue with formatting as you said, try using getDisplayValues() rather than getValues(), this will pull the data as you see it in the sheet (as a string), rather than the unformatted data itself.
Reference:
getDisplayValues
I was having a similar problem. This was exactly the solution I needed. For my situation, I was able to use getValues() successfully on the initial page load, but when I tried to run it again as a sort of 'refresh' to update the values without reloading the entire page, it would return null.
My data did indeed contain dates, so after changing it to getDisplayValues(), it worked perfectly.

.geteditors and Groups

I"m trying to run a script against DocsList that will gather the list of viewers and editors and in turn perform some "work" against them. (Specifically, I'm trying to strip rights from the file in question).
It's works perfectly for users. For groups though, it returns the group name and not the email.
I can't find any way using apps scripts to retrieve the group email address from that information.
If I run fileObject.removeEditor(Group Name) is tells me it's an invalid email (which is entirely true).
I'm open to suggestions... I'm completely stuck here.
Alternatively, I'm open to a way I haven't thought of to remove all sharing rights to a bunch of files in Google Docs.
function getDocs(){
var myFolders = DocsList.getAllFolders();
var myFiles = DocsList.getAllFiles(0,10);
var mySharing = new Array();
for(x in myFiles){
mySharing[x] = [myFiles[x].getId(), myFiles[x].getEditors(), myFiles[x].getViewers()];
for(y in mySharing[x][1]){
if(mySharing[x][1][y].toString() != "doc.owner#deltahotels.com"){
myFiles[x].removeEditor(mySharing[x][1][y]);
}
}
for(y in mySharing[x][2]){
if(mySharing[x][2][y].toString() != "doc.owner#deltahotels.com"){
myFiles[x].removeEditor(mySharing[x][1][y]);
}
}
}
}
Thanks for bringing this up. This is a problem with the way the getEditors/getViewers methods handle groups. I've raised this issue for you in the issue tracker. Please star it to be updated on the progress.