Finding a table using ImportHTML - google-apps-script

I feel like I've tried every solution out here, and have yet to accomplish this task.
I'm looking to scrape the SECOND (playoffs) table on this link:
https://www.basketball-reference.com/players/c/curryst01/gamelog/2016
The first table comes in very easily using IMPORTHTML, the second however I haven't been able to locate.
I've tried using IMPORTHTML with 100 different tables & lists. I also looked in inspector and did a CTRL F on <table and see the info there.
I read that it could be because it's a Javascript object, but when I turned off Javascript (like someone suggested), I still see the table, which leads me to believe it can definitely be scraped into a Google Sheet.
I tried ImportXML as well, but I'm not as familiar and wasn't able to find the info with that either.
Are there any suggestions on how I could scrape this? Seems bizarre to me that it is this difficult!

Unfortunately, it seems that IMPORTHTML and IMPORTXML cannot be used for retrieving the table you expect. But, fortunately, I noticed that when the HTML is retrieved by Google Apps Script, the HTML data includes the table of the SECOND (playoffs) table you expect. So in this answer, I would like to propose to use Google Apps Script.
Sample script:
Please copy and paste the following script to the script editor of Google Spreadsheet, and please enable Sheets API at Advanced Google services. And, please run myFunction at the script editor. By this, the retrieved table is put to the sheet.
function myFunction() {
const url = "https://www.basketball-reference.com/players/c/curryst01/gamelog/2016"; // This URL is from your question.
const sheetName = "Sheet1"; // Please set the destination sheet name.
const html = UrlFetchApp.fetch(url).getContentText();
const tables = [...html.matchAll(/<table[\s\S\w]+?<\/table>/g)];
if (tables.length > 8) {
const ss = SpreadsheetApp.getActiveSpreadsheet();
Sheets.Spreadsheets.batchUpdate({ requests: [{ pasteData: { html: true, data: tables[8][0], coordinate: { sheetId: ss.getSheetByName(sheetName).getSheetId() } } }] }, ss.getId());
return;
}
throw new Error("Expected table cannot be retrieved.");
}
Result:
When this script is run, the following result can be obtained.
References:
Method: spreadsheets.batchUpdate
PasteDataRequest

I learned I wasn't turning off Javascript properly... well, now the table is gone. So I'm assuming this means it cannot be scraped into Sheets.
Still curious what solutions are out there - I'm currently working on it using ParseHub, but I'd really love to understand how it could be done in Sheets

Try this, it will give you the main table
=importhtml(url,"table",8)
you can also retrieve informations for tables #1 to #7

Related

List of all tabs' "publish to web" links on a large googlesheet document (200 tabs)

Is there a way to get a list of each of the hyperlinks created by the "publish to web" function on google sheets without selecting each tab individually and copying and pasting to a spreadsheet/word document. Ideally the output being all my tab names (circa 200 of them) and the link.
Any help or advice would be greatly appreciated.
If all you wish is tab names then this is a list of tab names:
function getTabNames() {
const ss = SpreadsheetApp.getActive();
Logger.log(ss.getSheets().map(sh => sh.getName()).join(','))
}
You could use openById() if you wish.
I believe your goal is as follows.
You want to receive the Web Published URL for all sheets in a Google Spreadsheet using Google Apps Script.
You want to put the URLs to the Spreadsheet.
Issue and workaround:
When a Google Spreadsheet is published to the web, a URL like https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml?gid=###&single=true is obtained. But, in the current stage, unfortunately, this cannot be retrieved using a script and API. Ref By this, it is required to manually create the URL.
In this answer, I would like to propose 2 patterns for achieving your goal.
Pattern 1:
In this pattern, a URL like https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml?gid=###&single=true is used. 2PACX-### is not the Spreadsheet ID. Please be careful about this.
First, please publish to the web for your Spreadsheet, and retrieve the URL of https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml?gid=###&single=true. In this pattern , https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml from https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml?gid=###&single=true is used.
Please copy and paste the following script to the script editor of Google Spreadsheet. And, please set your https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml to baseUrl. When you use this script, please put a custom function of =SAMPLE(). By this, the URLs are returned.
function SAMPLE() {
const baseUrl = "https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml"; // Please modify this for your URL.
return SpreadsheetApp.getActiveSpreadsheet().getSheets().map(s => `${baseUrl}?single=true&gid=${s.getSheetId()}`);
}
Pattern 2:
In this pattern, the URL like https://docs.google.com/spreadsheets/d/### fileId ###/pubhtml is used. In this case, Spreadsheet ID is used. By this, you are not required to do a hard copy of the URL.
Please copy and paste the following script to the script editor of Google Spreadsheet. When you use this script, please put a custom function of =SAMPLE(). By this, the URLs are returned.
function SAMPLE() {
const ss = SpreadsheetApp.getActiveSpreadsheet();
const baseUrl = `https://docs.google.com/spreadsheets/d/${ss.getId()}/pubhtml`;
return ss.getSheets().map(s => `${baseUrl}?single=true&gid=${s.getSheetId()}`);
}
Note:
In this case, when the sheet is not published, you cannot access the URL. Please be careful about this.
References:
map()
getSheetId()

Resource URl Not Found with Yahoo Finance

I wish to grab the historical stock price from Yahoo Finance into Google Sheet and received this error. Please assist. If using import xml, how will it be?
https://au.finance.yahoo.com/quote/ASX.AX/history?p=ASX.AX
=IMPORTHTML(D7,"table",1)
I believe your goal as follows.
You want to retrieve the table from the URL of https://au.finance.yahoo.com/quote/ASX.AX/history?p=ASX.AX and put it to the Spreadsheet.
Issue and workaround:
Unfortunately, it seems that the table cannot be retrieved using IMPORTHTML and IMPORTXML from the URL. This has already been mentioned in Jason E.'s answer.
But, fortunately, when I tested to retrieve the table using UrlFetchApp of Google Apps Script, I confirmed that the table can be retrieved. So, in this answer, as a workaround, I would like to propose to achieve your goal using Google Apps Script. The sample script is as follows.
Sample script:
Please copy and paste the following sample script to the script editor of Spreadsheet. And, before you use this script, please enable Sheets API at Advanced Google services. And, run the function of myFunction and please authorize the scopes. By this flow, the table is retrieved from the URL and put it to the active sheet.
function myFunction() {
const url = "https://au.finance.yahoo.com/quote/ASX.AX/history?p=ASX.AX";
const res = UrlFetchApp.fetch(url, {muteHttpExceptions: true});
const tables = res.getContentText().match(/(<table[\w\s\S]+?<\/table>)/g);
if (!tables || tables.length == 0) throw new Error("No tables. Please confirm URL again.");
const spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
const sheet = spreadsheet.getActiveSheet();
const resource = {requests: [{pasteData: {html: true, data: tables[0], coordinate: {sheetId: sheet.getSheetId()}}}]};
Sheets.Spreadsheets.batchUpdate(resource, spreadsheet.getId());
}
Result:
When above script is run, the following result is obtained.
Note:
This sample script is for the URL of https://au.finance.yahoo.com/quote/ASX.AX/history?p=ASX.AX. So when you changed the URL, the script might not be able to be used. Please be careful this.
References:
Class UrlFetchApp
Method: spreadsheets.batchUpdate
Yahoo seems to have made some changes to their website resulting for the IMPORT functions of Google Sheet not to work. This affected some(not all) of their webpage as well as the tickers. Using IMPORTXML will still give you the same error.
I suggest using the built in GOOGLEFINANCE() function or find another website that is scrape-able by IMPORT functions and will give you the same data as you wanted.

UrlFetchApp.fetch(URL).getContentText() all of a sudden not working despite no changes

I have a very simple Google Script that grabs a CSV file from the internet and puts it into a Google Doc. There is more to it before and after, but here is where the issue arises:
var csvUrl = "https://data.cdc.gov/resource/muzy-jte6.csv?$limit=6000";
var csvContent = UrlFetchApp.fetch(csvUrl).getContentText();
It runs automatically every night, and has run without an issue for the last six months. All of a sudden, it no longer works. The script editor provides no reason. The link is perfectly valid and still works. I tried it with a different CSV link from a different website, and it had the same issue.
When I run the script, all it says is:
Exception: Unexpected error:
https://data.cdc.gov/resource/muzy-jte6.csv?$limit=6000 (line 8, file
"Code") Dismiss
As I mentioned, the script has worked flawlessly well over 150 times since the summer, and it has not changed not has the document changed nor has the API link. My only guess is that there was some permission change on my Google account, but the app still has all the permissions it needs from my Google account.
Please help me understand why the script no longer works and how I can get it working again.
I had the same situation. At that time, I could noticed that when the built-in function of Google Spreadsheet is used for the URL, the values can be retrieved. In that case, as the current workaround, I used the following flow.
Put a formula of =IMPORTDATA(URL).
Retrieve the values from the sheet.
When above flow is reflected to your URL of https://data.cdc.gov/resource/muzy-jte6.csv?$limit=6000, it becomes as follows.
Sample script:
function myFunction() {
const url = "https://data.cdc.gov/resource/muzy-jte6.csv?$limit=6000"; // This is your URL.
const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Sheet1");
sheet.clear();
const range = sheet.getRange("A1");
range.setFormula(`=IMPORTDATA("${url}")`);
SpreadsheetApp.flush();
const values = sheet.getDataRange().getValues();
range.clear();
console.log(values)
}
In this sample script, "Sheet1" is used. So please modify it for your actual situation.
When above script is run, the values from the URL are retrieved as 2 dimensional array for values.
In this answer, I used IMPORTDATA. But for each situation, other functions might be suitable. In that case, please check them.
Note:
This is the current workaround. So when this issue was removed, I think that you can use your original script.
References:
IMPORTDATA
setFormula()

Google Apps Script - Creating and Saving Filters for Google Sheets

I feel like a bit of a chump, but I cannot work this out...
I have been given the job of producing a new master analysis sheet each month from a supplied XML file that combines with various columns of our (multiple) sheets. No problems, so far. I have got all of that working the way I want. :-)
My issue is that we also have about 6-8 filters saved with a specific sheet that allow our auditors to focus on specific areas (and as you can understand, our auditors want these to work EXACTLY as they specify).
I have tried using createFilter() but there doesn't appear any way to save multiple filters to that sheet (maybe I am missing something). No joy! :-(
I have tried recording a macro which I could then run to create the filters. No joy here either :-(
Do I have to tell these pesky auditors to create there own filters each month (they do know how, but it's beneath them), or is there a way I can script them up and get them off my back?
Unfortunately (as much as I would like to) I cannot share our sheets or scripts as we have significant IP embedded there.
I would really appreciate some guidance as to how you might approach this (if it is possible).
Kind regards
Ian
If you're indeed talking about the 'Create new filter view', I suggest making an template sheet. So instead of creating a new sheet every month, make one template spreadsheet and add all the filter views your auditors desire. Then copy that spreadsheet, and paste the new data in it.
The correct way to create a filter using Apps Script and the createFilter() is this one:
function setFilters() {
var ss = SpreadsheetApp.getActiveSheet();
var rangeFilter = ss.getRange("INPUT_YOUR_RANGE_HERE");
var filter = rangeFilter1.createFilter();
var filterCriteria = SpreadsheetApp.newFilterCriteria();
filterCriteria.ADD_YOUR_CRITERIA_HERE;
filter.setColumnFilterCriteria(columnPosition, filterCriteria.build());
}
As you can see, you must use build() in order to build the criteria for the filter you have created.
You can also use the Sheets advanced services and create the filters using the Sheets API, something similar to this:
var filterSettings = {
//YOUR FILTER SETTINGS
};
var request = [{
"setBasicFilter": {
"filter": filterSettings
}
}];
And as for calling the Sheets service and applying the above filter, you can use this:
Sheets.Spreadsheets.batchUpdate({'requests': request}, SPREADSHEET_ID);
Reference
Range Class Apps Script createFilter();
Filter Class Apps Script;
Apps Script Google Advanced Services.

Google Script to import data from Google trends

just wondering if there is a script to import data from google trends into a google sheet.
Basically, I would like to trigger such script daily to know the rising trends for a given topic and I am not sure if there is such solution available.
Else, is there any solution to perform this task?
Many thanks!
You may refer with this sample code snippet on how to use Google Apps Script for querying Google trends. For example, this function read input from the spreadsheet and sanitize it, then call up queryGoogleTrends to perform actual query and finally display the actual result:
function startQuery() {
sheet = SpreadsheetApp.getActiveSpreadsheet();
// start the query
var result = buildQueryString(sheet.getRangeByName("q").getValue(),
sheet.getRangeByName("cat").getValue(),
sheet.getRangeByName("geo").getValue(),
sheet.getRangeByName("gprop").getValue(),
sheet.getRangeByName("cmpt").getValue(),
sheet.getRangeByName("date").getValue()
);
// display the resulting link in a cell
sheet.getRangeByName("Query_Result").setValue(result).setBackground("yellow");
var csv_result = generateCsvDownload(sheet.getRangeByName("q").getValue(),
sheet.getRangeByName("cat").getValue(),
sheet.getRangeByName("geo").getValue(),
sheet.getRangeByName("gprop").getValue(),
sheet.getRangeByName("cmpt").getValue(),
sheet.getRangeByName("date").getValue()
);
sheet.getRangeByName("CSV_Download_Link").setValue(csv_result).setBackground("yellow");
}