I wish to grab the historical stock price from Yahoo Finance into Google Sheet and received this error. Please assist. If using import xml, how will it be?
https://au.finance.yahoo.com/quote/ASX.AX/history?p=ASX.AX
=IMPORTHTML(D7,"table",1)
I believe your goal as follows.
You want to retrieve the table from the URL of https://au.finance.yahoo.com/quote/ASX.AX/history?p=ASX.AX and put it to the Spreadsheet.
Issue and workaround:
Unfortunately, it seems that the table cannot be retrieved using IMPORTHTML and IMPORTXML from the URL. This has already been mentioned in Jason E.'s answer.
But, fortunately, when I tested to retrieve the table using UrlFetchApp of Google Apps Script, I confirmed that the table can be retrieved. So, in this answer, as a workaround, I would like to propose to achieve your goal using Google Apps Script. The sample script is as follows.
Sample script:
Please copy and paste the following sample script to the script editor of Spreadsheet. And, before you use this script, please enable Sheets API at Advanced Google services. And, run the function of myFunction and please authorize the scopes. By this flow, the table is retrieved from the URL and put it to the active sheet.
function myFunction() {
const url = "https://au.finance.yahoo.com/quote/ASX.AX/history?p=ASX.AX";
const res = UrlFetchApp.fetch(url, {muteHttpExceptions: true});
const tables = res.getContentText().match(/(<table[\w\s\S]+?<\/table>)/g);
if (!tables || tables.length == 0) throw new Error("No tables. Please confirm URL again.");
const spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
const sheet = spreadsheet.getActiveSheet();
const resource = {requests: [{pasteData: {html: true, data: tables[0], coordinate: {sheetId: sheet.getSheetId()}}}]};
Sheets.Spreadsheets.batchUpdate(resource, spreadsheet.getId());
}
Result:
When above script is run, the following result is obtained.
Note:
This sample script is for the URL of https://au.finance.yahoo.com/quote/ASX.AX/history?p=ASX.AX. So when you changed the URL, the script might not be able to be used. Please be careful this.
References:
Class UrlFetchApp
Method: spreadsheets.batchUpdate
Yahoo seems to have made some changes to their website resulting for the IMPORT functions of Google Sheet not to work. This affected some(not all) of their webpage as well as the tickers. Using IMPORTXML will still give you the same error.
I suggest using the built in GOOGLEFINANCE() function or find another website that is scrape-able by IMPORT functions and will give you the same data as you wanted.
Related
Is there a way to get a list of each of the hyperlinks created by the "publish to web" function on google sheets without selecting each tab individually and copying and pasting to a spreadsheet/word document. Ideally the output being all my tab names (circa 200 of them) and the link.
Any help or advice would be greatly appreciated.
If all you wish is tab names then this is a list of tab names:
function getTabNames() {
const ss = SpreadsheetApp.getActive();
Logger.log(ss.getSheets().map(sh => sh.getName()).join(','))
}
You could use openById() if you wish.
I believe your goal is as follows.
You want to receive the Web Published URL for all sheets in a Google Spreadsheet using Google Apps Script.
You want to put the URLs to the Spreadsheet.
Issue and workaround:
When a Google Spreadsheet is published to the web, a URL like https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml?gid=###&single=true is obtained. But, in the current stage, unfortunately, this cannot be retrieved using a script and API. Ref By this, it is required to manually create the URL.
In this answer, I would like to propose 2 patterns for achieving your goal.
Pattern 1:
In this pattern, a URL like https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml?gid=###&single=true is used. 2PACX-### is not the Spreadsheet ID. Please be careful about this.
First, please publish to the web for your Spreadsheet, and retrieve the URL of https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml?gid=###&single=true. In this pattern , https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml from https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml?gid=###&single=true is used.
Please copy and paste the following script to the script editor of Google Spreadsheet. And, please set your https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml to baseUrl. When you use this script, please put a custom function of =SAMPLE(). By this, the URLs are returned.
function SAMPLE() {
const baseUrl = "https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml"; // Please modify this for your URL.
return SpreadsheetApp.getActiveSpreadsheet().getSheets().map(s => `${baseUrl}?single=true&gid=${s.getSheetId()}`);
}
Pattern 2:
In this pattern, the URL like https://docs.google.com/spreadsheets/d/### fileId ###/pubhtml is used. In this case, Spreadsheet ID is used. By this, you are not required to do a hard copy of the URL.
Please copy and paste the following script to the script editor of Google Spreadsheet. When you use this script, please put a custom function of =SAMPLE(). By this, the URLs are returned.
function SAMPLE() {
const ss = SpreadsheetApp.getActiveSpreadsheet();
const baseUrl = `https://docs.google.com/spreadsheets/d/${ss.getId()}/pubhtml`;
return ss.getSheets().map(s => `${baseUrl}?single=true&gid=${s.getSheetId()}`);
}
Note:
In this case, when the sheet is not published, you cannot access the URL. Please be careful about this.
References:
map()
getSheetId()
I need using the link to download the Google Sheet to .xlxs format.
This is works for me. but it is for single tab only, the thing is I have to download three or more tabs. I believe the format of "gid" would be different.
https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=1eVcHMWyH1YIDN_i0iMcv468c4_jnPk9Tw5gea-2FCyk&gid=626501804&exportFormat=xlsx
I believe your goal is as follows.
You want to export a Google Spreadsheet in XLSX format.
You want to include several specific Sheets in the Google Spreadsheet.
In this case, how about the following workaround? In this workaround, a Google Spreadsheet including the several sheets you want to include is created as a temporal Spreadsheet. In this case, as a simple method, Google Apps Script is used for retrieving the URL.
Sample script:
function sample() {
const exportSheetNames = ["Sheet1", "Sheet2", "Sheet3"]; // Please set the sheet names you want to export.
const spreadsheetId = "###"; // Please set your Spreadsheet ID.
const source = SpreadsheetApp.openById(spreadsheetId);
const temp = source.copy("temp_" + source.getName());
temp.getSheets().forEach(s => {
if (!exportSheetNames.includes(s.getSheetName())) temp.deleteSheet(s);
});
const url = `https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=${temp.getId()}&exportFormat=xlsx`;
console.log(url);
}
When this script is run, you can see the URL for exporting the Spreadsheet in XLSX format including the specific sheets you want at the log. From your question, I thought that you might want the URL for exporting.
This is a simple sample script for achieving your goal. For example, if you want to automatically export the XLSX file using a script, you can see the sample script at this thread.
I feel like I've tried every solution out here, and have yet to accomplish this task.
I'm looking to scrape the SECOND (playoffs) table on this link:
https://www.basketball-reference.com/players/c/curryst01/gamelog/2016
The first table comes in very easily using IMPORTHTML, the second however I haven't been able to locate.
I've tried using IMPORTHTML with 100 different tables & lists. I also looked in inspector and did a CTRL F on <table and see the info there.
I read that it could be because it's a Javascript object, but when I turned off Javascript (like someone suggested), I still see the table, which leads me to believe it can definitely be scraped into a Google Sheet.
I tried ImportXML as well, but I'm not as familiar and wasn't able to find the info with that either.
Are there any suggestions on how I could scrape this? Seems bizarre to me that it is this difficult!
Unfortunately, it seems that IMPORTHTML and IMPORTXML cannot be used for retrieving the table you expect. But, fortunately, I noticed that when the HTML is retrieved by Google Apps Script, the HTML data includes the table of the SECOND (playoffs) table you expect. So in this answer, I would like to propose to use Google Apps Script.
Sample script:
Please copy and paste the following script to the script editor of Google Spreadsheet, and please enable Sheets API at Advanced Google services. And, please run myFunction at the script editor. By this, the retrieved table is put to the sheet.
function myFunction() {
const url = "https://www.basketball-reference.com/players/c/curryst01/gamelog/2016"; // This URL is from your question.
const sheetName = "Sheet1"; // Please set the destination sheet name.
const html = UrlFetchApp.fetch(url).getContentText();
const tables = [...html.matchAll(/<table[\s\S\w]+?<\/table>/g)];
if (tables.length > 8) {
const ss = SpreadsheetApp.getActiveSpreadsheet();
Sheets.Spreadsheets.batchUpdate({ requests: [{ pasteData: { html: true, data: tables[8][0], coordinate: { sheetId: ss.getSheetByName(sheetName).getSheetId() } } }] }, ss.getId());
return;
}
throw new Error("Expected table cannot be retrieved.");
}
Result:
When this script is run, the following result can be obtained.
References:
Method: spreadsheets.batchUpdate
PasteDataRequest
I learned I wasn't turning off Javascript properly... well, now the table is gone. So I'm assuming this means it cannot be scraped into Sheets.
Still curious what solutions are out there - I'm currently working on it using ParseHub, but I'd really love to understand how it could be done in Sheets
Try this, it will give you the main table
=importhtml(url,"table",8)
you can also retrieve informations for tables #1 to #7
I have a list of 200 hyperlinks saved on a spreadsheet. Those links are for files (particularly Google Slides files) all saved in Google Drive. They are scattered in sub folders under the same root folder that has ~1500 files
Link 1
Link 2
Link 3
...
Link 200
I want to make a copy of those 200 files only. There is no common search term or filter to pull them up on Google Drive search. So I need to work off that list
Thoughts on doing this? Thanks in advance!
I believe your current situation and your goal as follows.
You have the Spreadsheet including 200 hyperlinks of Google Slides like https://docs.google.com/presentation/d/FILE_ID.
You want to copy the Google Slides to the specific folder in your Google Drive.
You want to achieve this using Google Apps Script.
From the number of hyperlinks, I thought that in this case, the batch request might be useful for your situation. When the batch request is used, the process cost will become low because the batch request is run with the asynchronous process. So, in this answer, I would like to propose to copy the files of 200 hyperlinks using the batch request. The sample script is as follows.
Usage:
1. Install a Google Apps Script library.
In this script, in order to achieve the batch request, a Google Apps Script library is used. Ref I thought that the request of the batch request might be a bit complecate. Ref So I created this library for using the batch request with Google Apps Script. The library's project key is as follows.
1HLv6tWz0oXFOJHerBTP8HsNmhpRqssijJatC92bv9Ym6HSN69_UuzcDk
The method for installing the library can be seen at the official document. Ref
2. Sample script.
Please copy and paste the following script to the script editor of your Google Spreadsheet including 200 hyperlinks like https://docs.google.com/presentation/d/FILE_ID. This script used Drive API. So, before you use this script, please enable Drive API at Advanced Google services. And, run the function "myFunction".
function myFunction() {
const sheetName = "Sheet1"; // Please set the sheet name.
const destinationFolderId = "###"; // Please set the destination folder ID.
// 1. Retrieve file IDs from Spreadsheet.
const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(sheetName);
const fileIds = sheet.getRange("A1:A" + sheet.getLastRow()).getValues().reduce((ar, [a]) => {
if ((/^https:\/\/docs.google.com\/presentation\/d/).test(a)) ar.push(a.split("/")[5]);
return ar;
}, []);
console.log(fileIds) // You can check the retrieved file ID.
// 2. Retrieve the filenames form the file IDs using the "Files: get" method with the batch request.
const requests1 = fileIds.map(id => ({
method: "GET",
endpoint: `https://www.googleapis.com/drive/v3/files/${id}?supportsAllDrives=true`,
}));
const res1 = BatchRequest.EDo({batchPath: "batch/drive/v3", requests: requests1});
console.log(res1) // You can check the retrieved file metadata.
// 3. Copy the files using the file IDs and filenames using the "Files: copy" method with the batch request.
const requests2 = res1 .map(({id, name}) => ({
method: "POST",
endpoint: `https://www.googleapis.com/drive/v3/files/${id}/copy?supportsAllDrives=true`,
requestBody: {parents: [destinationFolderId || "root"], name},
}));
const res2 = BatchRequest.EDo({batchPath: "batch/drive/v3", requests: requests2});
console.log(res2);
}
Note:
In this sample script, it supposes that the 200 hyperlinks are put in the column "A" of "Sheet1". So, please modify this for your actual situation. Please be careful this.
References:
Batch request of official document
Files: get
Files: copy
BatchRequest of Google Apps Script library
Assuming the links look something similar to this and they're stored in the A column:
https://docs.google.com/presentation/d/SLIDE_ID/edit
You can easily extract the slideId from the hyperlink which corresponds to the fileId by making use of this formula (by dragging it down the whole A column):
=REGEXEXTRACT(A1,"[-\w]{25,}")
Finally, in order to copy each file, you can make use of Apps Script’s DriveApp, something similar to this:
DriveApp.getFileById(“fileId”).makeCopy(“destination”);
However, since the fileId corresponds to a range in the sheet, you can pass directly the range - so instead of using “fileId”, you could use this:
let sheet = SpreadsheetApp.openById(“spreadsheetId”).getSheetByName(“sheetName”);
let fileId = sheet.getRange(1,2).getValue();
The snippet above is retrieving the sheet where the links are stored and then by making use of the getRange and getValue methods it retrieves the value from the B1 cell (assuming that the ids of the files will be in the B column after REGEXEXTRACT).
Note
Please bear in mind that you can extract the fileId as well directly in your script, depending on the workaround and programming language .
Reference
Files Class Apps Script;
Spreadsheet Class Apps Script;
Range Class Apps Script;
REGEXEXTRACT function.
Data Source
https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yieldAll
I am trying to get the following data onto a Google Sheet, but it is looking to be tricky to do so using IMPORTXML. Any idea how to do it?
You want to retrieve a table from the HTML data of the URL.
From I am trying to get the following data onto a Google Sheet, I thought like this.
If my understanding is correct, how about this answer?
Issue and workaround:
Unfortunately, it seems that the file size of HTML is large. So when =IMPORTXML("https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yieldAll","//title") is used, an error of Resource at url contents exceeded maximum size. occurs. When I retrieve HTML data from the URL, the size of HTML data was about 9 MB. It is considered that the reason of error is due to this. So as one of workaround, how about using Google Apps Script? In this workaround, the following flow is used.
Retrieve HTML data using UrlFetchApp
Parse the HTML data using Parser which is a GAS library.
Put the parsed data to the active sheet on the Spreadsheet using PasteDataRequest of Sheets API.
Usage:
Preparation:
Please install Parser. About the install of library, you can see it at here.
The project key of the library is M1lugvAXKKtUxn_vdAG9JZleS6DrsjUUV.
Please enable Sheets API at Advanced Google services.
Sample script:
Please copy and paste the following script to the script editor of the container-bound script of the Spreadsheet. After above settings were done, please run the function of myFunction(). When the script is run, the table of HTML is put to the active sheet on the Spreadsheet.
function myFunction() {
// Retrieve HTML data from URL.
var url = "https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yieldAll";
var html = UrlFetchApp.fetch(url).getContentText();
// Parse HTML data.
var table = "<table" + Parser.data(html).from("<table class=\"t-chart\"").to("</table>").build() + "</table>";
// Put the values to the Spreadsheet.
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getActiveSheet();
var resource = {requests: [{pasteData: {html: true, data: table, coordinate: {sheetId: sheet.getSheetId()}}}]};
Sheets.Spreadsheets.batchUpdate(resource, ss.getId());
}
References:
Parser
PasteDataRequest
Advanced Google services
If I misunderstood your question and this was not the direction you want, I apologize.
Updated at April, 23, 2021:
New IDE for Google Apps Script has finally been released at December 7, 2020. Ref By this, in the current stage, in order to install Google Apps Script library, it is required to use the script ID of Google Apps Script project.
In this case, when the Google Apps Script library of Parser is installed, unfortunately, this ID M1lugvAXKKtUxn_vdAG9JZleS6DrsjUUV cannot be used.
So when you use new IDE, please use the following script ID.
1Mc8BthYthXx6CoIz90-JiSzSafVnT6U3t0z_W3hLTAX5ek4w0G_EIrNw
This script ID is the ID of Google Apps Script project of M1lugvAXKKtUxn_vdAG9JZleS6DrsjUUV. By this, the library of Parser can be installed to the new IDE.
About the method for installing the library, you can see the official document.
Reference:
Libraries