How to extract data from website to Google Sheet [duplicate] - html

I try to print historic adjusted close prices from Yahoo finance to Google Sheets.
=ImportXML("https://sg.finance.yahoo.com/quote/"&B57&"/history?p="&B57, "//tbody/tr[21]/td[6]")
Cell B57 is for example "SPY".
This works fine for historic prices up to 100 days. (it is adjusted here: tr[100])
When I try to get prices later 100 days it returns "N/A".
These prices are visible on yahoo finance.
It there a way to adjust XPATH that it works?
I noticed, that in the html code of yahoo pices about 100 days don't have this "data-reactid=1520" in the tr tag.

In the current stage, it seems that your expected values are included in the HTML data as a JSON object for Javascript. In this case, when the JSON object is retrieved with Google Apps Script, the value can be retrieved. When this is reflected in a sample Google Apps Script, how about the following sample script?
Sample script:
Please copy and paste the following script to the script editor of Google Spreadsheet and save the script. When you use this script, please put a custom function of =SAMPLE("https://sg.finance.yahoo.com/quote/SPY/history?p=SPY") to a cell. By this, the script is run.
function SAMPLE(url) {
const html = UrlFetchApp.fetch(url).getContentText().match(/root.App.main = ([\s\S\w]+?);\n/);
if (!html || html.length == 1) return "No data";
const tempObj = JSON.parse(html[1].trim());
const obj = tempObj.context.dispatcher.stores;
const header = ["date", "amount", "open", "high", "low", "close", "adjclose", "volume"];
return [header, ...obj.HistoricalPriceStore.prices
.map(o => header.map(h => {
if (h == "date") {
return new Date(o[h] * 1000)
} else if (h == "amount" && o[h]) {
return `${o[h]} ${o.type}`;
}
return o[h];
}))];
}
Testing:
When this script is run with =SAMPLE("https://sg.finance.yahoo.com/quote/SPY/history?p=SPY"), the following result is obtained.
Note:
The above script is for a custom function. If you want to use this script with the script editor, you can also the following sample script.
function myFunction() {
const url = "https://sg.finance.yahoo.com/quote/SPY/history?p=SPY"; // This URL is from your question.
const html = UrlFetchApp.fetch(url).getContentText().match(/root.App.main = ([\s\S\w]+?);\n/);
if (!html || html.length == 1) return;
const tempObj = JSON.parse(html[1].trim());
const obj = tempObj.context.dispatcher.stores;
const header = ["date", "amount", "open", "high", "low", "close", "adjclose", "volume"];
const values = [header, ...obj.HistoricalPriceStore.prices
.map(o => header.map(h => {
if (h == "date") {
return new Date(o[h] * 1000)
} else if (h == "amount" && o[h]) {
return `${o[h]} ${o.type}`;
}
return o[h];
}))];
const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Sheet1"); // Please set your sheet name.
sheet.getRange(sheet.getLastRow() + 1, 1, values.length, values[0].length).setValues(values);
}
Note:
If const obj = tempObj.context.dispatcher.stores is the salted base64 data, please check this answer.
References:
Custom Functions in Google Sheets
map()

not possible because yahoo site uses JavaScript element - the infinity scroll - which kicks in after 100th value and that's the reason why you can't get past that point. you can test this by disabling JS for a given site and what's left can be scraped:

It's possible with a workaround :
Later than 100 days :
Cell with green background : the code to search
Cells with orange backgound : cells containing formulas
Cells with yellow background : data returned
Formulas used :
=IMPORTXML(A1;"substring-before(substring-after(//script[#id='fc'],'{""prices"":'),',""isPending')")
=SUBSTITUE(SUBSTITUE(SUBSTITUE(A3;"},{";"|");",";";");".";",")
=REGEXREPLACE(A4;"[a-z:{}\[\]""]+";"")
=TRANSPOSE(SPLIT(A5;"|"))
=(((C8/60)/60)/24)+DATE(1970;1;1)
IMPORTXML to import the data.
SUBSTITUE AND REGEXREPLACE to prepare the TRANSPOSE step.
TRANSPOSE to "build" the lines and SPLIT to "build" the columns.
DATE to transform timestamp to date.
Sheet

Answer:
IMPORTXML can not retrieve data which is populated by a script, and so using this formula to retrieve data from this table is not possible to do.
More Information:
As the first 100 values are loaded into the page without the use of JavaScript (as you can see by disabling JavaScript for https://sg.finance.yahoo.com/quote/SPY/history?p=SPY and reloading the page), the information can be retrieved by IMPORTXML.
As the data after the first 100 results is generated on-the-fly after scrolling down the page, the newly available data is not retrievable by IMPORTXML - as far as the formula sees, there is no 101st <tr> element and so it displays N/A: Imported content is empty .
References:
IMPORTXML - Docs Editors Help
Related Questions:
Google Sheets importXML Returns Empty Value

Related

How to set a named range for a data validation programmatically (in Google apps script) in a Google spreadsheet?

Use Case
Example. I have a named range Apples (address "Sheet10!B2:B"), which in use for data validation for plenty of sheet cells. The data range for Apples can be changed (in a script), e.g. to "Sheet10!D2:D".
It works from UI
I can set manually a named range as a data source of data validation.
In this case, the data validation of a cell will always refer to the named range Apples with updated the data range.
How to make it in Google Apps Script?
GAS Limits
The code, for setting data validation, should look like this, if you have a namedRange object:
mySheet.getRange('F5')
.setDataValidation(
SpreadsheetApp.newDataValidation()
.requireValueInRange(
namedRange.getRange()
)
.setAllowInvalid(false)
.build()
);
DataValidationBuilder.requireValueInRange() does not work here as it requires only class Range (it cannot get NamedRange), and no reference to a named range will be used.
Is there a workaround or so?
UPD1 - Spreadsheet.getRangeByName() does not work
Getting range by name does not help, the data validation will get actual range address.
SpreadsheetApp.getActive().getRangeByName("Apples")
UPD2 No way to make it so far in GAS
As #TheMaster posted, it's not possible at this moment.
Please set +1 for posts:
https://issuetracker.google.com/issues/143913035
https://issuetracker.google.com/issues/203557342
P.S. It looks like the only solution will work is Google Sheets API.
I thought that in your situation, I thought that when Sheets API is used, your goal might be able to be used.
Workaround 1:
This workaround uses Sheets API.
Usage:
1. Prepare a Google Spreadsheet.
Please create a new Google Spreadsheet.
From Example. I have a named range Apples (address "Sheet10!B2:B"), which in use for data validation for plenty of sheet cells. The data range for Apples can be changed (in a script), e.g. to "Sheet10!D2:D"., please insert a sheet of "Sheet10" and put sample values to the cells "B2:B" and "D2:D".
Please set the named range Sheet10!B2:B as Apple.
2. Sample script.
Please copy and paste the following script to the script editor of Spreadsheet and save the script. And, please enable Sheets API at Advanced Google services.
function myFunction() {
const namedRangeName = "Apple"; // Please set the name of the named range.
const ss = SpreadsheetApp.getActiveSpreadsheet();
const sheet = ss.getSheetByName("Sheet10");
const requests = [{ updateCells: { range: { sheetId: sheet.getSheetId(), startRowIndex: 0, endRowIndex: 1, startColumnIndex: 0, endColumnIndex: 1 }, rows: [{ values: [{ dataValidation: { condition: { values: [{ userEnteredValue: "=" + namedRangeName }], type: "ONE_OF_RANGE" }, showCustomUi: true } }] }], fields: "dataValidation" } }];
Sheets.Spreadsheets.batchUpdate({ requests }, ss.getId());
}
In this request, the name of the named range is directly put to userEnteredValue.
3. Testing.
When this script is run to the above sample Spreadsheet, the following result is obtained.
When this demonstration is seen, first, you can see the named range of "Apple" which has the cells "B1:B1000". When a script is run, data validation is put to the cell "A1" with the named range of "Apple". In this case, the values of data validation indicate "B1:B1000". When the range named range "Apple" is changed from "B1:B1000" to "D1:D1000" and the data validation of "A1" is confirmed, it is found that the values are changed from "B1:B1000" to "D1:D1000".
Workaround 2:
This workaround uses the Google Spreadsheet service (SpreadsheetApp). In the current stage, it seems that the Google Spreadsheet service (SpreadsheetApp) cannot directly achieve your goal. This has already been mentioned in the discussions in the comment and TheMaster's answer. When you want to achieve this, how about checking whether the range of the named range is changed using OnChange as following workaround 2?
Usage:
1. Prepare a Google Spreadsheet.
Please create a new Google Spreadsheet.
From Example. I have a named range Apples (address "Sheet10!B2:B"), which in use for data validation for plenty of sheet cells. The data range for Apples can be changed (in a script), e.g. to "Sheet10!D2:D"., please insert a sheet of "Sheet10" and put sample values to the cells "B2:B" and "D2:D".
Please set the named range Sheet10!B2:B as Apple.
2. Sample script.
Please copy and paste the following script to the script editor of Spreadsheet and save the script. And, please install OnChange trigger to the function onChange.
First, please run createDataValidation. By this, data validation is put to the cell "A1" of "Sheet10". In this case, the set range is the range retrieved from the named range "Apple". So, in this case, the range is Sheet10!B2:B1000.
As the next step, please change the range of the named range from Sheet10!B2:B1000 to Sheet10!D2:D1000. By this, onChange` function is automatically run by the installed OnChange trigger. By this, the data validation of "A2" is updated. By this, the values of data validation are changed.
const namedRangeName = "Apple"; // Please set the name of the named range.
const datavalidationCell = "Sheet10!A2"; // As a sample. data validation is put to this cell.
function onChange(e) {
if (e.changeType != "OTHER") return;
const range = e.source.getRangeByName(namedRangeName);
const a1Notation = `'${range.getSheet().getSheetName()}'!${range.getA1Notation()}`;
const prop = PropertiesService.getScriptProperties();
const previousRange = prop.getProperty("previousRange");
if (previousRange != a1Notation) {
const rule = SpreadsheetApp.newDataValidation().requireValueInRange(e.source.getRangeByName(namedRangeName)).setAllowInvalid(false).build();
e.source.getRange(datavalidationCell).setDataValidation(rule);
}
prop.setProperty("previousRange", a1Notation);
}
// First, please run this function.
function createDataValidation() {
const ss = SpreadsheetApp.getActiveSpreadsheet();
const rule = SpreadsheetApp.newDataValidation().requireValueInRange(ss.getRangeByName(namedRangeName)).setAllowInvalid(false).build();
ss.getRange(datavalidationCell).setDataValidation(rule);
const prop = PropertiesService.getScriptProperties();
const range = ss.getRangeByName(namedRangeName);
const a1Notation = `'${range.getSheet().getSheetName()}'!${range.getA1Notation()}`;
prop.setProperty("previousRange", a1Notation);
}
References:
Method: spreadsheets.batchUpdate
UpdateCellsRequest
DataValidationRule
Currently, This seems to be impossible. This is however a known issue. +1 this feature request, if you want this implemented.
https://issuetracker.google.com/issues/143913035
Workarounds from the tracker issue creator:
If a validation rule is manually created with a NamedRange via the Sheets GUI, it can then be copied programmatically using Range.getDataValidations(), and subsequently used to programmatically create new DataValidations. DataValidations created this way maintain their connection to the NamedRange, and behave like their manually created counterparts. This demonstrates that the functionality to 'use' NamedRanges for data validation rules is already possible with Apps Scripts, but not the option to 'create' them.
As a half-answer, if you want just validation and can live without the drop-down list of valid values, you can programmatically set a custom formula that references the named range. This reference to the named range will not get expanded in the AppsScript, so future changes to the Named Range's actual range will percolate to the validator. Like so:
mySheet.getRange('F5')
.setDataValidation(
SpreadsheetApp.newDataValidation()
.requireFormulaSatisfied(
'=EQ(F5, VLOOKUP(F5, ' + namedRange.getName() + ', 1))'
)
.setAllowInvalid(false)
.build()
);
(The formula just checks that the value in the cell being tested is equal to what VLOOKUP finds for that cell, in the first column -- I'm assuming the named range content is sorted.)
Use getRangeByName()
function lfunko() {
const ss = SpreadsheetApp.getActive();
const sh = ss.getSheetByName("Sheet0");
var cell = sh.getRange(1, 10);//location where datavalidation is applied
var rule = SpreadsheetApp.newDataValidation().requireValueInRange(ss.getRangeByName("MyList")).build();
cell.setDataValidation(rule);
}

How to output a hyperlink from Apps Script function into Google Sheet

My first attempt was raw html, but that clearly didn't work.
I found that I'm supposed to use rich text, so I tried:
function youtubeLink(yt_id, start_stamp, end_stamp) {
const start_secs = toSecs(start_stamp)
const end_secs = toSecs(end_stamp)
const href = `https://www.youtube.com/embed/${yt_id}?start=${start_secs}&end=${end_secs}`
return (
SpreadsheetApp.newRichTextValue()
.setText("Youtube Link")
.setLinkUrl(href)
.build()
)
}
I'm calling with:
=youtubeLink(A1,A2,A3)
But that didn't work at all. The field just stayed blank.
I tried with a range, but got a circular reference. It seems like this should be easy. Not sure what I'm missing.
This works, but it is auto-formated and the link text is the same as the link:
function youtubeLink(yt_id, start_stamp, end_stamp) {
const start_secs = toSecs(start_stamp)
const end_secs = toSecs(end_stamp)
return (`https://www.youtube.com/embed/${yt_id}?start=${start_secs}&end=${end_secs}`)
}
Unfortunately, the custom function cannot directly put the RichtextValue and the built-in function to the cell. In this case, that is put as a string value. So, in this case, it is required to use a workaround. In this answer, I would like to propose the following 2 patterns.
Pattern 1:
If you want to use the functions of Spreadsheet, how about the following sample formula?
=HYPERLINK("https://www.youtube.com/embed/"&A1&"?start="&toSecs(B1)&"&end="&toSecs(C1),"Youtube Link")
In this case, the cells "A1", "B1" and "C1" are yt_id, start_stamp, end_stamp, respectively.
The function of toSecs is used from Google Apps Script.
Pattern 2:
If you want to use Google Apps Script, how about the following sample script? In this case, this script supposes that the values of yt_id, start_stamp, end_stamp are put in the cells "A1", "B1", and "C1", respectively. Please be careful about this.
function sample() {
const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Sheet1"); // Please set the sheet name.
const [yt_id, start_stamp, end_stamp] = sheet.getRange("A1:C1").getValues()[0];
const start_secs = toSecs(start_stamp);
const end_secs = toSecs(end_stamp);
const href = `https://www.youtube.com/embed/${yt_id}?start=${start_secs}&end=${end_secs}`
const richtextValue = SpreadsheetApp.newRichTextValue().setText("Youtube Link").setLinkUrl(href).build();
sheet.getRange("D1").setRichTextValue(richtextValue);
}
When this script is run, the values of yt_id, start_stamp, end_stamp are retrieved from the cells "A1", "B1" and "C1", and the text with the hyperlink is put to the cell "D1".
Reference:
setRichTextValue(value)

Converting Google Sheets IF formula to Apps script

As a test, I have entered the following formula in cell K2 of my spreadsheet: =IF($M2=today(),"Today"). This acheives the desired effect and I would like this to be applied to all the rows below (with m3 referring to k3 , m4 to k4 etc.) , HOWEVER, this sheet is updated via Google Form so I cannot leave a formula in these cells as it will be overwritten.
Therefore I need to write and run the formula in apps script but, whilst I have enough knowledge of script language to do write basic If functions, this one is beyond my skills.
I have referred to this: How to get range and then set value in Google Apps Script and tried to adapt it to my purposes but to no avail.
Could someone please enlighten me?
Try this script:
function isMToday() {
sheet = SpreadsheetApp.getActiveSheet();
lastRow = sheet.getLastRow();
// get M2:M range
mRange = sheet.getRange(2, 13, lastRow - 1, 1);
// get display values instead to avoid timezone issues
mValues = mRange.getDisplayValues();
today = new Date();
// check every mValue, if today, return today, else false
output = mValues.map(mValue => {
mValueDate = new Date(mValue);
if (mValueDate.getDate() == today.getDate() &&
mValueDate.getMonth() == today.getMonth() &&
mValueDate.getFullYear() == today.getFullYear())
return ["Today"];
else
return [false];
});
// write output to K2:K
mRange.offset(0, -2).setValues(output);
}
After execution:
You do not need AppScript to do this even though it is on a Form Responses tab.
Instead of this formula in cell K2 (as you described):
=IF($M2=today(),"Today")
Use this formula in cell K1, and delete all the other formulas in column K:
=ARRAYFORMULA(IF(ROW(M:M)=1,"Today?",IF(M:M=TODAY(),"Today")))

Google Sheets Print All IDs in one click

I'm working on a student information template and I'm wondering if it's possible to print all of the data for each student in one go? I used data validation on my spreadsheet to modify the student ID so that their data will be easily view and print. Because I have to print it one by one for each pupil, this is a time-consuming process so I come up with this kind of flow to save time. Is it possible?
Please see the sample spreadsheet.
Upon printing, there should be a different page per student.
I believe your goal is as follows.
You want to reduce the cost of printing the data from the Spreadsheet.
I thought that when the Spreadsheet is printed out using Google Apps Script, the Google Cloud Print can achieve this. But I thought that in this case, the settings might be a bit complicated. So, in this case, I would like to propose a workaround. How about the following flow?
Retrieve the values from the source sheet and create the output values as the array.
Create a new Spreadsheet and put each value on each page.
Print out the Spreadsheet.
By this flow, you can print out all pages by one manual process.
The sample script is as follows.
Sample script:
Please copy and paste the following script to the script editor of Google Spreadsheet. And run myFunction.
function myFunction() {
const rowHeader = ["STUD ID", "LAST NAME", "FIRST NAME", "MIDDLE NAME", "NAME EXTENSION"];
const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("DATA");
const [header, ...values] = sheet.getDataRange().getValues();
const ar = values.map(r => header.reduce((o, h, i) => Object.assign(o, { [h.toUpperCase()]: r[i] }), {}));
const newValues = ar.map(e => [rowHeader, ...[rowHeader.map(f => e[f])]]).map(e => e[0].map((_, c) => e.map(r => r[c])));
const ss = SpreadsheetApp.create("tempSpreadsheet");
newValues.forEach((v, i) => (i == 0 ? ss.getSheets()[0] : ss.insertSheet()).getRange(1, 1, v.length, v[0].length).setValues(v));
}
When you run this script, a new Spreadsheet of tempSpreadsheet is created to the root folder. When you open it, you can see the expected values for each worksheet. By this, you can print out them.
Note:
This sample script is prepared from your sample Spreadsheet. So when your sample Spreadsheet is different from your actual situation, this sample script might not be able to be used. Please be careful about this.
References:
reduce()
map()
create(name)

Modifying Google Chart - Vertical Axis 'number format' reverts to "from source data" after I update my chart title by script (shows date) dynamically

I'm trying to update the title of my chart through the code so it is changed dynamically. When this happens, the title changes successfully, but the number format for the vertical axis changes to dates (changes to from source data).
I'm not sure where it's getting the dates either because the source data is set to Number. Changing it to 'format by' anything else using the script has no effect. It just ignores it.
Even a macro doesn't work. When I try to do this with a macro I get the error: "Exception: Unexpected error while getting the method or property setOption on object SpreadsheetApp.EmbeddedChartBuilder."
I have this code that runs when the button is pressed. It changes the title of the chart to the value in K1 but I don't understand why it changes the vertical axis to dates as well. I just want it to remain at "none" or "decimal":
function updateTitle() {
var sheet = SpreadsheetApp.getActive().getActiveSheet();
var chart = sheet.getCharts()[0];
var title = sheet.getRange('K1').getValue();
chart = chart.modify()
.setOption('title', title || 'Empty')
.setOption('vAxis.format', 'none')
.build();
sheet.updateChart(chart);
}
I'm thinking the formatting of the code must be incorrect, is that true? I've searched all over the Internet, and I've found tons of different ways to format modifying google charts, and have tried many different variations. I'm not getting it right and am kind of lost about what format is correct and why what I'm doing isn't working.
Here is an example of my chart: https://docs.google.com/spreadsheets/d/16vmDv5sJvle4hLXaz1ZTgkP8V2-5dlWJvdJ9eqPxc-w/edit?usp=sharing
Thank you
Issue and workaround:
I have experienced the same situation with your situation. In this case, even when the following simple script is run, the number format is changed.
var sheet = SpreadsheetApp.getActive().getActiveSheet();
var chart = sheet.getCharts()[0];
sheet.updateChart(chart);
But, unfortunately, at that time, although I had looked for the method for keeping the number format for updateChart, I couldn't find it. So in this answer, I would like to propose to use Sheets API for your situation. In my experience, I had confirmed that when Sheets API is used, the title can be updated without changing the number format of the vertical axis.
When your script is modified, it becomes as follows.
Sample script:
Before you use this script, please enable Sheets API at Advanced Google services.
function updateTitle() {
var ss = SpreadsheetApp.getActive();
var sheet = ss.getActiveSheet();
var chart = sheet.getCharts()[0];
var title = sheet.getRange('K1').getValue();
var chartId = chart.getChartId();
sheet.getRange("F3:H5").setNumberFormat("0");
SpreadsheetApp.flush();
var sheets = Sheets.Spreadsheets.get(ss.getId(), {fields: "sheets(charts)"}).sheets;
var c = sheets.reduce((ar, s) => {
var temp = s.charts.filter(c => c.chartId == chartId);
if (temp.length == 1) ar = ar.concat(temp);
return ar;
}, []);
if (c.length == 1) {
var chartObj = c[0];
delete chartObj.position;
chartObj.spec.title = title || 'Empty';
Sheets.Spreadsheets.batchUpdate({requests: [{updateChartSpec: chartObj}]}, ss.getId());
}
}
Result:
When above script is used for sample chart of your sample Spreadsheet, the following result is obtained.
Note:
In this case, the number format of the cells "F3:H5" is reflected to the number format of the vertical axis. So when you want to use the format like 1.00, 2.00,,,, please remove sheet.getRange("F3:H5").setNumberFormat("0") and SpreadsheetApp.flush() from above script.
In the case of above script, because fields cannot be used for UpdateChartSpecRequest], by this, at first, it is required to retrieve all objects of the chart and modify the object, and then, put the modified object to the chart. This situation has already been reported to Google issue tracker as the future request. Ref
References:
Method: spreadsheets.get
Method: spreadsheets.batchUpdate
UpdateChartSpecRequest
If you change the vertical axis format in any way after the graph was first made, you must leave the vertical axis format in its virgin state and never change it.