I am new to decoding hence I am unable to decode below response while using UrlFetchApp.
Below is the function I am using to fetch data but getting values in unicode which I need to decode. Any function that I can use to decode this?
function scrape() {
var url = UrlFetchApp.fetch("https://immi.homeaffairs.gov.au/visas/working-in-australia/skillselect/invitation-rounds");
var elements =/id="ctl00_PlaceHolderMain_PageSchemaHiddenField_Input" (.*?)\/>/gim;
var x = url.getContentText().match(elements);
Logger.log(x);
}
Although I'm not sure whether this is the best way, how about this modification?
Modified script:
function scrape() {
var url = UrlFetchApp.fetch("https://immi.homeaffairs.gov.au/visas/working-in-australia/skillselect/invitation-rounds");
var elements =/id="ctl00_PlaceHolderMain_PageSchemaHiddenField_Input" (.*?)\/>/gim;
var x = url.getContentText().match(elements);
var res = unescape(x[0].replace(/\\u/g, "%u")); // Added
Logger.log(res)
}
Result:
When above modified script is used, as a sample, the values are converted as follows.
From:
\u003cp\u003eThe table below shows the number of invitations issued in the SkillSelect invitation round on 11 September 2018.\u003c/p\u003e\n\n\u003ch3\u003eInvitations issued on 11 September 2018\u003c/h3\u003e\n\n
To:
<p>The table below shows the number of invitations issued in the SkillSelect invitation round on 11 September 2018.</p>\n\n<h3>Invitations issued on 11 September 2018</h3>\n\n
References:
unescape()
replace()
If I misunderstand your question, I'm sorry.
function scrape() {
var url = UrlFetchApp.fetch("https://immi.homeaffairs.gov.au/visas/working-in-australia/skillselect/invitation-rounds");
var elements =/id="ctl00_PlaceHolderMain_PageSchemaHiddenField_Input" (.*?)\/>/gim;
var x = url.getContentText().match(elements);
var res = unescape(x[0].replace(/\\u/g, "%u")); // Added
Logger.log(res)
}
Related
I have built a simple custom function in Apps Script using URLFetchApp to get the follower count for TikTok accounts.
function tiktok_fans() {
var raw_data = new RegExp(/("followerCount":)([0-9]+)/g);
var handle = '#charlidamelio';
var web_content = UrlFetchApp.fetch('https://www.tiktok.com/'+ handle + '?lang=en').getContentText();
var match_text = raw_data.exec(web_content);
var result = (match_text[2]);
Logger.log(result)
return result
}
The Log comes back with the correct number for followers.
However, when I change the code to;
function tiktok_fans(handle) {
var raw_data = new RegExp(/("followerCount":)([0-9]+)/g);
//var handle = '#charlidamelio';
var web_content = UrlFetchApp.fetch('https://www.tiktok.com/'+ handle + '?lang=en').getContentText();
var match_text = raw_data.exec(web_content);
var result = (match_text[2]);
Logger.log(result)
return result
}
and use it in a spreadsheet for example =tiktok_fans(A1), where A1 has #charlidamelio I get an #ERROR response in the cell
TypeError: Cannot read property '2' of null (line 6).
Why does it work in the logs but not in the spreadsheet?
--additional info--
Still getting the same error after testing #Tanaike answer below, "TypeError: Cannot read property '2' of null (line 6)."
Have mapped out manually to see the error, each time the below runs, a different log returns "null". I believe this is to do with the ContentText size/in the cache. I have tried utilising Utilities.sleep() in between functions with no luck, I still get null's.
code
var raw_data = new RegExp(/("followerCount":)([0-9]+)/g);
//tiktok urls
var qld = UrlFetchApp.fetch('https://www.tiktok.com/#thisisqueensland?lang=en').getContentText();
var nsw = UrlFetchApp.fetch('https://www.tiktok.com/#visitnsw?lang=en').getContentText();
var syd = UrlFetchApp.fetch('https://www.tiktok.com/#sydney?lang=en').getContentText();
var tas = UrlFetchApp.fetch('https://www.tiktok.com/#tasmania?lang=en').getContentText();
var nt = UrlFetchApp.fetch('https://www.tiktok.com/#ntaustralia?lang=en').getContentText();
var nz = UrlFetchApp.fetch('https://www.tiktok.com/#purenz?lang=en').getContentText();
var aus = UrlFetchApp.fetch('https://www.tiktok.com/#australia?lang=en').getContentText();
var vic = UrlFetchApp.fetch('https://www.tiktok.com/#visitmelbourne?lang=en').getContentText();
//find folowers with regex
var match_qld = raw_data.exec(qld);
var match_nsw = raw_data.exec(nsw);
var match_syd = raw_data.exec(syd);
var match_tas = raw_data.exec(tas);
var match_nt = raw_data.exec(nt);
var match_nz = raw_data.exec(nz);
var match_aus = raw_data.exec(aus);
var match_vic = raw_data.exec(vic);
Logger.log(match_qld);
Logger.log(match_nsw);
Logger.log(match_syd);
Logger.log(match_tas);
Logger.log(match_nt);
Logger.log(match_nz);
Logger.log(match_aus);
Logger.log(match_vic);
Issue:
From your situation, I remembered that the request of UrlFetchApp with the custom function is different from the request of UrlFetchApp with the script editor. So I thought that the reason for your issue might be related to this thread. https://stackoverflow.com/a/63024816 In your situation, your situation seems to be the opposite of this thread. But, it is considered that this issue is due to the specification of the site.
In order to check this difference, I checked the file size of the retrieved HTML data.
The file size of HTML data retrieved by UrlFetchApp executing with the script editor is 518k bytes.
The file size of HTML data retrieved by UrlFetchApp executing with the custom function is 9k bytes.
It seems that the request of UrlFetchApp executing with the custom function is the same as that of UrlFetchApp executing withWeb Apps. The data of 9k bytes are retrieved by using this.
From the above result, it is found that the retrieved HTML is different between the script editor and the custom function. Namely, the HTML data retrieved by the custom function doesn't include the regex of ("followerCount":)([0-9]+). By this, such an error occurs. I thought that this might be the reason for your issue.
Workaround:
When I tested your situation with Web Apps and triggers, the same issue occurs. By this, in the current stage, I thought that the method for automatically executing the script might not be able to be used. So, as a workaround, how about using a button and the custom menu? When the script is run by the button and the custom menu, the script works. It seems that this method is the same as that of the script editor.
The sample script is as follows.
Sample script:
Before you run the script, please set range. For example, please assign this function to a button on Spreadsheet. When you click the button, the script is run. In this sample, it supposes that the values like #charlidamelio are put to the column "A".
function sample() {
var range = "A2:A10"; // Please set the range of "handle".
var raw_data = new RegExp(/("followerCount":)([0-9]+)/g);
var sheet = SpreadsheetApp.getActiveSheet();
var r = sheet.getRange(range);
var values = r.getValues();
var res = values.map(([handle]) => {
if (handle != "") {
var web_content = UrlFetchApp.fetch('https://www.tiktok.com/'+ handle + '?lang=en').getContentText();
var match_text = raw_data.exec(web_content);
return [match_text[2]];
}
return [""];
});
r.offset(0, 1).setValues(res);
}
When this script is run, the values are retrieved from the URL and put to the column "B".
Note:
This is a simple script. So please modify it for your actual situation.
Reference:
Related thread.
UrlFetchApp request fails in Menu Functions but not in Custom Functions (connecting to external REST API)
Added:
About the following additional question,
whilst this works for 1 TikTok handle, when trying to run a list of multiple it fails each time, with the error TypeError: Cannot read property '2' of null. After doing some investigating and manually mapping out 8 handles, I can see that each time it runs, it returns "null" for one or more of the web_content variables. Is there a way to slow the script down/run each UrlFetchApp one at a time to ensure each returns content?
i've tried this and still getting an error. Have tried up to 10000ms. I've added some more detail to the original question, hope this makes sense as to the error. It is always in a different log that I get nulls, hence why I think it's a timing or cache issue.
In this case, how about the following sample script?
Sample script:
In this sample script, when the value cannot be retrieved from the URL, the value is tried to retrieve again as the retry. This sample script uses the 2 times as the retry. So when the value cannot be retrieved by 2 retries, the empty value is returned.
function sample() {
var range = "A2:A10"; // Please set the range of "handle".
var raw_data = new RegExp(/("followerCount":)([0-9]+)/g);
var sheet = SpreadsheetApp.getActiveSheet();
var r = sheet.getRange(range);
var values = r.getValues();
var res = values.map(([handle]) => {
if (handle != "") {
var web_content = UrlFetchApp.fetch('https://www.tiktok.com/'+ handle + '?lang=en').getContentText();
var match_text = raw_data.exec(web_content);
if (!match_text || match_text.length != 3) {
var retry = 2; // Number of retry.
for (var i = 0; i < retry; i++) {
Utilities.sleep(3000);
web_content = UrlFetchApp.fetch('https://www.tiktok.com/'+ handle + '?lang=en').getContentText();
match_text = raw_data.exec(web_content);
if (match_text || match_text.length == 3) break;
}
}
return [match_text && match_text.length == 3 ? match_text[2] : ""];
}
return [""];
});
r.offset(0, 1).setValues(res);
}
Please adjust the value of retry and Utilities.sleep(3000).
This works for me as a Custom Function:
function MYFUNK(n=2) {
const url = 'my website url'
const re = new RegExp(`<p id="un${n}.*\/p>`,'g')
const r = UrlFetchApp.fetch(url).getContentText();
const v = r.match(re);
Logger.log(v);
return v;
}
I used my own website and I have several paragraphs with ids from un1 to un7 and I'm taking the value of A1 for the only parameter. It returns the correct string each time I change it.
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I'm attempting to scrape options pricing data from Yahoo Finance in Google Sheets. Although I'm able to pull the options chain just fine, i.e.
=IMPORTHTML("https://finance.yahoo.com/quote/TCOM/options?date=1610668800","table",2)
I find that it's returning results that don't completely match what's actually shown on Yahoo Finance. Specifically, the scraped results are incomplete - they're missing some strikes. i.e. the first 5 rows of the chart may match, but then it will start returning only every other strike (aka skipping every other strike).
Why would IMPORTHTML be returning "abbreviated" results, which don't match what's actually shown on the page? And more importantly, is there some way to scrape complete data (i.e. that doesn't skip some portion of the available strikes)?
In Yahoo finance, all data are available in a big json called root.App.main. So to get the complete set of data, proceed as following
var source = UrlFetchApp.fetch(url).getContentText()
var jsonString = source.match(/(?<=root.App.main = ).*(?=}}}})/g) + '}}}}'
var data = JSON.parse(jsonString)
You can then choose to fetch the informations you need. Take a copy of this example https://docs.google.com/spreadsheets/d/1sTA71PhpxI_QdGKXVAtb0Rc3cmvPLgzvXKXXTmiec7k/copy
edit
if you want to get a full list of available data, you can retrieve it by this simple script
// mike.steelson
let result = [];
function getAllDataJSON(url = 'https://finance.yahoo.com/quote/TCOM/options?date=1610668800') {
var source = UrlFetchApp.fetch(url).getContentText()
var jsonString = source.match(/(?<=root.App.main = ).*(?=}}}})/g) + '}}}}'
var data = JSON.parse(jsonString)
getAllData(eval(data),'data')
var sh = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet()
sh.getRange(1, 1, result.length, result[0].length).setValues(result);
}
function getAllData(obj,id) {
const regex = new RegExp('[^0-9]+');
for (let p in obj) {
var newid = (regex.test(p)) ? id + '["' + p + '"]' : id + '[' + p + ']';
if (obj[p]!=null){
if (typeof obj[p] != 'object' && typeof obj[p] != 'function'){
result.push([newid, obj[p]]);
}
if (typeof obj[p] == 'object') {
getAllData(obj[p], newid );
}
}
}
}
Here's a simpler way to get the last market price of a given option. Add this function to you Google Sheets Script Editor.
function OPTION(ticker) {
var ticker = ticker+"";
var URL = "finance.yahoo.com/quote/"+ticker;
var html = UrlFetchApp.fetch(URL).getContentText();
var count = (html.match(/regularMarketPrice/g) || []).length;
var query = "regularMarketPrice";
var loc = 0;
var n = parseInt(count)-2;
for(i = 0; i<n; i++) {
loc = html.indexOf(query,loc+1);
}
var value = html.substring(loc+query.length+9, html.indexOf(",", loc+query.length+9));
return value*100;
}
In your google sheets input the Yahoo Finance option ticker like below
=OPTION("AAPL210430C00060000")
I believe your goal as follows.
You want to retrieve the complete table from the URL of https://finance.yahoo.com/quote/TCOM/options?date=1610668800, and want to put it to the Spreadsheet.
Issue and workaround:
I could replicate your issue. When I saw the HTML data, unfortunately, I couldn't find the difference of HTML between the showing rows and the not showing rows. And also, I could confirm that the complete table is included in the HTML data. By the way, when I tested it using =IMPORTXML(A1,"//section[2]//tr"), the same result of IMPORTHTML occurs. So I thought that in this case, IMPORTHTML and IMPORTXML might not be able to retrieve the complete table.
So, in this answer, as a workaround, I would like to propose to put the complete table parsed using Sheets API. In this case, Google Apps Script is used. By this, I could confirm that the complete table can be retrieved by parsing the HTML table with Sheet API.
Sample script:
Please copy and paste the following script to the script editor of Spreadsheet, and please enable Sheets API at Advanced Google services. And, please run the function of myFunction at the script editor. By this, the retrieved table is put to the sheet of sheetName.
function myFunction() {
// Please set the following variables.
const url ="https://finance.yahoo.com/quote/TCOM/options?date=1610668800";
const sheetName = "Sheet1"; // Please set the destination sheet name.
const sessionNumber = 2; // Please set the number of session. In this case, the table of 2nd session is retrieved.
const html = UrlFetchApp.fetch(url).getContentText();
const section = [...html.matchAll(/<section[\s\S\w]+?<\/section>/g)];
if (section.length >= sessionNumber) {
if (section[sessionNumber].length == 1) {
const table = section[sessionNumber][0].match(/<table[\s\S\w]+?<\/table>/);
if (table) {
const ss = SpreadsheetApp.getActiveSpreadsheet();
const body = {requests: [{pasteData: {html: true, data: table[0], coordinate: {sheetId: ss.getSheetByName(sheetName).getSheetId()}}}]};
Sheets.Spreadsheets.batchUpdate(body, ss.getId());
}
} else {
throw new Error("No table.");
}
} else {
throw new Error("No table.");
}
}
const sessionNumber = 2; means that 2 of =IMPORTHTML("https://finance.yahoo.com/quote/TCOM/options?date=1610668800","table",2).
References:
Method: spreadsheets.batchUpdate
PasteDataRequest
I am trying to develop a program that automatically fills out a google form using the data provided in google sheets.
This is my code.
function auto_data_entry() {
var formURL = "(URL of the form would be put here)";
var workbook = SpreadsheetApp.getActiveSpreadsheet();
var worksheet = workbook.getSheetByName("Sheet1");
var full_name = worksheet.getRange("A2").getValue();
var year = worksheet.getRange("B2").getValue();
var month = worksheet.getRange("C2").getValue();
var day = worksheet.getRange("D2").getValue();
var period = worksheet.getRange("E2").getValue();
var datamap =
{
"entry.1901360617": full_name,
"entry.43103907_year": year,
"entry.43103907_month": month,
"entry.43103907_day": day,
"entry.1047848587": period
};
var options =
{
"method": "post",
"payload": datamap
};
UrlFetchApp.fetch(formURL, options); //Line 27
}
However, it returns...
Exception: Request failed for https://docs.google.com returned code 401.
Truncated server response: <!DOCTYPE html><html lang="en"><head><meta name="description"
content="Web word processing, presentations and spreadsheets"><meta name="viewport" c...
(use muteHttpExceptions option to examine full response) (line 27, file "Code")
Is the problem that I am using a school owned google account or that there is an error with my code.
I am very lost and would appreciate it if someone could help out.
There is no need to use UrlFetchApp because you can use the Class FormResponse and the Class ItemResponse. This code will help you with your issue:
function autoDataEntry() {
// Get the desire form with its questions and create
// a response to later be submitted
var form = FormApp.openById("YOUR-FORM-ID");
var formResponse = form.createResponse();
var formQuestions = form.getItems();
var workbook = SpreadsheetApp.getActiveSpreadsheet();
var worksheet = workbook.getSheetByName("Sheet1");
// Get all the needed values in the second row
var answers = worksheet.getRange("A2:E2").getValues();
answers[0].forEach((answer, answerNumber) => {
// Get the question depending of its type
var question = getQuestion(formQuestions, answerNumber);
// Create the response to your question with the value obtained in the sheet
var formAnswer = question.createResponse(answer);
// Add the answer to the response
formResponse.withItemResponse(formAnswer);
});
// submit the form response
formResponse.submit();
}
What I did was to get the form where you want to send your response and the sheet where the answers are. Then I iterated through those answers to add them to the respective question, which would be added to the form response. When that process is finished, then you only need to submit the form response.
Edit
I modified my code by adding the following function and calling it inside the forEach in my autoDataEntry function:
// This function will return the question as the requiered type
function getQuestion(formQuestions, answerNumber){
var questionType = formQuestions[answerNumber].getType();
switch(questionType){
case FormApp.ItemType.TEXT:
return formQuestions[answerNumber].asTextItem();
case FormApp.ItemType.MULTIPLE_CHOICE:
return formQuestions[answerNumber].asMultipleChoiceItem();
case FormApp.ItemType.DATE:
return formQuestions[answerNumber].asDateItem();
}
}
In that way, you will get the proper question type as the situation requires as long you have set it as a condition in the switch statement. You can see all types in Enum ItemType.
I am trying to scrape a table of price data from this website using the following code;
function scrapeData() {
// Retrieve table as a string using Parser.
var url = "https://stooq.com/q/d/?s=barc.uk&i=d";
var fromText = '<td align="center" id="t03">';
var toText = '</td>';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser.data(content).from(fromText).to(toText).build();
//Parse table using XmlService.
var root = XmlService.parse(scraped).getRootElement();
}
I have taken this method from an approach I used in a similar question here however its failing on this particular url and giving me the error;
Error on line 1: Content is not allowed in prolog. (line 12, file "Stooq")
In related questions here and here they talk of textual content that is not accepted being submitted to the parser however, I am unable to apply the solutions in these questions to my own problem. Any help would be much appreciated.
How about this modification?
Modification points:
In this case, it is required to modify the retrieved HTML values. For example, when var content = UrlFetchApp.fetch(url).getContentText() is run, each attribute value is not enclosed. These are required to be modified.
There is a merged column in the header.
When above points are reflected to the script, it becomes as follows.
Modified script:
function scrapeData() {
// Retrieve table as a string using Parser.
var url = "https://stooq.com/q/d/?s=barc.uk&i=d";
var fromText = '#d9d9d9}</style>';
var toText = '<table';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser.data(content).from(fromText).to(toText).build();
// Modify values
scraped = scraped.replace(/=([a-zA-Z0-9\%-:]+)/g, "=\"$1\"").replace(/nowrap/g, "");
// Parse table using XmlService.
var root = XmlService.parse(scraped).getRootElement();
// Retrieve header and modify it.
var headerTr = root.getChild("thead").getChildren();
var res = headerTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})});
res[0].splice(7, 0, "Change");
// Retrieve values.
var valuesTr = root.getChild("tbody").getChildren();
var values = valuesTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})});
Array.prototype.push.apply(res, values);
// Put the result to the active spreadsheet.
var ss = SpreadsheetApp.getActiveSheet();
ss.getRange(1, 1, res.length, res[0].length).setValues(res);
}
Note:
Before you run this modified script, please install the GAS library of Parser.
This modified script is not corresponding to various URL. This can be used for the URL in your question. If you want to retrieve values from other URL, please modify the script.
Reference:
Parser
XmlService
If this was not what you want, I'm sorry.
I would like to return the number of tables in a HTML page to a google sheet. The code below can get me the number of tables in the chrome console.
var i = 1; [].forEach.call(document.getElementsByTagName("table"),
function(x) { (i++, x); });
console.log (i)
But I dont know how to get this result (i) in Google App Script so I can return it to my sheet. Something on the lines of
function doGet() {
var html = UrlFetchApp.fetch('http://allqs.saqa.org.za/showUnitStandard.php?id=7743').getContentText();
var table = getElementsByClassName(html, 'table')[0];
var i = 1; [].forEach.call(document.getElementsByTagName("table"),
(i++, x);
console.log (i)
You want to retrieve the number of tags of <table> fronm the URL of http://allqs.saqa.org.za/showUnitStandard.php?id=7743 using GAS. If my understanding is correct, how about this modification?
Modification points :
In the native GAS, getElementsByTagName() can't be used.
In this answer, the number of <table> was retrieved using regex.
Modified script :
var url = "http://allqs.saqa.org.za/showUnitStandard.php?id=7743";
var res = UrlFetchApp.fetch(url).getContentText();
var numberOfTables = res.match(/<table/g).length;
Logger.log(numberOfTables) // 96 is retrieved.
If I misunderstand your question, I'm sorry.