I have a google Form, and the responses populate a response google Sheet, which has a google Script function to generate a google Document using a template.
I am trying to get the address entered in the Form (stored in the response Sheet) to become a hyperlink in the generated Doc.
I have been using the body.replaceText() to replace all the fields I need in the Doc:
body.replaceText("{{Date}}", date);
and its working well, but the address field I would like to become a hyperlink.
I have been trying to do it this way:
body.replaceText("{{Location}}", =HYPERLINK("http://www.google.com/maps/place/'+location+'"));
But that does not become a usable hyperlink, resulting with this in the Doc (please note while it becomes a hyperlink on this page it does not become a hyperlink in Docs):
=HYPERLINK("http://www.google.com/maps/place/myplacenotyours")
I have also tried:
body.replaceText("{{Location}}", location = HYPERLINK("http://www.google.com/maps/place/"+location+));
But this throws up syntax errors.
I have this var:
var location = e.values[2];
So perhaps it better to use that to create another var as a hypertext?
I am now trying:
var loclink = 'Hyperlink("http://www.google.com/maps/place/'+location+'","'+location+'")';
but that doesnt do it either... I'm now starting to think that one can't insert a hyperlink using replace method?
Sorry for the noob question, but I can't figure this out. Can you help me find a solution and put a var into a hypertext link and put that into the Doc as a link?!
Cheers.
Something like this:
function insertLink() {
var pattern = '{{Location}}';
var url = 'https://stackoverflow.com/a/69143679/14265469';
var text = 'how to paste a link';
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
var next = body.findText(pattern);
if (!next) return;
var start = next.getStartOffset();
body.replaceText(pattern, text);
body.editAsText().setLinkUrl(start, start+text.length-1, url);
}
The following formula does work for some, but not for others:
=IFNA(VALUE(IMPORTXML("https://finance.yahoo.com/quote/C2PU.SI", "//*[#class=""D(ib) Mend(20px)""]/span[1]")))
If used without IFNA, it says 'Resource at url not found'.
Here's the value I'm trying to pull in:
I appreciate if you could point me to the right direction.
Thank you!
It does not return any values even for simple importxml.
It seems the site is generated by javascript or protected so it can't be scraped by importxml.
Don't use the "inspect" tool as it will show the DOM as it's being rendered by the web browser including modifications to the source code by client-side JavaScript, instead look at the source code.
Resources
How to know if Google Sheets IMPORTDATA, IMPORTFEED, IMPORTHTML or IMPORTXML functions are able to get data from a resource hosted on a website?
The structure of the DOM is generated by javascript. Nevertheless, all informations you need are contained by a json string called here root.App.main. You can get all the data by these way
function extract(url){
var source = UrlFetchApp.fetch(url).getContentText()
return source.match(/(?<=root.App.main = ).*(?=}}}})/g) + '}}}}'
}
and then retrieve the data by conventionnal json parsing. This will give you the value
[![function marketPrice() {
var code = 'C2PU.SI'
var url='https://finance.yahoo.com/quote/' + code
var source = UrlFetchApp.fetch(url).getContentText()
var jsonString = source.match(/(?<=root.App.main = ).*(?=}}}})/g) + '}}}}'
var data = JSON.parse(jsonString)
var regularMarketPrice = data.context.dispatcher.stores.StreamDataStore.quoteData.item(code).regularMarketPrice.raw
Logger.log(regularMarketPrice)
}
Object.prototype.item=function(i){return this\[i\]};][1]][1]
I am working up a solution which returns a link to a prefilled Google Form, to be located on a classic Google Sites List Page. I have all the code in place, everything works, aside from the last vital part: programmatically, via Google Apps Script, adding the Google Form prefilled url as a new listitem on a Google Sites List Page. I am doing the prefilling using variables from a spreadsheet (the Responses from the google form). The url/anchor code looks perfectly formed in the Log. Here is the code:
//fetch all spreadsheet row entries to variables
var keyname = candname;
var keytimestamp = ActiveSheet.getRange("A"+ActiveRow).getValue();
var keytimestamp = Utilities.formatDate(new Date(), "GMT", "yyyy-MM-dd");
var keyinc01 = ActiveSheet.getRange("B"+ActiveRow).getValue();
var keyinc02 = ActiveSheet.getRange("C"+ActiveRow).getValue();
var keyinc03 = ActiveSheet.getRange("D"+ActiveRow).getValue();
var keybrok01 = ActiveSheet.getRange("E"+ActiveRow).getValue();
var keybrok02 = ActiveSheet.getRange("F"+ActiveRow).getValue();
var keybrok03 = ActiveSheet.getRange("G"+ActiveRow).getValue();
var keyacc01 = ActiveSheet.getRange("H"+ActiveRow).getValue();
var keyacc02 = ActiveSheet.getRange("I"+ActiveRow).getValue();
var keyacc03 = ActiveSheet.getRange("J"+ActiveRow).getValue();
//var prefillurl = "https://www.google.co.uk"; //test other url
var prefillurl = 'https://docs.google.com/a/xxxxxxxx.org/forms/d/e/1FAIpQLSfOT0CVpSyGrlZmKKIRVCqbg11rAa9ANyYL8u9QwIWjqWfITg/viewform?entry.1894338678='+keyinc01+'&entry.1429410229='+keyinc02+'&entry.868549131='+keyinc03+'&entry.1083479546='+keybrok01+'&entry.1385475363='+keybrok02+'&entry.137722395='+keybrok03+'&entry.1074722805='+keyacc01+'&entry.1093081320='+keyacc02+'&entry.409101030='+keyacc03;
var listpageurl = "Google Form Link";
Logger.log(listpageurl);
var site = SitesApp.getSiteByUrl("https://sites.google.com/a/xxxxxxxx.org/xxxxxxxx");
var page = site.getChildByName("/home/candidates/"+candwebname);
page.addListItem([ keytimestamp, ssname, listpageurl ]);
As you can see I have tested the anchor tag syntax with the commented "https://www.google.co.uk", which works, and the prefilled url pasted into a browser also works correctly (grabbed from the Log). Also the listpage will accept the url via manual direct entry.However I always get the error about the anchor tag not being properly formed when I run the code. I have also tested with and without the https:// in the url, just in case. I have either missed something fundemental, have left out some additional quotes or double quotes, or have found a bug?
I have found a workaround, which was to enable UrlShortener, but would prefer to be able to use the prefilled url directly ( I will have a lot of these to do!) to speed things up, and make the code more transferrable.
Can anyone spot the issue here?
I am using google app script to create a form for uploading file. This is my Code.gs file:
var SPREADSHEET_FILE_ID = '1oQn6OLMzys8tVk1FLriOAmpzFJNazLRP-SwM7--eA58';
var folderId = "0B9TN_-yt-h0WZ0dnWndGWkw3UkE";
function doGet() {
var template = HtmlService.createTemplateFromFile('index');
// Build and return HTML in IFRAME sandbox mode.
return template.evaluate()
.setTitle('Web App Window Title')
.setSandboxMode(HtmlService.SandboxMode.IFRAME);
};
And I am using all other codes almost similar to the url https://script.google.com/d/125dG42eB9lM4SPq64p0dpR2CBH4ohfHiqu9TvFNM8s4Ra7pt-7kHXoTM/edit?usp=sharing.
I am getting the following error.
3402363213-mae_html_user_bin_i18n_mae_html_user.js:42 Uncaught ReferenceError: "doc" is not defined.
Can anyone please help why this error is coming and how to prevent this. No for is not being submitted. It is hanged after I click the button of submission.
Since the problem started when you added these values, I think you need to recheck them.
var SPREADSHEET_FILE_ID = '1oQn6OLMzys8tVk1FLriOAmpzFJNazLRP-SwM7--eA58';
var folderId = "0B9TN_-yt-h0WZ0dnWndGWkw3UkE";
var SPREADSHEET_FILE_ID is expecting an spreadsheet ID like "1386834576" where as you provided "1oQn6OLMzys8tVk1FLriOAmpzFJNazLRP-SwM7--eA58", which is wrong. I think you got that from something like "https://docs.google.com/forms/d/1BqxyEG8RhtlM3MNuSbln6C1L1GLl3axdiSEijcwB5gY/edit" that's why it's asking you "doc" is not defined.
var page = UrlFetchApp.fetch(contestURL);
var doc = XmlService.parse(page);
The above code gives a parse error when used, however if I replace the XmlService class with the deprecated Xml class, with the lenient flag set, it parses the html properly.
var page = UrlFetchApp.fetch(contestURL);
var doc = Xml.parse(page, true);
The problem is mostly caused because of no CDATA in the javascript part of the html and the parser complains with the following error.
The entity name must immediately follow the '&' in the entity reference.
Even if I remove all the <script>(.*?)</script> using regex, it still complains because the <br> tags aren't closed.
Is there a clean way of parsing html into a DOM tree.
I ran into this exact same problem. I was able to circumvent it by first using the deprecated Xml.parse, since it still works, then selecting the body XmlElement, then passing in its Xml String into the new XmlService.parse method:
var page = UrlFetchApp.fetch(contestURL);
var doc = Xml.parse(page, true);
var bodyHtml = doc.html.body.toXmlString();
doc = XmlService.parse(bodyHtml);
var root = doc.getRootElement();
Note: This solution may not work if the old Xml.parse is completely removed from Google Scripts.
In 2021, the best way to parse HTML on the .gs side that I know of is...
Click + next to Library
Enter 1ReeQ6WO8kKNxoaA_O0XEQ589cIrRvEBA9qcWpNqdOP17i47u6N9M5Xh0
Click "Look up"
Click Add
Sample usage:
const contentText = UrlFetchApp.fetch('https://www.somesite.com/').getContentText();
const $ = Cheerio.load(contentText);
$('.some-class').first().text();
That's it -- this is probably the closest we'll get to doing jQuery-like DOM selection in GAS. The .first() is important or else you may extract more content than you expected (think of it as using querySelector() instead of querySelectorAll()).
Credit where credit is due: https://github.com/tani/cheeriogs
As of May 2020, you can now use the Cheerio library for Google Apps Script to do this.
Returns the content of Wikipedia's Main Page
const content = getContent_('https://en.wikipedia.org');
const $ = Cheerio.load(content);
Logger.log($('#mp-right').text());
Returns the content of the first paragraph <p> of Wikipedia's Main Page
const content = getContent_('https://en.wikipedia.org');
const $ = Cheerio.load(content);
Logger.log($('p').first().text());
To add to your project:
Select Resources - Libraries... in the Google Apps Script editor. Enter the project key 1ReeQ6WO8kKNxoaA_O0XEQ589cIrRvEBA9qcWpNqdOP17i47u6N9M5Xh0 in the Add a library field, and click "Add". Select the highest version number, and click "Save".
I found that the best way to parse html in google apps is to avoid using XmlService.parse or Xml.parse. XmlService.parse doesn't work well with bad html code from certain websites.
Here a basic example on how you can parse any website easily without using XmlService.parse or Xml.parse. In this example, i am retrieving a list of president from "wikipedia.org/wiki/President_of_the_United_States"
whit a regular javascript document.getElementsByTagName(), and pasting the values into my google spreadsheet.
1- Create a new Google Sheet;
2- Click the menu Tools > Script editor... to open a new tab with the code editor window and copy the following code into your Code.gs:
function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu("Parse Menu")
.addItem("Parse", "parserMenuItem")
.addToUi();
}
function parserMenuItem() {
var sideBar = HtmlService.createHtmlOutputFromFile("test");
SpreadsheetApp.getUi().showSidebar(sideBar);
}
function getUrlData(url) {
var doc = UrlFetchApp.fetch(url).getContentText()
return doc
}
function writeToSpreadSheet(data) {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheets()[0];
var row=1
for (var i = 0; i < data.length; i++) {
var x = data[i];
var range = sheet.getRange(row, 1)
range.setValue(x);
var row = row+1
}
}
3- Add an HTML file to your Apps Script project. Open the Script Editor and choose File > New > Html File, and name it 'test'.Then copy the following code into your test.html
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<input id= "mButon" type="button" value="Click here to get list"
onclick="parse()">
<div hidden id="mOutput"></div>
</body>
<script>
window.onload = onOpen;
function onOpen() {
var url = "https://en.wikipedia.org/wiki/President_of_the_United_States"
google.script.run.withSuccessHandler(writeHtmlOutput).getUrlData(url)
document.getElementById("mButon").style.visibility = "visible";
}
function writeHtmlOutput(x) {
document.getElementById('mOutput').innerHTML = x;
}
function parse() {
var list = document.getElementsByTagName("area");
var data = [];
for (var i = 0; i < list.length; i++) {
var x = list[i];
data.push(x.getAttribute("title"))
}
google.script.run.writeToSpreadSheet(data);
}
</script>
</html>
4- Save your gs and html files and Go back to your spreadsheet. Reload your Spreadsheet. Click on "Parse Menu" - "Parse". Then click on "Click here to get list" in the sidebar.
Xml.parse() has an option to turn on lenient parsing, which helps when parsing HTML. Note that the Xml service is deprecated however, and the newer XmlService doesn't have this functionality.
For simple tasks such as grabbing one value from a webpage, you could use a regular expression. Regex is notoriously bad for parsing HTML as there's all sorts of weird cases it can get tripped up, but if you're confident about the HTML you're accessing this can sometimes be the simplest way.
Here's an example that fetches the contents of the page's <title> tag:
var page = UrlFetchApp.fetch(contestURL);
var regExp = new RegExp("<title>(.*)</title>", "gi");
var result = regExp.exec(page.getContentText());
// [1] is the match group when using parenthesis in the pattern
var value = result ? result[1] : 'No title found';
I know it is not exactly what OP asked, but I found this question when I was looking for some html parsing options - so it might be useful for others as well.
There is an easy to use the library for TEXT parsing. It's useful if you want to get only one piece of information from the html(xml) code.
EDIT 2021: The script library id is:
1Mc8BthYthXx6CoIz90-JiSzSafVnT6U3t0z_W3hLTAX5ek4w0G_EIrNw
It works like in the picture above
function getData() {
var url = "https://chrome.google.com/webstore/detail/signaturesatori-central-s/fejomcfhljndadjlojamaklegghjnjfn?hl=en";
var fromText = '<span class="e-f-ih" title="';
var toText = '">';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser
.data(content)
.from(fromText)
.to(toText)
.build();
Logger.log(scraped);
return scraped;
}
If you are using
Cheerio library for Google Apps Script
Source code
Library page (⭐ star it!)
Installation by library ID:
1ReeQ6WO8kKNxoaA_O0XEQ589cIrRvEBA9qcWpNqdOP17i47u6N9M5Xh0
A function to get current emojis from unicode.org:
function getEmojis() {
var t = new Date();
var url = 'https://unicode.org/emoji/charts/full-emoji-list.html';
var fetch = UrlFetchApp.fetch(url);
var contentText = fetch.getContentText();
//console.log(new Date() - t);
// Cherio
var $ = Cheerio.load(contentText);
var data = [];
$("table > tbody > tr").each((index, element) => {
var row = [];
$(element).find("td").each((index, child) => {
row.push($(child).text());
});
if (row.length > 0) {
data.push(row);
}
});
//console.log(data);
//console.log(new Date() - t);
// Result
return data;
}
↑ Sample code shows how to parse table and put it into [[array]]
May be used as a custom function:
Bonus
Parsing the site may be a time-consuming operation + you may reach the limit.
Here's a test file with a full version of the script:
https://docs.google.com/spreadsheets/d/1iO7YjYWyfseQu_YCfRbGDPg7NskOgMu_iO1iGjr7KxY/edit#gid=93365395
↑ it uses CasheService to reduce the number of calls.
Natively there's no way unless you do what you already tried which wont work if the html doesnt conform with the xml format.
There are two options
a) One is to use JavaScript's string functions. First locate your tag using string.indexOf() and then extract the data you want using string.substring().
b) The other option is to make use of the Xml Service.
It's not possible to create an HTML DOM server-side in Apps Script. Using regular expressions is likely your best option, at least for simple parsing.