Google App script extracting data from website

Google App script extracting data from website - html

So I am writing a script which look at review done on google+ pages
and updates a google spreadsheet.
I have found out that the line in the html which holds this value is
<span class="A7a">103</span>
I just need to make it possible for me to extract from the page with just knowing the URL and that html code.

Use
var response = UrlFetchApp.fetch("http://www.website.com/");
To fetch the html of the page. Then use something like
var cut = response.substring( str.indexOf( "<span class="A7a">" ), response.length);
var value = cut.substring(0, cut.indexOf("</span>"));
https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app#fetch(String)

I agree with the previous answer but there are some omissions
var response = UrlFetchApp.fetch(url);
var str = response.getContentText();
var balise = '<span class="A7a">'
var cut = str.substring(str.indexOf( balise ), response.length);
var value = cut.substring(balise.length, cut.indexOf("</span>"));

Related

How do I use apps script to programmatically create/submit a google form response on a google form that collects emails? [duplicate]

I have the following issue. I am trying to create a script that will autofill a template google document using the submission of a google form. I am able to get the script to work for questions that are input with text but am struggling on getting the data from questions in the form that are checkboxes (or multiple choice) to work and fill the google document. Any assistance would be great. For example the variable identified as "offense" is from a question with checkboxes that has about 30 different options, I would like each option that is checked on the form to replace text within my google doc. Thanks.
function autoFillGoogleDocFromForm(e) {
//e.values is an array of form values
var timestamp = e.values[4];
var studentName = e.values[3];
var oe = e.values[16];
var gradelevel = e.values[14];
var program = e.values[15];
var offense = e.values[6];
var action = e.values[18];
var serve = e.values[31];
var makeUp = e.values[32];
var comments = e.values[29];
//file is the template file, and you get it by ID
var file = DriveApp.getFileById('1nPWC0IKc1zUJXYxbGahJsSW4uNWwhxnLM8shcD8kEE4');
//We can make a copy of the template, name it, and optionally tell it what folder to live in
//file.makeCopy will return a Google Drive file object
var folder = DriveApp.getFolderById('1FlpHRKqYrEHttA-3ozU3oUVJlgiqqa-F')
var copy = file.makeCopy(studentName + ', ' + timestamp, folder);
//Once we've got the new file created, we need to open it as a document by using its ID
var doc = DocumentApp.openById(copy.getId());
//Since everything we need to change is in the body, we need to get that
var body = doc.getBody();
//Then we call all of our replaceText methods
body.replaceText('<<Student Name>>', studentName);
body.replaceText('<<Incident Date>>', timestamp);
body.replaceText('<<Student Grade>>', gradelevel);
body.replaceText('<<Open enrolled?>>', oe);
body.replaceText('<<IEP/504?>>', program);
body.replaceText('<<Reason for Referral (Handbook)>>', offense);
body.replaceText('<<Administrative Action>>', action);
body.replaceText('<<Date(s) to be Served>>', serve);
body.replaceText('<<Make up Date(s)>>', makeUp);
body.replaceText('<<Comments>>', comments);
//Lastly we save and close the document to persist our changes
doc.saveAndClose();
}

You need to use the labels assigned to the checkboxes to determine if they have been checked. Same for multiple coice.
You can't use ListItems because you can't set the glyph to a check box so I simply insert text with a checkbox character.
I created a form
I then created an onFormSubmit(e) installed trigger in the spreadsheet to get the form response and put it in the Doc. Here I've simply used an active doc to perform my tests. You will need to adjust the script to handle your template doc.
function onFormSubmit() {
// test data
let e = {"authMode":"FULL","namedValues":{"Timestamp":["8/16/2022 14:40:26"],"Student Grade":["Junior"],"Reason for Referrel":["Bad grades, Disruptive in class, Other"],"Student Name":["Joe Smith"],"Open Enrollment":["Yes"]},"range":{"columnEnd":5,"columnStart":1,"rowEnd":2,"rowStart":2},"source":{},"triggerUid":"12151926","values":["8/16/2022 14:40:26","Joe Smith","Junior","Yes","Bad grades, Disruptive in class, Other"]};
try {
let doc = DocumentApp.getActiveDocument();
let body = doc.getBody();
let referrels = ["Bad grades","Unexcused absence","Disruptive in class","Fighting","Other"];
body.replaceText("<<Student Name>>",e.namedValues["Student Name"]);
body.replaceText("<<Student Grade>>",e.namedValues["Student Grade"]);
body.replaceText("<<Open Enrollment>>",e.namedValues["Open Enrollment"]);
// Notice the regex expression below because findText doesn't seem to handle parenthesis well
let text = body.findText("<<Reason for Referral.*>>");
body.replaceText("<<Reason for Referral.*>>","");
if( text ) {
let index = body.getChildIndex(text.getElement().getParent())+1;
referrels.forEach( item => {
let checked = e.namedValues["Reason for Referrel"][0].indexOf(item);
if( checked >= 0 ) {
let listItem = body.insertListItem(index,item);
index = body.getChildIndex(listItem)+1;
}
}
);
}
}
catch(err) {
Logger.log(err);
}
}

Google Apps Script Mail Merge - Grabbing Entire Body

I am officially stuck! Hopefully a fresh set of eyes can help...
I can't figure out out to grab the entire body of my source template and place it in one shot on the target document for reception of the data. As you can see from my code below, my workaround (and literally only thing I stumbled upon that worked) was to grab each line of the template document, and then place each line one-by-one on the target document. However, I don't consider this the appropriate solution for a few reasons: it's not pretty, it's a more resource-expensive run, and it absolutely would not work if I was creating a letter.
Thankfully, since this was envelopes, I got through the job, but I'd like to discover the correct solution before my next mailing. I poured through the documentation, and there were a few functions that were potential candidates (such as 'getBody') but seemed not to be available (I would get 'not a function' errors. So, I'm at a loss.
Another issue with getBody(): it seems to only send plain-text forward. It does not retain any formatting or fonts I arranged in my template.
So my objectives are:
1. Grab the rich-text content of my template document
2. With each loop iteration, apply the content to the next page of target document in one-shot (not line by line).
3. Have this content maintain the formatting (font sizes, fonts, tabbing, spacing, etc.) of my template.
4. Update the dynamic fields with the row of information it's on for that iteration and move on.
I would greatly appreciate any help and/or insight!
Thanks!
function envelopeMailMerge() {
var sourceID = "[id of data sheet]";
var rangeData = 'OnePerFamily!A2:E251';
var values = Sheets.Spreadsheets.Values.get(sourceID,rangeData).values;
var templateID = "[id of template document]";
var targetID = "[id of target document]";
var templateBody = DocumentApp.openById(templateID).getBody();
var targetBody = DocumentApp.openById(targetID).getBody();
//obviously what follows is a ridiculous way to do this, hence my issue
var theContent = templateBody.getChild(0).copy();
var theContent2 = templateBody.getChild(1).copy();
var theContent3 = templateBody.getChild(2).copy();
var theContent4 = templateBody.getChild(3).copy();
var theContent5 = templateBody.getChild(4).copy();
var theContent6 = templateBody.getChild(5).copy();
var theContent7 = templateBody.getChild(6).copy();
var theContent8 = templateBody.getChild(7).copy();
var theContent9 = templateBody.getChild(8).copy();
var theContent10 = templateBody.getChild(9).copy();
var theContent11 = templateBody.getChild(10).copy();
var theContent12 = templateBody.getChild(11).copy();
var theContent13 = templateBody.getChild(12).copy();
var theContent14 = templateBody.getChild(13).copy();
var theContent15 = templateBody.getChild(14).copy();
var theContent16 = templateBody.getChild(15).copy();
var theContent17 = templateBody.getChild(16).copy();
//Clear the target document before creating the new merge
targetBody.clear();
if (!values) {
Logger.log('No data found...');
} else {
for (var row=0; row < values.length; row++) {
var name = values[row][0];
var address = values[row][1];
var city = values[row][2];
var state = values[row][3];
var zip = values[row][4];
//Again, what follows is ridiculous and not an ideal solution
targetBody.appendParagraph(theContent.copy());
targetBody.appendParagraph(theContent2.copy());
targetBody.appendParagraph(theContent3.copy());
targetBody.appendParagraph(theContent4.copy());
targetBody.appendParagraph(theContent5.copy());
targetBody.appendParagraph(theContent6.copy());
targetBody.appendParagraph(theContent7.copy());
targetBody.appendParagraph(theContent8.copy());
targetBody.appendParagraph(theContent9.copy());
targetBody.appendParagraph(theContent10.copy());
targetBody.appendParagraph(theContent11.copy());
targetBody.appendParagraph(theContent12.copy());
targetBody.appendParagraph(theContent13.copy());
targetBody.appendParagraph(theContent14.copy());
targetBody.appendParagraph(theContent15.copy());
targetBody.appendParagraph(theContent16.copy());
targetBody.appendParagraph(theContent17.copy());
//Update the dynamic fields with this row's data
targetBody.replaceText('{{Name}}',name);
targetBody.replaceText('{{Address}}',address);
targetBody.replaceText('{{City}}',city);
targetBody.replaceText('{{ST}}',state);
targetBody.replaceText('{{ZIP}}',zip);
//Insert page break so next iteration begins on new page
targetBody.appendPageBreak();
}
}
}

In the following example I am using a more Javascript approach using String.prototype.replace() to replace the text. I consider the following:
You have a template DOC where you have some strings like these {{Name}}:
You have a spreadsheet where the data to replace the template lives
You want to create a Google Doc for every of the rows
Considering this as true, the example shows this approach:
Grab all the text from the template doc
Replace the text using String.prototype.replace()
Setting the text of the new doc with the replaced one
Code.gs
const templateDocID = "<Template_DOC_ID>"
const dataSsId = "<Data_SS_ID>"
const doC = DocumentApp.openById(templateDocID)
const sS = SpreadsheetApp.openById(dataSsId).getSheets()[0]
function createDocFromTemplate() {
/* Grab the data from the sheets */
const dataToReplace = sS.getRange('A2:E').getValues().filter(n => n[0] !== "")
dataToReplace.forEach((data) => {
let body = doC.getBody().getText()
/* Create a new doc for each row */
const newDocument = DocumentApp.create('New Document')
/* A quick approach to extract the data */
const [name, address, city, state, zip] = data
/* Using string.replace() */
body = body.replace("{{Name}}", name)
body = body.replace('{{Address}}', address)
body = body.replace("{{City}}", city)
body = body.replace("{{ST}}", state)
body = body.replace("{{ZIP}}", zip)
/* Setting the text */
newDocument.getBody().setText(body)
/* Or sending it as an email */
GmailApp.sendEmail('email#gmail.com', 'From Template', body)
Logger.log(newDocument.getUrl())
})
}
This is an example that can help you, but you can adapt it to meet your needs.
Documentation
SpreadsheetApp
GmailApp
Optimize the replace function

Extract multiple links from one Google Sheet cell and then paste them in a Google Doc as hyperlinks using Google Apps Script

I would like to do the following with Google Apps Script:
Search an specific cell in Google Sheets with multiple URL's
Split the URL's and get them as separate links in order to avoid using the split function in sheets (watch the image
Attach each URL into a Google Doc by using tags
I've done this before, but I'm only able to obtain the URL's in one array and not separated (watch image 1 and image 2)
for (var i=0; i<row[rownumber].length; i++){
if (row[rownumber].includes(","))
img=row[rownumber].split(",");
body.replaceText('{{TagName}}',img);
}
Here you have the example in Google Sheets in order to apply the mentioned steps (link). Any help would be appreciated. Thanks!

You can refer to this sample code:
var doc = DocumentApp.openById('YourDocId');
var body = doc.getBody();
var sheet = SpreadsheetApp.getActiveSheet();
var rowValues = sheet.getRange(1,1,sheet.getLastRow(),1).getValues().flat();
//Combine all row values into a single url array
var urls = [];
rowValues.forEach(row => {
if(row.includes(",")){
var tmp = row.split(",");
urls = urls.concat(tmp);
}
});
Logger.log(urls);
Logger.log(urls.length);
if(urls.length > 0){
var tag = "{{TagName}}";
var newLine = "\n\n";
var element = body.findText(tag);
if(element){ // if found a match
var start = element.getStartOffset();
var text = element.getElement().asText();
//remove tag in the docs
text.deleteText(start,start+tag.length-1);
//Add url
urls.forEach(url => {
url = url.trim(); //remove whitespaces on both ends of the url string
Logger.log("START: "+start);
Logger.log(url);
Logger.log("URL LENGTH: "+url.length);
text.appendText(url).setLinkUrl(start, start+url.length-1, url);
text.appendText(newLine);
start = start + url.length + newLine.length;
Logger.log(text.getText());
Logger.log("*****");
});
doc.saveAndClose();
}
}
Note:
You can remove the logs if you felt like you don't need them. I just used them to debug the code.
What it does?
Get the url links from column A starting row 1 up to the last available row.
Parse each row. Split the url string into individual urls' then concatenate it to urls array.
Find your tag to be replaced in the document's body using Body.findText(searchPattern)
Get the start offset of the matched text using RangeElement.getStartOffset()
Get the element that corresponds to the RangeElement using RangeElement.getElement()
Get the element as text using Element.asText()
Delete the tag string in your document using Text.deleteText(startOffset, endOffsetInclusive)
Loop each url in the array.Take note to remove whitespaces in the current url string. Add the url text using Text.appendText(text). Once the url text was appended, include it's url link using Text.setLinkUrl(startOffset, endOffsetInclusive, url). Add a new line using Text.appendText(text) then adjust the start offset based on the url length and the new line length. (Repeat until all the url links were added in the document)
Output:
(UPDATE:)
If you want to give a different name in your hyperlink, you can replace the appended text and modify the offsets in the setLinkUrl()
Sample Code Changes:
//Add url
urls.forEach((url, index) => {
url = url.trim(); //remove whitespaces on both ends of the url string
var name = "Image"+(index+1);
Logger.log("START: "+start);
Logger.log(url);
Logger.log("URL LENGTH: "+url.length);
Logger.log("NAME LENGTH: "+name.length);
text.appendText(name).setLinkUrl(start, start+name.length-1, url);
text.appendText(newLine);
start = start + name.length + newLine.length;
Logger.log(text.getText());
Logger.log("*****");
});
Output:

Error on line 1: Content is not allowed in prolog

I am trying to scrape a table of price data from this website using the following code;
function scrapeData() {
// Retrieve table as a string using Parser.
var url = "https://stooq.com/q/d/?s=barc.uk&i=d";
var fromText = '<td align="center" id="t03">';
var toText = '</td>';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser.data(content).from(fromText).to(toText).build();
//Parse table using XmlService.
var root = XmlService.parse(scraped).getRootElement();
}
I have taken this method from an approach I used in a similar question here however its failing on this particular url and giving me the error;
Error on line 1: Content is not allowed in prolog. (line 12, file "Stooq")
In related questions here and here they talk of textual content that is not accepted being submitted to the parser however, I am unable to apply the solutions in these questions to my own problem. Any help would be much appreciated.

How about this modification?
Modification points:
In this case, it is required to modify the retrieved HTML values. For example, when var content = UrlFetchApp.fetch(url).getContentText() is run, each attribute value is not enclosed. These are required to be modified.
There is a merged column in the header.
When above points are reflected to the script, it becomes as follows.
Modified script:
function scrapeData() {
// Retrieve table as a string using Parser.
var url = "https://stooq.com/q/d/?s=barc.uk&i=d";
var fromText = '#d9d9d9}</style>';
var toText = '<table';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser.data(content).from(fromText).to(toText).build();
// Modify values
scraped = scraped.replace(/=([a-zA-Z0-9\%-:]+)/g, "=\"$1\"").replace(/nowrap/g, "");
// Parse table using XmlService.
var root = XmlService.parse(scraped).getRootElement();
// Retrieve header and modify it.
var headerTr = root.getChild("thead").getChildren();
var res = headerTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})});
res[0].splice(7, 0, "Change");
// Retrieve values.
var valuesTr = root.getChild("tbody").getChildren();
var values = valuesTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})});
Array.prototype.push.apply(res, values);
// Put the result to the active spreadsheet.
var ss = SpreadsheetApp.getActiveSheet();
ss.getRange(1, 1, res.length, res[0].length).setValues(res);
}
Note:
Before you run this modified script, please install the GAS library of Parser.
This modified script is not corresponding to various URL. This can be used for the URL in your question. If you want to retrieve values from other URL, please modify the script.
Reference:
Parser
XmlService
If this was not what you want, I'm sorry.

How to get a page title

i need to get the page name of a google sites with google apps script, there is another post about it but is about URL such as Retrieve page title from URL in Apps Script, but i need to get the page title name.
Thanks.

If you are in a domain, use the following, replacing teh strings as appropriate.
var site = SitesApp.getSite("mydomainname.org", "sitename");
var page = site.getChildren()[0];
var title = page.getTitle();
Karl

As per this stackoverflow answer, the function works well for getting page title.
function betweenMarkers(text, begin, end) {
var firstChar = text.indexOf(begin) + begin.length;
var lastChar = text.indexOf(end);
var newText = text.substring(firstChar, lastChar);
return newText;
}
var pageTitle = betweenMarkers(response.getContentText(),"<title>","</title>")
console.log(pageTitle)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Google App script extracting data from website - html

Related

How do I use apps script to programmatically create/submit a google form response on a google form that collects emails? [duplicate]

Google Apps Script Mail Merge - Grabbing Entire Body

Extract multiple links from one Google Sheet cell and then paste them in a Google Doc as hyperlinks using Google Apps Script

Error on line 1: Content is not allowed in prolog

How to get a page title

Categories

Resources