Google Apps Script: count occurences of string in HTML query - html

I am trying to get the count of occurrences of a string in the text fetched from a website
var html = UrlFetchApp.fetch('https://www.larvalabs.com/cryptopunks/details/0000').getContentText();
var offers = html.match('Offered');
Logger.log(offers);
However I get the following data returned: [Offered]
I tried several methods but I do not find much documentation on those I can use to do this task that sounds simple.
I add that I tried to parse with XMLservice but some errors in the HTML code makes it fail.

For example, as one method, how about using matchAll()?
Modified script:
var html = UrlFetchApp.fetch('https://www.larvalabs.com/cryptopunks/details/0000').getContentText();
var offers = [...html.matchAll('Offered')]; // or [...html.matchAll(/Offered/g)]
Logger.log(offers.length);
When I tested above, 3 is returned.
Note:
In this case, the upper- and lowercase letters are distinguished. Please be careful this.
Reference:
matchAll()

Related

Creating a UrlFetchApp script to replace the Google Sheet importHTML function

I used the following formula for about a year now and suddenly it stopped working/importing the table.
=IMPORTHTML("https://tradingeconomics.com/matrix";"table";1)
It gives me a "Could not fetch url: https://tradingeconomics.com/matrix" error.
I tried various things and one of the interesting findings was that the importHTML works for the cached version, but only in a new sheet under a different Google account. Furthermore, the cached version breaks randomly too.
Thus, it seems I won't get around using a script for this purpose.
Ideally, this script would be flexible enough, where it would have a dedicated function e.g. importHTMLtable where the user can input the URL and the table no. and it works. So it would work for the following functions I currently use e.g.
=importHTMLtable("https://tradingeconomics.com/matrix";"table";1)
OR
=importHTMLtable("https://tradingeconomics.com/country-list/business-confidence?continent=world";"table";1)
OR
=importHTMLtable("https://tradingeconomics.com/country-list/ease-of-doing-business";"table";1)
etc...
Not sure if this Github code solves this problem. It seems to only parse text?
As I would assume this is a fairly common problem users of Google Sheets have and would think there might already be an AppScript out there that does exactly this and might be faster in terms of importing speed too.
I can't program, so I tried copying and posting codes to see if I can get some code to work. No luck :(
Can anyone provide a code or maybe an existing app script (I'm not aware of) that does exactly this that?
Try this way
=importTableHTML(A1,1)
with
function importTableHTML(url,n){
var html = UrlFetchApp.fetch(url,{followRedirects : true,muteHttpExceptions: true}).getContentText().replace(/(\r\n|\n|\r|\t| )/gm,"")
const tables = [...html.matchAll(/<table[\s\S\w]+?<\/table>/g)];
var trs = [...tables[n-1][0].matchAll(/<tr[\s\S\w]+?<\/tr>/g)];
var data = [];
for (var i=0;i<trs.length;i++){
console.log(trs[i][0])
var tds = [...trs[i][0].matchAll(/<(td|th)[\s\S\w]+?<\/(td|th)>/g)];
var prov = [];
for (var j=0;j<tds.length;j++){
donnee=tds[j][0].match(/(?<=\>).*(?=\<\/)/g)[0];
prov.push(stripTags(donnee));
}
data.push(prov);
}
return(data)
}
function stripTags(body) {
var regex = /(<([^>]+)>)/ig;
return body.replace(regex,"").replace(/ /g,' ').trim();
}
url-fetch-app#advanced-parameters
matchAll

IMPORTXML- Could not fetch URL

I am trying to scrape data from wine-searcher.com and am having an issue with IMPORTXML in google sheets, I keep getting the "could not fetch url" error when trying either of the following:
=IMPORTXML("https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa","//h1")
=IMPORTXML("https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa","//*[#id='tab-info']/div/div[1]/div[2]/div/div[1]/span[2]/span[2]") ( xpath to scrape current average price)
I've tried suggestions in other stack posts such as with/out http/s, www, and both XPath and full XPath to no avail. I have also tried with other URLs and they work no problem, maybe the problem is with URL length or format? Any help would be appreciated. If it cannot be done with IMPORT XML, any free alternatives suggested?
As the page is built in javascript on the client side and not on the server side, you will not be able to retrieve the data by the importxml / importhtml functions. However, the page contains a json which you can retrieve and analyze to retrieve the information you need.
function myFunction() {
var url = 'https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa'
var source = UrlFetchApp.fetch(url).getContentText()
var jsonString = source.split('<script type="application/ld+json">')[1].split('</script>')[0]
var data = JSON.parse(jsonString)
Logger.log(data)
}
all these informations are available, from x=0 to x=23
data.offers[x].#type
data.offers[x].priceCurrency
data.offers[x].availability
data.offers[x].priceValidUntil
data.offers[x].url
data.offers[x].name
data.offers[x].seller.#type
data.offers[x].seller.name
data.offers[x].seller.description
data.offers[x].seller.availableDeliveryMethod
data.offers[x].seller.address.#type
data.offers[x].seller.address.addressRegion
data.offers[x].seller.address.addressCountry.#type
data.offers[x].seller.address.addressCountry.name
data.offers[x].priceSpecification.#type
data.offers[x].priceSpecification.description
data.offers[x].priceSpecification.price
data.offers[x].priceSpecification.priceCurrency
https://docs.google.com/spreadsheets/d/17f6lhaHA_xpSWClzxkYZcNs4FeM4VHA480QrmwyJvT4/copy
as mentioned both these basic formulae return nothing:
=IMPORTXML("https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa"; "//*")
=IMPORTDATA("https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa")
pls note that importing data into spreadsheet is URL specific, so if something works well for www.aaa.org most likely it wont work for www.bbb.org

Extract href attribute from HTML text in Google Sheets

I have about 3000 rows in my Google Spreadsheet and each row contains data about one article from our website. In one column (e.g. A:A) is stored formated text in HTML. I need extract all URLs inside href="" attribute from this column and work with them later. (It could be array or text string separated with coma or space in B column)
I tryied to use REGEXTRACT formula but it gives me only the first result. Then I tryied to use REGEXREPLACE but I'm unable to write proper expression to get only URL links.
I know that it is not proper way to use regex to get anything from HTML. Is there another way to extract these values from HTML text in one cell?
Link to sample data: Google Spreadsheet
Thak you in advance! I'm real newbie here and in scripting, parsing etc. too.
How about this samples? I used href=\"(.*?)\" for retrieving the URL. The sample of regex101.com is here.
1. Using Google spreadsheets functions :
=TEXTJOIN(CHAR(10),TRUE,ARRAYFORMULA(IFERROR(REGEXEXTRACT(SPLIT(a1,">"),"href="&CHAR(34)&"(.*?)"&CHAR(34)))))
In this case, since REGEXEXTRACT retrieves only the first matched string, after the cell data is separated by SPLIT, the URL is retrieved by REGEXEXTRACT.
Result :
2. Using Google Apps Script :
function myFunction(str){
var re = /href=\"(.*?)\"/g;
var result = "";
while ((res=re.exec(str)) !== null) {
result += res[1] + "\n";
};
return result.slice(0,-1);
}
This script can be used as a custom function. When you use this, please put =myFunction(A1) to a cell.
Result :
The result is the same to above method.
If I misunderstand your question, I'm sorry.

Error when trying to add code back into cleared sheet using Set Formula

I am attempting to create a manual archive function in a google sheet (based on form responses). It's a bit of a Frankenstein effort at this point as I've gathered bits and pieces to put it together. I am so close to completion, but I've hit a wall when I try to add formulas back into the sheet after clearing it to the archive.
I'm sure it's no surprise that am new to this, and I feel like I am missing something simple here. The formulas below are copied directly from the active spreadsheet where they are working fine, but for some reason, I can't get the script to parse in order to put them back after clearing the sheet. I would appreciate any assistance anyone is willing to offer.
I get the error:
Missing ) after argument list.
on the two lines of "cell.setFormula" code that won't "code block" below:
function addFormulas(){
var sheet = SpreadsheetApp.getActiveSpreadsheet();
var sourcesheet = sheet.getSheetByName("Form Totals");
//Add formula back into Column A
var cell = sourcesheet.getRange("A2:A1000");
cell.setFormula("=D2");
//Add formula back into Column U
var cell = sourcesheet.getRange("U2:U1000");
cell.setFormula("=IF(ISNUMBER(FIND("14.29",P2)),14.29,IF(ISNUMBER(FIND("5.24",P2)),5.24,""))");
//Add formula back into Column V
var cell = sourcesheet.getRange("V2:V1000");
cell.setFormula("=IF(ISNUMBER(FIND("14.29",U2)),U2*Q2,IF(ISNUMBER(FIND("5.24",U2)),U2*Q2,""))");
}
What's going on here? Are my formulas "wrong" even though they work in the spreadsheet?
In JavaScript, double quotes are used to denote string type. If you place anything inside " ", it tells the JS code parser to treat anything between these double quotes as a string.
String is also the only valid argument type for setFormula() method of the Range class, so anything that goes inside the brackets should be of this type.
Take a look at this part in your formula
cell.setFormula("=IF(ISNUMBER(FIND("14.29",
The syntax parser would recognize the part between the first pair of double quotes as a string, but 14.29 will be treated a a number as that's where the string ends. The parser will immediately stop and throw an error.
The solution is to use single quotes for strings inside your formula, e.g.
range.setFormula("=IF(ISNUMBER(FIND('14.29',P2)),14.29,IF(ISNUMBER(FIND('5.24',P2)),5.24,''))");
You must be carefull with the use of the "". Like you write it, the console get it like
cell.setFormula("=IF(ISNUMBER(FIND("
and he don't find the missing ).
You should write it like
cell.setFormula("=IF(ISNUMBER(FIND(\"14.29\",U2)),U2*Q2,IF(ISNUMBER(FIND(\"5.24\",U2)),U2*Q2,\"\"))");

How do i get data from a webpage using google apps script?

i looked the existing questions (such as this one: Get data from webpage using Google Apps Script and Yahoo Query Language) which are similar to my query but had no luck.
How do i get data from bseindia.com using UrlFetchApp? Here is the page: http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=532343&expandable=0
Now, for example, How do i get the Revenue of Dec 14 from the above page? In this case, the code should return 2,652.91.
i tried this:
function getit(){
var response = UrlFetchApp.fetch("http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=532343&expandable=0");
var cut = response.substring(str.indexOf("<td class=TTRow_right>"),response.length);
var value = cut.substring(0, cut.indexOf("</td>"));
Logger.log(number);
}
The error i get is:
TypeError: Cannot find function substring in object
what i am doing is definitely not correct also because every revenue number in there starts with the same "td class=TTRow_right"
The response you are getting is not a string but of type HTTPResponse, so you cannot use substring(..) on it.
Try response.getContentText()
you find more details here http-response