IMPORTXML- Could not fetch URL - html

I am trying to scrape data from wine-searcher.com and am having an issue with IMPORTXML in google sheets, I keep getting the "could not fetch url" error when trying either of the following:
=IMPORTXML("https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa","//h1")
=IMPORTXML("https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa","//*[#id='tab-info']/div/div[1]/div[2]/div/div[1]/span[2]/span[2]") ( xpath to scrape current average price)
I've tried suggestions in other stack posts such as with/out http/s, www, and both XPath and full XPath to no avail. I have also tried with other URLs and they work no problem, maybe the problem is with URL length or format? Any help would be appreciated. If it cannot be done with IMPORT XML, any free alternatives suggested?

As the page is built in javascript on the client side and not on the server side, you will not be able to retrieve the data by the importxml / importhtml functions. However, the page contains a json which you can retrieve and analyze to retrieve the information you need.
function myFunction() {
var url = 'https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa'
var source = UrlFetchApp.fetch(url).getContentText()
var jsonString = source.split('<script type="application/ld+json">')[1].split('</script>')[0]
var data = JSON.parse(jsonString)
Logger.log(data)
}
all these informations are available, from x=0 to x=23
data.offers[x].#type
data.offers[x].priceCurrency
data.offers[x].availability
data.offers[x].priceValidUntil
data.offers[x].url
data.offers[x].name
data.offers[x].seller.#type
data.offers[x].seller.name
data.offers[x].seller.description
data.offers[x].seller.availableDeliveryMethod
data.offers[x].seller.address.#type
data.offers[x].seller.address.addressRegion
data.offers[x].seller.address.addressCountry.#type
data.offers[x].seller.address.addressCountry.name
data.offers[x].priceSpecification.#type
data.offers[x].priceSpecification.description
data.offers[x].priceSpecification.price
data.offers[x].priceSpecification.priceCurrency
https://docs.google.com/spreadsheets/d/17f6lhaHA_xpSWClzxkYZcNs4FeM4VHA480QrmwyJvT4/copy

as mentioned both these basic formulae return nothing:
=IMPORTXML("https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa"; "//*")
=IMPORTDATA("https://www.wine-searcher.com/find/robert+mondavi+rsrv+cab+sauv+napa+valley+county+north+coast+california+usa")
pls note that importing data into spreadsheet is URL specific, so if something works well for www.aaa.org most likely it wont work for www.bbb.org

Related

Creating a UrlFetchApp script to replace the Google Sheet importHTML function

I used the following formula for about a year now and suddenly it stopped working/importing the table.
=IMPORTHTML("https://tradingeconomics.com/matrix";"table";1)
It gives me a "Could not fetch url: https://tradingeconomics.com/matrix" error.
I tried various things and one of the interesting findings was that the importHTML works for the cached version, but only in a new sheet under a different Google account. Furthermore, the cached version breaks randomly too.
Thus, it seems I won't get around using a script for this purpose.
Ideally, this script would be flexible enough, where it would have a dedicated function e.g. importHTMLtable where the user can input the URL and the table no. and it works. So it would work for the following functions I currently use e.g.
=importHTMLtable("https://tradingeconomics.com/matrix";"table";1)
OR
=importHTMLtable("https://tradingeconomics.com/country-list/business-confidence?continent=world";"table";1)
OR
=importHTMLtable("https://tradingeconomics.com/country-list/ease-of-doing-business";"table";1)
etc...
Not sure if this Github code solves this problem. It seems to only parse text?
As I would assume this is a fairly common problem users of Google Sheets have and would think there might already be an AppScript out there that does exactly this and might be faster in terms of importing speed too.
I can't program, so I tried copying and posting codes to see if I can get some code to work. No luck :(
Can anyone provide a code or maybe an existing app script (I'm not aware of) that does exactly this that?
Try this way
=importTableHTML(A1,1)
with
function importTableHTML(url,n){
var html = UrlFetchApp.fetch(url,{followRedirects : true,muteHttpExceptions: true}).getContentText().replace(/(\r\n|\n|\r|\t| )/gm,"")
const tables = [...html.matchAll(/<table[\s\S\w]+?<\/table>/g)];
var trs = [...tables[n-1][0].matchAll(/<tr[\s\S\w]+?<\/tr>/g)];
var data = [];
for (var i=0;i<trs.length;i++){
console.log(trs[i][0])
var tds = [...trs[i][0].matchAll(/<(td|th)[\s\S\w]+?<\/(td|th)>/g)];
var prov = [];
for (var j=0;j<tds.length;j++){
donnee=tds[j][0].match(/(?<=\>).*(?=\<\/)/g)[0];
prov.push(stripTags(donnee));
}
data.push(prov);
}
return(data)
}
function stripTags(body) {
var regex = /(<([^>]+)>)/ig;
return body.replace(regex,"").replace(/ /g,' ').trim();
}
url-fetch-app#advanced-parameters
matchAll

Google Apps Script: count occurences of string in HTML query

I am trying to get the count of occurrences of a string in the text fetched from a website
var html = UrlFetchApp.fetch('https://www.larvalabs.com/cryptopunks/details/0000').getContentText();
var offers = html.match('Offered');
Logger.log(offers);
However I get the following data returned: [Offered]
I tried several methods but I do not find much documentation on those I can use to do this task that sounds simple.
I add that I tried to parse with XMLservice but some errors in the HTML code makes it fail.
For example, as one method, how about using matchAll()?
Modified script:
var html = UrlFetchApp.fetch('https://www.larvalabs.com/cryptopunks/details/0000').getContentText();
var offers = [...html.matchAll('Offered')]; // or [...html.matchAll(/Offered/g)]
Logger.log(offers.length);
When I tested above, 3 is returned.
Note:
In this case, the upper- and lowercase letters are distinguished. Please be careful this.
Reference:
matchAll()

Google forms from scripts, trying to get addCheckboxGridItem option with one response per row

I am trying to build from scripts a google form using the codes
var form = FormApp.create("New form");
var formQ1 = form.addCheckboxGridItem();
formQ1.setTitle(Title1);
formQ1.setRows(Rows1);
formQ1.setColumns(Colums1);
However, I would like to have the option to only accept one response per row
If I want one response per column I can have:
var formQ1validation = FormApp.createCheckboxGridValidation().requireLimitOneResponsePerColum()
.build();
formQ1.setValidation(formQ1validation);
and it works fine, but what I need is one response per raw.
Thanks in advance.
There does not seem a way to handle the rows how you want in the documentation for CheckboxGridValidationBuilder. a workaround you can try would be to build your grid "sideways", which you can do by trying:
var form = FormApp.create("New form");
var formQ1 = form.addCheckboxGridItem();
formQ1.setTitle(Title1);
formQ1.setRows(Colums1);
formQ1.setColumns(Rows1);
This way you can set the rule for one answer per column and it will behave as intended. You can open up a bug with Google Issue Tracker here to let them know this functionality is missing.

Parsing JSON in Google Sheets

I'm working with JSON for the first time, so please excuse my lack of knowledge.
I'm trying to use a JSON file to populate data in a Google Sheet. I just don't know the right syntax. How can I format a JSON function to properly access the data and stop returning an error?
I'm trying to pull data from here:
https://eddb.io/archive/v6/bodies_recently.jsonl
into a Google Sheets.
I've got the ImportJSON script loaded and I've tested it with a really small JSON file (http://date.jsontest.com/) and it works as advertised, using this function:
=ImportJSON("http://date.jsontest.com", "/date")
However, when I try to use the same function with the JSON from eddb.io above, I can't get it to work.
What I would like to do is pull the "name" into A1 and then a few of the attributes into columns, like so:
name id type_name rotational_period, etc.
Here's a link to my tests:
https://docs.google.com/spreadsheets/d/1gCKpLcf-ytbPNcuQIIzxp1RMy7N5K8pD02hCLnL27qQ/edit?usp=sharing
How about this workaround?
Reason of issue:
When I saw the URL of https://eddb.io/archive/v6/bodies_recently.jsonl, I noticed that the extension of the file is jsonl. So when I checked the values retrieved from https://eddb.io/archive/v6/bodies_recently.jsonl, it was found that the values were JSON Lines. This has already been mentioned by Dimu Designs's comment. Also I could confirm that the official document says bodies_recently.jsonl is Line-delimited JSON.
Workaround:
Unfortunately, ImportJSON cannot directly parse the values of JSON Lines. So it is required to modify the script as a workaround. In your shared Spreadsheet, the script of ImportJSON is put as the container-bound script. In this modification, I modified the script. Please modify as follows.
From:
The following function can be seen at the line of 130 - 135 in your script editor.
function ImportJSONAdvanced(url, query, options, includeFunc, transformFunc) {
var jsondata = UrlFetchApp.fetch(url);
var object = JSON.parse(jsondata.getContentText());
return parseJSONObject_(object, query, options, includeFunc, transformFunc);
}
To:
Please replace the above function to the following script, and save the script. Then, please put =ImportJSON("https://eddb.io/archive/v6/bodies_recently.jsonl", "/id") to a cell, again.
function ImportJSONAdvanced(url, query, options, includeFunc, transformFunc) {
var jsondata = UrlFetchApp.fetch(url);
var object = jsondata.getContentText().match(/{[\w\s\S].+}/g).map(function(e) {return JSON.parse(e)}); // Modified
return parseJSONObject_(object, query, options, includeFunc, transformFunc);
}
Result:
Note:
Although this modified script works for the values from https://eddb.io/archive/v6/bodies_recently.jsonl, I'm not sure whether this modified script works for all JSON lines values. I apologize for this.
References:
eddb.io/api
JSON Lines
If I misunderstood your question and this was not the result you want, I apologize.
I'm not with my laptop, but I see you getting the error SyntaxError: Expected end of stream at char 2028 (line 132).
I think the data you received from the URL is to long.
you can use =IMPORTDATA(E1) and get the whole chunk into sheets and then REGEXEXTRACT all parts you need

How do i get data from a webpage using google apps script?

i looked the existing questions (such as this one: Get data from webpage using Google Apps Script and Yahoo Query Language) which are similar to my query but had no luck.
How do i get data from bseindia.com using UrlFetchApp? Here is the page: http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=532343&expandable=0
Now, for example, How do i get the Revenue of Dec 14 from the above page? In this case, the code should return 2,652.91.
i tried this:
function getit(){
var response = UrlFetchApp.fetch("http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=532343&expandable=0");
var cut = response.substring(str.indexOf("<td class=TTRow_right>"),response.length);
var value = cut.substring(0, cut.indexOf("</td>"));
Logger.log(number);
}
The error i get is:
TypeError: Cannot find function substring in object
what i am doing is definitely not correct also because every revenue number in there starts with the same "td class=TTRow_right"
The response you are getting is not a string but of type HTTPResponse, so you cannot use substring(..) on it.
Try response.getContentText()
you find more details here http-response