another IMPORTXML returning empty content - google-apps-script

When I input
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")
in my google sheet, I get: #N/A Imported content is empty.
However, when I input:
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")
I get some content, so I can presume that access to the page is not blocked.
And the page contains several h2 tags without any doubt.
So what's the issue?

You want to know the reason of the following situation.
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2") returns #N/A Imported content is empty.
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*") returns the content.
If my understanding is correct, how about this answer?
Issue:
When I saw the HTML data of http://www.ilgiornale.it/autore/franco-battaglia.html, I noticed that the wrong point of it. It is as follows.
window.jQuery || document.write("<script src='/sites/all/modules/jquery_update/replace/jquery/jquery.min.js'>\x3C/script>")
In this case, the script tag is not closed like \x3C/script>. It seems that when IMPORTXML retrieves this line, the script tab is not closed. I could confirm that when \x3C is converted to <, =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2") correctly returns the values of h2 tag.
By this, it seems that the issue that =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2") returns #N/A Imported content is empty occurs.
About the reason that =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*") returns the content, when I put this formula, I couldn't find the values of the script tab. From this situation, I thought that the script tag might have an issue. So I could find the above wrong point. I could confirm that when \x3C is converted to <, =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*") returns the values including the values of the script tag.
Workarounds:
In order to avoid above issue, it is required to be modified \x3C to <. So how about the following workarounds? In these workarounds, I used Google Apps Script. Please think of these workarounds as just two of several workarounds.
Pattern 1:
In this pattern, at first, download the HTML data from the URL, and modify the wrong point. Then, the modified HTML data is created as a file, and the file is shared. And retrieve the URL of the file. Using this URL, the values are retrieved.
Sample script:
function myFunction() {
var url = "http://www.ilgiornale.it/autore/franco-battaglia.html";
var data = UrlFetchApp.fetch(url).getContentText().replace(/\\x3C/g, "<");
var file = DriveApp.createFile("htmlData.html", data, MimeType.HTML);
file.setSharing(DriveApp.Access.ANYONE_WITH_LINK, DriveApp.Permission.VIEW);
var endpoint = "https://drive.google.com/uc?id=" + file.getId() + "&export=download";
Logger.log(endpoint)
}
When you use this script, at first, please run the function of myFunction() and retrieve the endpoint. And as a test case, please put the endpoint to the cell "A1". And put =IMPORTXML(A1,"//h2") to the cell "A2". By this, the values can be retrieved.
Pattern 2:
In this pattern, the values of the tag h2 are directly retrieved by parsing HTML data and put them to the active Spreadsheet.
Sample script:
function myFunction() {
var url = "http://www.ilgiornale.it/autore/franco-battaglia.html";
var data = UrlFetchApp.fetch(url).getContentText().match(/<h2[\s\S]+?<\/h2>/g);
var xml = XmlService.parse("<temp>" + data.join("") + "</temp>");
var h2Values = xml.getRootElement().getChildren("h2").map(function(e) {return [e.getValue()]});
var sheet = SpreadsheetApp.getActiveSheet();
sheet.getRange(sheet.getLastRow() + 1, 1, h2Values.length, 1).setValues(h2Values);
Logger.log(h2Values)
}
When you run the script, the values of the tag h2 are directly put to the active Spreadsheet.
References:
Class UrlFetchApp
Class XmlService
If I misunderstood your question and this was not the direction you want, I apologize.

Related

Format imported table

I've been attempting to import a table of events from the following website:  https://scpgajt.bluegolf.com/bluegolf/scpgajt23/schedule/index.htm?type=2&display=champ (and others similar in structure).
I am attempting to reproduce the example website table on a Google Sheet where I would later add a check-box column and then select the events I need (which would copy the selection to another sheet for personalized planning).
So far, I have been able to use copied/pasted Apps Script coding found on Stack Overflow (see my Example Sheet HERE) and this =ImportTableHTML(A1,1) formula on the sheet to pull the table from the site into the sheet.
This Apps Script method has finally produced a complete list of events, however, the results are horribly formatted incorrectly (see Example Sheet 1 - Scrape Import / Raw). The result I am looking for should format close to the the original columns and rows as the original table, or filter and distribute the pulled data into certain specified cells (see Example Sheet 2 - Model Result).
This is the farthest I have been able to get, thanks to the scripts found on Stack Overflow, combining scripts posted in Replacing =ImportHTML with URLFetchApp) and Creating a UrlFetchApp script to replace the Google Sheet importHTML function.
Unfortunately, now I cannot figure out the options in the script to affect formatting / distributing of the results into the proper cells.
Is it possible to reproduce the table in my example sheet with proper or modifiable formatting?
The site I am attempting to capture table data from
The resulting import using =ImportTableHTML(A1,1)
The way the imported data should be parsed and distributed
App Script Code I am currently using:
function importTableHTML(url,n){
var html = '<table' + UrlFetchApp.fetch(url, {muteHttpExceptions: true}).getContentText().replace(/(\r\n|\n|\r|\t| )/gm,"").match(/(?<=\<table).*(?=\<\/table)/g) + '</table>';
var trs = [...html.matchAll(/<tr[\s\S\w]+?<\/tr>/g)];
var data = [];
for (var i=0;i<trs.length;i++){
var tds = [...trs[i][0].matchAll(/<(td|th)[\s\S\w]+?<\/(td|th)>/g)];
var prov = [];
for (var j=0;j<tds.length;j++){
donnee=tds[j][0].match(/(?<=\>).*(?=\<\/)/g)[0];
prov.push(stripTags(donnee));
}
data.push(prov);
}
return(data);
}
function stripTags(body) {
var regex = /(<([^>]+)>)/ig;
return body.replace(regex,"");
}

ImportXLM Using IfError on Google Sheets

I wanted to thank everyone for being so helpful on this site - it means a lot!
I am trying to import the likes/followers from a Spotify playlist to Google Sheets. It seems like various playlists have a different XPath.
I can extract a majority(most work) of the likes/followers using this code: (B24 is the URL)
=INDEX(REGEXEXTRACT(IFERROR(QUERY(ARRAY_CONSTRAIN(IMPORTDATA(B24), 500, 5), "select Col5 where Col4 contains 'followers'", 0), QUERY(ARRAY_CONSTRAIN(IMPORTDATA(B24), 500, 7), "select Col7 where Col6 contains 'followers'", 0)), "\d+")*1)
However, some playlist links come up with an empty output.
Example: https://open.spotify.com/playlist/5aSO2lT7sVPKut6F9L6IAc
Example of a working one: https://open.spotify.com/playlist/7qvQVDnLe4asawpZqYhKMQ
I'm honestly not sure how to add a third argument, and I have been blindly changing the col numbers to see what works - no luck. Any idea on how to figure out what col #'s to change to/any guidance would be extremely helpful.
Thank you!!
Issue and workaround:
When I saw the HTML from both URLs, I thought that in this case, the value, you want to retrieve, can be retrieved from the JSON data included in the HTML. But unfortunately, the JSON data is large. So when IMPORTXML is used, an error occurs because of the data size. So in this answer, I would like to propose to use a custom function using Google Apps Script.
Sample script:
Please copy and paste the following Google Apps Script to the script editor of Google Spreadsheet. And, please put =SAMPLE("###url###") to a cell. By this, the value of followers is returned.
function SAMPLE(url) {
const res = UrlFetchApp.fetch(url).getContentText();
const v = res.replace(/&/g, "&").match(/Spotify\.Entity \=([\s\S\w]+?);/);
return v && v.length == 2 ? JSON.parse(v[1].trim()).followers.total : "Value cannot be retrieved.";
}
Result:
When above script is used for your 2 URLs, the following result is obtained. In this case, the following custom formulas are put to the cells "A1" and "A2", respectively.
=SAMPLE("https://open.spotify.com/playlist/5aSO2lT7sVPKut6F9L6IAc")
=SAMPLE("https://open.spotify.com/playlist/7qvQVDnLe4asawpZqYhKMQ")
Note:
This sample script is for the URLs in your question. So when you tested it for other URLs, the script might not be able to used. And, when the structure of HTML is changed at the server side, the script might not be able to used. So please be careful this.
References:
Custom Functions in Google Sheets
fetch(url)

Writing in a newly created Google spreadsheet using an external script

I have built a short script that takes inputs from a Google Form and creates a new spreadsheet. But when I try to set that values inside the sheet, nothing happens. Not sure if this is due to my code or to the lack of authorization given this is a newly created file. Here is my code:
var templates = DriveApp.getFilesByName('Template'); // get the files named Template (only one)
var template = templates.next(); // get the first files in the list
var newFile = template.makeCopy(name,newFolder); // Make a copy of the Template file and put it in NewFolder
var ss = SpreadsheetApp.open(newFile);
var sheet = ss.getSheets()[0];
sheet.getActiveRange('B1').setValue(name);
Thanks for your help
I think that in your script, an error occurs at getActiveRange('B1'). Because the method getActiveRange() of Class Sheet has not arguments. Ref I think that this is the reason of your issue. In this case, an error like The parameters (String) don't match the method signature for SpreadsheetApp.Sheet.getActiveRange. occurs. I thought that the reason of nothing happens might be that from your question, the script is run by the OnSubmit event trigger might be used. In this case, please modify as follows.
From:
sheet.getActiveRange('B1').setValue(name);
To:
sheet.getRange('B1').setValue(name);
When you modify like above and run again, the value of name is put to the cell "B2" on the 1st tab in the created Spreadsheet.
Note:
If you want to append the values when the script is run, please modify sheet.getActiveRange('B1').setValue(name); as follows.
sheet.appendRow([, name]); // In this case, the value is append to the column "B".
References:
getActiveRange()
getRange(a1Notation)

Parsing JSON in Google Sheets

I'm working with JSON for the first time, so please excuse my lack of knowledge.
I'm trying to use a JSON file to populate data in a Google Sheet. I just don't know the right syntax. How can I format a JSON function to properly access the data and stop returning an error?
I'm trying to pull data from here:
https://eddb.io/archive/v6/bodies_recently.jsonl
into a Google Sheets.
I've got the ImportJSON script loaded and I've tested it with a really small JSON file (http://date.jsontest.com/) and it works as advertised, using this function:
=ImportJSON("http://date.jsontest.com", "/date")
However, when I try to use the same function with the JSON from eddb.io above, I can't get it to work.
What I would like to do is pull the "name" into A1 and then a few of the attributes into columns, like so:
name id type_name rotational_period, etc.
Here's a link to my tests:
https://docs.google.com/spreadsheets/d/1gCKpLcf-ytbPNcuQIIzxp1RMy7N5K8pD02hCLnL27qQ/edit?usp=sharing
How about this workaround?
Reason of issue:
When I saw the URL of https://eddb.io/archive/v6/bodies_recently.jsonl, I noticed that the extension of the file is jsonl. So when I checked the values retrieved from https://eddb.io/archive/v6/bodies_recently.jsonl, it was found that the values were JSON Lines. This has already been mentioned by Dimu Designs's comment. Also I could confirm that the official document says bodies_recently.jsonl is Line-delimited JSON.
Workaround:
Unfortunately, ImportJSON cannot directly parse the values of JSON Lines. So it is required to modify the script as a workaround. In your shared Spreadsheet, the script of ImportJSON is put as the container-bound script. In this modification, I modified the script. Please modify as follows.
From:
The following function can be seen at the line of 130 - 135 in your script editor.
function ImportJSONAdvanced(url, query, options, includeFunc, transformFunc) {
var jsondata = UrlFetchApp.fetch(url);
var object = JSON.parse(jsondata.getContentText());
return parseJSONObject_(object, query, options, includeFunc, transformFunc);
}
To:
Please replace the above function to the following script, and save the script. Then, please put =ImportJSON("https://eddb.io/archive/v6/bodies_recently.jsonl", "/id") to a cell, again.
function ImportJSONAdvanced(url, query, options, includeFunc, transformFunc) {
var jsondata = UrlFetchApp.fetch(url);
var object = jsondata.getContentText().match(/{[\w\s\S].+}/g).map(function(e) {return JSON.parse(e)}); // Modified
return parseJSONObject_(object, query, options, includeFunc, transformFunc);
}
Result:
Note:
Although this modified script works for the values from https://eddb.io/archive/v6/bodies_recently.jsonl, I'm not sure whether this modified script works for all JSON lines values. I apologize for this.
References:
eddb.io/api
JSON Lines
If I misunderstood your question and this was not the result you want, I apologize.
I'm not with my laptop, but I see you getting the error SyntaxError: Expected end of stream at char 2028 (line 132).
I think the data you received from the URL is to long.
you can use =IMPORTDATA(E1) and get the whole chunk into sheets and then REGEXEXTRACT all parts you need

How to get the text from a Google Spreadsheet cell?

I'm trying to write a Google Apps script that involves getting the text from a table cell inside a Google Spreadsheet. To illustrate the situation, let's say the word "hello" is in the cell B4, and my function is supposed to get that string from that cell. How would I go about this? I've tried the following but that hasn't worked for me.
function getText() {
var body = DocumentApp.getActiveDocument().getBody();
var text = body.getText();
}
I get this error whenever I try to use the function.
Error: You do not have permission to call getActiveDocument
Even after getting authorization for the script, I still get the same error. Any ideas on how to solve this?
You say you're trying to retrieve a Spreadsheet information but uses the Document class, they're not interchangeable like that, you'll have to use Spreadsheet class.
Eg.
var allVals = SpreadsheetApp.getActiveSheet().getDataRange().getValues();
Logger.log(allVals);