how to parse (this) xml using Google Script

how to parse (this) xml using Google Script - google-apps-script

I need to parse this xml by Google Script. jsonformatter.org tells me that the XML is valid
I want to get text of ICO but //var ico = root.getChild('Ares_odpovedi').getChild('Odpoved').getChild('VBAS').getChild('ICO').getText(); is throwing an error
The full code is
function getARES() {
var url = 'https://wwwinfo.mfcr.cz/cgi-bin/ares/darv_bas.cgi?'
+ 'ico=06018025'
+ '&xml=1';
var response = UrlFetchApp.fetch(url);
var responseText = response.getContentText(); //.replace(/D:/g,'');
var document = XmlService.parse(responseText);
var root = document.getRootElement();
var ico_tmp0 = root.getName(); // value is "Ares_odpovedi"
var ico_tmp1 = root.getContentSize(); // value is 3
var ico_tmp2 = root.getChild('Ares_odpovedi'); // value is null
var ico_tmp3 = root.getChild('Odpoved'); // value is null
//var ico = root.getChild('Ares_odpovedi').getChild('Odpoved').getChild('VBAS').getChild('ICO').getText();
//var ico = root.getChild('Odpoved').getChild('VBAS').getChild('ICO').getText();
Logger.log(response);
Logger.log(" ");
Logger.log(responseText);
}

I believe your goal as follows.
You want to retrieve the text of ICO using Google Apps Script.
In this case, it is required to use the name space when getChild is used. When this is reflected to your script, it becomes as follows.
Modified script:
function getARES() {
var url = 'https://wwwinfo.mfcr.cz/cgi-bin/ares/darv_bas.cgi?'
+ 'ico=06018025'
+ '&xml=1';
var response = UrlFetchApp.fetch(url);
var responseText = response.getContentText(); //.replace(/D:/g,'');
var document = XmlService.parse(responseText);
var root = document.getRootElement();
// I modified below script.
var ns1 = XmlService.getNamespace("/ares/xml_doc/schemas/ares/ares_answer_basic/v_1.0.3");
var ns2 = XmlService.getNamespace("/ares/xml_doc/schemas/ares/ares_datatypes/v_1.0.3");
var res = root.getChild("Odpoved", ns1).getChild("VBAS", ns2).getChild("ICO", ns2).getText();
Logger.log(res)
}
When above script is run, 06018025 is retrieved.
When http://wwwinfo.mfcr.cz/cgi-bin/ares/darv_bas.cgi?ico=27074358&xml=1 is used as the URL of UrlFetchApp.fetch, 27074358 is obtained.
References:
XML Service
Added:
From your replying of Any idea why var res2 = root.getChild("Odpoved", ns1).getChild("VBAS", ns2).getChild("DIC", ns2).getText(); does not work?, now I noticed that your question had been changed.
In your question, you wanted to retrieve the value of ICO. But in the case for retrieving the value of DIC, it is required to check the structure of XML. Because in your script in your question, the XML from var url = 'https://wwwinfo.mfcr.cz/cgi-bin/ares/darv_bas.cgi?' + 'ico=06018025' + '&xml=1'; doesn't include the value of DIC. I think that this is the reason of your issue.
When you want to retrieve the value of DIC from http://wwwinfo.mfcr.cz/cgi-bin/ares/darv_bas.cgi?ico=27074358&xml=1, please use the following script.
Modified script:
function getARES() {
var url = 'http://wwwinfo.mfcr.cz/cgi-bin/ares/darv_bas.cgi?ico=27074358&xml=1'; // <--- Modified
var response = UrlFetchApp.fetch(url);
var responseText = response.getContentText(); //.replace(/D:/g,'');
var document = XmlService.parse(responseText);
var root = document.getRootElement();
var ns1 = XmlService.getNamespace("/ares/xml_doc/schemas/ares/ares_answer_basic/v_1.0.3");
var ns2 = XmlService.getNamespace("/ares/xml_doc/schemas/ares/ares_datatypes/v_1.0.3");
var res = root.getChild("Odpoved", ns1).getChild("VBAS", ns2).getChild("DIC", ns2).getText(); // <--- Modified
Logger.log(res) // In this case, CZ27074358 is retrieved.
}
Note:
About the name space, these threads might be useful.
What are XML namespaces for?
How does XPath deal with XML namespaces?

Related

Apps Script Error - Cannot find method getRange(number,number,(class),number)

I've written a custom Google Apps Script that will pull some data (2 columns wide, 50-100 rows long but this varies)in an array 2 from an API, parse it into JSON and then paste into a google sheet.
I can run the script from the editor and it works ok. But when I try to run it from a custom menu or when I run the debugger I get the following error:
'Cannot find method getRange(number,number,(class),number) (line 43)'
Line 43 is the last line of the code.
sheet.getRange(3,1,dataSet.length,2).setValues(rows);
It seems that the issue is that getRange is not able to use the variable of length of the dataset (number of rows) to set the number of rows to use in the range in which the data is to be pasted.
I cannot work out how to fix this - can anyone else see what I am doing wrong? Thanks for taking a look.
//custom menu
function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu('XXXX Data')
.addItem('Credit Limits','CREDITLIMITS')
.addToUi();
}
function CREDITLIMITS() {
var ss = SpreadsheetApp.getActiveSpreadsheet(); //get active spreadsheet
var sheet = ss.getActiveSheet();
// var sheet = ss.getSheetByName('data'); //get sheet by name from active spreadsheet
// URL and params for the API
var USERNAME = 'XXXXXXX';
var PASSWORD = 'XXXXXXXXXXXXX';
var url = 'https://api.XXXX.com/api/v1/XXX/?where=type=%27XXXXXXX%27'; // var url="http://example.com/feeds?type=json"; // Paste your JSON URL here
var authHeader = 'Basic ' + Utilities.base64Encode(USERNAME + ':' + PASSWORD);
var params = {'method': 'GET','muteHttpExceptions': true,'headers': {'Authorization': authHeader,} };
//call the XXXX API
var response = UrlFetchApp.fetch(url, params); // get api endpoint
var json = response.getContentText(); // get the response content as text
var dataAll = JSON.parse(json); //parse text into json
var dataSet = dataAll;
//create empty array to hold data points
var rows=[],
data;
//loop over the retrun events
for (i=0; i < dataSet.length; i++) {
data = dataSet[i];
//push a row of data as 2d array
rows.push([data.company, data.creditLimit]);
}
// clear any previous content
sheet.getRange(1,1,500,10).clearContent();
// write data to sheet
sheet.getRange(3,1,dataSet.length,2).setValues(rows);
}

Error on line 1: Content is not allowed in prolog

I am trying to scrape a table of price data from this website using the following code;
function scrapeData() {
// Retrieve table as a string using Parser.
var url = "https://stooq.com/q/d/?s=barc.uk&i=d";
var fromText = '<td align="center" id="t03">';
var toText = '</td>';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser.data(content).from(fromText).to(toText).build();
//Parse table using XmlService.
var root = XmlService.parse(scraped).getRootElement();
}
I have taken this method from an approach I used in a similar question here however its failing on this particular url and giving me the error;
Error on line 1: Content is not allowed in prolog. (line 12, file "Stooq")
In related questions here and here they talk of textual content that is not accepted being submitted to the parser however, I am unable to apply the solutions in these questions to my own problem. Any help would be much appreciated.

How about this modification?
Modification points:
In this case, it is required to modify the retrieved HTML values. For example, when var content = UrlFetchApp.fetch(url).getContentText() is run, each attribute value is not enclosed. These are required to be modified.
There is a merged column in the header.
When above points are reflected to the script, it becomes as follows.
Modified script:
function scrapeData() {
// Retrieve table as a string using Parser.
var url = "https://stooq.com/q/d/?s=barc.uk&i=d";
var fromText = '#d9d9d9}</style>';
var toText = '<table';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser.data(content).from(fromText).to(toText).build();
// Modify values
scraped = scraped.replace(/=([a-zA-Z0-9\%-:]+)/g, "=\"$1\"").replace(/nowrap/g, "");
// Parse table using XmlService.
var root = XmlService.parse(scraped).getRootElement();
// Retrieve header and modify it.
var headerTr = root.getChild("thead").getChildren();
var res = headerTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})});
res[0].splice(7, 0, "Change");
// Retrieve values.
var valuesTr = root.getChild("tbody").getChildren();
var values = valuesTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})});
Array.prototype.push.apply(res, values);
// Put the result to the active spreadsheet.
var ss = SpreadsheetApp.getActiveSheet();
ss.getRange(1, 1, res.length, res[0].length).setValues(res);
}
Note:
Before you run this modified script, please install the GAS library of Parser.
This modified script is not corresponding to various URL. This can be used for the URL in your question. If you want to retrieve values from other URL, please modify the script.
Reference:
Parser
XmlService
If this was not what you want, I'm sorry.

Downloading a Google Slides presentation as PowerPoint doc using Google Apps Script?

The GUI of Google Slides offers to download a GSlides presentation as a Powerpoint (myFile.pptx). I could not find the equivalent in the Google Apps Script documentation - any pointer?
EDIT
Thanks to comments and answers, I tried this snippet:
function testFileOps() {
// Converts the file named 'Synthese' (which happens to be a Google Slide doc) into a pptx
var files = DriveApp.getFilesByName('Synthese');
var rootFolder = DriveApp.getRootFolder();
while (files.hasNext()) {
var file = files.next();
var blobPptx = file.getBlob().getAs('application/vnd.openxmlformats-officedocument.presentationml.presentation');
var result = rootFolder.createFile(blobPptx);
}
}
It returns an error:
Converting from application/vnd.google-apps.presentation to
application/vnd.openxmlformats-officedocument.presentationml.presentation
is not supported. (line 7, file "Code")
SECOND EDIT
As per another suggestion in comments, I tried to make an http call from Google App Script, that would directly convert the gslides into pptx, without size limit. It produces a file on G Drive, but this file is corrupted / unreadable. The GAS script:
function convertFileToPptx() {
// Converts a public Google Slide file into a pptx
var rootFolder = DriveApp.getRootFolder();
var response = UrlFetchApp.fetch('https://docs.google.com/presentation/d/1Zc4-yFoUYONXSLleV_IaFRlNk6flRKUuAw8M36VZe-4/export/pptx');
var blobPptx = response.getContent();
var result = rootFolder.createFile('test2.pptx',blobPptx,MimeType.MICROSOFT_POWERPOINT);
}
Notes:
I got the mime type for pptx here
using the mime type 'pptx' returns the same error message

How about this modification?
Modification point:
response.getContent() returns byte array. So please use response.getBlob().
Modified script:
function convertFileToPptx() {
var fileId = "1Zc4-yFoUYONXSLleV_IaFRlNk6flRKUuAw8M36VZe-4";
var outputFileName = "test2.pptx";
var url = 'https://docs.google.com/presentation/d/' + fileId + '/export/pptx';
var rootFolder = DriveApp.getRootFolder();
var response = UrlFetchApp.fetch(url);
var blobPptx = response.getBlob();
var result = rootFolder.createFile(blobPptx.setName(outputFileName));
}
Note:
If you want to convert Google Slides, which are not published, in your Google Drive, please use access token. At that time please modify url as follows.
var url = 'https://docs.google.com/presentation/d/' + fileId + '/export/pptx?access_token=' + ScriptApp.getOAuthToken();
DriveApp.createFile() creates a file on root folder as the default.
References:
Class HTTPResponse
getOAuthToken()

As mentioned by tehhowch, you could get the Google Slide file from your Drive and get it as a .pptx. (Not sure of mime type.)
File#getAs:

I add all modifications with token part and specific folder
function convertFileToPptx() {
var fileId = "Your File ID";
var outputFileName = "name.pptx";
var url = 'https://docs.google.com/presentation/d/' + fileId + '/export/pptx';
//var rootFolder = DriveApp.getRootFolder();
var rootFolder = DriveApp.getFolderById("Your Folder ID")
var params = {method:"GET", headers:{"authorization":"Bearer "+ ScriptApp.getOAuthToken()}};
var response = UrlFetchApp.fetch(url,params);
var blobPptx = response.getBlob();
var result = rootFolder.createFile(blobPptx.setName(outputFileName));
}

To get the byte[] do:
function downloadAsPPTX(){
var presentation = SlidesApp.getActivePresentation();
var fileId = presentation.getId();
var url = 'https://docs.google.com/presentation/d/' + fileId + '/export/pptx';
var response = UrlFetchApp.fetch(url);
var blobPptx = response.getBlob();
Logger.log("size: "+blobPptx.getBytes().length);
}

Google Script - Unzipping a Mimetype application/x-zip

I regularly download a .zip file (that isn't created by me) that is Mimetype "application/x-zip" which isn't being processed by Utilities.unzip(file) with the following errors. I am hoping someone is able to help me parse this file.
Thanks.
var thisFile = DriveApp.getFileById(newestFileId);
//Successfully gets the file as verified by other processes
var thisBlob = thisFile.getBlob();
var createMe = thisFolder.createFile("workingb.zip", thisBlob, "application/zip");
//Creates a 4-byte file with the text "Blob"
var createMe = thisFolder.createFile("workingf.zip", thisFile, "application/zip");
//Creates an 11-byte file with the text the same as the filename.ext
Logger.log(thisFile.getMimeType());
// "application/x-zip"
var test1 = thisFile.getAs("application/zip");
//Exception: Converting from application/x-zip to application/zip is not supported
var thisUnzip = Utilities.unzip(test1);
//Exception: Invalid argument

How about this modification? I think that there are 2 patterns for your situation.
Pattern 1:
If the extension of filename is .zip, how about this modification?
Modified script:
var thisFile = DriveApp.getFileById(newestFileId);
var thisBlob = thisFile.getBlob();
var convertedBlob = thisBlob.setContentTypeFromExtension();
var thisUnzip = Utilities.unzip(convertedBlob);
Pattern 2:
If the filename has no extension or the extension is the value except for .zip, how about this modification?
Modified script:
var thisFile = DriveApp.getFileById(newestFileId);
var thisBlob = thisFile.getBlob();
var convertedBlob = thisBlob.setContentType("application/zip");
var thisUnzip = Utilities.unzip(convertedBlob);
References:
setContentTypeFromExtension()
setContentType(contentType)
If these were not what you want, I'm sorry.

google app script Exceeded memory limit

May be this question already asked, but that won't solve my problem.
I try to save data's into google spreadsheet using google app script. But it shows Exceeded memory limit error.
following my code:
//new
function getNewTitle() {
var url = "https://www.reddit.com/r/DigitalMarketing.rss?limit=100&after=0";
var fromText = '</updated><title>';
var toText = '</title>';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser.data(content).from(fromText).to(toText).iterate();
return scraped;
}
function getNewContent() {
var url = "https://www.reddit.com/r/DigitalMarketing.rss?limit=10&after=0";
var content = UrlFetchApp.fetch(url).getContentText();
var document = XmlService.parse(content);
var root = document.getRootElement();
var atom = XmlService.getNamespace('http://www.w3.org/2005/Atom');
Logger.log(atom);
var fromText = '<content type="html"><!-- SC_OFF --><div class="md"><p>';
var toText = '</div>';
var scraped = Parser.data(content).from(fromText).to(toText).iterate();
return scraped;
}
function getNewLink() {
var url = "https://www.reddit.com/r/DigitalMarketing.rss?limit=10&after=0";
var fromText = '<link href="';
var toText = '" /><updated>';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser.data(content).from(fromText).to(toText).iterate();
return scraped;
}
function SAVE_DATA() {
var sheet = SpreadsheetApp.openById('1No3m_FnhyxIaxj2zSlbHrg8HLBJULGQ2bda65hpKlyY').getSheetByName('sample');
var content = getNewContent();
var title = getNewTitle();
var link = getNewLink();
Logger.log(title[1]);
for(var i =0; i < title.length; i++) {
sheet.appendRow([ 'Reddit','wordpress', title[i], link[i], content[i]]);
}
}
//new
In my above code am tried to save the data from url.
But i get Exceeded memory limit error.
In my Log i got this message
[18-07-21 05:33:29:719 PDT] [Namespace: prefix "" is mapped to URI "http://www.w3.org/2005/Atom"]
Please help me to fix this error...!
Thanks in advance.

I think that the reason of the error is that </div> of var toText = '</div>'; is not included in content retrieved from https://www.reddit.com/r/DigitalMarketing.rss?limit=10&after=0. So how about this modification?
Modification points :
</div> of var toText = '</div>'; is not included in content. So in this modification, I used </content>. Because you are using '<content type="html"><!-- SC_OFF --><div class="md"><p>' for fromText.
setValues() instead of appendRow() is used for putting the values.
You can see the difference of the cost between setValues() and appendRow() at here.
Modified script :
1. For getNewContent()
Please modify from
From :
var toText = '</div>';
To :
var toText = '</content>';
2. For SAVE_DATA()
Please modify as follows.
function SAVE_DATA() {
var sheet = SpreadsheetApp.openById('1No3m_FnhyxIaxj2zSlbHrg8HLBJULGQ2bda65hpKlyY').getSheetByName('sample');
var content = getNewContent();
var title = getNewTitle();
var link = getNewLink();
var values = title.map(function(e, i){return [e, link[i], content[i]]});
sheet.getRange(sheet.getLastRow() + 1, 1, values.length, values[0].length).setValues(values);
}
Note :
In this modification, I used var toText = '</content>'; for getNewContent(). If you want to retrieve other range of the site, please modify this.
About the URL, limit=100 for the title is set. But limit=10 is set for the link and content. So when the values are retrieved and put them to Spreadsheet, link and content become undefined from 11 row.
If you have already known this, please ignore this.
Reference :
Easy data scraping with Google Apps Script in 5 minutes
Parser is a GAS library. You can check at here.
If I misunderstand your question, I'm sorry.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

how to parse (this) xml using Google Script - google-apps-script

Related

Apps Script Error - Cannot find method getRange(number,number,(class),number)

Error on line 1: Content is not allowed in prolog

Downloading a Google Slides presentation as PowerPoint doc using Google Apps Script?

Google Script - Unzipping a Mimetype application/x-zip

google app script Exceeded memory limit

Categories

Resources