Compare html responses using node js

Compare html responses using node js - html

I have a case where I send a request to a server and record the response. Then I craft the request and send it to server one more time and compare the response with the earlier recorded response.
I am using node.js and I want to know is there any best routine to compare HTML response in node.js which can directly point me the differences in both HTML responses.

Take a look at jsdiff, it can return to you the differences between two pieces of text, or HTML in your case, at a few different levels (chars, words, lines).

You can use a combination of jsdom and dom-compare:
var compare = require('dom-compare').compare,
jsdom = require('jsdom');
// Those are the HTML fragments that we want to compare:
var expectedHTML = '<div><i>m</i><b>q</b></div>';
var actualHTML = '<div><i>h</div>';
var expectedDOM = jsdom.jsdom(expectedHTML);
var actualDOM = jsdom.jsdom(actualHTML);
var result = compare(expectedDOM, actualDOM);
console.log('diff array:', result.getDifferences());
// we can use a reporter to pretty-print the result:
var reporter = require('dom-compare').GroupingReporter;
console.log(reporter.report(result));

Related

Why is this Importxml formula not working?

The following formula does work for some, but not for others:
=IFNA(VALUE(IMPORTXML("https://finance.yahoo.com/quote/C2PU.SI", "//*[#class=""D(ib) Mend(20px)""]/span[1]")))
If used without IFNA, it says 'Resource at url not found'.
Here's the value I'm trying to pull in:
I appreciate if you could point me to the right direction.
Thank you!

It does not return any values even for simple importxml.
It seems the site is generated by javascript or protected so it can't be scraped by importxml.

Don't use the "inspect" tool as it will show the DOM as it's being rendered by the web browser including modifications to the source code by client-side JavaScript, instead look at the source code.
Resources
How to know if Google Sheets IMPORTDATA, IMPORTFEED, IMPORTHTML or IMPORTXML functions are able to get data from a resource hosted on a website?

The structure of the DOM is generated by javascript. Nevertheless, all informations you need are contained by a json string called here root.App.main. You can get all the data by these way
function extract(url){
var source = UrlFetchApp.fetch(url).getContentText()
return source.match(/(?<=root.App.main = ).*(?=}}}})/g) + '}}}}'
}
and then retrieve the data by conventionnal json parsing. This will give you the value
[![function marketPrice() {
var code = 'C2PU.SI'
var url='https://finance.yahoo.com/quote/' + code
var source = UrlFetchApp.fetch(url).getContentText()
var jsonString = source.match(/(?<=root.App.main = ).*(?=}}}})/g) + '}}}}'
var data = JSON.parse(jsonString)
var regularMarketPrice = data.context.dispatcher.stores.StreamDataStore.quoteData.item(code).regularMarketPrice.raw
Logger.log(regularMarketPrice)
}
Object.prototype.item=function(i){return this\[i\]};][1]][1]

Parse JSON data from ASX into Google Sheets for Exchange Traded Products - not companies

I am trying to develop a Google Sheets-based portfolio tracking sheet that is able to retrieve daily prices for the securities in the Australian (ASX) and US markets.
For US market securities the GoogleFinance function works well enough. However for the ASX the ability for GoogleFinance to retrieve information is a bit hit and miss.
Ruben had asked a similar question to which Ian Finlay provided a solution that works in most instances, i.e. listed companies, but not for Exchange Traded Products that such as PMGOLD.
Ian Finlay's solution using apps script to parse json data was:
<code>
function AsxPrice(asx_stock) {
var url = "https://www.asx.com.au/asx/1/share/" + asx_stock +"/";
var response = UrlFetchApp.fetch(url);
var content = response.getContentText();
Logger.log(content);
var json = JSON.parse(content);
var last_price = json["last_price"];
return last_price;
}
For a 'normal' company such as NAB = asx_stock, the script works well, however for a exchange traded product such as PMGOLD, it does not.
With some basic searching an experimentation, the reason seems to be that the url that is in the script does not point to the information required.
For NAB = asx_stock, the url reponse is
{"code":"NAB","isin_code":"AU000000NAB4","desc_full":"Ordinary Fully Paid","last_price":23.77,"open_price":24.11,"day_high_price":24.21,"day_low_price":23.74,"change_price":-0.15,"change_in_percent":"-0.627%","volume":1469971,"bid_price":23.75,"offer_price":23.77,"previous_close_price":23.92,"previous_day_percentage_change":"-1.239%","year_high_price":27.49,"last_trade_date":"2021-01-29T00:00:00+1100","year_high_date":"2020-02-20T00:00:00+1100","year_low_price":13.195,"year_low_date":"2020-03-23T00:00:00+1100","year_open_price":34.51,"year_open_date":"2014-02-25T11:00:00+1100","year_change_price":-10.74,"year_change_in_percentage":"-31.121%","pe":29.12,"eps":0.8214,"average_daily_volume":6578117,"annual_dividend_yield":2.51,"market_cap":-1,"number_of_shares":3297132657,"deprecated_market_cap":78636614000,"deprecated_number_of_shares":3297132657,"suspended":false}
However, for PMGOLD = asx_stock, the url reponse is:
{"code":"PMGOLD","isin_code":"AU000PMGOLD8","desc_full":"Perth Mint Gold","suspended":false}
Conducting some relatively 'non-code qualified person' type research, looks like the actual url for an Exchange Listed Product should be:
https://www.asx.com.au/asx/1/share/PMGOLD/prices?interval=daily&count=1
The url reponse for this is:
{"data":[{"code":"PMGOLD","close_date":"2021-01-28T00:00:00+1100","close_price":24.12,"change_price":0.19,"volume":98132,"day_high_price":24.2,"day_low_price":23.9,"change_in_percent":"0.794%"}]}
When I substitute this url into Ian Finlay's code and rename the var as 'close_price' instead of 'last_price' there is nothing retrieved. The code used is:
function AskPrice(asx) {
var url = "https://www.asx.com.au/asx/1/share/"+ asx +"/prices?interval=daily&count=1";
var response = UrlFetchApp.fetch(url);
var content = response.getContentText();
Logger.log(content);
var json = JSON.parse(content);
var data = json["data"];
return data;
}
I suspect this is due to the structure of the url response being formatted differently for the two different url types. Maybe nested? - I am not sure.
Can someone please help point out what mistake(s) I am making?
Thank you

Yes, the structure is different. I've done this in Python so I know exactly your problem.
The 1/share API (first example) returns a simple dictionary of name:value pairs so you can easily reference the value.
The "prices" version gives you a list of daily values under the data element. Even though your example only returns one day, it is a list with one value. (Notice the [square] brackets around it?
So you need to go to the "data" element to get the list, then reference the first (only) item of the list and then reference close_price.
I don't know this language but it's probably something like:
var data = json["data"][0]["close_price"];
Let me know if this helps.

Is there a way to modify json received callback?

I'm receiving a callback from a server in my Google app script via doPost.
The problem is, my Json format is with a this word in front of the Json "Data=", because of that I'm not able to work with the Json callback.
The code:
Function doPost(e){
var r = e.postdata.contents
Logger.log(r)
}
I'm receiving the bellow format.
data={""retorno"":{""estoques"":[{""estoque"":{""codigo"":""001a"",""nome"":""M\u00e1scara 100% Algod\u00e3o Lav\u00e1vel Dupla Prote\u00e7\u00e3o - 10 Unidades"",""estoqueAtual"":50,""depositos"":[{""deposito"":{""id"":7939278964,""nome"":""Geral"",""saldo"":""50.0000000000"",""desconsiderar"":""N"",""saldoVirtual"":""50.0000000000""}}]}}]}}
Anyway to remove this "Data="?
Thanks

If you just want to remove the substring data= - the easiest would be to use the method silce()
Sample:
var r = e.postdata.contents;
var sliced = r.toString().slice(5);
Logger.log(slide);

How to get the currency information from this site

I'm trying to bring to my google sheets the currency information from the site:
https://www.bbva.mx/personas/informacion-financiera-al-dia.html
I'm trying to use IMPORTHTML and IMPORTXML but none of this is working for me
The information I need is this
Any help on this please ???
Maybe using Apps scripts ?
Edit:
this is the code im using
function fetchData() {
var url = 'https://www.bbva.mx/personas/informacion-financiera-al-dia.html';
var dolarTable = UrlFetchApp.fetch(url).getContentText();
Logger.log(dolarTable)
var match = dolarTable.match(/Dólar(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(<\/tr>)/);
var string = match[0].replace(/(\r\n|\n|\r)/gm," ");
string = string.replace(/\s/g, "");
var dollar = string.search("\\$");
var value = string.indexOf("$", dollar + 1);
var substrings = string.substring(value);
var almostThere = substrings.substring(0).indexOf("<");
var number = substrings.substring(0, almostThere);
return SpreadsheetApp.getActiveSpreadsheet().getSheets[0].getRange('A1').setValue(number);
}
getting this error
Regular expression operation exceeded execution time limit (line 5, file "Code")

Okay so the problem you're running into here is that while in Sheets, the IMPORTHTML and IMPORTXML Imports data from a table or list within an HTML page, the webpage you're trying to access is using active server scripts to generate the HTML content.
In Apps Script, there is a built-in UrlFetchApp class which you can use to get HTML data - it has its own limitations, but allows you to get the data from a page into your script for usage.
The page you're trying to get uses a frame that contains an .aspx file, and it's this generated content that has the information you're trying to retrieve. Honestly, this solution is a little ad-hoc as I've used UrlFetchApp.fetch() to get the data, then used regular expressions and built-in JavaScript string functions to get the information out as generically as I can:
function fetchData() {
var dolarTable = UrlFetchApp.fetch('https://bbv.infosel.com/bancomerindicators/indexv8.aspx').getContentText();
var match = dolarTable.match(/Dólar(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(<\/tr>)/);
var string = match[0].replace(/(\r\n|\n|\r)/gm," ");
string = string.replace(/\s/g, "");
var dollar = string.search("\\$");
var value = string.indexOf("$", dollar + 1);
var substrings = string.substring(value);
var almostThere = substrings.substring(0).indexOf("<");
var number = substrings.substring(0, almostThere);
SpreadsheetApp.getActiveSpreadsheet().getSheets()[0].getRange('A1').setValue(number);
}
This will fetch the HTML data of the page, then reduce what you're looking for by substring filtering. I've kept it generic so as long as the structure of the page doesn't change too much, it should still work even if the value of the amount changes.

Not able to scrape data

I am just starting out in Google Apps Script. Since best coding practices recommend using as few sheet formulas as possible I am trying to do my web scraping with GAS Parser then push the data over to my spreadsheet.
Within my sheet using the below formula returns a table of data which is exactly what I am looking for from GAS.
=IMPORTHTML("https://finance.yahoo.com/quote/BOO.L/history?p=BOO.L", "table", 1)
The two questions here & here are similar but trying those methods also fail. It almost seems like I am not getting the full page content since when I view data in Logger.log() after the code below I am not getting anything that resembles the page I need.
UrlFetchApp.fetch(url).getContentText();
Since running the formula seems to get the data perfectly I can only assume the problems with my own code but can't figure where. Here is the code I have tried thus far;
function scrapeData() {
var url = "https://finance.yahoo.com/quote/BARC.L/history?p=BARC.L";
var fromText = '<td class="Py(10px) Ta(start) Pend(10px)"><span>';
var toText = '</span></td>';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser
.data(content)
.from(fromText)
.to(toText)
.iterate();
Logger.log(scraped)
}
Any guidance much appreciated.

You want to retrieve and put the values from the URL to Spreadsheet using Google Apps Script.
If my understanding is correct, how about this modification? I think that there are several answers for your situation. So please think of this as one of them.
Modification points:
In order to retrieve the table, I used Parser and XmlService.
Retrieve the table as the string value using Parser.
Parse the table using XmlService. I think that XmlService makes us easily parse the table.
XmlService is the strong parsing tool of XML. So when this can be used to HTML, it makes us retrieve the values from HTML more easily. However, recently, the most HTML cannot be directly parsed by XmlService. So I always use this flow.
Modified script:
function scrapeData() {
// Retrieve table as a string using Parser.
var url = "https://finance.yahoo.com/quote/BOO.L/history?p=BOO.L";
// var url = "https://finance.yahoo.com/quote/BARC.L/history?p=BARC.L";
var fromText = '<div class="Pb(10px) Ovx(a) W(100%)" data-reactid="30">';
var toText = '<div class="Mstart(30px) Pt(10px)"';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser.data(content).from(fromText).to(toText).build();
// Parse table using XmlService.
var root = XmlService.parse(scraped).getRootElement();
// Retrieve header
var headerTr = root.getChild("thead").getChildren();
var res = headerTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})});
var len = res[0].length;
// Retrieve values
var valuesTr = root.getChild("tbody").getChildren();
var values = valuesTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})})
.map(function(e) {return e.length == len ? e : e.concat(Array.apply(null, new Array(len - e.length)).map(String.prototype.valueOf,""))});
Array.prototype.push.apply(res, values);
// Put the result to the active spreadsheet.
var ss = SpreadsheetApp.getActiveSheet();
ss.getRange(1, 1, res.length, res[0].length).setValues(res);
}
Note:
Before you run this modified script, please install the GAS library of Parser.
In my environment, I could confirmed that the modified script works for both p=BOO.L and p=BARC.L. I couldn't confirm others. So when you tried others, if an error occurs, please modify the script.
Reference:
Parser
XmlService
If this was not what you want, I'm sorry.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Compare html responses using node js - html

Take a look at jsdiff, it can return to you the differences between two pieces of text, or HTML in your case, at a few different levels (chars, words, lines).

Related

Why is this Importxml formula not working?

Parse JSON data from ASX into Google Sheets for Exchange Traded Products - not companies

Is there a way to modify json received callback?

How to get the currency information from this site

Not able to scrape data

Categories

Resources