Google App script for scraping WSJ and Yahoo Finance - html

I am trying to pull data from WSJ and yahoofiance using google sheet app script.
I was able to pull some data through following code with HTML of current price from following page ie <span id="quote_val">(.+?)<\/span>....please note that it containing Id .
Now i am trying to pull target price with HTML of <span class="data_data"><sup>$</sup>50.80</span> . This will not give me desired result . please note that it contains Class not id..
When we choose url as lets say https://www.wsj.com/market-data/quotes/PCT/research-ratings
function SAMPLE(url) {
const html = UrlFetchApp.fetch(url).getContentText();
const res = html.match(/<span id="quote_val">(.+?)<\/span>/);
if (!res) throw new Error("Value cannot be retrieved.")
return isNaN(res[1]) ? res[1] : Number(res[1]);
}
Is there a simple solution?

I would tackle this in two steps:
Use a regular expression to extract the subsection of the web page that you are interested in - specifically the "Stock Price Target" table.
Parse that <table>...</table> string into an HTML document and then iterate over the nodes in that document to extract each relevant item from the table.
function scrapeDemo() {
var url = 'https://www.wsj.com/market-data/quotes/PCT/research-ratings';
var html = UrlFetchApp.fetch(url).getContentText();
var res = html.match(/<div class="cr_data rr_stockprice module">.+?(<table .+?<\/table>)/);
var document = XmlService.parse(res[1]);
var root = document.getRootElement();
var trNodes = root.getChild('tbody').getChildren();
trNodes.forEach((trNode) => {
var tdNodes = trNode.getChildren();
var fieldName;
var fieldValue;
tdNodes.forEach((tdNode, idx) => {
if (idx % 2 === 0) {
fieldName = tdNode.getValue().trim();
} else {
fieldValue = tdNode.getValue().trim();
console.log( fieldName + " : " + fieldValue );
}
} );
} );
}
The regular expression uses this as its starting point:
<div class="cr_data rr_stockprice module">
This is because we need a reliably unique element which is a parent of the table we want (the table itself does not contain anything which uniquely identifies it).
This gives us the table in the res[1] captured group. Here is that HTML:
<table class="cr_dataTable">
<tbody>
<tr>
<td>
<span class="data_lbl">High</span>
</td>
<td>
<span class="data_data">
<sup>$</sup>48.00</span>
</td>
</tr>
<tr>
<td>
<span class="data_lbl">Median</span>
</td>
<td>
<span class="data_data">
<sup>$</sup>37.50</span>
</td>
</tr>
<tr>
<td>
<span class="data_lbl">Low</span>
</td>
<td>
<span class="data_data">
<sup>$</sup>24.00</span>
</td>
</tr>
<tr class="highlight">
<td>
<span class="data_lbl">Average</span>
</td>
<td>
<span class="data_data">
<sup>$</sup>36.75</span>
</td>
</tr>
<tr>
<td>
<span class="data_lbl">Current Price</span>
</td>
<td>
<span class="data_data">
<sup>$</sup>24.18</span>
</td>
</tr>
</tbody>
</table>
Now we perform step 2 using XmlService.parse() to create a mini-XML document containing our HTML.
Then we iterate over the elements of that document, by drilling down into each level's child nodes.
Each field's value is written to the console, so for this table...
...we get this data:
High : $48.00
Median : $37.50
Low : $24.00
Average : $36.75
Current Price : $24.18
In my experience, doing this type of scraping can be difficult. Any unexpected changes in web page structure, from one page to another, can cause the regular expression to fail, or cause the drill-down into the table to fail. In other words, this type of approach should work, but it may also break unexpectedly.

Related

Cheerio not finding table content

I need to parse an HTML containg a table.
<div>
<table id="tableID">
<tr>
<td class="tdClass">
<span id="id1">Some data i need to access</span>
</td>
<td class="tdClass">
<span id="id2">Some data i need to access</span>
</td>
</tr>
</table>
</div>
I'm using cheerio on a NW.js app. I can't figure out how to access the datas, I've tried with span's ids, but it doesn't work.
The div is contained in the body of the page.
var $ = cheerio.load(html)
alert($('#id1').html())
I'm getting null when I'm trying to alert the content of the span.
Try this:
$ = cheerio.load(html, {normalizeWhitespace: false, xmlMode: true});

HTML - Change display order of entire tables on a page

I have a list of tables with various elements on a page. I want to have the display order of the various tables change randomly each time a page is loaded. Any ideas on how to do this? For reference, the code below shows the first two tables. Say I wanted to randomly change their display order - how would I do that?
<table>
<tr>
<td class="lender-logo" width="200" height="168x"><img src="http://www.texaspaceauthority.org/wp-content/uploads/2015/05/CleanFund_LOGO.jpg" alt="Clean Fund LLC" width="200" />
</td>
<td width="15px"></td>
<td width="340px">
<strong>Clean Fund LLC</strong>
<span style="font-size: small;"><strong>Preferred Financing Range:</strong> $500K - $15M
<strong>Types of Projects:</strong> Any
<strong>Contact:</strong> Josh Kagan
www.cleanfund.com
</span>
</td>
</tr>
</table>
<hr />
<table>
<tr>
<td class="lender-logo" width="200" height="168x"><img src="http://www.texaspaceauthority.org/wp-content/uploads/2015/05/Greenworks-Lending-Logo.jpg" alt="Greenworks Lending" width="200" />
</td>
<td width="15px"></td>
<td width="340px">
<strong>Greenworks Lending</strong>
<span style="font-size: small;"><strong>Preferred Financing Range:</strong> $30K - $5M
<strong>Types of Projects:</strong> Any Eligible Technologies and Properties
<strong>Contact:</strong> azech#greenworkslending.com
www.greenworkslending.com
</span>
</td>
</tr>
</table>
Are you able to use javascript on that page? My suggestion would be to write a javascript function that selects the table elements and then appends them back to their parent element in a random order.
This shows how to do with jQuery. You would need to use some JavaScript to accomplish this and jQuery is one option. You need a language like JavaScript to do this type of dynamic content.
https://css-tricks.com/snippets/jquery/shuffle-dom-elements/
Here is the code on the page in case it gets deleted, this was take from that page:
$.fn.shuffle :
(function($){
$.fn.shuffle = function() {
var allElems = this.get(),
getRandom = function(max) {
return Math.floor(Math.random() * max);
},
shuffled = $.map(allElems, function(){
var random = getRandom(allElems.length),
randEl = $(allElems[random]).clone(true)[0];
allElems.splice(random, 1);
return randEl;
});
this.each(function(i){
$(this).replaceWith($(shuffled[i]));
});
return $(shuffled);
};
})(jQuery);
And the usage is as follows:
// Shuffle all list items within a list:
$('ul#list li').shuffle();
// Shuffle all DIVs within the document:
$('div').shuffle();
// Shuffle all <a>s and <em>s:
$('a,em').shuffle();
In your case, you would:
Include jQuery js file in your webpage
Save the code for $.fn.shuffle and save in a js file, and in include that in your webpage
Include the javescript to call the shuffle: $('table').shuffle();

Cycle through input fields

I have a table which I'm populating with some data I get from a database query
Table example
<tr ng-repeat="row in reservasTable">
<td> {{row.L}} </td>
<td> {{row.number}} </td>
<td> {{row.line}} </td>
<td> {{row.cod_art}} </td>
<td> {{row.creation_date}} </td>
<td> {{row.deadline}} </td>
<td> {{row.qtt_ordered}} </td>
<td> {{row.qtt_delivered}} </td>
<td> {{row.ocor}} </td>
<td> <input type="text" id="qttField" ng-model-onblur ng-model="qtt" ui-keypress="{13:'setQtt($event)'}">
</tr>
The only field I don't get from the JSON query is the last one, an input field which is used to fill with a certain amount (Quantity).
What I need to do is: after I fill whatever fields I want, (e.g., the 1st, 4rd and last) I need to cycle through those fields, check which ones are filled and get the value from them.
I can't seem to get the value just from the model, since the model is the same for every field, that's why I'm currently using the 'Enter' button to update the value and send it to an array:
Simple Version:
$scope.arrayQtt = [];
var i = 0;
$scope.setQtt = function(evt){
$scope.arrayQtt[i] = evt.srcElement.value;
i++;
};
It is preferable to check all input fields with a 'Confirmation' button, after all fields are filled, since an user can edit a field the amount of times he wants before clicking the 'Confirmation' button.
Any help, advice or guidance is appreciated.
Thanks in advance!

Split string on MVC3 + razor view

I would like to get help and guide on how can i manipulate text/string that being called from MS-SQL Database. So here is my SettingController.cs partial code for index viewing:
public ActionResult Index()
{
var datacontext = new SGM_SIDDataContext();
var dataToView = from m in datacontext.SGMs
orderby m.Seq
select m;
return View(dataToView.ToList());
}
And This is my index.cshtml codes:
#model IEnumerable<MVC_Apps.Models.SGM>
#{
ViewBag.Title = "Index";
Layout = "~/Views/Shared/_Layout.cshtml";
}
<h2>Index</h2>
<p>
#Html.ActionLink("Create New", "Create")
</p>
<table>
<tr>
<th>
SID
</th>
<th>
Abbrev
</th>
<th>
Val
</th>
<th>
Seq
</th>
<th></th>
</tr>
#foreach (var item in Model)
{
<tr>
<td>
#Html.DisplayFor(modelItem => item.SID)
</td>
<td>
#Html.DisplayFor(modelItem => item.Abbrev)
</td>
<td>
#Html.DisplayFor(modelItem => item.Val)
</td>
<td>
#Html.DisplayFor(modelItem => item.Seq)
</td>
<td>
#Html.ActionLink("Edit", "Edit", new { id=item.GID })
</td>
</tr>
}
</table>
So roughly I got a View. Now here is what i would like to do:
1) The data 'item.Val' on Index.cshtml view will be like this:
A,L,C,H
and sort of like that. But the might be line where the data only contain:
H or C or none(Null value)
But if the data contain more than one char like say it does set for K and L, the data in item.Val will look like this:
A,L
Which the data being separate by comma. So now i want to split that item.Val data and do if statement on it. Where I would like to check every data in item.Val if it contain K or L or C or H or all of them or none of them(Sorry if my english is bad). And I would like to view it as , if all the data contain H so in table view, the will be column named Hotel while if the data also have L so another column of Lounge will be display with a check button being checked.
the possible char in item.Val is:
A = Admin
L = Lounge
H = Hotel
C = Customer
Any help and ideas much appreciated. Thank you in advanced.
Update information:
Thanks Im starting to get what is it. What i really want to do is, when the item.Val does contain H(which is for hotel) then the view will have table column with header name Hotel and at the record it will have tick checkbox.
This is sample picture of table view: http://imagebin.org/152941
But then the Hotel, Admin and User information either it is tick or not is in item.Val
For example for Smith, the item.Val data look like this : C,
For example for Anna , the item.Val data look like this : H,C,
P.S : - the var conf line is a test code. Ignore it as I already delete it from my source. :)
Create a ViewModel similar to
public class MyViewModel
{
// items from your model
public int Id{get;get;}
public String Vals{get;set;} // A,L,C,H
...
...
public bool IsHotel
{
get
{
return Vals.Split(',').Contains("H");
}
}
public bool IsAdmin
{
get
{
return Vals.Split(',').Contains("A");
}
}
public bool IsCust
{
get
{
return Vals.Split(',').Contains("C");
}
}
}
In the controller action
public ActionResult Index()
{
var datacontext = new SGM_SIDDataContext();
var dataToView = from m in datacontext.SGMs
orderby m.Seq
select new MyViewModel()
{
// items from your datacontext
Id= m.SID,
//etc...
};
var conf = from m in datacontext.SGMs // what's the purpose of this query??
select m.Val;
return View(dataToView.ToList());
}
The View will need to be updated, so include columns for Hotel, Cust, Admin and any others you need. I would suggest that you keep it simple and don't try to create the HTML columns on the fly. If you want to do this, you probably will need to inspect all Vals in every object of your list first to determine what columns are needed. In the view
#foreach (var item in Model)
{
<tr>
<td>
#Html.DisplayFor(modelItem => item.SID)
</td>
<td>
#Html.DisplayFor(modelItem => item.Abbrev)
</td>
<td>
#(item.IsHotel? "Checked" : "")
</td>
<td>
#(item.IsAdmin? "Checked" : "")
</td>
<td>
#(item.IsCust? "Checked" : "")
</td>
<td>
#Html.DisplayFor(modelItem => item.Seq)
</td>
<td>
#Html.ActionLink("Edit", "Edit", new { id=item.GID })
</td>
</tr>
}

How do I make LINQ to XML and HTML work together?

I have an html table
<table border="0" width="100%">
<tr class="headerbg">
<th width="5%">
No
</th>
<th width="30%">
Name
</th>
<th width="20%">
Department or Division
</th>
<th width="25%">
Email
</th>
<th width="20%">
Staff/Student
</th>
</tr>
<tr class="bg2">
<td>
1
</td>
<td>
<strong><a class="searchLink2" href="tel_search.php?fore=Dave&sur=Rumber">Dave Rumber</a></strong>
</td>
<td>
Medical School
</td>
<td>
<a class="searchLink2" href="mailto:Dave.Rumber#Home.com">Dave.Rumber#Home.com</a>
</td>
<td>
Student
</td>
</tr>
</table>
Sometimes there will be more than one row of people results.
I would like to be able to go through each row and pluck out the name and e-mail information and do some other processing. Put the data in a datagrid, and possibly into a database.
I guess my question is how do I do this?
string table = GetContents(buffer);
table = table.Replace(" ", "");
table = table.Replace("&", "&");
XElement inters = XElement.Parse(table);
I can put it into an XElement but I am not quite sure where to go from here!
Thanks!
Here's some freehand code that should get you started. Don't do this in production, this is an educational demonstration only.
List<XElement> rows = inters
.Descendants
.Where(x => x.Name == "tr")
.Skip(1) //header
.ToList();
//
// and now to turn rows into people
List<Person> people = rows
//filter to anchor. should be two.
.Select(r => r.Descendants.Where(a => a.Name = "a"))
//Project each anchor pair into a Person
.Select(g => new Person()
{
Name = g.First().Value,
Email = g.Skip(1).First().Value
})
.ToList();
You can actually use an HTML table as a data source for OLE DB:
http://connectionstrings.com/html-table
Full Disclosure: I haven't actually tried this - but I'm guessing it'll be much easier than trying to parse XML out of HTML.