unable to select a table within div using html agility pack - html

image of div tree
I am trying to scrape data from a table in a web page using htmlagilitypack.
Below is the html portion
<div id="table-matches" style="display: block;"><table class=" table-main"><colgroup><col width="50"><col width="*"><col width="50"><col width="50"><col width="50"><col width="50"><col width="50"></colgroup><tbody><tr class="dark center" xtid="28575"><th class="first2 tl" colspan="3"><a class="bfl" href="/hockey/usa/"><span class="ficon f-200"> </span>USA</a><span class="bflp">»</span>ECHL</th><th>1</th><th>X</th><th>2</th><th xparam="Number of available bookmakers odds~2">B's</th></tr><tr class="odd deactivate" xeid="pn36Jn1f"><td class="table-time datet t1496703900-1-1-0-0 ">04:35</td><td class="name table-participant">South Carolina Stingrays - <span class="bold">Colorado Eagles</span><span class="ico-event-info" onmouseover="toolTip('Colorado Eagles wins series 4-0. 4th leg.', this, event, '4');allowHideTootip(false);delayHideTip(200);return false;" onmouseout="allowHideTootip(true);delayHideTip(200);"> </span></td><td class="center bold table-odds table-score">1:2</td><td class="odds-nowrp" xodd="1.91" xoid="E-2nrdfxv464x0x6av8v">1.91</td><td class="odds-nowrp" xodd="4.74" xoid="E-2nrdfxv498x0x0">4.74</td><td class="odds-nowrp result-ok" xodd="2.79" xoid="E-2nrdfxv464x0x6av90">2.79</td><td class="center info-value">1</td></tr><tr class="dark center" xtid="28308"><th class="first2 tl" colspan="3"><a class="bfl" href="/hockey/usa/"><span class="ficon f-200"> </span>USA</a><span class="bflp">»</span>NHL</th><th>1</th><th>X</th><th>2</th><th xparam="Number of available bookmakers odds~2">B's</th></tr><tr class="odd deactivate" xeid="EyxiHGE4"><td class="table-time datet t1496707200-1-1-0-0 ">05:30</td><td class="name table-participant"><span class="bold">Nashville Predators</span> - Pittsburgh Penguins<span class="ico-event-info" onmouseover="toolTip('Series tied 2-2. 4th leg.', this, event, '4');allowHideTootip(false);delayHideTip(200);return false;" onmouseout="allowHideTootip(true);delayHideTip(200);"> </span></td><td class="center bold table-odds table-score">4:1</td><td class="odds-nowrp result-ok" xodd="2.15" xoid="E-2ns9hxv464x0x6b2jp">2.15</td><td class="odds-nowrp" xodd="3.86" xoid="E-2ns9hxv498x0x0">3.86</td><td class="odds-nowrp" xodd="2.91" xoid="E-2ns9hxv464x0x6b2jq">2.91</td><td class="center info-value">55</td></tr></tbody></table></div>
I have been trying combination of properties to access the data within table but all i get is the initial node containing div.
Here is the code used by me
var html = #urlOddsportal;
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
//var html = new HtmlAgilityPack.HtmlDocument();
//html.LoadHtml(new WebClient().DownloadString(urlOddsportal)); // load a string
var root = htmlDoc.DocumentNode;
var node = root.SelectSingleNode("//div[#id='table-matches']"); //this returns non null
// all of the below functions return null value
var rows = node.SelectNodes(".//tr[#class='odd deactivate']");
var table = root.SelectSingleNode("//table[#class=' table-main']");
var tablerows = node.SelectNodes(".//table/tbody/tr[1]");// [#class='odd deactivate']");
var tabletag = htmlDoc.DocumentNode.SelectNodes("//table[#class='table-main']");
Can someone tell me where i am going wrong.
Thanks

Does this return var = table ?
var table = document.DocumentNode.Descendants("table").FirstOrDefault(_ => _.HasProperty("class", " table-main")
Has property =
public static bool HasProperty(this HtmlNode node, string property, params string[] valueArray)
{
var propertyValue = node.GetAttributeValue(property, "");
var propertyValues = propertyValue.Split(' ');
return valueArray.All(c => propertyValues.Contains(c));
}
If it does work you can try it out on the other nodes returning null
I prefere to use this method as it is easier to read than xcode formulas

Related

Get results from text box then continue actionResult

So I am trying to make a form on a asp.net website where a user loads a page in this case called Software once this page loads it checks if two variables are empty and if so it returns the Software View which then allows a user to fill in information for those said variables. The problem is when the user puts in the information the submit button doesn't actually submit the changes. If it did then the IF statement should no longer be valid and would be ignored so the page would be redirected to the PostSoftware page. My assumption is I might need some sort of javascript check for when submit button is clicked to do something but I am unsure.
Software Frontend:
<div class="jumbotron" style="background-color:#D1D3D4;">
<img src="/Images/TECH_BARlogoBLACK.png" style="width:1000px;height:200px;" class="animate__heartBeat">
<p class="lead" span style="background-color: #FFFF00;">Software Request</p>
<br>
<p class="text-left"><b>Enter Computer Name:</b></p>
#Html.TextBoxFor(x => x.computerName, new { id = "computerName", name = "computerName"})
<p style="font-size: 14px; text-align: left;"><b>To obtain your Computer Name: </b>Click Start (bottom left windows icon) and type in System Information, On the 5th line, your Computer Name is listed as the System Name <br></p>
<p class="text-left" style="width: 1000px;"><b>Business Justification:</b> <br> #Html.TextAreaFor(x => x.businessJustification, new { id = "businessJustification", name = "businessJustification", #cols = 40, #rows = 3,#style="width: 1000px;"}) </p>
<label for="Software">Select the Software(s) you need!</label>
<br>
<!-- NOTICE FROM HERE DOWN -->
#Html.DropDownListFor(model => #Model.selectedSoftwareList, new MultiSelectList(Model.softwareList, "Text", #Model.selectedSoftwareList),
new
{
id = "_SelectedSoftwareList",
#class = "form-control multiselect-dropdown",
multiple = "true",
style = "width:200px;height:300px;",
data_placeholder = "Select Software"
})
<br>
<br><button type="submit" class="btn-primary btn-lg" style="font-size: 20px">Submit</button>
</div>
Controller
public ActionResult Software(TicketModel model)
{
ViewBag.Message = "Request software through the Tech Bar.";
var snow = new clsServNowAuth();
var tbl = clsServNowAuth.SoftwareTable();
var tbl2 = clsServNowAuth.SoftwareTableID();
tbl.Wait();
tbl2.Wait();
model.softwareList = tbl.Result;
model.softwareListID = tbl2.Result;
if (String.IsNullOrEmpty(model.computerName) || String.IsNullOrEmpty(model.businessJustification))
{
// TODO
return View(model);
}
return RedirectToAction("PostSoftware", model);
}
[HttpPost]
public ActionResult PostSoftware(TicketModel model)
{
if (String.IsNullOrEmpty(model.computerName) || String.IsNullOrEmpty(model.businessJustification))
{
// TODO
return View(model);
}
// get user info from AD currently ZID and Email
List<string> lstUserInfo = clsADInterface.GetInfo(model);
var test = model.selectedSoftwareList;
// submit software request
clsServNowAuth.SoftwareRequest(lstUserInfo, model.computerName, model.businessJustification, model.softwareList, model.softwareListID);
return View(model);
}
}```
I was missing the form creation for the submit button
#using (Html.BeginForm(FormMethod.Post))
{ }

Create mailto hyperlink using AngularJS ng-repeat

I'm currently printing out our user list using ng-repeat.
<div ng-repeat="User in ac.Users | filter:ac.Search | limitTo:ac.Limit"
style="{{ac.Users.indexOf(User)%2 == 0 ? 'background-color:#f2f2f2' : 'background-color:white' }};">
<span style="font-weight:600;">{{User.FullName}}</span>
<span style="font-weight:600;">{{User.EmailAddress}}</span>
</div>
I was wondering is there anyway I can create single mailto: hyperlink adding all of the users email.
Group Mail
In the controller:
var arr = $filter('filter')($scope.ac.Users, $scope.ac.Search);
arr.length = $scope.ac.limit;
var emailArr = arr.map(_ => _.MailAddress);
$scope.mailRef = "mailto:" + emailArr.join(",");
HTML
Group Mail

Targeting elements of an a-tag with Nokogiri when classes dont work

I am trying to build a scraper and I would need some help with the following:
I would like to grab a bunch of data from an a-tag and some divs/spans nested in the same div.
My code look like this:
page = Nokogiri::HTML(open(website))
page.search('.company').each { |e| companies << e.text.strip }
page.search('.jobtitle').each { |e| jobtitles << e.text.strip }
page.search('.location').each { |e| locations << e.text.strip }
page.xpath('//a[#class="turnstileLink"]').map{ |e| links << e['href'] }
For the first three (company, title and location) I get either 16 or 15 results, but for the last search my array only contains 10 elements. Weirdly its they also dont match the first 10 of one of the other arrays, but rather start matching somewhere around the 3rd or 4th element of one of the other arrays.
The html of a typical card that I would like to target is here:
<div class="row result clickcard" id="pj_81c3e09223cbc6b3" data-jk="81c3e09223cbc6b3" data-advn="4563763653116462" data-tu="">
<a target="_blank" id="sja1" data-tn-element="jobTitle" class="jobtitle turnstileLink" href="/pagead/clk?mo=r&ad=-6NYlbfkN0DhDTzlYIMy8YIuVE6IrMC_kH05KGZgoAT6LTrcTn8STrwXoiuruouegXiAvJy4qud6xIecRibm3b0Q5eOBkpCiV3R04sAyQbvP7gt6NKZVpCRp32eFzXudmk-TIABX3xEZGo90a47Vz9OofqZaLDh37545RNQ3sFjM6VzWNEWwKf_YoXxeGKcAICj9AADyBuYAY7p9UIUxoox7J5U9gO8Zo2dvRW-i5FJtaUr49Vjsl04W0Jp-CN2azbfp6rrfT6RYFbJ_YAc2iI-L37eeygDtI4KXQwv_elrV8ZLEKo9rkcfEzbE129kX7JKeEq5wJ1dj7GJ4ONH1lIPJQd1gJLoqNYJVQlLTKJiBP72Z0RBmgfZQ-69U8AoEyMT6pytz6iqykLCnO-SxClmvFPJsNV96oBGzpMWtWQeVgGQ49jZfBBRq9Ubw7N73iEjCv6oQ70hcW1P4d8DYK0pCI7vu2KfUh0P9vx8AKC6wY2QoAZeeP4OiBIJ8ikKSIUYJTbe3UwKcLYP7r_3_rx1gY_JO1ReG21ctCxfqGH9DnqTSjz3SYCMZ2ZekooXa&vjs=3&p=1&sk=&fvj=1" title="Private Care Jobs With Elder - Immediate Start - £550 to £750 pw" rel="noopener nofollow" onmousedown="sjomd('sja1'); clk('sja1');" onclick="setRefineByCookie([]); sjoc('sja1',0); convCtr('SJ')">Private Care Jobs With Elder - Immediate Start - £550 to £75...</a>
<br>
<div class="sjcl">
<span class="company">
Elder</span>
<span class="location">London</span>
</div>
<div class="">
<table cellpadding="0" cellspacing="0" border="0"><tbody><tr><td class="snip">
<span class="summary">
Pass a full DBS check or have a valid check already. Access to the internet and a smartphone. At Elder, we’re looking for caring individuals to join our...</span>
</td></tr></tbody></table>
</div>
<div class="sjCapt">
<div class="result-link-bar-container">
<div class="result-link-bar"><span class=" sponsoredGray ">Sponsored</span> - <span id="tt_set_10" class="tt_set"><a id="sj_81c3e09223cbc6b3" href="#" class="sl resultLink save-job-link " onclick="changeJobState('81c3e09223cbc6b3', 'save', 'linkbar', true, ''); return false;" title="Save this job to my.indeed">save job</a></span><div id="editsaved2_81c3e09223cbc6b3" class="edit_note_content" style="display:none;"></div><script>if (!window['sj_result_81c3e09223cbc6b3']) {window['sj_result_81c3e09223cbc6b3'] = {};}window['sj_result_81c3e09223cbc6b3']['showSource'] = false; window['sj_result_81c3e09223cbc6b3']['source'] = "Indeed"; window['sj_result_81c3e09223cbc6b3']['loggedIn'] = false; window['sj_result_81c3e09223cbc6b3']['showMyJobsLinks'] = false;window['sj_result_81c3e09223cbc6b3']['undoAction'] = "unsave";window['sj_result_81c3e09223cbc6b3']['jobKey'] = "81c3e09223cbc6b3"; window['sj_result_81c3e09223cbc6b3']['myIndeedAvailable'] = true; window['sj_result_81c3e09223cbc6b3']['showMoreActionsLink'] = window['sj_result_81c3e09223cbc6b3']['showMoreActionsLink'] || false; window['sj_result_81c3e09223cbc6b3']['resultNumber'] = 10; window['sj_result_81c3e09223cbc6b3']['jobStateChangedToSaved'] = false; window['sj_result_81c3e09223cbc6b3']['searchState'] = "l=London&start=20"; window['sj_result_81c3e09223cbc6b3']['basicPermaLink'] = "https://www.indeed.co.uk"; window['sj_result_81c3e09223cbc6b3']['saveJobFailed'] = false; window['sj_result_81c3e09223cbc6b3']['removeJobFailed'] = false; window['sj_result_81c3e09223cbc6b3']['requestPending'] = false; window['sj_result_81c3e09223cbc6b3']['notesEnabled'] = false; window['sj_result_81c3e09223cbc6b3']['currentPage'] = "serp"; window['sj_result_81c3e09223cbc6b3']['sponsored'] = true;window['sj_result_81c3e09223cbc6b3']['showSponsor'] = true;window['sj_result_81c3e09223cbc6b3']['reportJobButtonEnabled'] = false; window['sj_result_81c3e09223cbc6b3']['showMyJobsHired'] = false; window['sj_result_81c3e09223cbc6b3']['showSaveForSponsored'] = true; window['sj_result_81c3e09223cbc6b3']['showJobAge'] = true;</script></div></div>
<div class="tab-container">
<div class="sign-in-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
</div>
</div>
</div>
All cards have the same class ".clickcard" and all the relevant links have the class ".turnstileLink" but I cant seem to get consistent results when i try to page.search or page.xpath them, without having a problem matching up the data from all the different arrays correctly, besides the different number of elements I get returned.
So my question is: If I want to scrape the company name, location, job title, the url to that page and possibly another value, how would I best go about this?
I would appreciate any feedback!
Edit:
The contains() expression needs to be more complex:
contains(
concat(' ',normalize-space(#class),' '),
' turnstileLink '
)
to prevent classes like turnstileLinkerCar from matching. It's such a hassle that I would use doc.css() with a css selector like a.turnstileLink, which takes care of matching exactly the specified class name in a string that may have multiple class names.
Try:
doc.xpath('//a[contains(#class, "turnstileLink")]').each{ |e| links << e['href'] }
Or:
doc.css('a.turnstileLink').each{ |e| links << e['href'] }
Here's the problem:
require 'nokogiri'
my_html = %q{
<html>
<body>
A link
B link
C link
D link
</body>
</html>
}
doc = Nokogiri::HTML(my_html)
links = doc.xpath('//a[#class="c1"]').map{ |e| e["href"] }
p links
--output:--
["aaa"]
The class of the bbb link is "c1 c2" which is not equal to "c1".
Response to comment:
require 'nokogiri'
my_html = %q{
<html>
<body>
<div class="x">
A link
B link
C link
<div>
D link
</div>
</div>
<div class="y">
Y link
</div>
</body>
</html>
}
doc = Nokogiri::HTML(my_html)
links = doc.css('a.c1').map{ |e| e["href"] }
p links
--output:--
["aaa", "bbb", "ccc", "ddd", "yyy"]
But:
links = doc.css('div.x a.c1').map{ |e| e["href"] }
p links
--output:--
["aaa", "bbb", "ccc", "ddd"]
The same thing with xpaths:
links = doc.xpath('//div[contains(#class, "x")]//a[contains(#class, "c1")]').map{ |e| e["href"] }
plinks
--output:--
["aaa", "bbb", "ccc", "ddd"]

Getting attribute of element in XPath

I want to learn web-scraping. Therefore, I started practicing. I am trying to get data-ad-id from HTML using XPath.
HTML structure like this:
<body id="z1234">
<div class="viewport">
<div class="g-row">
<div class="g-col-9">
<div class="cBox cBox--content cBox--resultList">
<div class="cBox-body cBox-body--resultitem dealerAd rbt-reg rbt-no-top"><a class="link--muted no--text--decoration result-item" href="url" data-ad-id="248059713"></a>
</div>
</div>
</div>
</div>
</body>
XPath for <a class="link--muted no--text--decoration result item" > is //*[#id="z1234"]/div[3]/div[4]/div[2]/div[1]/div[11]/a. If I choose different car, only last div changes.
According to this I write C# code:
var url = "https://suchen.mobile.de/fahrzeuge/search.html?damageUnrepaired=NO_DAMAGE_UNREPAIRED&isSearchRequest=true&maxPowerAsArray=KW&maxPrice=10000&minPowerAsArray=KW&minPrice=10000&scopeId=C";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
string sourceCode = sr.ReadToEnd();
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(sourceCode);
var rows = document.DocumentNode.SelectNodes("//*[#id='z1234']/div[3]/div[4]/div[2]/div[1]/div[11]");
foreach (var row in rows)
{
var id = row.SelectSingleNode("a[#data-ad-id]").InnerText;
Console.WriteLine("id:" + id);
}
}
I cannot get anything from this Node. It is null. How can I get data-ad-id?
EDIT
I change my C# code:
var rows = document.DocumentNode.SelectNodes("//a[#data-ad-id]")[0];
var id = rows.Attributes["data-ad-id"].Value;
Now I can get data-ad-id.
As per the code of the site, I could sense that you have no innertext for that "A" tag. It just contains DIV and IMG tags.
You will need to fetch data-ad-id using
//a[#data-ad-id]/#data-ad-id

Select a single item in html parse list. HTML AglilityPack

I would like to be able to be able to select each item is an individual list. look at this HTML:
<div class="services">
<a class="service selected" onclick="serviceNameClick('');" href="#">all</a>
<a class="service" onclick="serviceNameClick('12');" href="#">12</a>
<a class="service" onclick="serviceNameClick('14');" href="#">14</a>
<a class="service" onclick="serviceNameClick('14C');" href="#">14C</a>
<a class="service" onclick="serviceNameClick('N14');" href="#">N14</a>
<a class="service" onclick="serviceNameClick('14B');" href="#">14B</a>
<a class="service" onclick="serviceNameClick('27');" href="#">27</a>
<a class="service" onclick="serviceNameClick('12A');" href="#">12A</a>
<a class="service" onclick="serviceNameClick('27C');" href="#">27C</a>
<a class="service" onclick="serviceNameClick('N12');" href="#">N12</a>
<a class="service" onclick="serviceNameClick('14A');" href="#">14A</a>
</div>
To be able to display this as a list like:
all 12 14 N14 14B 27 12A 27 N12 14A
I am able to get there using the code below:
string htmlPage = "";
using (var client = new HttpClient())
{
htmlPage = await client.GetStringAsync("http://m.buses.co.uk/stop.aspx?stopid=" + stopIdVariable);
}
HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(htmlPage);
List<Movie> movies = new List<Movie>();
foreach (var a in htmlDocument.DocumentNode.SelectNodes("//div[starts-with(#class, 'content')]"))
{
Movie newMovie = new Movie();
//newMovie.Cover = div.SelectSingleNode(".//div[#class='image']//img").Attributes["src"].Value;
// newMovie.Title = div.SelectSingleNode(".//h4[#itemprop='name']").InnerText.Trim();
// newMovie.Summary = div.SelectSingleNode(".//div[#class='outline']").InnerText.Trim();
newMovie.Summary = a.SelectSingleNode("div[starts-with(#class, 'service')]").InnerText.Trim();
movies.Add(newMovie);
}lstMovies.ItemsSource = movies;
This displays it in a list, but I am unable to select individual items on that result,
It makes me select it all as a list and not as each one.
Also the aim to be able to select that and then use that value as a text field. So user clicks on 12 and then I use that 12 within the app.
What needs to be change? Thanks
You can try this way :
var links = htmlDocument.DocumentNode
.SelectNodes("//div[starts-with(#class, 'content')]//a[#class='service']")
foreach (var a in links)
{
Movie newMovie = new Movie();
newMovie.Summary = a.InnerText.Trim();
movies.Add(newMovie);
}
lstMovies.ItemsSource = movies;
Basically, the XPath passed as argument of SelectNodes() above select individual <a> nodes having class equals "service" (you can change the class checking using starts-with() or contains() if necessary).