HtmlAgilityPack reading table after specified table - html

I have similiar structure to this:
<table class="superclass">
<tr>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
</td>
<td>
</td>
</tr>
</table>
<table cellspacing="0">
<tr>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
</td>
<td>
</td>
</tr>
</table>
This is how I get the first table with class:
HtmlNode firstTable = document.DocumentNode.SelectSingleNode("//table[#class=\"superclass\"]");
Then I read the data. However I don't know how to get straight to the another table and read that data too. Any ideas?
I'd rather avoid counting which table it is and then using index to that table.

There is XPath following-sibling axis which allows you to get element following current context element at the same level :
HtmlNode firstTable = document.DocumentNode.SelectSingleNode("//table[#class=\"superclass\"]");
HtmlNode nextTable = firstTable.SelectSingleNode("following-sibling::table");

If you want to access multiple nodes, you can consider SelectNodes(xpath) method over SelectSingleNode(xpath) method.
I'll provide a sample code here for reference, it may not work towards your need.
var tables = htmlDocument.DocumentNode.SelectNodes("//table");
foreach (HtmlNode table in tables)
{
if (table.GetAttributeValue("class", "").Contains("superclass"))
{
//this is the table of class="superclass"
}
else
{
//this is the other table.
}
}

Related

Angularjs, two data in a single row in table

I have a problem to implement table with thin width.
myData = { name:"Foo", age:11, sex:"M", weight:77, height:77, hobby:'gaming'}
I wanna table like belows.
<table>
<tr>
<td>name</td><td>Foo</td><td>age</td><td>11</td>
</tr>
<tr>
<td>sex</td><td>M</td><td>weight</td><td>77</td>
</tr>
<tr>
<td>height</td><td>77</td><td>hobby</td><td>gaming</td>
</tr>
</table>
Is it possible to show data like this using ngRepeat and its built-in variable?
The question John posted would solve your problem but I think it would be less of a hack to use ng-repeat-start and ng-repeat-end e.g.:
<table>
<tr ng-repeat-start="item in myData">
<td>name</td><td>{{item.name}}</td><td>age</td><td>{{item.age}}</td>
</tr>
<tr>
<td>sex</td><td>{{item.sex}}</td><td>weight</td><td>{{item.weight}}</td>
</tr>
<tr ng-repeat-end>
<td>height</td><td>{{item.height}}</td><td>hobby</td><td>{{item.hobby}}</td>
</tr>
</table>
If you have yr myData like this :
myData = [{ name:"Foo", age:11, sex:"M", weight:77, height:77, hobby:'gaming'},{ name:"Foo", age:11, sex:"M", weight:77, height:77, hobby:'gaming'},{ name:"Foo", age:11, sex:"M", weight:77, height:77, hobby:'gaming'}]
Then Your table will be like this :
<table>
<tr ng-repeat="row in myData">
<td>{{row.name}}</td>
<td>{{row.age}}</td>
<td>{{row.sex}}</td>
<td>{{row.weight}}</td>
<td>{{row.height}}</td>
<td>{{row.hobby}}</td>
</tr>
</table>

HTML parsing with XPath: flattened hierarchical data

My target HTML is a flattened table of elements with 2 levels of data defined by class attribute:
<tr>
<td class="type">Type 1</td>
</tr>
<tr>
<td class="name">name1</td>
<td class="year">1970</td>
<td class="rank">1</td>
</tr>
<tr>
<td class="name">name2</td>
<td class="year">1982</td>
<td class="rank">3</td>
</tr>
Goal is parse out list of name, year, rank elements, which I accomplish with these xpath expressions:
//td[#class = 'name']/text()
//td[#class = 'year']/text()
//td[#class = 'rank']/text()
Each element is under immediately preceding
<tr>
<td class="type">Type 1</td>
</tr>
I would like to have "Type 1" assigned to each element parsed above. It could be separate list of the same length. Of course, my target HTML contains many such elements within the same 2-level hierarchy: type - element (name, year, rank).
The following rather clumsy xpath concatenates the closest, previous #type td to the name td matched above.
concat(//td[#class = 'name']/preceding::td[#class='type'][1]/text(), '-',
//td[#class = 'name']/text())
This probably makes more sense when shown in the following xsl
<xsl:for-each select="//td[#class='name']">
<Name>
<xsl:value-of select="concat(preceding::td[#class='type'][1]/text(),
'-', ./text())" />
</Name>
</xsl:for-each>
Applied to the following xml
<xml>
<tr>
<td class="type">Type 1</td>
</tr>
<tr>
<td class="name">name1</td>
<td class="year">1970</td>
<td class="rank">1</td>
</tr>
<tr>
<td class="name">name2</td>
<td class="year">1982</td>
<td class="rank">3</td>
</tr>
<tr>
<td class="type">Type 2</td>
</tr>
<tr>
<td class="name">name3</td>
<td class="year">1971</td>
<td class="rank">2</td>
</tr>
<tr>
<td class="name">name4</td>
<td class="year">1983</td>
<td class="rank">4</td>
</tr>
</xml>
With the result
<Name>Type 1-name1</Name>
<Name>Type 1-name2</Name>
<Name>Type 2-name3</Name>
<Name>Type 2-name4</Name>
Solution 1
First, find the td elements of interest. For example, the name tds with the following pseudo-code:
name_tds = doc.evalXPath("//td[#class = 'name']")
Then you can find the corresponding type td using a name td as context node like this:
type_td = name_td.evalXPath("../preceding-sibling::tr[td[#class = 'type']][1]/td")
Solution 2
Simply iterate all the tds and remember the last type you found. Pseudo-code:
foreach (td in doc.evalXPath("//td") {
class = td.getAttribute("class");
if (class == "type") {
type = td.textContent();
}
else if (class == "name") {
name = td.textContent();
println("type: " + type + ", name: " + name);
}
// Same for year and rank.
}

To show arraylist data in specific format

I am getting request attribute on a JSP page like = ArrayList arr= [a,b,c[e,f,g[j,k,l]]]. The list can be long. How should I show it in such a way a,b,c is parent of e,f,g is parent of j,k,l?
I want something like this or better
<tr onclick=showchild()>
<td> a </td> <td> b </td> <td> c </td>
</tr>
When I click on above tr its child i.e, below tr should be shown.
< tr onclick=showchild()>
<td> e </td <td> f </td> <td> g </td>
</tr>

Finding an XPATH expression

For the following html:
<tr>
<td class="first">AUD</td>
<td> 0.00 </td>
<td> 1,305.01 </td>
<td> 1,305.01 </td>
<td> -65.20 </td>
<td> 0.00 </td>
<td> 0.00 </td>
<td> 1,239.81 </td>
<td class="fx-rate"> 0.98542 </td>
</tr>
I am trying to grab the value for the fx-rate, given the type of current. For example, the function would be something like get_fx_rate(currency). This is the XPATH expression I have so far, but it results in an empty element, []. What am I doing wrong here and what would be the correct expression?
"//td[#class='first']/text()[normalize-space()='AUD']/parent::td[#class='fx-rate']"
Use this:
//td[#class = 'first' and normalize-space() = 'AUD']/parent::tr/td[#class = 'fx-rate']
or clearer:
//tr[td[#class="first1" and normalize-space()="AUD"]]/td[#class="fx-rate"]
This is the way I managed to solve it, using partial xpaths:
### get all the elements via xpath
currencies = driver.find_elements_by_xpath("//td[#class='first']")
fx_rates = driver.find_elements_by_xpath("//td[#class='fx-rate']")
### build a list and zip it to get the k,v pairs
fx_values = [fx.text for fx in fx_rates if fx.text]
currency_text = [currency.text for currency in currencies if currency.text]
zip(currency_text,fx_values)[1:]

A table row was 2 columns wide and exceeded the column count established by the first row (1)

I want to validate my page but w3c keeps giving me this warning. I want to get rid of it but I can't seem to find the cause of it.
It gives me this error:
A table row was 2 columns wide and exceeded the column count established by the first row (1).
Table and CSS code:
<table>
<tr>
<td>Contact informatie</td>
<tr>
<td>Adres:</td>
<td>Jan van der Heydenstraat 61</td>
<tr>
<td>Postcode:</td>
<td>1223 BG</td>
<tr>
<td>Plaats:</td>
<td>Hilversum</td>
<tr>
<td>Email:</td>
<td>info#blabla.nl</td>
<tr>
<td>Telefoon:</td>
<td>06-31903706</td>
</tr>
</table>
table {
border:none;
padding-left:75px;}
td:first-child {
width:135px;
border:none;
text-align:left;}
td+td {
border:none;
text-align: left;}
Anyone any suggestions?
It means exactly what it says. One of the rows in your table has too many columns. Specifically, the first row has less columns that a subsequent row. But we can't do much unless you post some code.
Edit
The markup for the table is incorrect.
You only have one cell in the first row (or do what PeeHaa suggested)
You need to close off each row with </tr>
Just change this:
<tr>
<td>Contact informatie</td>
</tr>
To this:
<tr>
<td colspan="2">Contact informatie</td>
</tr>
YOu should always close you tablerows (tr): </tr>.
Final version:
<table>
<tr>
<td colspan="2">Contact informatie</td>
</tr>
<tr>
<td>Adres:</td>
<td>Jan van der Heydenstraat 61</td>
</tr>
<tr>
<td>Postcode:</td>
<td>1223 BG</td>
</tr>
<tr>
<td>Plaats:</td>
<td>Hilversum</td>
</tr>
<tr>
<td>Email:</td>
<td>info#vazcreations.nl</td>
</tr>
<tr>
<td>Telefoon:</td>
<td>06-31903706</td>
</tr>
</table>
In extension to what SimpleCoder said, if you have the first row of a table have only one column, then the futher ones can have no more then one column. If you want to get around this you need to put a table inside the cell i.e.
<td>
<table>
<tr>
<td><!-- Content here --></td>
</tr>
</table>
</td>