How to select table entry via XPath - html

I have the following table, and I wanted an expression to get the Percentage of the Category "OC". Is it possible to extract via XPath?
<tbody>
<tr>
<th class="textL">Category</th>
<th class="textR">No. of Items</th>
<th class="textR">Percentage</th>
</tr>
<tr class="data_row">
<td>OC</td>
<td class="textR">100</td>
<td class="textR">4.70</td>
</tr>
<tr class="data_row">
<td>FP</td>
<td class="textR">200</td>
<td class="textR">38.82</td>
</tr>
<tr class="data_row">
<td>FI</td>
<td class="textR">300</td>
<td class="textR">20.39</td>
</tr>
</tbody>

Selecting table entry based on value of another entry
To select the Percentage for the given "OC" Category:
//td[.='OC']/following-sibling::td[count(../..//th[.='Percentage']/preceding-sibling::th)]/text()
The above XPath will return
"4.70"
as requested.
Note that it will continue to work in the face of many changes, including row and column rearrangements as long as the targeted column continues to be named "Percentage" and remains after the Category column in the first column. One could even further generalize the expression by taking the difference of the positions of the two columns rather than assuming that Category is the first column.
Explanation: From the td that contains "OC", go over the number of siblings equal to the position of the "Percentage" column header, and there select the text in the correct sibling td.

Another XPath, also dependent on the order of the table's columns
//td[text()='OC']/following-sibling::td[2]
(explanation: take the second td sibling among the siblings of a td that contains text 'OC')

There are multiple XPaths for that. This one will work:
/tbody[1]/tr[2]/td[3]/text()
But it is based on the current layout of the XML

This works:
tbody/tr/td[. eq "OC"]/../td[#class eq "textR"][2]/text()
It assumes that the OC td element will be there, and that the value you want is the 2nd element with a "textR" attribute.

Related

add a tag to many td elements

I have a table with hundreds of rows.
The table is done after converting a csv file to html table using https://www.convertcsv.com/csv-to-html.htm
I want a specific column of the table to contains a link, but I don't know how to add to hundreds of at a time.
<tr>
<td>title 1</td>
<td align="right">5.18</td>
<td align="right">17.27</td>
<td align="right">70</td>
<td>www.google.com/</td>
<td align="right">32958865536</td>
</tr>
the 5th td is always a link, but td don't contain so I need a way to add the a tag to all 5th td of the table
I use vscode
This is the selector you need:
http://api.jquery.com/nth-child-selector/
You can select every 5th td of each table row with following script, considering tableContent is the id of table
$('#tablecontent td:nth-child(5n)').addClass('someClass');

Incorporating a table inside a table in HTML

I am trying to create an HTML table where there are four columns and any number of rows. Inside this table, the first two columns are just normal cells. The latter two columns can have multiple rows WITHIN a row in the top-level table. My issue is how I can properly align the column separators, even if the length of the content in each cell is variable.
My attempt tries to make use of:
<td colspan=2>
Example of what I am trying to do: https://jsfiddle.net/hurnzhmq/
The things I am missing in the JSFiddle are:
There is no divider between the two rows separating Content3A/Content4A from Content3B/Content4B - I tried using the "bottom-border:none" for the last child, but that did not seem to work.
The column separators between Content3A/Content3B and Content4A/Content4B are not lined up with the header's column separator, and do not touch the ends of the table (there are gaps).
Any advice on how I might go about fixing this would be greatly appreciated!
I think you should use rowspan instead colspan
you can use code below
<html>
<table border=1 >
<tr>
<td>Header1</td>
<td>Header2</td>
<td>Header3</td>
<td>Header4</td>
</tr>
<!-- Content -->
<tr>
<td rowspan="2">Content1</td>
<td rowspan="2" >Content2</td>
<td > Content3A</td>
<td > Content2</td>
</tr>
<tr>
<td > Content3B</td>
<td > Content2</td>
</tr>
</table>
</html>

How to match nearest tag backward with XPath

I have a HTML like this:
html =<<EOS
<table><!-- outer table -->
<tr><td>
<table><!-- inner table 1 -->
<tr><td>Foo</td></tr>
</table>
<table><!-- inner table 2 -->
<tr><td>Bar</td></tr>
</table>
</td></tr>
</table>
EOS
I want to get a changing value Bar from a static value Foo.
With this code I can get the value.
Nokogiri::HTML(html)
doc.xpath("//table[tr/td[text()='Foo']]/following-sibling::table//td").text
And I wanted to rewrite like this:
doc.xpath("//table[//td[text()='Foo']]/following-sibling::table//td").text
But this code doesn't work because //table[//td[text()='Foo']] matches outer table not the inner table.
Is there a expression for nearest backward match in XPath like this?
//table[(nearest match expression)td[text()='Foo']]
Yes, //table[//td[text()='Foo']] gives the outer table as the first result (not the only result) , but //table[//td[text()='Foo']]/following-sibling::table//td still retrieves <td>Bar</td>.
The problematic part of //table[//td[text()='Foo']] is the // in front of td, because it selects all descendant td elements:
<table>
<tr>
<td>This is selected</td>
<td>
<table>
<tr>
<td>This is also selected</td>
</tr>
</table>
</td>
</tr>
</table>
You should use // only sparingly. I would use the expression
//table[tr/td = 'Foo']/following-sibling::table[1]/tr/td
EDIT: As suggested by Phrogz, in Nokogiri, instead of [1] in the expression above, you can use at_xpath as in
doc.at_xpath(//table[tr/td = 'Foo']/following-sibling::table/tr/td).text
to only get the first result node that was found. That is, if you actually intend to only find one node and if the wanted node is the first one in document order.

Identifying nodes on position relative to grandparent

I have the following HTML table:
tab2 <- '<table>
<thead>
<tr>
<th rowspan="2">a</th>
<th>b</th>
<th colspan="2" rowspan="2">c</th>
</tr>
<tr>
<td></td>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td></td>
</tr>
</tbody>
</table>'
It has three rows, the first two are header information, the last one is the body. The goal is to extract the header information using only the header row position relative to the table node (1+2), i.e. without having to pay attention to whether the header nodes have a thead parent or not.
I tried
//tr[position() < 3]
doesn't work because position() works relative to parent node thead and tbody.
I am using R with the XML package (which uses XPath 1.0). This is what I get when I use above XPath
xpathSApply(tab2, "//tr[position() < 3]")
[[1]]
<tr><th rowspan="2">a</th>
<th>b</th>
<th colspan="2" rowspan="2">c</th>
</tr>
[[2]]
<tr><td/>
</tr>
[[3]]
<tr><td>1</td>
<td>2</td>
<td>3</td>
<td/>
</tr>
I get all three rows. Which makes sense according to how I understand position(). It works relative to its parent.
Context
I am writing a function that allows users to parse HTML tables with the R programming language and assemble an R data structure from it. The function allows users to pass a numeric value for which rows provide header information and which body information. For the above table, users should be able to say row 1 and row 2 (in the entire table) provide header information. I need to process this input so it works on HTML tables unconditional on whether this table makes use of thead and tbody elements, or not. The problem with
//tr[position() < 3]
is that it also returns the body row (third row). Hope this makes it clear(er).
Use the following XPath expression:
/table//tr[count(preceding::tr) < 2]
It does not care whether a certain tr is inside thead or not. It just considers tr element that are preceded by zero or one other tr element. The result is the following:
<tr>
<th rowspan="2">a</th>
<th>b</th>
<th colspan="2" rowspan="2">c</th>
</tr>
-----------------------
<tr>
<td/>
</tr>
Caveat: This simple approach only works if there is only one table in the HTML document. But as long as you are working with exactly this HTML snippet , it suffices.
This expression will work for a document with any number of tables.
//table/descendant::tr[position() < 3]
By using the descendant forward axis, the [position() < 3] subscript will select the first and second node in the set of tr descendants of the table (rather than finding their position relative to their parent node, as with //tr in your question).
http://jsfiddle.net/uutavwvk/1/

HTML table with different count of td

For logo and and menu I have table with two rows. And Picture of person. Upper part of the body is on first row and lower part is on second row. First row should have only one td that is logo of the site and in second row i should make multiple td Can someone help me with that task. I should use tables because my client want it.
You can use the colspan attribute on a <td> element to make it fit more than one column, just increase the number to however many columns you want it to fit across
<table>
<tr>
<td colspan="2">Long column</td>
</tr>
<tr>
<td>Small</td>
<td>Column</td>
</tr>
</table>
jou can use collspan (like in cell[2,0]), further info: http://reference.sitepoint.com/html/td/colspan