Extract a td value using XPath

Extract a td value using XPath - html

I need to extract values that are in a HTML cell table using XPath and I'd like not to use "positional" XPath string.
My code sample is something like this one
<body>
<table>
<tr>...</tr>
<tr>...</tr>
<tr>...</tr>
<tr>
<td>
<table id="GridView5">
<tr>...</tr>
<tr>...</tr>
<tr>
<td>First Title</td>
<td>1</td>
<td>2</td>
<td>3</td>
<tr>
</table>
</td>
</tr>
</table>
</body>
I'm trying to use something like this XPath expression
//*[#id=”GridView5”]/*[td=”First Title”]/td[3]
to extract the value "2" from the above code
Suggestions? Examples?

This XPath,
//table[#id="GridView5"]/tr[td="First Title"]/td[3]
will select the third td child of the tr that has a td child with a string value of First Title within a table with an id attribute value of GridView5.
It uses a single positional selector for the requested td element because your markup affords no other way to differentiate the td that contains 2, unless you allow an assumption that 2 comes after 1 or before 3 (treating either a label). If the preceding or following td elements can serve as labels, then you could use the preceding or following sibling axis instead of the [3].
//table[#id="GridView5"]/tr[td="First Title"]/td[.='1']/following-sibling::td[1]
but even here you'll need [1] to select only the immediately following sibling.

Related

Xpath using sibblings or fellowing in two defrent Cell

Put bluntly I want to locate TestCoupon10% inside td then open a sibling td then locate //a[contains(#id,"cmdOpen")] I did try sibling and fellowing but likely I didnt do it right because
//span[./text()="TestCoupon10%"]/following-sibling:a[contains(#id,"cmdOpen")]
result into an invalid xpath. the HTML structure look as fellow
<tr>
<td>
<span id="oCouponGrid_ctl03_lblCode">TestCoupon10%</span>
</td>
<td>...</td>
<td>...</td>
<td valign="middle" align=""right">
<a id="oCouponGrid_ctl03_cmdOpen">
</td>
</tr>
I need to find cmdOpen and test coupon does anyone has an idea how to?

Axes are delimited with double colons, not single ones (those are used for namespace prefixes). You wanted to say this:
//span[./text()="TestCoupon10%"]/following-sibling::a[contains(#id,"cmdOpen")]
But - the <a> is not a following sibling of the <span> in question. You need to do some navigating:
//span[./text()="TestCoupon10%"]/parent::td/following-sibling::td/a[contains(#id,"cmdOpen")]
Or, simply avoid descending into the tree you you don't have to "climb up" again in the first place.
//td[span = "TestCoupon10%"]/following-sibling::td/a[contains(#id,"cmdOpen")]

How to select table entry via XPath

I have the following table, and I wanted an expression to get the Percentage of the Category "OC". Is it possible to extract via XPath?
<tbody>
<tr>
<th class="textL">Category</th>
<th class="textR">No. of Items</th>
<th class="textR">Percentage</th>
</tr>
<tr class="data_row">
<td>OC</td>
<td class="textR">100</td>
<td class="textR">4.70</td>
</tr>
<tr class="data_row">
<td>FP</td>
<td class="textR">200</td>
<td class="textR">38.82</td>
</tr>
<tr class="data_row">
<td>FI</td>
<td class="textR">300</td>
<td class="textR">20.39</td>
</tr>
</tbody>

Selecting table entry based on value of another entry
To select the Percentage for the given "OC" Category:
//td[.='OC']/following-sibling::td[count(../..//th[.='Percentage']/preceding-sibling::th)]/text()
The above XPath will return
"4.70"
as requested.
Note that it will continue to work in the face of many changes, including row and column rearrangements as long as the targeted column continues to be named "Percentage" and remains after the Category column in the first column. One could even further generalize the expression by taking the difference of the positions of the two columns rather than assuming that Category is the first column.
Explanation: From the td that contains "OC", go over the number of siblings equal to the position of the "Percentage" column header, and there select the text in the correct sibling td.

Another XPath, also dependent on the order of the table's columns
//td[text()='OC']/following-sibling::td[2]
(explanation: take the second td sibling among the siblings of a td that contains text 'OC')

There are multiple XPaths for that. This one will work:
/tbody[1]/tr[2]/td[3]/text()
But it is based on the current layout of the XML

This works:
tbody/tr/td[. eq "OC"]/../td[#class eq "textR"][2]/text()
It assumes that the OC td element will be there, and that the value you want is the 2nd element with a "textR" attribute.

How to match nearest tag backward with XPath

I have a HTML like this:
html =<<EOS
<table><!-- outer table -->
<tr><td>
<table><!-- inner table 1 -->
<tr><td>Foo</td></tr>
</table>
<table><!-- inner table 2 -->
<tr><td>Bar</td></tr>
</table>
</td></tr>
</table>
EOS
I want to get a changing value Bar from a static value Foo.
With this code I can get the value.
Nokogiri::HTML(html)
doc.xpath("//table[tr/td[text()='Foo']]/following-sibling::table//td").text
And I wanted to rewrite like this:
doc.xpath("//table[//td[text()='Foo']]/following-sibling::table//td").text
But this code doesn't work because //table[//td[text()='Foo']] matches outer table not the inner table.
Is there a expression for nearest backward match in XPath like this?
//table[(nearest match expression)td[text()='Foo']]

Yes, //table[//td[text()='Foo']] gives the outer table as the first result (not the only result) , but //table[//td[text()='Foo']]/following-sibling::table//td still retrieves <td>Bar</td>.
The problematic part of //table[//td[text()='Foo']] is the // in front of td, because it selects all descendant td elements:
<table>
<tr>
<td>This is selected</td>
<td>
<table>
<tr>
<td>This is also selected</td>
</tr>
</table>
</td>
</tr>
</table>
You should use // only sparingly. I would use the expression
//table[tr/td = 'Foo']/following-sibling::table[1]/tr/td
EDIT: As suggested by Phrogz, in Nokogiri, instead of [1] in the expression above, you can use at_xpath as in
doc.at_xpath(//table[tr/td = 'Foo']/following-sibling::table/tr/td).text
to only get the first result node that was found. That is, if you actually intend to only find one node and if the wanted node is the first one in document order.

Identifying nodes on position relative to grandparent

I have the following HTML table:
tab2 <- '<table>
<thead>
<tr>
<th rowspan="2">a</th>
<th>b</th>
<th colspan="2" rowspan="2">c</th>
</tr>
<tr>
<td></td>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td></td>
</tr>
</tbody>
</table>'
It has three rows, the first two are header information, the last one is the body. The goal is to extract the header information using only the header row position relative to the table node (1+2), i.e. without having to pay attention to whether the header nodes have a thead parent or not.
I tried
//tr[position() < 3]
doesn't work because position() works relative to parent node thead and tbody.
I am using R with the XML package (which uses XPath 1.0). This is what I get when I use above XPath
xpathSApply(tab2, "//tr[position() < 3]")
[[1]]
<tr><th rowspan="2">a</th>
<th>b</th>
<th colspan="2" rowspan="2">c</th>
</tr>
[[2]]
<tr><td/>
</tr>
[[3]]
<tr><td>1</td>
<td>2</td>
<td>3</td>
<td/>
</tr>
I get all three rows. Which makes sense according to how I understand position(). It works relative to its parent.
Context
I am writing a function that allows users to parse HTML tables with the R programming language and assemble an R data structure from it. The function allows users to pass a numeric value for which rows provide header information and which body information. For the above table, users should be able to say row 1 and row 2 (in the entire table) provide header information. I need to process this input so it works on HTML tables unconditional on whether this table makes use of thead and tbody elements, or not. The problem with
//tr[position() < 3]
is that it also returns the body row (third row). Hope this makes it clear(er).

Use the following XPath expression:
/table//tr[count(preceding::tr) < 2]
It does not care whether a certain tr is inside thead or not. It just considers tr element that are preceded by zero or one other tr element. The result is the following:
<tr>
<th rowspan="2">a</th>
<th>b</th>
<th colspan="2" rowspan="2">c</th>
</tr>
-----------------------
<tr>
<td/>
</tr>
Caveat: This simple approach only works if there is only one table in the HTML document. But as long as you are working with exactly this HTML snippet , it suffices.

This expression will work for a document with any number of tables.
//table/descendant::tr[position() < 3]
By using the descendant forward axis, the [position() < 3] subscript will select the first and second node in the set of tr descendants of the table (rather than finding their position relative to their parent node, as with //tr in your question).
http://jsfiddle.net/uutavwvk/1/

Hierarchical html table, putting last td on next line

I'm creating a simple hierarchical table with html and CSS and I'm getting into trouble with formatting the last td element with class .child to be on next line.
I want to have the nested table inside table > tr > td.child becase each table can be sorted and javascript sorters don't implement any grouping of rows (my problem of having nested table could be easily solved by moving the .child > table element into next table > tr however this would break the nice nesting structure)
Is there a way to put td.child on next row with css?
html sample:
<table>
<tr>
<td>I have</td>
<td>1</td>
<td>pie</td>
<td class="child">
<table>
<tr>
<td>I have</td>
<td>1</td>
<td>pie</td>
</tr>
</table>
</td>
</tr>
</table>

You could do something like this . You'd need to be careful cross browser though (only checked on Chrome)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Extract a td value using XPath - html

Related

Xpath using sibblings or fellowing in two defrent Cell

How to select table entry via XPath

How to match nearest tag backward with XPath

Identifying nodes on position relative to grandparent

Hierarchical html table, putting last td on next line

Categories

Resources