How to get a HTML element by text using XPath?

How to get a HTML element by text using XPath? - html

I'm encoutered a problem that is could not get a HTML element by element's text.My HTML looks like:
...
<table>
...
<tr>
...
<td class="oMain">test value</td>
...
<tr>
...
</table>
...
For some special reasons,I have to get the '<td class="oMain">' element using it's text 'test value'. I tried '//tr[td='test value']/td' but no result.How can i write the XPath expression?
Any help is welcome.Thanks!

Your expression
//tr[td='test value']/td
places the predicate on the parent node "tr". Maybe that's what's causing the problem.
What you want probably is this
//td[#class = "oMain" and child::text() = 'test value']]
Here's a link to th W3 specification of the xPath language for further reading: http://www.w3.org/TR/xpath/

Your XPath expression seems to be correct. Do you have a default namespace (e.g. XHTML) in your html? If so, you can modify your XPath like this:
//*[local-name()='td' and text()='test value']
If you can figure out how to use namespaces, you could also do
//xhtml:tr[xhtml:td='test value']/xhtml:td
Does that help?

In the xpath expression, first put the element node, which in your case is td, and then apply the filter text()='text node'
//td[text()='test value']
Hope this helps.

What are you using to do the parsing? In Ruby + Hpricot, you can do
doc.search("//td.oMain").each do |cell|
if cell.inner_html == "test value"
return cell
end
end
In this case, cell would be:
<td class="oMain">test value</td>

Related

How a write a common XPath for same text displayed for different HTML tags?

I want to write a common XPath for the result displayed for my searched text 'Automation Server'
The same text is displayed for td HTML tags as well as for div html tags as shown below, and I wrote XPath as below based on my understanding by going through different article
displayed_text = //td[contains(text(),'Automation Server') or div[contains(text(),' Automation Server ')]
<td role="cell" mat-cell="" class="mat-cell cdk-cell cdk-column-siteName mat-column-siteName ng-star-inserted">Automation Server</td>
<div class="change-list-value ng-star-inserted"> Automation Server </div>

The operator you are looking for in XPath is |. It is a union operator and will return both sets of elements.
The XPath you are looking for is
//td[contains(text(),'Automation Server')] | //div[contains(text(),'Automation Server')]

This XPath,
//*[self::td or self::div][text()[normalize-space()='Automation Server']]
will select all td or div elements with an immediate text node whose normalize string value equals 'Automation Server'.
Cautions regarding other answers here
| is not logical-OR or "OR-like".
It is a union operator over node sets (XPath 1.0) or sequences (XPath 2.0+), not boolean values.
See: Logical OR in XPath? Why isn't | working?
contains(text(), "string") only tests the first text node child.
See: Why is contains(text(), "string" ) not working in XPath?

A few alternatives to JeffC answer, using common properties for both:
1. use the * as a wildcard for any element:
//*[contains(#class,'ng-star-inserted') and normalize-space(text())='Automation Server']
2. use in addition the local-name() function to narrow down the names of the elements:
//*[local-name()[.='td' or .='div']][contains(#class,'ng-star-inserted') and normalize-space(text())='Automation Server']
The normalize-space() function can be used to clean-up the optional white space, so a = operator can be used.

You could use the following XPath to test the local-name() of the element in a predicate and whether it's text() contains the phrase:
//*[(local-name() = "td" or local-name() = "div") and contains(text(), "Automation Server")]

What is the role of parentheses in XPath 1.0?

In Chrome DevTools > Elements, when I search for //tr/td/span I find an element (because such an element exists on my page).
When I search for (//tr)/td/span or (//tr/td)/span I also find this element.
But neither //tr(/td)/span nor //tr/(td)/span nor //tr/(td/)span find anything.
What is the meaning of these parentheses in XPath?

Parenthesis in XPath are used as they are in other programming languages:
Function argument grouping: e.g: //tr/td[contains(.,"e")]
Evaluation precedence indication: e.g: normal arithmetic expression grouping as well as leading path grouping (trace LocationPath through to PrimaryExpr in the XPath grammar) as in (//td)[1] to find the first td in the document as opposed to //td[1] which finds the td elements that are the first child of their respective parent elements.
They're also used in
node tests: e.g: node(), element(), ...
processing instructions: e.g: PageBreak().
Your examples that do not find anything (e.g: //tr(/td)/span, //tr/(td)/span1, etc) have parenthesis embedded within the path that do not follow in one of the above categories. Such use of parenthesis are actually syntactically invalid and should have been reported as such rather than silently failing.
1Note that this expression would actually be syntatically valid under XPath 2.0/3.0. Thanks, #Andersson, for noticing.

I don't think that parenthesis mean something in your case, but it might be used to return required node/nodes set depending on passed index
For instance, HTML is like below:
<table>
<tr>
<td>
<span>first</span>
</td>
<td>
<span>second</span>
</td>
</tr>
<tr>
<td>
<span>third</span>
</td>
<td>
<span>fourth</span>
</td>
</tr>
</table>
(//tr)[1]/td will return cells for first row only (first, second)
(//tr)[2]/td - for second row (third, fourth)
(//tr/td)[1] - first cell of first row (first). Note that //tr/td[1] will returns each first cell of each row (first, third)
...

One Xpath expression doesn't work in selenium, but works in Firefox

I have one question about xpath.
There is td like this in chrome:
<td class="dataCol col02">
"Hello world(notes:there is$)nbsp;"
[View Hierarchy]
</td>
but when I inspect the same element in Firefox it doesn't have $nbsp and double quotes;
<td class="dataCol col02">
Hello world
[View Hierarchy]
</td>
I used FireFinder and use the xpath:
//td[text()='Hello world']
, it can locate that element.
but when I use selenium api 2.24, it couldn't find that element.
by.xpath("//td[text()='Hello world']")
Do you have any idea of that?
Thanks!

Try with normalize-space() which trims leading and trailing whitespace characters:
//td[normalize-space(text())='Hello world']
Edit following the different comments:
here's an XPath expression that's probably better suited in the general case:
//td[starts-with(normalize-space(.), 'Hello world')]
meaning it matches <td> nodes if the concatenated string content of the whole <td>, less leading and trailing whitespace, starts with "Hello world"

I would try to use contains() function.
Your xpath will look like: //td[contains(text(),'Hello world')]

Using XPath to select table that includes specific class

I have an HTML table that I need to select using XPath. The table may or may not contain multiple classes, but I only want tables that include a specific class.
Here is a sample HTML snippet:
<html>
<body>
<table class="no-border">
<tr>
<th colspan="2">Blah Blah Blah</th>
</tr>
<tr>
<td>Content</td>
<td>
<table class="info no-border">
<tr>
<!-- Inner table content -->
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>
I need to use XPath to retrieve ONLY the table that includes the class info. I've tried using /html/body/table/tr/td/table[#class='info*'], but that doesn't work. The table I'm trying to retrieve may exist ANYWHERE in the HTML document - technically, not ANYWHERE, but there may be varying levels of hierarchy between the outer and inner table.
If anyone can point me in the right direction, I'd be grateful.

The closest you can do is with the contains function:
//table[contains(#class,'info')]
But please be aware that this would capture a table with the class information, or anything else that has the info substring. As far as I know XPath can't distinguish whole-word matches. So you'd have to filter results to check for this possible condition.

What you'd ideally need is a CSS selector like table.info. And some XPath engines and toolkits fo XML/HTML parsing do support these selectors, which are translated to XPath expressions internally, e.g. cssselect if you use Python and which is included in lxml, or Nokogiri for Ruby.
In the general case, to emulate a CSS selector like table.info with XPath, a common trick or pattern is to use contains() combined with concat() and space characters. In your case, it looks like this:
.//table[contains(concat(' ', normalize-space(#class), ' '), ' info')]

I know that you did not asked for this answer, but I think it will help you to make your queries more precise.
//table[ (contains(#class,"result-cont") or contains(#class,"resultCont")) and not(contains(#class,"hide")) ]
This will get classes that contain 'result-cont' or 'resultCont', and do not have the 'hide' class.

XPath 1.0 is , indeed, fairly limited in its string processing. You can do modest amounts of processing with starts-with() substring() and similar functions. See this answer for creating something similar to a regex.
XSLT2.0 (which not all browsers and software support) has support for regex.

Regex in html code

<tbody id="clavier:infractionList2:tb">
<tr class="rich-table-row rich-table-firstrow ">
..............
..............
............
</tr>
</tbody>
I'm looking to find a Regex to get this value from a big text.
I tried this one but without result:
#<tbody id=\"clavier:infractionList2:tb\">(.*)</tbody>#

Regex with html is often a bad idea, because of potential recursive tags. Have you tried using an XML/HTML parser? For example, XmlDocument, XmlElement and XmlAttribute.
EDIT: The problem with regex and html in your example:
Cannot keep count of recursive tbody tags
Will the tbody tag can look like <tbody>...</tbody> or <tbody .../>?
Even if you know there will be one start and end tag, how do you know there won't be any plain text containing "tbody" somewhere inside the table, thus breaking the regex?

You may want to tell your regex engine that it should match newlines with the . as well.
In PHP, that would make the regex:
#<tbody id=\"clavier:infractionList2:tb\">(.*)</tbody>#s
Note the trailing s
Warning if there are 2 tbodies, this regex will match everything starting from the first tbody (with this ID) until the last tbody (ID-independent).
Example:
<tbody id="clavier:infractionList2:tb">Some data</tbody>
<tbody id="tbody2"></tbody>
will also be matched.

This works:
/<tbody id="clavier:infractionList2:tb">(.*?)<\/tbody>/is
Or full PHP:
<?php
$html = '<tbody id="clavier:infractionList2:tb">
<tr class="rich-table-row rich-table-firstrow ">
..............
..............
............
</tr>
</tbody> ';
preg_match_all('/<tbody id="clavier:infractionList2:tb">(.*?)<\/tbody>/is', $html, $matches);
var_dump($matches[1]);
That gives you the <tr...>....</tr> as a result. If you only want the dots you'll need to use something like:
/<tbody id="clavier:infractionList2:tb">.*?<tr.*?>(.*?)<\/tr>.*?<\/tbody>/is

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to get a HTML element by text using XPath? - html

In the xpath expression, first put the element node, which in your case is td, and then apply the filter text()='text node' //td[text()='test value'] Hope this helps.

What are you using to do the parsing? In Ruby + Hpricot, you can do doc.search("//td.oMain").each do |cell| if cell.inner_html == "test value" return cell end end In this case, cell would be: <td class="oMain">test value</td>

Related

How a write a common XPath for same text displayed for different HTML tags?

What is the role of parentheses in XPath 1.0?

One Xpath expression doesn't work in selenium, but works in Firefox

Using XPath to select table that includes specific class

Regex in html code

Categories

Resources