XPath for string contained in one XML element or another?

XPath for string contained in one XML element or another? - html

I need an XPath that can find either an <a> tag, or an <option> tag, each one containing "something".
So the XPath would be able to match either
<a attributes='value'>something</a>
or
<option attributes="value">something</option>
I tried this:
$x("//*[local-name()='a' contains(.,'something') or local-name()='option' contains(.,'something')]")
I also tried this:
$x("//*[local-name(contains(.,'something'))='a' or local-name(contains(.,'something'))='option']")
But neither of them work. In the first one, I can exclude the contains() and it finds the tags, but I need to be able to search for those tags only containing the specified "something" text.

You really should post your input XML.
Let's say it's this:
<r>
<a>xxx something</a>
<a>yyy nothing</a>
<option>something xxx</option>
<option>nothing xxx</option>
</r>
(1) Then (if you're trying to ignore namespaces):
//*[(local-name() = 'a' or local-name() = 'option')][contains(., 'something')]
(2) or (if there are no namespaces) [credit: earlier #alecxe post]:
//*[self::option or self::a][contains(., "something")]
(3) or (if using XPath 2.0, again without namespaces):
//(a|option)[contains(., 'something')]
will select
<a>xxx something</a>
<option>something xxx</option>

Related

Why can't I search for <tag class="x"> in the Dev Tools in Chrome? [duplicate]

I perform a simple search in devtool, but it drops drastically without a reason:
What's more, if I view the source and do the same search, the number of results of <link rel is just 58, not 184. Do you know why?
Here is the page if you need to examine.

For these "complex" queries you'll have to use xPath selectors:
//link[#rel]
//link[contains(#rel,'style')]
or CSS selectors:
link[rel]
link[rel*="style"]
For a trivial CSS selector like a use html a instead to ensure it doesn't match as literal text.
List of supported queries
Devtools uses CDP command DOM.performSearch and judging by the implementation it tries to match these types of queries:
text - inside #text nodes (like textContent in js)
text - inside tag names
text - inside attribute names
text - inside attribute values
<tag - matching at the start of a tag name
</tag - matching a closing tag
tag> - matching at the end of a tag name
<tag> - matching an entire tag name
"text - matching at the start of an attribute value
text" - matching at the end of an attribute value
text - matching an entire attribute value
//a[contains(., 'foo')] - XPath selector
a#foo.class[attr] - CSS selector
As you can see the literal text matching is limited to the first four types, and it won't find things that span more than one type like attr="value" that spans two types.

Why won't my XPath select link/button based on its label text?

<a href="javascript:void(0)" title="home">
<span class="menu_icon">Maybe more text here</span>
Home
</a>
So for above code when I write //a as XPath, it gets highlighted, but when I write //a[contains(text(), 'Home')], it is not getting highlighted. I think this is simple and should have worked.
Where's my mistake?

Other answers have missed the actual problem here:
Yes, you could match on #title instead, but that's not why OP's
XPath is failing where it may have worked previously.
Yes, XML and XPath are case sensitive, so Home is not the same as
home, but there is a Home text node as a child of a, so OP is
right to use Home if he doesn't trust #title to be present.
Real Problem
OP's XPath,
//a[contains(text(), 'Home')]
says to select all a elements whose first text node contains the substring Home. Yet, the first text node contains nothing but whitespace.
Explanation: text() selects all child text nodes of the context node, a. When contains() is given multiple nodes as its first argument, it takes the string value of the first node, but Home appears in the second text node, not the first.
Instead, OP should use this XPath,
//a[text()[contains(., 'Home')]]
which says to select all a elements with any text child whose string value contains the substring Home.
If there weren't surrounding whitespace, this XPath could be used to test for equality rather than substring containment:
//a[text()[.='Home']]
Or, with surrounding whitespace, this XPath could be used to trim it away:
//a[text()[normalize-space()= 'Home']]
See also:
Testing text() nodes vs string values in XPath
Why is XPath unclean constructed? Why is text() not needed in predicate?
XPath: difference between dot and text()

yes you are doing 2 mistakes, you're writing Home with an uppercase H when you want to match home with a lowercase h. also you're trying to check the text content, when you want to check check the "title" attribute. correct those 2, and you get:
//a[contains(#title, 'home')]
however, if you want to match the exact string home, instead of any a that has home anywhere in the title attribute, use #zsbappa's code.

You can try this XPath..Its just select element by attribute
//a[#title,'home']

Extracting content of HTML tag with specific attribute

Using regular expressions, I need to extract a multiline content of a tag, which has specific id value. How can I do this?
This is what I currently have:
<div(.|\n)*?id="${value}"(.|\n)*?>(.|\n)*?<\/div>
The problem with this is this sample:
<div id="1">test</div><div id="2">test</div>
If I want to replace id="2" using this regexp (with ${value} = 2), the whole string would get matched. This is because from the tag opening to closing I match everything until id is found, which is wrong.
How can I do this?

A fairly simple way is to use
Raw: <div(?=\s)[^>]*?\sid="2"[^>]*?>([\S\s]*?)</div>
Delimited: /<div(?=\s)[^>]*?\sid="2"[^>]*?>([\S\s]*?)<\/div>/
Use the variable in place of 2.
The content will be in group 1.

Change (.|\n) to [^>] so it won't match the > that ends the tag. Then it can't match across different divs.
<div\b[^>]*\bid="${value}"[^>]*>.*?<\/div>
Also, instead of using (.|\n)* to match across multiple lines, use the s modifier to the regexp. This makes . match any character, including newlines.
However, using regular expressions to parse HTML is not very robust. You should use a DOM parser.

Is there any way to get element attribute names by Xpath?

I can get attribute value of element by Xpath, but how to get all the attribute names?
example:
# there is an element
'<img src="http://fakesrc" alt="pic name"></img>'
i = <Element img at 0x102622cb0>
In [10]: i.xpath("//img/#src")
Out[10]: ['http://fakesrc']
In [11]: i.xpath("//img/#*")
Out[11]: ['http://fakesrc', 'pic name']
How can I get the the names src/alt of i?

Depending on whether you want to include namespace prefixes or not, you can choose between the following two options in XPath 2.0:
//#*/local-name()
//#*/name()
Choose a different initial context node that fits your needs and see the specifications for more info.
With XPath 1.0, the above is not possible. The following does work, but will only show the attribute name of one attribute, even if there are multiple ones.
local-name(//#*)
name(//#*)

is it possible to read the text of a li using Xpath with different attributes?

I am aware that I can directly use:
driver.FindElement(By.XPath("//ul[3]/li/ul/li[7]")).Text
to get the text .. but I am trying get the text by using Xpath and combination of different attributes like text(), contains() etc.
//ul[3]/li/ul/li//[text()='My Data']
Please suggest me different ways that I can handle this ... except the one I mentioned.
<li class="ng-binding ng-scope selectedTreeElement" ng-click="orgSelCtrl.selectUserSessionOrg(child);" ng-class="{selectedTreeElement: child.organizationId == orgSelCtrl.SelectedOrg.organizationId}" ng-repeat="child in node.childOrgs" style="background-color: transparent;"> My Data </li>

looks like you have extra "/" in your xpath and you miss dot:
//ul[3]/li/ul/li//[text()='My Data']
try this:
.//ul[3]/li/ul/li[text()='My Data']
BUT you are use xpath only for find elements, but not for reading its attributes. If you need to read attribute or text inside of it, you need to use selenium after search.

.Text of a WebElement would just return you the text of an element.
If you want to make expectations about the text, check the text() inside the XPath expression, e.g.:
//ul[3]/li/ul/li[text()='My Data']
or, using contains():
//ul[3]/li/ul/li[contains(text(), 'My Data')]
There are other functions you can make use of, see Functions - XPath.
You can also combine it with other conditions. For instance:
//ul[3]/li/ul/li[contains(#class, 'selectedTreeElement') and contains(text(), 'My Data')]

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

XPath for string contained in one XML element or another? - html

Related

Why can't I search for <tag class="x"> in the Dev Tools in Chrome? [duplicate]

Why won't my XPath select link/button based on its label text?

Extracting content of HTML tag with specific attribute

Is there any way to get element attribute names by Xpath?

is it possible to read the text of a li using Xpath with different attributes?

Categories

Resources