Select text without sibling markup

Select text without sibling markup - html

I want to select with XPath only the value from span class="value" without the currency sign character.
<span class="infoValue">
<span class="value">
<span class="currencyLeft">$</span>
1000
</span>
</span>
//span[#class='infoValue']//span[#class='value']
With this xpath, I can select "$1000" . How can I remove or get only the "1000" without " $ " sign with XPath?
When I try with this.
//span[#class='infoValue']//span[#class='value']/span[not(currencyLeft)]"
I only get again the currency sign "$".

'normalize-space(//span[#class="value"]/text()[normalize-space()])'

This XPath,
normalize-space(//span[#class='currencyLeft']/following-sibling::text())
will select
1000
as requested.
You can, of course, specify the heritage to /span[#class='currencyLeft'] more specifically.

There are <span> nodes and text nodes. If you select a span node, you always get the $, because every span node contains it and XPath only selects complete nodes with all their descendants. But $ and 1000 are text nodes, so you can choose one of them:
//span[#class='infoValue']/span[#class='value']/text()
Alternatively, you can treat the span as string and remove the string "$" ("$" is a $, not the $ in the XML file, since latter $ is associated with a certain position/parent/... in the XML file). Although this only works for a single value:
normalize-space(translate(//span[#class='infoValue']/span[#class='value'], "$", ""))

substring-after(//span[#class="value"], "$")
use substring-after function:
Returns the substring of the first argument string that follows the
first occurrence of the second argument string in the first argument
string, or the empty string if the first argument string does not
contain the second argument string.

Related

Why is XPath contains(text(),'substring') not working as expected?

Let's say I have a piece of HTML like this:
<a>Ask Question<other/>more text</a>
I can match this piece of XPath:
//a[text() = 'Ask Question']
Or...
//a[text() = 'more text']
Or I can use dot to match the whole thing:
//a[. = 'Ask Questionmore text']
This post describes this difference between . (dot) and text(), but in short the first returns a single element, where the latter returns a list of elements. But this is where it gets a bit weird to me. Because while text() can be used to match either of the elements on the list, this is not the case when it comes to the XPath function contains(). If I do this:
//a[contains(text(), 'Ask Question')]
...I get the following error:
Error: Required cardinality of first argument of contains() is one or zero
How can it be that text() works when using a full match (equals), but doesn't work on partial matches (contains)?

For this markup,
<a>Ask Question<other/>more text</a>
notice that the a element has a text node child ("Ask Question"), an empty element child (other), and a second text node child ("more text").
Here's how to reason through what's happening when evaluating //a[contains(text(),'Ask Question')] against that markup:
contains(x,y) expects x to be a string, but text() matches two text nodes.
In XPath 1.0, the rule for converting multiple nodes to a string is this:
A node-set is converted to a string by returning the string-value of
the node in the node-set that is first in document order. If the
node-set is empty, an empty string is returned. [Emphasis added]
In XPath 2.0+, it is an error to provide a sequence of text nodes to a function expecting a string, so contains(text(),'substr') will cause an error for more than one matching text node.
In your case...
XPath 1.0 would treat contains(text(),'Ask Question') as
contains('Ask Question','Ask Question')
which is true. On the other hand, be sure to notice that contains(text(),'more text') will evaluate to false in XPath 1.0. Without knowing the (1)-(3) above, this can be counter-intuitive.
XPath 2.0 would treat it as an error.
Better alternatives
If the goal is to find all a elements whose string value contains the substring, "Ask Question":
//a[contains(.,'Ask Question')]
This is the most common requirement.
If the goal is to find all a elements with an immediate text node child equal to "Ask Question":
//a[text()='Ask Question']
This can be useful when wishing to exclude strings from descendent elements in a such as if you want this a,
<a>Ask Question<other/>more text</a>
but not this a:
<a>more text before <not>Ask Question</not> more text after</a>
See also
How contains() handles a nodeset first arg
How to use XPath contains() for specific text?
Testing text() nodes vs string values in XPath

The reason for this is that the contains function doesn't accept a nodeset as input - it only accepts a string. (Well, it may be engine dependent, because it works for Python's lxml module. According to the specification, it should convert the value of the first node in the set to a string and act on that. See also XPath contains(text(),'some string') doesn't work when used with node with more than one Text subnode)
//a[text() = 'Ask Question'] is matching any a elements which contain a text node which equals Ask Question.
//a[text() = 'more text'] is matching any a elements which contain a text node which equals more text.
So both of these expressions match the same a element.
You can re-work your query to //a[text()[contains(., 'Ask Question')]] so that the contains method will only act on a single text node at a time.

What is the Correct XPath to Identify Element with Text Occuring Minimum Number of Times?

I'm trying to identify an element that has certain text but I only want to identify the element if the desired text occurs a specific number of times.
For example, imagine we have the following two HTML snippets on the same page:
Snippet 1:
<span id="price">
$36.46
<span>
($0.38 / Count)
</span>
</span>
Snippet 2:
<span id="price">$38.38</span>
I could identify both elements using the XPath: .//span[contains(text(),'$')] However, I only want to identify the element if it (or any descendant of span element) contain at least two instances of the character: $
In above example, it would only identify the first snippet because the second snippet only contains one instance of $, not two.
What is the correct XPath syntax to use?

You can use the XPath //span[count(.//text()[contains(., "$")]) >= 2]
This is a moderately complicated XPath, so to explain it some by expanding outwards:
.//text()[contains(., "$")]
Select all text elements descending from the current node whose self contains "$".
count(.//text()[contains(., "$")])
Count the number of text elements descending from the current node whose self contains "$".
//span[count(.//text()[contains(., "$")]) >= 2]
Select all span elements with two or more text descendants whose self contains "$"
As a caveat, this only works if the dollar sign is in two different text elements. If you want to include the span in this example:
<span>
$$
<span>
foo
</span>
</span>
...then you'll need a different approach:
//span[string-length(.) - string-length(translate(., "$", "")) >= 2]
This predicate compares the string length of the span to the string length of the same span with all "$" characters removed.

One usable XPath-1.0 expression is
string-length(/span[#id='price'])-string-length(translate(/span[#id='price'],'$',''))
In a predicate this could look like
//span[string-length(.)-string-length(translate(.,'$',''))>=2]
This expression selects only the elements with a count of $ >= 2

Why won't my XPath select link/button based on its label text?

<a href="javascript:void(0)" title="home">
<span class="menu_icon">Maybe more text here</span>
Home
</a>
So for above code when I write //a as XPath, it gets highlighted, but when I write //a[contains(text(), 'Home')], it is not getting highlighted. I think this is simple and should have worked.
Where's my mistake?

Other answers have missed the actual problem here:
Yes, you could match on #title instead, but that's not why OP's
XPath is failing where it may have worked previously.
Yes, XML and XPath are case sensitive, so Home is not the same as
home, but there is a Home text node as a child of a, so OP is
right to use Home if he doesn't trust #title to be present.
Real Problem
OP's XPath,
//a[contains(text(), 'Home')]
says to select all a elements whose first text node contains the substring Home. Yet, the first text node contains nothing but whitespace.
Explanation: text() selects all child text nodes of the context node, a. When contains() is given multiple nodes as its first argument, it takes the string value of the first node, but Home appears in the second text node, not the first.
Instead, OP should use this XPath,
//a[text()[contains(., 'Home')]]
which says to select all a elements with any text child whose string value contains the substring Home.
If there weren't surrounding whitespace, this XPath could be used to test for equality rather than substring containment:
//a[text()[.='Home']]
Or, with surrounding whitespace, this XPath could be used to trim it away:
//a[text()[normalize-space()= 'Home']]
See also:
Testing text() nodes vs string values in XPath
Why is XPath unclean constructed? Why is text() not needed in predicate?
XPath: difference between dot and text()

yes you are doing 2 mistakes, you're writing Home with an uppercase H when you want to match home with a lowercase h. also you're trying to check the text content, when you want to check check the "title" attribute. correct those 2, and you get:
//a[contains(#title, 'home')]
however, if you want to match the exact string home, instead of any a that has home anywhere in the title attribute, use #zsbappa's code.

You can try this XPath..Its just select element by attribute
//a[#title,'home']

Get number via XPath?

There's a part of the page:
<p>
349
<span>
$
</span>
</p>
How to get "349"?

There are many XPaths that will select "349" from that XML. Here are a few:
Select the space-normalized text node children of p:
normalize-space(/p/text())
Select the space-normalized substring before the $ in the string value of p:
normalize-space(substring-before(/p, '$'))
Select space-normalized, numeric text nodes anywhere in the document:
normalize-space(//text()[number(.) = .])
All of these XPaths will select "349" as a string as requested. You could also wrap any of the above expressions with a number() function call if you actually want 349 as a number rather than as a string.

Since you don't show the complete page, we don't have a complete path, but try this:
/p/text()

How can you view the output XPATH functions like normalize-space()?

Say I have the following HTML:
<div class="instruction" id="scan-prompt">
<span class="long instruction">Scan </span>
<span id="slot-to-scan">A-2</span>
<span class="long instruction"> to prep</span>
</div>
And I'm trying to write an XPATH selector like this
//div[#id='scan-prompt' and normalize-space()='Scan A-2 to prep']
Is there a way to see what the normalize-space output actually is?
I know you can do $x("//div[#id='scan-prompt']) in chrome debugger but I don't know how to go from that to seeing the output of normalize-space.

Why can you not simply use the path expression
normalize-space(//div[#id='scan-prompt'])
to see what the normalized string value would look like? Other than that, what normalize-space() does exactly is:
Removing any leading or trailing whitespaces from the string argument
Collapsing any sequence of whitespace characters to just one whitespace character
If handed an element node as an argument (as is the case with your original expression), the function evaluates the string value of that element node. The string value of an element node is the concatenation of all its descendant text nodes.
The result of normalize-space(//div[#id='scan-prompt']) is, given the input you show (whitespace marked with "+"):
Scan+A-2+to+prep
Without invoking normalize-space(), for example string(//div[#id='scan-prompt']):
+
Scan+
A-2+
to+prep+
+
So, simply use path expressions that do nothing else than either giving back a string value or a normalized string value. With Google Chrome by using an XPath expression inside $x().

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Select text without sibling markup - html

'normalize-space(//span[#class="value"]/text()[normalize-space()])'

This XPath, normalize-space(//span[#class='currencyLeft']/following-sibling::text()) will select 1000 as requested. You can, of course, specify the heritage to /span[#class='currencyLeft'] more specifically.

Related

Why is XPath contains(text(),'substring') not working as expected?

What is the Correct XPath to Identify Element with Text Occuring Minimum Number of Times?

Why won't my XPath select link/button based on its label text?

Get number via XPath?

How can you view the output XPATH functions like normalize-space()?

Categories

Resources