Xpath selecting nodes with text - html

I have 2 requests.
First i need xpath expression that selects all <tr> elements with any kind of text nested in that element.
I tried with :
//tr[contains(., 'PRVI ODJELJAK')]/following-sibling::tr[text() != '']
but it doesnt work it still selects siblings w/o a text also. :/
secondly is there a way to lets say select all siblings of an element until you hit a sibling with inner text matching some text.
Thx in advance!

"First i need xpath expression that selects all elements with any kind of text nested in that element"
To filter element that contains some non-whitespace text in it, you can use normalize-space() :
//tr[contains(., 'PRVI ODJELJAK')]/following-sibling::tr[normalize-space()]
"secondly is there a way to lets say select all siblings of an element until you hit a sibling with inner text matching some text"
Probably you can emulate that by selecting siblings where there is following-sibling with inner text matching some text
following-sibling::tr[following-sibling::tr[contains(.,'some text')]]

Related

XPath for element with child X with no leading text, and no child Y

With this:
<p>Some variable length text <abbr>abc</abbr> and more text.</p>
<p><abbr>abc</abbr>Some variable length text.</p>
<p><abbr>abc</abbr>Some variable length text. blah</p>
I'm trying to build an Xpath that will find p elements that:
Have a child <abbr> with value of "abc", but that child has no intervening text between the parent and the child.
Does not have a child <a>.
#1 means the first line should not match, because it has text between the parent <p> and the child <abbr>.
#2 means the third line should not match, because although there is a child <abbr> and no leading text, it also has a child <a>.
Thus, only the second line,
<p><abbr>abc</abbr>Some variable length text.</p>
should match: no leading text, and no child <a>.
I've played with an XPath testbed for an hour, and done quite a bit of searching, but haven't figured out how to handle both of those requirements, but especially #1.
This XPath,
//p[node()[1][self::abbr]='abc'][not(a)]
selects all p elements
with a first sibling abbr element that has a string value of "abc"1, and
without an a child,
as requested.
1 Note that #1 implies that abbr must be the first node, which prevents any node (text, another element, etc) from being a previous sibling to abbr.
Try this XPath-1.0 expression:
//p[node()[1]!=text()[1] and abbr='abc' and not(a)]
It checks if the first child node is a text() node. Further checks are that it has a child "abbr" with the content 'abc' and no children named "a".

XPath - get text from whole document except text from specified elements

I'm trying to figure out how to get text using XPath and exclude some tags.
Let's say (for illustration) I want to get all text from this page's body tag (so all visible text), but I don't want my text to contain text from tags with class="comment-copy" i.e. I don't want text to include comments.
I tried this but it doesn't work. It returns text including comments.
//body//text()[not(*[contains(#class,"comment-copy")])]
Do you have any idea?
EDIT:
Probably figured it out but maybe there are better or faster approaches so I won't delete the question.
//body//text()[not(ancestor-or-self::*[contains(#class,"comment-copy")])]
You were very close.
Just change
//body//text()[not(*[contains(#class,"comment-copy")])]
to
//body//text()[not(contains(../#class,"comment-copy"))]
Note that this will only exclude immediate children text() nodes of comment-copy marked elements. Your follow-up XPath will exclude all descendant text() nodes beneath comment-copy marked elements.
Note: You might want to beef up the robustness of the #class test; see Xpath: Find element with class that contains spaces.

Finding xpath of element

In the following snippet
I want to get xpath of the element containing the text 'This is what I should get'. I use the xpath expression html/body/div[5]/div[3]/div/div/div/div[2]/div/table/tbody/tr[2]/td/span, but I am getting the element with text 'This is what I am getting'. Please help me to modify element locator to get desired text
There must be a better XPath expression than that verbose one, but without more information I can only suggest based on the existing XPath. So, the desired text node can be identified either as text node that follows the previously selected span element :
..../table/tbody/tr[2]/td/span/following-sibling::text()[1]
or as direct child text node from the parent td element :
..../table/tbody/tr[2]/td/text()[normalize-space()]
If you want to get the text node, the xpath would be:
html/body/div[5]/div[3]/div/div/div/div[2]/div/table/tbody/tr[2]/td/text()[2]
Although xPath expression should probably less verbose.

XPath - selecting descendants - difference between using // and */

Can someone help me understand the difference between the two following XPath queries:
A: //table[#id="xyz"]//tr[//a[contains(text(), "Alice")]]
B: //table[#id="xyz"]//tr[*/a[contains(text(), "Alice")]]
(A) appears to select all the tr's in the table regardless of whether it has an a descendent with the text "Alice".
(B) meanwhile does what I expect in only selecting the trs with a descendents containing the text "Alice".
As an aside question, is there a more elegant way of writing the above?
You would need to use //table[#id="xyz"]//tr[.//a[contains(text(), "Alice")]] or //table[#id="xyz"]//tr[descendant::a[contains(text(), "Alice")]] to make sure that in the first expression the path in square brackets is relative to the tr. With your current //tr[//a] inside the predicate the selection //a starts from the document node, the root node, and is not relative to the tr.
//a selects all a elements anywhere in the document.
*/a selects all a elements that are grandchildren of the context node.
.//a selects all a elements that are descendants of the context node at any depth
./a (or just a) selects all a elements that are children of the context node.
It's vital to understand the notion of context. In a predicate such as tr[XXXX] the context node for evaluating XXXX is the tr element that you are testing.

xpath help can't get the text?

I am unable to get the text from this website: http://mp3bear.com...so now I just want to get the title of the song that is displayed on it.. here is what i wrote as the code:
//table/tr[2]/td[2]
so now I want to get second row from second column... it doesn't display anything.... is there any thing special when
I can't find any table element on this site, the tables are constructed with divs.
Therefore the expression for the second row of the second column of the table is.
//div[#id='listwrap']/div[3]/div[2]
There are some xpath implementations that don't allow indexing of child elements in this manner. In this case you could use
//div[#id='listwrap']/div[position()='3']/div[position()='2']
Edit:
In that case you need this expression:
//div[#id='listwrap']/div[3]/div[2]/a/text()
as the title is contained in a 'a' element and you use the xpath function text() to get the text value of the 'a' element
tested in firepath.