Select optional nodes with XPath

Select optional nodes with XPath - html

I have an HTML fragment:
<td>
<span class="x-cell">something</span>
<span class="y-cell">something</span>
<span class="z-cell">something</span>
A text
<span class="foo"/>
Another text
<span class="bar"/>
Also text
</td>
I try to select all nodes following the <span class="z-cell"/> to move them into another node. But all the nodes within td are optional, I can have zero to three <span class="*-cell"/>, the text is optional and there could be further <span> nodes in the middle/begin/end of the text or not.
In short, I have to move all nodes except the <span class="*-cell"/> into another node. I tried XPath to select the nodes:
td/span[contains(#class,"-cell")][last()]/following-sibling::*
but it doesn't work, if there aren't any <span class="*-cell"/> nodes. How I could solve that?

Have your xpath expression exclude all elements you do not want:
td/(*[not(contains(#class,"-cell"))]|text())
If you only want to copy elements without the intervening text this simplifies to
td/*[not(contains(#class,"-cell"))]
Live Demo on XPathTester

Related

How do you use xpath to find an element with two specific descendants?

I have an unordered list of list items containing elements for labels and values that are dynamically generated. I am trying to validate that the list contains a specific label with a specific value.
I am attempting to write an xpath that will allow me to find the parent element that contains the defined label and value with protractor's element(by.xpath). Given a list, I need to be able to find any single li by the combination of two descendants of specific attributes. For example, a li element that contains any descendent with class=label and text=Color AND any descendent with text=Blue.
<ul>
<li>
<span class='label'> Car </span>
<p> Ford </p>
</li>
<li>
<span class='label'> Color </span>
<p> <span>My favorite color is</span> : <webl>Blue</webl></p>
</li>
<li>
<span class='label'> Name </span>
<p> Meri </p>
</li>
<li>
<span class='label'> Pet </span>
<p> Cats <span>make the best pets</span> </p>
</li>
I have tried several variations on the following pattern:
//li[.//*[#class="label" | contains(text(), 'Color')] | .//*[contains(text(), 'Blue')]
This is the closest I think I have come and it's coming back as not a valid xpath. I've been looking at references, cheatsheets, and SO questions for several hours now and I am no closer to understanding what I am doing wrong. Eventually I will need to replace the text with variables, but right now I just need to get my head around this.
a list item that contains, at any depth,
any tag with a class of 'label' and text of x
AND
any tag with text y
Can anyone tell me what I am doing wrong? Am I just making it too complex?

The reason you are getting invalid xPath is because:
The |, or union, operator returns the union of its two operands,
which must be node-sets..
However since you have used inside one node you are getting issue. To meet your requirement below xpath will work just fine:
//*[#class="label" and contains(text(),'Color')]//ancestor::li//*[contains(text(), 'Blue')]

As per the HTML you have shared to locate the <li> element that contains a descendent with class='label' and text=Color AND any descendent with text=Blue you can use the following xpath based Locator Strategy:
//li[./span[#class='label' and contains(., 'Color')]][.//webl[contains(., 'Blue')]]
Proof Of Concept:

html and combining span ID's into one span ID

I'm working on an eBook which requires me to create an overlay. All is working fine except in some cases I have a drop cap combined with the rest of the word which need to be highlighted at the same time.
The code below is my current problem. I need to have the two span ID's combined into on without destroying the html.
Any ideas?
<p class="ParaOverride-1"><span id="_idTextSpan017" class="DropCap-color CharOverride-6" style="position:absolute;top:-109.78px;left:26.39px;">W</span><span id="_idTextSpan018" class="PageText-v1 CharOverride-7" style="position:absolute;top:0px;left:1626.19px;letter-spacing:-2.6px;">hat </span>

You need a nested <span>:
<span id="myID">
<span id="x">
</span>
<span id="y">
</span>
</span>

xpath select parent elem w/ blank text() after excluding certain children

I am trying to select all div.to_get whose children have no text content, excluding certain elements
html:
<body>
<div class="to_get">
<span> </span>
<span class="exclude"> text is ignored </span>
<span> </span>
</div>
<div class="to_get">
<span> there is text here, so don't select the parent div </span>
<span class="exclude"> text is ignored </span>
<span> </span>
</div>
<div class="to_get">
<span> </span>
<span class="exclude"> text is ignored </span>
<span> there is text here, so don't select the parent div </span>
</div>
</body>
xpath attempt:
//*/body/div[#class='to_get']/descendant::text()[not(ancestor::span/#class='exclude')][normalize-space(.)='']/ancestor::div[#class='to_get']
The problem is that this still returns the 2nd (and 3rd) div.to_get because of its 3rd (and 1st) span child. But those divs should be excluded due to its 1st (and 3rd) span child.
The xpath should only select the 1st div.to_get.

The following XPath
//div[#class='to_get' and normalize-space(span[not(#class='exclude')]/text())='']
selects all div with the class to_get that only contains empty span elements, excluding the span elements with the class exclude. For the input HTML, this returns only the first div.
Update: As noticed as comment, above XPath only checks for the first span. Following XPath
//div[#class='to_get'][not(span[not(#class='exclude') and not(normalize-space(text())='')])]
selects all div elements with the class to_get that only contain empty span elements excluding the ones having the class exclude. For the updated input HTML only the first div is returned.

You can try this way (formatted for readability) :
//div[
#class='to_get'
and
not(
span[not(#class='exclude') and normalize-space()]
)
]
To compare with the other answer, not(normalize-space(text())='') only tests if the first text node in the <span> is empty while normalize-space() tests if all text node(s) in the <span> is empty. Consider the following example that will pass the former but not the latter :
<div class="to_get">
<span> </span>
<span class="exclude"> text is ignored </span>
<span> <br/> there is text here, so don't select the parent div </span>
</div>

xPath: How to get 'title' text from table?

I am using xPath to try to get the title text from the following section of a table:
<td class="title" title="if you were in a job and then one day, the work..." data-id="3198695">
<span id="thread_3198695" class="titleline threadbit">
<span class="prefix">
</span>
<a id="thread_title_3198695" href="showthread.php?t=3198695">would this creep you out?</a>
<span class="thread-pagenav">(Pgs:
<span>1</span> <span>2</span> <span>3</span> <span>4</span>)</span>
</span>
<span class="byline">
by
<a href="member.php?u=1687137" data-id="3198695" class="username">
damoni
</a>
</span>
</td>
The output I want is: "if you were in a job and then one day, the work..."
I have been trying various expressions in Scrapy (python) to try and get the title. It outputs a weird text such as: '\n\n \r \r \n \n\n\r'
response.xpath("//tr[3]/td[#class='title']/text()")
I know that the following part is correct, at least (I verified it locates the correct table element using Chrome's developer tools:
//tr[3]/td
# (This is the above snippet)
Any idea as to how I can extract the title?

You want:
response.xpath("//tr[3]/td[#class='title']/#title")
Note that text() selects the text content of a node but #attribute the value of an attribute. Since the desired text is stored in the title attribute you need to use #title.

get sibling element text only when its parallel element meets a condition with xpath1.0

The goal is to get the code of the user named Nick who's title is Mr with xpath1.0.
<span class="user">
<span class="master">
<span class="user-title" title="Mr">
<span class="name">Nick</span>
</span>
<span class="user-info">
<span class="code">A</span>
</span>
</span>
</span>
<span class="user">
<span class="master">
<span class="user-title" title="Mr">
<span class="name">Bob</span>
</span>
<span class="user-info">
<span class="code">B</span>
</span>
</span>
</span>
I would divide it into several steps to understand how xpath works in this case.
//span[contains(., 'Nick']) can get that node, but how to get the person's code info which is in next node?

You could do something like:
//span
[#class='user']
[.//span[#class='name']='Nick']
//span[#class='code']
/text()
Basically this says:
Find the user span that contains the name span with text Nick
Within that user span, find the code span
For the code span, return the text
Alternatively, you could directly navigate to the sibling element. However, it is not as readable:
//span[.='Nick']/../following-sibling::*[1]/span/text()
This says to find the span with text Nick. From there, go to the parent (the user-title span). Then go to the next sibling (the user-info span). Then get the span in there, which is the code span.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Select optional nodes with XPath - html

Have your xpath expression exclude all elements you do not want: td/([not(contains(#class,"-cell"))]|text()) If you only want to copy elements without the intervening text this simplifies to td/[not(contains(#class,"-cell"))] Live Demo on XPathTester

Related

How do you use xpath to find an element with two specific descendants?

html and combining span ID's into one span ID

xpath select parent elem w/ blank text() after excluding certain children

xPath: How to get 'title' text from table?

get sibling element text only when its parallel element meets a condition with xpath1.0

Categories

Resources

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Select optional nodes with XPath - html

Have your xpath expression exclude all elements you do not want: td/(*[not(contains(#class,"-cell"))]|text()) If you only want to copy elements without the intervening text this simplifies to td/*[not(contains(#class,"-cell"))] Live Demo on XPathTester

Related

How do you use xpath to find an element with two specific descendants?

html and combining span ID's into one span ID

xpath select parent elem w/ blank text() after excluding certain children

xPath: How to get 'title' text from table?

get sibling element text only when its parallel element meets a condition with xpath1.0

Categories

Resources

Have your xpath expression exclude all elements you do not want: td/([not(contains(#class,"-cell"))]|text()) If you only want to copy elements without the intervening text this simplifies to td/[not(contains(#class,"-cell"))] Live Demo on XPathTester