XPath for text based on content in preceding element? - html

<div class="profile-row">
<div class="profile-cell">
<h4 class="">Telephone</h4>
</div>
<div class="profile-cell">
<p class="">0207 289 2981</p>
</div>
</div>
I am trying to grab the phone number: 0207 289 2981
Using variations of:
//h4[starts-with(., 'Telephone')]/following-sibling::div[#class='profile-cell']
and:
//h4[starts-with(., 'Telephone')]/following-sibling::div/p
Can't seem to grab this.

Siblings have a common parent; h4 and p do not.
Use following:: instead.
This XPath,
//h4[.='Telephone']/following::p[1]/text()
will select the text of the immediately following p from your targeted h4.

Here is the xpath.
(//div[#class='profile-cell']//p)[last()]
The problem in your xpath is there is no sibling div to the h4. so you have to access the parent div and then select the sibling div as shown below.
//h4[starts-with(., 'Telephone')]/parent::div/following-sibling::div[#class='profile-cell']/p

This selects the actual number:
//p[#class='']/text()

Related

XPath get element with text in child element

I would like to match a <button> element with a certain text, which is sometimes closed in another element within the button, eg.:
<div #class="buttonset">
<button>Close</button>
</div>
<div #class="buttonset">
<button>
<span>Close</span>
</button>
</div>
The xpath query //div[#class='modal-buttonset']/button[text()='Cancel'] gives me only result from the highest level.
How to match the text on all levels?
Try the following:
//div/button[descendant::text()="Close"]
This XPath,
//button[normalize-space() = 'Close']
will select all button elements whose space-normalized string value is 'Close', regardless of any additional wrapper elements, as requested.

Selector for locating the second element not within the same parent

Given the following HTML:
<div>
<span class="my-class">One</span>
</div>
<div>
<span class="my-class">Two</span>
</div>
Can I use a CSS Selector using the class attribute my-class to locate only the second span(<span class="my-class">Two</span>)? Note that the <span>s are in different <div>s.
I've tried .my-class:nth-child(2) and .my-class:nth-of-type(2), which do not work. Also if possible, I would like to avoid using XPaths.
You need to use :nth-child to target the second div and then find the class you want inside it:
div:nth-child(2) .my-class{
background: red;
}
<div>
<span class="my-class">One</span>
</div>
<div>
<span class="my-class">Two</span>
</div>
With the :nth-child and :nth-of-type pseudo-class selectors you are targeting siblings. In other words, the elements must have the same parent.
.my-class:nth-child(2) and .my-class:nth-of-type(2) will not work as expected because you are targeting the second element inside the div container, which doesn't exist.
Since both divs are siblings (children of body), consider targeting the second div:
div:nth-child(2) > .my-class
An addition to #koala_dev 's answer, if you want to avoid using XPaths, you can give id attribute to each span to catch them. If you are using a script language, you can print your <span>'s by using a loop and you can set id's of each to loop variable. For example
#span-2{
background:red;
}
<div>
<span class="my-class" id="span-1">One</span>
</div>
<div>
<span class="my-class" id="span-2">Two</span>
</div>

Finding div containing text

I have the following HTML and and XPath working
<div class="panel panel-default">
<div class="panel-heading"><h1>Text to find</h1></div>
<div class="panel-body">
<div>
...
</div>
</div>
</div>
XPath:
.//div[div[#class[contains(.,'panel-heading')]][.//*[text()='Text to find']]]
The XPath expression will select the outer <div>.
Now if I remove the <h1> tag the XPath expression will no longer find the outer div. Can anyone explain me why, and what to do instead if I want to get the same result in the two cases.
That's because .//* part returns descendant elements of the <div class="panel-heading">. When you remove the h1 tag, the text node 'Text to find' is no longer contained in any descendant element (it is direct child of the context element now), hence can't be found using expression .//*[text()='Text to find'].
To make it work with and without h1 element, you can alter the predicate expression mentioned above to .//text()[.='Text to find'] :
.//div[div[#class[contains(.,'panel-heading')]][.//text()[.='Text to find']]]
.//text() simply returns descendant text nodes from current context element.

XPath for getting nodes from HTML fragment based on element text content

I need an XPath expressions for the following HTML fragment (DOM structure)
<div class="content">
<div class="product-compare-row">
<div class="spec-title half-size">Model</div>
<div class="spec-values half-size">
<span class="spec-value">kast</span>
</div>
</div>
So I need the kast value if the spec-title div contains Model.
I've tried //div[preceding-sibling::div[contains(.,"Model)")]] but that doesn't work.
The XPath you are looking for is:
//div[contains(#class, "spec-title") and contains(text(), "Model")]/following-sibling::div/span/text()
It is a little bit tricky to follow, but in plain English:
Select all div elements who have a class spec-title and who have text that contains 'Model'.
Find any of this div's following siblings if they are a div.
Traverse to any of their children which are a span and return their text.

How to differentiate two HTML elements with same tagname and same text

I would like to know is there any method in Jsoup that tells me an element with same tag name, text and class (if any) is different from other element with same tag name, text and same class (if any). For clarification consider the following HTML snippet :
<html>
<body>
<div>Here I Am</div><div>First Time</div>
<div>Here I Am</div><div>Again</div>
</body>
</html>
Now in the above code how can I separate the two elements with div tag and text Here I am. Also note that here the two elements have no id.
The above example is very simple but actual scenario may be more complex. So if you kindly suggest me a generalized answer I will be really grateful. Thank you.
Give id for making it unique for the document. this can be done as
<div id="first">Here I Am</div>
<div id="second">Here I Am</div>
Under the concept of DOM and XPath, these nodes are the same EXCEPT its canonical order. If you want to access the n-th node of this pattern,
//div[text()='Here I Am'][n]
where n is 0-based.
Get all div elements and select what you want.
Example
Document doc = Jsoup.parse("<html>\n" +
"<body>\n" +
"<div>Here I Am</div><div>First Time</div>\n" +
"<div>Here I Am</div><div>Again</div>\n" +
"</body>\n" +
"</html>");
Element div = doc.select("div").first();
System.out.println(div.html());
Output
Here I Am
Other elements are accessible by index.
Example
Element div = doc.select("div").get(3);
Output:
Again
You could identify the divs. For example, you could do this:
<div class="div1"><h2>Here I Am</h2></div>
<div class="div2"><h2>First Time</h2></div>
<div class="div3"><h2>Here I Am</h2></div>
<div class="div4"><h2>Again</h2></div>