XPath selecting explicit element comparing on value - html

This is what I have tried so far..
//div[#id='information']//div[div=='Site']
//div[text()='Site']//span//a[#href]
I am fiddling with an XPath expression but it´s not working out. I want to select the anchor's href attribute. Thats no problem but it needs to be explicitly after a div with class h3 AND a value = "Site".
<div id="information">
<div class="h3">Location</div>
<div class="h3">Site</div>
<span>
//Here is sometimes a <br/>
<a href='http://www.test.at'>Klick</a>
</span>
<div class="h3">Referenz</div>
<span>12345</span>
</div>
There can be arbitrarily many div elements inside the div with id="information" so selecting on index is not possible.

Something like this should work:
//div[#class = 'h3'][. = 'Site']/following-sibling::*/descendant-or-self::a/#href
This will extract the href attributes of all a tags that are after the "Site" div in document order but still contained within the same parent element (the "information" div in your example). If you're not bothered about that last bit, i.e. you want to include a tags that occur after the "information" div as well as inside it, then you can use the simpler
//div[#class = 'h3'][. = 'Site']/following::a/#href

Related

XPath get element with text in child element

I would like to match a <button> element with a certain text, which is sometimes closed in another element within the button, eg.:
<div #class="buttonset">
<button>Close</button>
</div>
<div #class="buttonset">
<button>
<span>Close</span>
</button>
</div>
The xpath query //div[#class='modal-buttonset']/button[text()='Cancel'] gives me only result from the highest level.
How to match the text on all levels?
Try the following:
//div/button[descendant::text()="Close"]
This XPath,
//button[normalize-space() = 'Close']
will select all button elements whose space-normalized string value is 'Close', regardless of any additional wrapper elements, as requested.

Finding div containing text

I have the following HTML and and XPath working
<div class="panel panel-default">
<div class="panel-heading"><h1>Text to find</h1></div>
<div class="panel-body">
<div>
...
</div>
</div>
</div>
XPath:
.//div[div[#class[contains(.,'panel-heading')]][.//*[text()='Text to find']]]
The XPath expression will select the outer <div>.
Now if I remove the <h1> tag the XPath expression will no longer find the outer div. Can anyone explain me why, and what to do instead if I want to get the same result in the two cases.
That's because .//* part returns descendant elements of the <div class="panel-heading">. When you remove the h1 tag, the text node 'Text to find' is no longer contained in any descendant element (it is direct child of the context element now), hence can't be found using expression .//*[text()='Text to find'].
To make it work with and without h1 element, you can alter the predicate expression mentioned above to .//text()[.='Text to find'] :
.//div[div[#class[contains(.,'panel-heading')]][.//text()[.='Text to find']]]
.//text() simply returns descendant text nodes from current context element.

XPath for getting nodes from HTML fragment based on element text content

I need an XPath expressions for the following HTML fragment (DOM structure)
<div class="content">
<div class="product-compare-row">
<div class="spec-title half-size">Model</div>
<div class="spec-values half-size">
<span class="spec-value">kast</span>
</div>
</div>
So I need the kast value if the spec-title div contains Model.
I've tried //div[preceding-sibling::div[contains(.,"Model)")]] but that doesn't work.
The XPath you are looking for is:
//div[contains(#class, "spec-title") and contains(text(), "Model")]/following-sibling::div/span/text()
It is a little bit tricky to follow, but in plain English:
Select all div elements who have a class spec-title and who have text that contains 'Model'.
Find any of this div's following siblings if they are a div.
Traverse to any of their children which are a span and return their text.

Can I 'style' and use a label as a 'normal' element

I have discover a way to have an input and label elements as an accordion viewer.
To center vertically my elements I use the label as if it was a div, that is, giving it display:table and create a div inside it.
So I have :
<div>
<input id='myid'>
<label for ='myid' style='display table'>
<div style='display:table-cell'>
<img ....... >
thetextforthelabel
</div>
</label>
</div>
Ok, this works fine.
My question is: am I doing something 'forbiden' ?
Can I use the label tag as a container ?
I know that it can be not orthodox .. but It works for me...
Your code is invalid.
The problem is that div elements can only be used
Where flow content is expected.
However, the content model of label elements is
Phrasing content, but with no descendant labelable elements
unless it is the element's labeled control, and no descendant
label elements.
Anyways, it will probably work, because (unlike e.g. p elements) the end tag of label elements can't be omitted:
Neither tag is omissable
However, I'm not sure of the advantage of having a table element with a single cell. Consider using the following instead:
<div>
<input id='myid'>
<label for='myid' style='display:block'>
<img ....... >
thetextforthelabel
</label>
</div>
Yes, it is forbidden by the formal rules of HTML. And yes, it works, and the parsing rules of HTML mean that it must work. So this is different from, say, the rule that says that a p element must not contain a div element; that rule is enforced by the parsing process (the p element is implicitly closed when <div> is encountered).
On the other hand, if the content is just an image and text, you don’t need a div element but can use span. In rendering, it does not matter (with the usual CSS caveats) which one you select, since their only difference in rendering is with the default display value, and you are assigning a display value anyway.
<div>
<input id='myid'>
<label for ='myid' style='display table'>
<span style='display:table-cell'>
<img src="http://lorempixel.com/100/50" alt="(an image)">
thetextforthelabel
</span>
</label>
</div>

How to differentiate two HTML elements with same tagname and same text

I would like to know is there any method in Jsoup that tells me an element with same tag name, text and class (if any) is different from other element with same tag name, text and same class (if any). For clarification consider the following HTML snippet :
<html>
<body>
<div>Here I Am</div><div>First Time</div>
<div>Here I Am</div><div>Again</div>
</body>
</html>
Now in the above code how can I separate the two elements with div tag and text Here I am. Also note that here the two elements have no id.
The above example is very simple but actual scenario may be more complex. So if you kindly suggest me a generalized answer I will be really grateful. Thank you.
Give id for making it unique for the document. this can be done as
<div id="first">Here I Am</div>
<div id="second">Here I Am</div>
Under the concept of DOM and XPath, these nodes are the same EXCEPT its canonical order. If you want to access the n-th node of this pattern,
//div[text()='Here I Am'][n]
where n is 0-based.
Get all div elements and select what you want.
Example
Document doc = Jsoup.parse("<html>\n" +
"<body>\n" +
"<div>Here I Am</div><div>First Time</div>\n" +
"<div>Here I Am</div><div>Again</div>\n" +
"</body>\n" +
"</html>");
Element div = doc.select("div").first();
System.out.println(div.html());
Output
Here I Am
Other elements are accessible by index.
Example
Element div = doc.select("div").get(3);
Output:
Again
You could identify the divs. For example, you could do this:
<div class="div1"><h2>Here I Am</h2></div>
<div class="div2"><h2>First Time</h2></div>
<div class="div3"><h2>Here I Am</h2></div>
<div class="div4"><h2>Again</h2></div>