I am using Selenium WebDriver. I have a doubt about the xpath.
If I have the following code example:
<div>
<div>
<div>
<a>
<div>
</div>
</a>
</div>
</div>
</div>
And I want to locate the element which is in the last <div>. I think I have 2 options with the xpath.
First option is with single slash:
driver.findElement(By.xpath("/div/div/div/a/div")).click();
Second option is using double slash (and here is where I have the doubt).
driver.findElement(By.xpath("//a/div")).click();
Is it going to search in the <a> directly, but what happens if the html example code was just a part of a bigger code and in this bigger code are more "<a>"?. Where would this method look exactly?
What happens for example if I do it like this:
driver.findElement(By.xpath("//div")).click();
Would it looks if every <div> found in the html code?
First of all, avoiding // is usually the right thing to do - so, the first expression you show is perfect.
Would it looks if every <div> found in the html code?
Yes, exactly. An XPath expression like
//div
will select all div elements in the document, regardless of where they are.
what happens if the html example code was just a part of a bigger code and in this bigger code are more <a>?. Where would this method look exactly?
Then, let us make the HTML "bigger":
<div>
<a>
<p>C</p>
</a>
<div>
<div>
<a>
<div>A</div>
</a>
</div>
<a>
<div>B</div>
</a>
</div>
</div>
As you can see, I have added two more a elements - only one of them contains a div element. Assuming this new document as the input, there will now be a difference between
/div/div/div/a/div
which will select only <div>A</div> as the result, and
//a/div
which will select both <div>A</div> and <div>B</div> - because the exact position of a in the tree is now irrelevant. But none of them will select the first a element that contains p.
Related
Straight to the point, Can you make something like this in pure css
This is a table with four columns, where:
certain rows can be collapsed (~ <details>)
the first column only is indented
some columns have a dynamic width ( ~ flex-grow , grid: 1fr )
some control over the dynamic width
the underlying html structure would look something like
<entry>
<A> example </A>
<B> example </B>
<C> example </C>
<D> example </D>
<sub-entries>
...
</sub-entries>
</entry>
using just css grids and <details> I can get requirements 1,3 and 4, but not 2.
As a side node, for my specific case the html structure can be generated from code.
EDIT
Example of requirements 1,3,4: https://codepen.io/samcoutteauhybrid/pen/XWYpEZe
the point isn't to make this example work, rather a way to display some data in a table ( not necessarily <table> ). So the starting point would be some json-esc object which needs to be converted to html.
using custom CSS property for depth was the right approach. here's how you could implement it
in your pen make this change to your CSS
.sub-entries {
padding-left: calc(1em * var(--depth,1));
}
here var() resolves to 1 if --depth is not defined. you may want to use 0 instead. that way there is no default indentation for subentries
to set the indent, the markup has to be updated to
<div class="sub-entries" style="--depth: 1">
<div class="entry">
<div class="name">indent me</div>
<div class="type">don't indent me</div>
<div class="sub-entries" style="--depth: 2">
<div class="entry">
<div class="name">indent me further</div>
</div>
</div>
</div>
</div>
PS: turns out cannot be done with <table> because only <tr> <th> <td> are valid children and the parser disposes everything else
So I was searching for ways to match and replace any html tag with a specific class name, but it seemed that, as I feared, it's not possible with RegEx (at least not perfectly).
However, I decided to code a solution that would fit my needs, but I figured out a way to actually match tags of the same level with this pattern :
/^(\t*)<([^>]{1,10}) [^>]*class="(?:[^"]+\s|)myclass.*".*>(.*)^\1<\/\2>/gUsm
(depending on your language, you may use \ or $ to identify a backreference)
I tested it on regex101.com (with global, Ungreedy, single line and multi line options enabled) and used this html code :
<p>Lorem ipsum</p>
<div id="foo" class="chosemyclass" data="stuff" >
<div>Hello World!</div>
<span></span>
</div>
<div></div>
<div id="nombinouche" class="chose myclass" data="stuff" >
<div>Hello World!</div>
<span></span>
</div>
<div></div>
<div id="bleh" class="myclass bob" data="moar">
<div>Hello World!</div>
<span></span>
</div>
<div>
</div>
<a class="chose myclass" href="https://www.youtube.com/watch?v=j8PxqgliIno" >
<a>Hello World!</a>
<span>of course it's a rickroll</span>
</a>
<a>
</a>
<div><h1 class="myclass">This content can't be captured because the tag doesn't start after the indentation
</h1></div>
<h1 class="myclass">This content can't be captured because the tag ends on the same line</h1>
<h1 class="myclass">However it captures it if the next line does
</h1>
I use the 2nd capture group to return the tag, the 3rd to return its content and use the 1st capture group is used to match the same indentation preceding the opening tag.
What each part does:
^(\s*) make the difference between its own closing tag and its children's closing tags (only when starting a line)
<([^>]{1,10}) identifies the html tag (longest being 10 characters) which doesn't close immediately
[^>]*class=" verifies if it has a class before the tag closes
(?:[^"]+\s|) verifies if it has another class before the desired class and if not, makes sure there's a whitespace before the desired class (so you don't end up selecting otherstuff when searching stuff)
myclass.*".*> searches for your class name and confirms the end of the opening tag
(.*) returns the tag's content
^\1<\/\2> verifies if the indentation at the start of a line and the closing tag matches the opening
but...
While this pattern fits my needs, it isn't perfect:
It is absolutely not working with minified code
It might not work or recognize the correct tag if the code is not well indented (check out DirtyMarkup if it helps)
Tags that end one the same line are not recognized (though I'm sure this can be remedied in RegEx)
Unrecognized tags' content and closing tag such as above will be wrongly returned as content if a later tag of the same indentation is correctly recognized.
Can you extrapolate on my solution?
I need an XPath expressions for the following HTML fragment (DOM structure)
<div class="content">
<div class="product-compare-row">
<div class="spec-title half-size">Model</div>
<div class="spec-values half-size">
<span class="spec-value">kast</span>
</div>
</div>
So I need the kast value if the spec-title div contains Model.
I've tried //div[preceding-sibling::div[contains(.,"Model)")]] but that doesn't work.
The XPath you are looking for is:
//div[contains(#class, "spec-title") and contains(text(), "Model")]/following-sibling::div/span/text()
It is a little bit tricky to follow, but in plain English:
Select all div elements who have a class spec-title and who have text that contains 'Model'.
Find any of this div's following siblings if they are a div.
Traverse to any of their children which are a span and return their text.
I have the following HTML snippet:
<div id="result-1">
<div class="page">
<div class="collapsingblock">
<h4>Click Me</h4>
</div>
<div class="collapsingblock collapsed">
<h4>No, Click Me</h4>
</div>
</div>
</div>
What I'm trying to do, is to find the second collapsingblock and it's h4
I have the following:
(//div[#id="result-1"]/div[#class="page"]/div[#class="collapsingblock"])[2]/h4
My xPath doesn't return the element. If I replace it with [1] it finds the first instance of collapsingblock though
Any ideas?
Thanks
UPDATE:
I have just noticed, that the HTML is using JavaScript to add/remove an additional class to the second collapsingblock, which collapsed
The problem is that the value of the class attribute of the second inner div element is not equal to "collapsingblock", as you can see:
<div class="collapsingblock collapsed">
<h4>No, Click Me</h4>
</div>
Even though class has very clear-cut semantics in HTML, it does not mean anything special to XPath, it's an attribute like any other.
Use contains() to avoid this problem:
(//div[#id="result-1"]/div[#class="page"]/div[contains(#class,"collapsingblock")])[2]/h4
Then, the only result of the expression above is
<h4>No, Click Me</h4>
By the way, parentheses around the lefthand part of the expression are not necessary in this case:
//div[#id="result-1"]/div[#class="page"]/div[contains(#class,"collapsingblock")][2]/h4
will do exactly the same, given this particular input document.
the parenthesis is necessary because of priority :
(//div[#id="result-1"]/div[#class="page"]/div[#class="collapsingblock"])[2]/h4
I have some block of code and need to get data out of it and trying different version of xpath commands but with no success.
<div>
<div class="some_class">
<a title="id" href="some_href">
<nobr>1<br>
</a>
</div>
<div class="some_other_class">
<a title="name" href="some_href">
<nobr>John<br>
</a>
</div>
</div>
<div>
<div class="some_class">
<a title="id" href="some_href">
<nobr>2<br>
</a>
</div>
<div class="some_other_class">
<a title="name" href="some_href">
<nobr>John<br>
</a>
</div>
</div>
// and many blocks like this
So, this div blocks are the same except they are different by content of its sub-element. I need xpath query to get John's href which <a title="id"> is equal to 1.
I've tried something like this:
//div[./div/nobr='1' AND ./div/nobr='John']
to get only div that contains data I need and then wouldn't be hard to get John's href.
Also, I've managed to get John's href with:
//a[./nobr='John'][#title='name']/#href
but that way it doesn't depend on value from <a title="id"...> element but it has to depend on it.
Any suggestions?
I think what you want is
//div/div[a/#title='id']/following-sibling::div[1]/a/#href
which, given a well-formed input document, will return (individual results separated by --------):
href="some_href"
-----------------------
href="some_href"
You did not explain it very clearly though, as kjhughes has noted, and perhaps your sample HTML is not ideal.
Regarding your attempted path expressions, as the input is HTML, it is hard to know whether
<nobr>John<br>
means that "John" is inside the nobr element or not.
Thanks Mathias, your example was helpful, but as there are many elements with #title='id' it isn't reliable solution that will always catch good elements.
I've managed to make workaround, first catched the whole div, and then extract href I need.
//div[./div/a[#title='name']/nobr='John' and ./div/a[#title='id']/nobr='1']
//a[./nobr='John'][#title='name']/#href