I have the following HTML snippet:
<div id="result-1">
<div class="page">
<div class="collapsingblock">
<h4>Click Me</h4>
</div>
<div class="collapsingblock collapsed">
<h4>No, Click Me</h4>
</div>
</div>
</div>
What I'm trying to do, is to find the second collapsingblock and it's h4
I have the following:
(//div[#id="result-1"]/div[#class="page"]/div[#class="collapsingblock"])[2]/h4
My xPath doesn't return the element. If I replace it with [1] it finds the first instance of collapsingblock though
Any ideas?
Thanks
UPDATE:
I have just noticed, that the HTML is using JavaScript to add/remove an additional class to the second collapsingblock, which collapsed
The problem is that the value of the class attribute of the second inner div element is not equal to "collapsingblock", as you can see:
<div class="collapsingblock collapsed">
<h4>No, Click Me</h4>
</div>
Even though class has very clear-cut semantics in HTML, it does not mean anything special to XPath, it's an attribute like any other.
Use contains() to avoid this problem:
(//div[#id="result-1"]/div[#class="page"]/div[contains(#class,"collapsingblock")])[2]/h4
Then, the only result of the expression above is
<h4>No, Click Me</h4>
By the way, parentheses around the lefthand part of the expression are not necessary in this case:
//div[#id="result-1"]/div[#class="page"]/div[contains(#class,"collapsingblock")][2]/h4
will do exactly the same, given this particular input document.
the parenthesis is necessary because of priority :
(//div[#id="result-1"]/div[#class="page"]/div[#class="collapsingblock"])[2]/h4
Related
So I was searching for ways to match and replace any html tag with a specific class name, but it seemed that, as I feared, it's not possible with RegEx (at least not perfectly).
However, I decided to code a solution that would fit my needs, but I figured out a way to actually match tags of the same level with this pattern :
/^(\t*)<([^>]{1,10}) [^>]*class="(?:[^"]+\s|)myclass.*".*>(.*)^\1<\/\2>/gUsm
(depending on your language, you may use \ or $ to identify a backreference)
I tested it on regex101.com (with global, Ungreedy, single line and multi line options enabled) and used this html code :
<p>Lorem ipsum</p>
<div id="foo" class="chosemyclass" data="stuff" >
<div>Hello World!</div>
<span></span>
</div>
<div></div>
<div id="nombinouche" class="chose myclass" data="stuff" >
<div>Hello World!</div>
<span></span>
</div>
<div></div>
<div id="bleh" class="myclass bob" data="moar">
<div>Hello World!</div>
<span></span>
</div>
<div>
</div>
<a class="chose myclass" href="https://www.youtube.com/watch?v=j8PxqgliIno" >
<a>Hello World!</a>
<span>of course it's a rickroll</span>
</a>
<a>
</a>
<div><h1 class="myclass">This content can't be captured because the tag doesn't start after the indentation
</h1></div>
<h1 class="myclass">This content can't be captured because the tag ends on the same line</h1>
<h1 class="myclass">However it captures it if the next line does
</h1>
I use the 2nd capture group to return the tag, the 3rd to return its content and use the 1st capture group is used to match the same indentation preceding the opening tag.
What each part does:
^(\s*) make the difference between its own closing tag and its children's closing tags (only when starting a line)
<([^>]{1,10}) identifies the html tag (longest being 10 characters) which doesn't close immediately
[^>]*class=" verifies if it has a class before the tag closes
(?:[^"]+\s|) verifies if it has another class before the desired class and if not, makes sure there's a whitespace before the desired class (so you don't end up selecting otherstuff when searching stuff)
myclass.*".*> searches for your class name and confirms the end of the opening tag
(.*) returns the tag's content
^\1<\/\2> verifies if the indentation at the start of a line and the closing tag matches the opening
but...
While this pattern fits my needs, it isn't perfect:
It is absolutely not working with minified code
It might not work or recognize the correct tag if the code is not well indented (check out DirtyMarkup if it helps)
Tags that end one the same line are not recognized (though I'm sure this can be remedied in RegEx)
Unrecognized tags' content and closing tag such as above will be wrongly returned as content if a later tag of the same indentation is correctly recognized.
Can you extrapolate on my solution?
I have the following HTML and and XPath working
<div class="panel panel-default">
<div class="panel-heading"><h1>Text to find</h1></div>
<div class="panel-body">
<div>
...
</div>
</div>
</div>
XPath:
.//div[div[#class[contains(.,'panel-heading')]][.//*[text()='Text to find']]]
The XPath expression will select the outer <div>.
Now if I remove the <h1> tag the XPath expression will no longer find the outer div. Can anyone explain me why, and what to do instead if I want to get the same result in the two cases.
That's because .//* part returns descendant elements of the <div class="panel-heading">. When you remove the h1 tag, the text node 'Text to find' is no longer contained in any descendant element (it is direct child of the context element now), hence can't be found using expression .//*[text()='Text to find'].
To make it work with and without h1 element, you can alter the predicate expression mentioned above to .//text()[.='Text to find'] :
.//div[div[#class[contains(.,'panel-heading')]][.//text()[.='Text to find']]]
.//text() simply returns descendant text nodes from current context element.
I am using Selenium WebDriver. I have a doubt about the xpath.
If I have the following code example:
<div>
<div>
<div>
<a>
<div>
</div>
</a>
</div>
</div>
</div>
And I want to locate the element which is in the last <div>. I think I have 2 options with the xpath.
First option is with single slash:
driver.findElement(By.xpath("/div/div/div/a/div")).click();
Second option is using double slash (and here is where I have the doubt).
driver.findElement(By.xpath("//a/div")).click();
Is it going to search in the <a> directly, but what happens if the html example code was just a part of a bigger code and in this bigger code are more "<a>"?. Where would this method look exactly?
What happens for example if I do it like this:
driver.findElement(By.xpath("//div")).click();
Would it looks if every <div> found in the html code?
First of all, avoiding // is usually the right thing to do - so, the first expression you show is perfect.
Would it looks if every <div> found in the html code?
Yes, exactly. An XPath expression like
//div
will select all div elements in the document, regardless of where they are.
what happens if the html example code was just a part of a bigger code and in this bigger code are more <a>?. Where would this method look exactly?
Then, let us make the HTML "bigger":
<div>
<a>
<p>C</p>
</a>
<div>
<div>
<a>
<div>A</div>
</a>
</div>
<a>
<div>B</div>
</a>
</div>
</div>
As you can see, I have added two more a elements - only one of them contains a div element. Assuming this new document as the input, there will now be a difference between
/div/div/div/a/div
which will select only <div>A</div> as the result, and
//a/div
which will select both <div>A</div> and <div>B</div> - because the exact position of a in the tree is now irrelevant. But none of them will select the first a element that contains p.
I have some block of code and need to get data out of it and trying different version of xpath commands but with no success.
<div>
<div class="some_class">
<a title="id" href="some_href">
<nobr>1<br>
</a>
</div>
<div class="some_other_class">
<a title="name" href="some_href">
<nobr>John<br>
</a>
</div>
</div>
<div>
<div class="some_class">
<a title="id" href="some_href">
<nobr>2<br>
</a>
</div>
<div class="some_other_class">
<a title="name" href="some_href">
<nobr>John<br>
</a>
</div>
</div>
// and many blocks like this
So, this div blocks are the same except they are different by content of its sub-element. I need xpath query to get John's href which <a title="id"> is equal to 1.
I've tried something like this:
//div[./div/nobr='1' AND ./div/nobr='John']
to get only div that contains data I need and then wouldn't be hard to get John's href.
Also, I've managed to get John's href with:
//a[./nobr='John'][#title='name']/#href
but that way it doesn't depend on value from <a title="id"...> element but it has to depend on it.
Any suggestions?
I think what you want is
//div/div[a/#title='id']/following-sibling::div[1]/a/#href
which, given a well-formed input document, will return (individual results separated by --------):
href="some_href"
-----------------------
href="some_href"
You did not explain it very clearly though, as kjhughes has noted, and perhaps your sample HTML is not ideal.
Regarding your attempted path expressions, as the input is HTML, it is hard to know whether
<nobr>John<br>
means that "John" is inside the nobr element or not.
Thanks Mathias, your example was helpful, but as there are many elements with #title='id' it isn't reliable solution that will always catch good elements.
I've managed to make workaround, first catched the whole div, and then extract href I need.
//div[./div/a[#title='name']/nobr='John' and ./div/a[#title='id']/nobr='1']
//a[./nobr='John'][#title='name']/#href
I am trying to create an anchor tag but its not working in any of the browsers
I am going from one page to another
<p>
View All Code Related Issues
</p>
and its going to this page having 10-12 anchor tags..
<div class="grouping">
<h4 id="Code2011">
<a>Code 2011</a>
</h4>
</div>
I tried these too:
<div class="grouping">
<h4 id="Code2011">
<a id="Code2011">Code 2011</a>
</h4>
</div>
and
<div class="grouping">
<h4>
<a name="Code2011">Code 2011</a>
</h4>
</div>
but none of them are working: When I go to that page and press enter on the url it then works...so that means my url is coming up fine...any ideas?
I found that this works better. Don't know why.
<div class="grouping">
<h4>
<a name="Code2011"></a>
Code 2011
</h4>
</div>
I have found that sometimes you can mistakenly have another element with the same ID. In my case, it was an option tag, which cannot be moved into view. As such, I would recommend you try $('#yourid') to see if there are any tags unexpectedly with the same ID.
In general:
'name' is deprecated, so don't use it.
All id's must be unique, no exceptions. You cannot have duplicated
id's.
Anchor id's need to occur in anchor tags. So something like <h4
id="myanchor"> wouldn't work as an anchor.
Your second example would work for you if you removed (or rename) the id from the H4 tag.
For others future reference, I've noticed anchors not working well within some divs. They seem to work better when placed next to a recognizable page element like an image or a table row, something on the page that isn't buried within a div. I think what may happen is with floated elements and relative positioning the page can't find the precise spot of your anchor so you get nothing.
Try:
Code 2011