Selenium automation- finding best xpath - html

I am looking to avoid using xpaths that are 'xpath position'. Reason being, the xpath can change and fail an automation test if a new object is on the page and shifts the expected xpath position.
But on some web pages, this is the only xpath I can find. For example, I am looking to click a tab called 'FooBar'.
If I use the Selenium IDE FireFox plugin, I get:
//td[12]/a/font
If I use the FirePath Firefox plugin, I get:
html/body/form/table[2]/tbody/tr/td[12]/font
If a new tab called "Hello, World" is added to the web page (before FooBar tab) then FooBar tab will change and have an xpath position of
//td[13]/a/font
What would you suggest to do?
TY!

Instead of using absolute xpath you could use relateive xpath which is short and more reliable.
Say
<td id="FooBar" name="FooBar">FooBar</td>
By.id("FooBar");
By.name("FooBar");
By.xpath("//td[text()='FooBar']") //exact match
By.xpath("//td[#id='FooBar']") //with any attribute value
By.xpath("//td[contains(text(),'oBar')]") //partial match with contains function
By.xpath("//td[starts-with(text(),'FooB')]") //partial match with startswith function
This blog post may be useful for you.

Relative xpath is good idea. relative css is even better(faster)
If possible suggest/request id for element.
Check also chrome -> check element -> copy css/xpath

Using //td is not a good idea because it will return all your td nodes. Any predicate such as //td[25] will be a very fragile selection because any td added to any previous table will change its result. Using plugins to generate XPath is great to find quickly what you want, but its always best to use it just as a starting point, and then analyze the structure of the file to write a locator that will be harder to break when changes occur.
The best locators are anchored to invariant values or attributes. Plugins usually won't suggest id or attribute anchors. They usually use absolute positional expressions. If can rewrite your locator path in terms of invariant structures in the file, you can then select the elements or text that you want relative to it.
For example, suppose you have
<body> ...
... lots of code....
<h1>header that has a special word</h1>
... other tags and text but not `h1` ...
<table id="some-id">
...
<td>some-invariant-text</td>
<td>other text</td>
<td>the field that you want</td>
...
The table has an ID. That's the best anchor. Now you can select the table as
//table[#id='some-id']
But many times you don't have the id, or even some other invariant attribute. You can still try to discover a pattern. For example: suppose that the last <h1> before the table you want contains a word you can match, you could still find the table using:
//table[preceding::h1[1][contains(.,'word')]]
Once you have the table, you can use relative axes to find the other nodes. Let's assume you want an td but there are no attributes on any tbody, tr, etc. You can still look for some invariant text. Tables usually have headers, or some fixed text which you can match. In the example above, if you find a td that is 2 fields before the one that you want, you could use:
//table[preceding::h1[1][contains(.,'word')]]/td[preceding-sibling::td[2][.='some-invariant-text']]
This is a simple example. If you apply some of these suggestions to the file you are working on, you can improve your XPath expression and make your selection code more robust.

Related

How to select all elements with a specific name under every li node with the same structure?

I have a certain bunch of XPath locators that hold the elements I want to extract, and they have a similar structure:
/div/ul/li[1]/div/div[2]/a
/div/ul/li[2]/div/div[2]/a
/div/ul/li[3]/div/div[2]/a
...
They are actually simplified from Pixiv user page. Each /div/div[2]/a element has a title string, so they are actually artwork titles.
I want to use a single expression to fetch all the above a elements in an WebExtension called PageProbe. Although I've tried a bunch of methods, it just can't return the wanted result.
However, the following expression does return all the a elements, including the ones I don't need.
/div/
The following expression returns the a element under only the first li item.
/div/ul/li/div/div[2]/a
Sorry for not providing enough info earlier. Hope someone can help me out. Thanks.
According to the information you gave here you can simply use this xpath:
/div/ul/li/div/div[2]/a
however I'm quite sure it should be some better locator based on other attributes like class names etc.

How to use by.css with Xpath and how to diferenciate both elements with identical paths/classes

I have this two xpaths:
/html/body/ion-app/ng-component/ion-nav/page-settings/ion-content/div[2]/ion-grid/ion-row[2]/ion-col[1]/ion-input
/html/body/ion-app/ng-component/ion-nav/page-settings/ion-content/div[2]/ion-grid/ion-row[2]/ion-col[2]/ion-input
How do i use them with element by.css to test with protractor?
And how to diferenciate when both paths are "equal" like the ones presented above and on the picture?
You can not use xpath with by.css, and only with by.xpath
If you want to use equivalent css to the given xpath, it's very basic knowledge, look it up. Here is a very good source https://devhints.io/xpath
Even though the second question should have been a separate post, I'll answer that also. In your particular example, the input are not identical.
Xpath:
//*[#ng-reflect-model="GG"]/input
//*[#ng-reflect-model="Test"]/input
But in hypothetical scenario, when you have 2 elements with the same attributes, you can specify which occurrence to use - (//xpath)[1] (selects the first match), or say which child is that - //xpath/to/firstChild[1]

Not able to select a text using selenium

I have a readymade code and i'm trying to write tests for it using selenium. This is how my code looks like in element tab of chrome:
<table id="xyz">
<tbody>
<tr>...</tr>
"
I am not able to retrieve this text.
"
</tbody>
</table>
Doing this $x("//*[contains(text(),'I am not able to retrieve this text')]"); in console tab of chrome shows no results. I'm able to get text by this command if the text is defined in a div, span etc. (Also case sensitivity is not a problem).
In code that text is appended in tbody using jQuery('tbody').append( abc() ); and abc() function returns this text in this way pqr.html();
Now my questions is what xpath expression should i write to grab this text? And i am looking for a pure xpath expression i.e no java functions etc.
contains() expects singular value as the first parameter. An HTML element may have more than one text nodes children in it, in which case, your attempted XPath will evaluates only the first text node. According to the sample HTML posted, the first text node of <tbody> which will be evaluated consists of newline and some spaces, hence your XPath didn't consider <tbody> a match and returned no result.
To avoid the problem explained above, use contains() in predicate for individual text nodes like the following :
//*[text()[contains(.,'I am not able to retrieve this text')]]
or this way if you want to return the text node itself instead of the parent element :
//*/text()[contains(.,'I am not able to retrieve this text')]
That table element is probably within a frame. To access contents within a frame you need to first switch to it. You can use the "switch to" method in selenium. Refer this answer and this one.
For the same reason it is not working in the Chrome Dev Tools console. In the console tab, there is dropdown containing a list of frames. Make sure you select the correct frame in which the specific element exist and then execute your XPath.

How to parse HTML/XML tags according to NOT conditions in [r]

Dearest StackOverflow homies,
I'm playing with HTML that was output by EverNote and need to parse the following:
Note Title
Note anchor (hyperlink identities of the notes themselves)
Note Creation Date
Note Content, and
Intra-notebook hyperlinks (the
links within the content of a note to another note's anchor)
According to examples by Duncan Temple Lang, author of the [r] XML package and a SO answer by #jdharrison, I have been able to parse the Note Title, Note anchor, and Note Creation Dates with relative ease. For those who may be interested, the commands to do so are
require("XML")
rawHTML <- paste(readLines("EverNotebook.html"), collapse="\n") #Yes... this is noob code
doc = htmlTreeParse(rawHTML,useInternalNodes=T)
#Get Note Titles
html.titles<-xpathApply(doc, "//h1", xmlValue)
#Get Note Title Anchors
html.tAnchors<-xpathApply(doc, "//a[#name]", xmlGetAttr, "name")
#Get Note Creation Date
html.Dates<-xpathApply(doc, "//table[#bgcolor]/tr/td/i", xmlValue)
Here's a fiddle of an example HTML EverNote export.
I'm stuck on parsing 1. Note Contents and 2. Intra-notebook hyperlinks.
Taking a closer look at the code it is apparent the solution for the first part is to return every upper-most* div that does NOT include a table with attribute bgcolor="#D4DDE5." How is this accomplished?
Duncan says that it is possible to use XPath to parse XML according to NOT conditions:
"It allows us to express things such as "find me all nodes named a" or "find me all nodes named a that have no attribute named b" or "nodes a that >have an attribute b equal to 'bob'" or "find me all nodes a which have c as >an ancestor node"
However he does not go on to describe how the XML package can parse exclusions... so I'm stuck there.
Addressing the second part, consider the format of anchors to other notes in the same notebook:
<a href="#13178">
The goal with these is to procure their number and yet this is difficult because they are solely distinguished from www links by the # prefix. Information on how to parse for these particular anchors via partial matching of their value (in this case #) is sparse - maybe even requiring grep(). How can one use the XML package to parse for these special hrefs? I describe both problems here since it's possible a solution to the first part may aid the second... but perhaps I'm wrong. Any advice?
UPDATE 1
By upper-most div I intend to say outer-most div. The contents of every note in an EverNote HMTL export are within the DOMs outer-most divs. Thus the interest is to return every outer-most div that does NOT include a table with attribute bgcolor="#D4DDE5."
"....to return every upper-most div that does NOT include a table with attribute bgcolor="#D4DDE5." How is this accomplished?"
One possible way ignoring 'upper-most' as I don't know exactly how would you define it :
//div[not(table[#bgcolor='#D4DDE5'])]
Above XPath reads: select all <div> not having child element <table> with bgcolor attribute equals #D4DDE5.
I'm not sure about what you mean by "parse" in the 2nd part of the question. If you simply want to get all of those links having special href, you can partially match the href attribute using starts-with() or contains() :
//a[starts-with(#href, '#')]
//a[contains(#href, '#')]
UPDATE :
Taking "outer-most" div into consideration :
//div[not(table[#bgcolor='#D4DDE5']) and not(ancestor::div)]
Side note : I don't know exactly how XPath not() is defined, but if it works like negation in general, (this worked as confirmed by OP in the comment below) you can apply one of De Morgan's law :
"not (A or B)" is the same as "(not A) and (not B)".
so that the updated XPath can be slightly simplified to :
//div[not(table[#bgcolor='#D4DDE5'] or ancestor::div)]

Excel VBA: get content from online HTML table

can anybody pleas show me part of VBA code, which will get text "hello" from this example online HTML table? first node will be found by his ID (id="something").
...
<table id="something">
<tr>
<td><TABLE><TR><TD></TD></TR><TR><TD></TD></TR></TABLE></td><td></td>
</tr>
<tr>
<td></td><td></td><td>hello</td>
</tr>
...
i think it will be something like child->sibling->child->sibling->sibling->child, but I don't know the exact way.
EDIT
updated code tags are CAPITALS. so if I use getElemenetsById("something").getElemenetsByTagName('tr') it get only two tr tags to collection, or four (with tags which are deeper children)?
If you did search for an answer, you might want to broaden your scope next time. There are plenty of questions and answers that deal with DOM stuff and VBA.
Use getElementById on HTMLElement instead of HTMLDocument
While the question (and answers) aren't exactly what you want, it will show you how to create something you can work with.
You'll need to use a mixture of getElementById() and getElemenetsByTagName() to retrieve your desired "hello"
eg: Document.getElementById("something").getElementsByTagName("tr")(1).getElementsByTagName("td")(2).innerText
Get the element "something"
Inside "something" get all "tr" tags (specifically the one at index 1)
Inside the returned tr tag get all "td" tags (specifically the one at index 2)
Get the innerText of the previous result
These objects use a 0 based array so the first item is item(0).
Update
document.getElementById() will return an (singular) IHTMLElement (which will include all of its children) or nothing/null if it does not exist.
document.getElementsByTagName() will return a collection of IHTMLElement (again, each element will include all of its children). (or an empty collection if none exist)
document.getElementsByTagName("tr") this will return all tr elements inside the "document" element.
document.getElementsByTagName("tr")(0) will return the first (singular) IHTMLElement from the collection. (note the index at the end?)
There is no (that i could find) "sibling" feature of the InternetExplorer object in VBA, so you'd have to do it manually using the child index.
Using the DOM Functions is the clean way to do it. Its much clearer than just looking at a chain "Element.Children(0).children(1).children(2)" as you've no idea what the index means without manually looking it up.
I looked all over for the answer to this question, too. I finally found the solution by talking to a coworker which was actually through recording a macro.
I know, you all think you are above this, but it is actually the best way. See the full post here: http://automatic-office.com/?p=344
In short, you want to record the macro and go to data --> from web and navigate to your website and select the table you want.
I have used the above solutions "get element by id" type stuff in the past, and it is great for a few elements, but if you want a whole table, and you aren't super experienced, just record a macro.
don't tell your friends and then reformat it to look like your own work so no one knows you used the macro tool ;)