Does addition or removal of html style impact existing XPath? - html

This was the html code earlier for label 'Home Page' which style attribute
<label style="background: rgb(204, 136, 136); border: 2px solid red;">
<i class="fa fa-info-circle info"></i> Home Page</label>
I had written this XPath earlier which worked well
//*[contains(text(),'Home Page')]
Now I noticed html is changed for label Home Page, style attribute is removed from label as shown below
<label>
<i class="fa fa-info-circle info"></i> Home Page</label>
Because of this changes my existing XPath is not working now, but when I change XPath as shown below, it works
//label[contains(.,'Home Page')]//removed * with label, text() with dot(.)
Also when I use the previous XPath in XPath checker with * none of element is selected now but when I use second XPath with label it select the Home Page label I want.
I think both XPath should have worked, addition or removal of style attribute in label should not have any impact. Can anyone please explain why it is happening so, why my first XPath is not working now, does addition or removal of style attribute in html impact on existing XPath?
Please check the attached screenshot to view the html structure

It's impossible to give the definite answer without seeing the whole HTML document, but probably you have the following problem:
Your initial XPath expression was:
//*[contains(text(),'Home Page')]
Which, in plain English, means:
Select element nodes with any name, if they have at least one text node as a child, and if the first text node in them contains the string "Home Page".
I am emphasizing first because it it not obvious to many that a function like contains() will use only the first node in a sequence, and silently ignore the rest - this is only true for XPath 1.0.
The expression text() does not return a single node, it returns a sequence of nodes if an element has more than one child text node. This happens if there are interfering child element nodes, for example.
There are several ways to confirm this yourself. On the one hand, you can modify the expression to
//*[contains(text()[2], "Home Page")]
which explicitly selects the second text node as the argument for contains() and you will find this label element as a result.
Or, evaluating an expression on only the HTML snippet you show,
/label/text()
will return (individual results separated by ---):
[result that only has whitespace]
-----------------------
Home Page
which indicates that the i element as a child of label leads to an additional text node in front of i that only has whitespace in it.
A good solution to your problem with the correct semantics is
//*[text()[contains(.,'Home Page')]]
it means:
Select element nodes with any name, if they have at least one text node as a child, and if any text node in them contains the string "Home Page".

Related

How to fetch the immediate tag for an element using contains text in Selenium Java

I have to fetch the tag under which the text is visible. But if we use the getTagName for all tags containing the text, then even the parent elements shows the same text.
<ul> <li> The Text </li> </ul>
In the above case, if we use contains text to fetch the tag then both ul and li would be selected. If I don't want to mention exactly li in the query, is there any generic way to get the immediate tag as li for The Text.
Thanks in advance.
In your case you need to find an element having no child elements and containing that text.
The XPath locator for this will be:
//*[not(child::*) and text()='your text']
So the command will be
String tag = driver.findElement(By.xpath("//*[not(child::*) and text()='your text']")).getTagName();
okay, I think if you can locate an element uniquely in DOM, then you can fetch the tag name as well.
the element has to be unique like 1/1 matches in nodes.
what I would suggest is to use xpath like below :-
//ul//li[contains(text(), 'The Text')]
or
//li[contains(text(), 'The Text')]
if they happen to present multiple time then apply xpath indexing :-
(//ul//li[contains(text(), 'The Text')])[1]
or
(//li[contains(text(), 'The Text')])[1]
in code you can use it like this :
print(driver.find_element_by_xpath((//ul//li[contains(text(), 'The Text')])[1]).tag_name)
Now, it's a skill to find 1/1 matching node in HTML node, you can share more HTML content, and I can look for more constructive xpath.

Terminology - The types of elements in HTML

A while ago there was a term that I remembered that described two categories of elements. I forgot the term and I want to know what that term was. The information I can remember is that the first category of elements get their values from within HTML like <p> or <a> or <ul> but there is another category of elements which get their values from "outside" of HTML like <img> or <input type="textbox">. I want to know the terminology for these types.
Edit - I've went through Zomry, Difster and BoltClock's answers and didn't get anything. So I remembered some extra piece of information and decided to add it. The two categories are Lazy Opposites of each other. For example if one is called xyz, then the other is called non-xyz.
Probably you mean replaced elements (and non-replaced, respectively)?
However, the distinction between them is not so unambigous. For example, form controls were traditionally considered replaced elements, but the HTML spec currently explicitly lists them as non-replaced (introducing the "widget" term instead).
The HTML specification mentions for tags like <img> and <input> the following: Tag omission in text/html: No end tag.
Tags with an end tag are defined as: Tag omission in text/html: Neither tag is omissible.
So as far as I can find, the HTML spec does define a technical name for this, apart from void versus normal elements, so what Watilin pointed out in the comments should be fine: standalone vs containers.
As an added side-note: HTML has a lot more HTML content categories. You can find a complete overview at the HTML spec here: https://html.spec.whatwg.org/multipage/indices.html#element-content-categories
Also interesting to read to visualize that a bit better: https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Content_categories
Elements whose contents are defined by text and/or other elements between their start and end tags don't have a special category. Even the HTML spec just calls them normal elements for the most part in section 8.1.2.
Elements whose primary values are defined by attributes and that cannot have content between their tags are called void elements. img and input are indeed two examples of void elements. Note that void elements are not to be confused with empty elements; see the following questions for more details on that:
Are void elements and empty elements the same?
HTML "void elements" so called because without content?
<input type="text" id="someField" name="someField">
With an input selector, you can get a value from it like so (with jQuery):
$("#someField).val();
Where as with a paragraph or a div, you don't get a value, you get the text or html.
<div id="someDiv">Blah, blah, blah</div> You can get that with jQuery as follows:
$("#someDiv").html();
Do you see the difference?

How do I find a reliable XPath for this html element (type is text, class is known, no id present)?

The element is similar to:
<input type="text" class="information">
There is no id for the element.
There is only one text type element inside the information class. I want to be able to enter text into this html element by using casperjs which works on top of phantomjs.
The XPath obtained from chrome developer tools is similar to:
//*[#id="abcid"]/div/div[1]/input
abcdid is the id of the div element which comprises of the text box and a few other elements. But I need a more reliable XPath. I'm not very experienced with finding XPaths so forgive me if the answer is too obvious.
If you want to use XPath selectors for nearly all CasperJS functions, you need to provide it as an object. If the selector is provided as a string it will be automatically assumed that it is a CSS selector.
You can build the XPath selector object yourself:
{
type: 'xpath',
path: '//input[#class="information"]'
}
or just use a XPath utility by first requiring it at the beginning of your script and then using it:
var x = require('casper').selectXPath;
// later ...
var text = casper.fetchText(x('//input[#class="information"]'));
Regarding your selector:
If there is only one input with the information class then you can use the XPath
//input[#class="information"]
or the CSS selector
input.information[type='text']
If the input has other classes too, the CSS selector will work as is, but the XPath selector must be changed to
//input[contains(#class,"information")]

Finding XPath for text in div following input

I got an issue reading XPath. Need some help/advise from experts.
Part of my HTML is below:
<div class = "input required_field">
<div class="rounded_corner_error">
<input id="FnameInput" class="ideField" type="text" value="" name="first_name>
<div class ="help-tooltip">LOGIN BACK TO MAIN</div>
<div class="error-tooltip">
I need to find the XPath of the text message (LOGIN BACK TO MAIN)
Using Firebug I find the XPath
("//html/body/div/div[5]/div/div/form/fieldset/div/div[2]/div[2]/div/div");
But using above XPath I can read only class = help-tooltip but I need to read LOGIN BACK TO MAIN.
Try adding /text() on the end of the xpath you have.
It does not really look like your XPath matches your XHTML element.
You should try something simpler and more generic, such as:
//div[#class="help-tooltip"]/text()
See Selecting a css class with xpath.
I would use:
# Selecting the div element
//input[#id="FnameInput"]/following-sibling::div[#class="help-tooltip"]
# Selecting the text content of the div
//input[#id="FnameInput"]/following-sibling::div[#class="help-tooltip"]/text()
…since a syntactically-valid HTML document will have a unique id attribute, and as such that's a pretty strong anchor point.
Note that the latter expression will select the text node, not the text string content of that node; you need to extract the value of the text node if you want the string. How you do that depends on what tools you are using:
In JavaScript/DOM that would be the .nodeValue property of the text node.
For Nokogiri that would be the .content method.
…but I have no idea what technology you are using your XPath with.

Simple Xpath puzzle

I'm trying to automate the Google Translate web interface with Selenium (but it's not necessary to understand Selenium to understand this question, just know that it finds elements and clicks them). I'm stuck on selecting the language to translate from.
I can't get to the point where the drop-down menu opens, as seen in the screenshot below.
Now, I want to select 'Japanese'.
This xpath expression works: $b.find_element(:xpath,"//*[#id=':13']/div").click But I would rather have one where I can just input the name of the language.
This xpath expression also works: $b.find_element(:xpath,"//*[contains(text(),'Japanese')]").click But only as long as there is no other 'Japanese' text on the page.
So I'm trying to narrow down the scope of my xpath, but when I try to specify the path to take to find the 'Japanese' text, the expression no longer works, I can't find the element: $b.find_element(:xpath,"//*div[#id='gt-sl-gms']/*[contains(text(),'Japanese')]").click
It also no longer works for the original xpath either: $b.find_element(:xpath,"//*div[#id='gt-sl-gms']/*[#id=':13']/div").click
Which is weird, because to bring down the drop-down menu, I use this xpath $b.find_element(:xpath,"//*[#id='gt-sl-gms']/*[contains(text(),'From:')]").click.
So it's not that I have two wildcards in my expression and it's not that my expression is too specific. There's something else that I'm missing and I'm sure it's really simple.
Any suggestions are appreciated.
Edit Other things I have tried unsuccessfully:
$b.find_element(:xpath,"//*/div[#id='gt-sl-gms']/*[#id=':13']/div").click
$b.find_element(:xpath,"//*[#id='gt-sl-gms']/*[#id=':13']/div").click
$b.find_element(:xpath,"//*[#id='gt-sl-gms']//*[#id=':13']/div").click
If the div with "#id=':13'" is an descendant of the div with "#id='gt-sl-gms" your xpaht "//*[#id='gt-sl-gms']//*[#id=':13']/div" would work.
The above xpaht expect that the html looks somehow like:
<div id="gt-sl-gms">
<div>
<div id=":13">
<div></div>
</div>
</div>
</div>
If <div id="gt-sl-gms"> in not an ancestor (as I expect) you have to look for an "real" ancestor, or you may use following (for nodes later in the document) or following-sibling (for nodes later in the document at the same level as the previous.
*div is incorrect, it should be just div. Also, depending on he structure of the HTML, you may need // instead of /.
Try selecting descendants (//) instead of (/*) which is really grandchildren or deeper.