XPath //* vs //element vs // - html

I have a confusion in finding XPath: When to put //* at start and when to put just // will work.
For example, I was trying to clear this thing on https://www.myntra.com/. There is one search box thereon the website home page with HTML code
<input placeholder="Search" class="desktop-searchBar" value="" data-reactid="529">
the below XPath works for this above code
//*[#class='desktop-searchBar']
I am still confused why I need a * after double slash(//).

//*[#class='desktop-searchBar']
says to select all elements, regardless of name, with an class attribute value of desktop-searchBar.
//input[#class='desktop-searchBar']
says the same as #1 except constrains the element to be named input.
//[#class='desktop-searchBar']
is syntactically invalid in XPath because it's missing a required node test such as input (element named input) or * (any element).

Related

Web form fill using VBA & Selenium

I have problem with the form filling. Already tried using following methods:
driver.FindElementByXPath("//div[#id='s_dane_dokumentu-section?grid-1-grid?c_nr_lrn_komunikatu-control?xforms-input-1']").sendkeys "21123456"
and
driver.FindElementByXPath("//div[#id='s_dane_dokumentu-section?grid-1-grid?c_nr_lrn_komunikatu-control?xforms-input-1']").value = "21123456"
I noticed that ☰ character was displayed in VBA as "?". The full XPath also gives an error. Entire form have a lot of fields, I'm stuck on the first one...
HTML:
<input id="s_dane_dokumentu-section☰grid-1-grid☰c_nr_lrn_komunikatu-control☰xforms-input-1" type="text" name="s_dane_dokumentu-section☰grid-1-grid☰c_nr_lrn_komunikatu-control☰xforms-input-1" value="" class="xforms-input-input" aria-required="true" aria-invalid="true">
I might by inclined to use a substring match for the id via css attribute = value selector with $ ends with operator. In addition, add in the class and the type selector for the shown input element for extra specification.
driver.FindElementByCss("input.xforms-input-input[id$=xforms-input-1]").sendkeys "21123456"
You could also just use a single quote enclosed (for the value) attribute = value selector for the id. The prevents the WebDriver being thrown by the special characters.
driver.FindElementByCss("[id='s_dane_dokumentu-section☰grid-1-grid☰c_nr_lrn_komunikatu-control☰xforms-input-1']").sendkeys "21123456"

Why does my XPath not select based on text()?

I have a page in firefox (no frame) which contains the following part of html code:
...
<div class="col-sm-6 align-right">
<a href="/efelg/download_zip" class="alert-link">
Download all results in .zip format
</a>
</div>
...
which I want to select with a selenium XPATH expression. In order to test my XPATH expression, I installed an add-on for firefox called 'TryXpath' in order to check my expression. However, the expression seems to be incorrect, as no element is selected. Here is the expression:
//a[text()= "Download all results in .zip format"]
but what is wrong with that expression? I found it in different SO answers - but for me this does not seem to work. Why do I get 0 hits? Why is the expression wrong find the html element I posted above (no frame, element is visible and clickable...)
You can try this:
//a[contains(text(),'Download all results in .zip format')]
it is working in my side, Please try at let me know
The reason your XPath isn't selecting the shown a element is due to the leading and trail white space surrounding your targeted text. While you could use contains() as the currently upvoted and selected answer does, be aware that it could also match when the targeted string is a substring of what's found in the HTML in an a element -- this may or may not be desirable.
Consider instead using normalized-space() and testing via equality:
//a[normalize-space()='Download all results in .zip format']
This will check that the (space-normalized) string value of a equals the given text.
See also
Testing text() nodes vs string values in XPath

Why won't my XPath select link/button based on its label text?

<a href="javascript:void(0)" title="home">
<span class="menu_icon">Maybe more text here</span>
Home
</a>
So for above code when I write //a as XPath, it gets highlighted, but when I write //a[contains(text(), 'Home')], it is not getting highlighted. I think this is simple and should have worked.
Where's my mistake?
Other answers have missed the actual problem here:
Yes, you could match on #title instead, but that's not why OP's
XPath is failing where it may have worked previously.
Yes, XML and XPath are case sensitive, so Home is not the same as
home, but there is a Home text node as a child of a, so OP is
right to use Home if he doesn't trust #title to be present.
Real Problem
OP's XPath,
//a[contains(text(), 'Home')]
says to select all a elements whose first text node contains the substring Home. Yet, the first text node contains nothing but whitespace.
Explanation: text() selects all child text nodes of the context node, a. When contains() is given multiple nodes as its first argument, it takes the string value of the first node, but Home appears in the second text node, not the first.
Instead, OP should use this XPath,
//a[text()[contains(., 'Home')]]
which says to select all a elements with any text child whose string value contains the substring Home.
If there weren't surrounding whitespace, this XPath could be used to test for equality rather than substring containment:
//a[text()[.='Home']]
Or, with surrounding whitespace, this XPath could be used to trim it away:
//a[text()[normalize-space()= 'Home']]
See also:
Testing text() nodes vs string values in XPath
Why is XPath unclean constructed? Why is text() not needed in predicate?
XPath: difference between dot and text()
yes you are doing 2 mistakes, you're writing Home with an uppercase H when you want to match home with a lowercase h. also you're trying to check the text content, when you want to check check the "title" attribute. correct those 2, and you get:
//a[contains(#title, 'home')]
however, if you want to match the exact string home, instead of any a that has home anywhere in the title attribute, use #zsbappa's code.
You can try this XPath..Its just select element by attribute
//a[#title,'home']

Finding XPath expression of a link using link text

In Selenium WebDriver, how can I to use an XPath expression for the below HTML using the link text ("Add New Button")?
<a href="SOME URL">
<span >
<i class=""/>
</span>Add New Button
</a>
I tried to inspect the element as below, but all didn’t work.
//a[text()='Add New Button']
//a[contains(text(),'Add New Button']
//a/span/i[text()='Add New Button']
//a/span/i[contains(text(),'Add New Button']
I know that 3 and 4 won't, but I just tried it.
So for such an HTML DOM, how can I find the link using the link text using XPath?
Some of the answers that were already given work, but others don't. And I think the OP would benefit from more explanations.
Your original expression:
//a[text()='Add New Button']
Does not work because the text node that contains "Add New Button" also has a newline character at the end.
The next one:
//a[contains(text(),'Add New Button']
Does not work (leaving aside the missing parenthesis), because text() returns a sequence of nodes and a function like contains() will only evaluate the first node in the sequence. In this case, the first text node inside a only contains whitespace and is not the one that contains "Add New Button".
You can validate this claim with:
//a[contains(text()[2],'Add New Button')]
which will test whether the second text node of a contains "Add New Button"—and this expression will return the a element. But the best solution in this case is:
//a[contains(.,'Add New Button')]
. will evaluate to the so-called "string value" of an element, a concatenation of all its text nodes which will include "Add New Button".
A solution with normalize-space() is also possible, but it has nested predicates:
//a[text()[normalize-space(.) = "Add New Button"]]
With an XPath expression, you can check if the element contains a certain text with the below statement:
//a[contains(., 'Button')]
The link text contains unnecessary spaces from the right side of the main text, so you need the following to get rid of them:
'//a[normalize-space(.)="Add New Button"]'
Use the following XPath expression:
//*[contains(text(),'Add New Button')]
or
//a/i[contains(text(),'Add New Button')]
or
//a[#href='SOME URL']/i
or using cssSelector -
a[href='SOME URL']>i

HTML input - name vs. id [duplicate]

This question already has answers here:
Difference between id and name attributes in HTML
(22 answers)
Closed 3 years ago.
When using the HTML <input> tag, what is the difference between the use of the name and id attributes especially that I found that they are sometimes named the same?
In HTML4.01:
Name Attribute
Valid only on <a>, <form>, <iframe>, <img>, <map>, <input>, <select>, <textarea>
Name does not have to be unique, and can be used to group elements together such as radio buttons & checkboxes
Can not be referenced in URL, although as JavaScript and PHP can see the URL there are workarounds
Is referenced in JavaScript with getElementsByName()
Shares the same namespace as the id attribute
Must begin with a letter
According to specifications is case sensitive, but most modern browsers don't seem to follow this
Used on form elements to submit information. Only input tags with a name attribute are submitted to the server
Id Attribute
Valid on any element except <base>, <html>, <head>, <meta>, <param>, <script>, <style>, <title>
Each Id should be unique in the page as rendered in the browser, which may or may not be all in the same file
Can be used as anchor reference in URL
Is referenced in CSS or URL with # sign
Is referenced in JavaScript with getElementById(), and jQuery by $(#<id>)
Shares same name space as name attribute
Must contain at least one character
Must begin with a letter
Must not contain anything other than letters, numbers, underscores (_), dashes (-), colons (:), or periods (.)
Is case insensitive
In (X)HTML5, everything is the same, except:
Name Attribute
Not valid on <form> any more
XHTML says it must be all lowercase, but most browsers don't follow that
Id Attribute
Valid on any element
XHTML says it must be all lowercase, but most browsers don't follow that
This question was written when HTML4.01 was the norm, and many browsers and features were different from today.
The name attribute is used for posting to e.g. a web server. The id is primarily used for CSS (and JavaScript). Suppose you have this setup:
<input id="message_id" name="message_name" type="text" />
In order to get the value with PHP when posting your form, it will use the name attribute, like this:
$_POST["message_name"];
The id is used for styling, as said before, for when you want to use specific CSS content.
#message_id
{
background-color: #cccccc;
}
Of course, you can use the same denomination for your id and name attribute. These two will not interfere with each other.
Also, name can be used for more items, like when you are using radio buttons. Name is then used to group your radio buttons, so you can only select one of those options.
<input id="button_1" type="radio" name="option" />
<input id="button_2" type="radio" name="option" />
And in this very specific case, I can further say how id is used, because you will probably want a label with your radio button. Label has a for attribute, which uses the id of your input to link this label to your input (when you click the label, the button is checked). An example can be found below
<input id="button_1" type="radio" name="option" /><label for="button_1">Text for button 1</label>
<input id="button_2" type="radio" name="option" /><label for="button_2">Text for button 2</label>
IDs must be unique
...within page DOM element tree so each control is individually accessible by its id on the client side (within browser page) by
JavaScript scripts loaded in the page
CSS styles defined on the page
Having non-unique IDs on your page will still render your page, but it certainly won't be valid. Browsers are quite forgiving when parsing invalid HTML. but don't do that just because it seems that it works.
Names are quite often unique but can be shared
...within page DOM between several controls of the same type (think of radio buttons) so when data gets POSTed to server only a particular value gets sent. So when you have several radio buttons on your page, only the selected one's value gets posted back to server even though there are several related radio button controls with the same name.
Addendum to sending data to server: When data gets sent to server (usually by means of HTTP POST request) all data gets sent as name-value pairs where name is the name of the input HTML control and value is its value as entered/selected by the user. This is always true for non-Ajax requests. In Ajax requests name-value pairs can be independent of HTML input controls on the page, because developers can send whatever they want to the server. Quite often values are also read from input controls, but I'm just trying to say that this is not necessarily the case.
When names can be duplicated
It may sometimes be beneficial that names are shared between controls of any form input type. But when? You didn't state what your server platform may be, but if you used something like ASP.NET MVC you get the benefit of automatic data validation (client and server) and also binding sent data to strong types. That means that those names have to match type property names.
Now suppose you have this scenario:
you have a view with a list of items of the same type
user usually works with one item at a time, so they will only enter data with one item alone and send it to server
So your view's model (since it displays a list) is of type IEnumerable<SomeType>, but your server side only accepts one single item of type SomeType.
How about name sharing then?
Each item is wrapped within its own FORM element and input elements within it have the same names so when data gets to the server (from any element) it gets correctly bound to the string type expected by the controller action.
This particular scenario can be seen on my Creative stories mini-site. You won't understand the language, but you can check out those multiple forms and shared names. Never mind that IDs are also duplicated (which is a rule violation) but that could be solved. It just doesn't matter in this case.
name identifies form fields*; so they can be shared by controls that stand to represent multiple possibles values for such a field (radio buttons, checkboxes). They will be submitted as keys for form values.
id identifies DOM elements; so they can be targeted by CSS or JavaScript.
* name's are also used to identify local anchors, but this is deprecated and 'id' is a preferred way to do so nowadays.
name is the name that is used when the value is passed (in the URL or in the posted data). id is used to uniquely identify the element for CSS styling and JavaScript.
The id can be used as an anchor too. In the old days, <a name was used for that, but you should use the id for anchors too. name is only to post form data.
name is used for form submission in the DOM (Document Object Model).
ID is used for a unique name of HTML controls in the DOM, especially for JavaScript and CSS.
The name defines what the name of the attribute will be as soon as the form is submitted. So if you want to read this attribute later you will find it under the "name" in the POST or GET request.
Whereas the id is used to address a field or element in JavaScript or CSS.
The id is used to uniquely identify an element in JavaScript or CSS.
The name is used in form submission. When you submit a form only the fields with a name will be submitted.
The name attribute on an input is used by its parent HTML <form>s to include that element as a member of the HTTP form in a POST request or the query string in a GET request.
The id should be unique as it should be used by JavaScript to select the element in the DOM for manipulation and used in CSS selectors.
I hope you can find the following brief example helpful:
<!DOCTYPE html>
<html>
<head>
<script>
function checkGender(){
if(document.getElementById('male').checked) {
alert("Selected gender: "+document.getElementById('male').value)
}else if(document.getElementById('female').checked) {
alert("Selected gender: "+document.getElementById('female').value)
}
else{
alert("Please choose your gender")
}
}
</script>
</head>
<body>
<h1>Select your gender:</h1>
<form>
<input type="radio" id="male" name="gender" value="male">Male<br>
<input type="radio" id="female" name="gender" value="female">Female<br>
<button onclick="checkGender()">Check gender</button>
</form>
</body>
</html>
In the code, note that both 'name' attributes are the same to define optionality between 'male' or 'female', but the 'id's are not equals to differentiate them.
Adding some actual references to W3C documentation that authoritatively explain the role of the 'name' attribute on form elements. (For what it's worth, I arrived here while exploring exactly how Stripe.js works to implement safe interaction with the payment gateway Stripe. In particular, what causes a form input element to get submitted back to the server, or prevents it from being submitted?)
The following W3C documentation is relevant:
HTML 4: https://www.w3.org/TR/html401/interact/forms.html#control-name Section 17.2 Controls
HTML 5: https://www.w3.org/TR/html5/forms.html#form-submission-0 and
https://www.w3.org/TR/html5/forms.html#constructing-the-form-data-set Section 4.10.22.4 Constructing the form data set.
As explained therein, an input element will be submitted by the browser if and only if it has a valid 'name' attribute.
As others have noted, the 'id' attribute uniquely identifies DOM elements, but is not involved in normal form submission. (Though 'id' or other attributes can of course be used by JavaScript to obtain form values, which JavaScript could then use for Ajax submissions and so on.)
One oddity regarding previous answers/commenters concern about id's values and name's values being in the same namespace. So far as I can tell from the specifications, this applied to some deprecated uses of the name attribute (not on form elements). For example https://www.w3.org/TR/html5/obsolete.html:
"Authors should not specify the name attribute on a elements. If the attribute is present, its value must not be the empty string and must neither be equal to the value of any of the IDs in the element's home subtree other than the element's own ID, if any, nor be equal to the value of any of the other name attributes on a elements in the element's home subtree. If this attribute is present and the element has an ID, then the attribute's value must be equal to the element's ID. In earlier versions of the language, this attribute was intended as a way to specify possible targets for fragment identifiers in URLs. The id attribute should be used instead."
Clearly, in this special case, there's some overlap between id and name values for 'a' tags. But this seems to be a peculiarity of processing for fragment ids, not due to general sharing of namespace of ids and names.
An interesting case of using the same name: input elements of type checkbox like this:
<input id="fruit-1" type="checkbox" value="apple" name="myfruit[]">
<input id="fruit-2" type="checkbox" value="orange" name="myfruit[]">
At least if the response is processed by PHP, if you check both boxes, your POST data will show:
$myfruit[0] == 'apple' && $myfruit[1] == 'orange'
I don't know if that sort of array construction would happen with other server-side languages, or if the value of the name attribute is only treated as a string of characters, and it's a fluke of PHP syntax that a 0-based array gets built based on the order of the data in the POST response, which is just:
myfruit[] apple
myfruit[] orange
Can't do that kind of trick with ids. A couple of answers in What are valid values for the id attribute in HTML? appear to quote the spec for HTML 4 (though they don't give a citation):
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
followed by any number of letters, digits ([0-9]), hyphens ("-"),
underscores ("_"), colons (":"), and periods (".").
So the characters [ and ] are not valid in either ids or names in HTML4 (they would be okay in HTML5). But as with so many things html, just because it's not valid doesn't mean it won't work or isn't extremely useful.
If you are using JavaScript/CSS, you must use the 'id' of a control to apply any CSS/JavaScript stuff on it.
If you use name, CSS won't work for that control. As an example, if you use a JavaScript calendar attached to a textbox, you must use the id of the text control to assign it the JavaScript calendar.