Not able to select a text using selenium - html

I have a readymade code and i'm trying to write tests for it using selenium. This is how my code looks like in element tab of chrome:
<table id="xyz">
<tbody>
<tr>...</tr>
"
I am not able to retrieve this text.
"
</tbody>
</table>
Doing this $x("//*[contains(text(),'I am not able to retrieve this text')]"); in console tab of chrome shows no results. I'm able to get text by this command if the text is defined in a div, span etc. (Also case sensitivity is not a problem).
In code that text is appended in tbody using jQuery('tbody').append( abc() ); and abc() function returns this text in this way pqr.html();
Now my questions is what xpath expression should i write to grab this text? And i am looking for a pure xpath expression i.e no java functions etc.

contains() expects singular value as the first parameter. An HTML element may have more than one text nodes children in it, in which case, your attempted XPath will evaluates only the first text node. According to the sample HTML posted, the first text node of <tbody> which will be evaluated consists of newline and some spaces, hence your XPath didn't consider <tbody> a match and returned no result.
To avoid the problem explained above, use contains() in predicate for individual text nodes like the following :
//*[text()[contains(.,'I am not able to retrieve this text')]]
or this way if you want to return the text node itself instead of the parent element :
//*/text()[contains(.,'I am not able to retrieve this text')]

That table element is probably within a frame. To access contents within a frame you need to first switch to it. You can use the "switch to" method in selenium. Refer this answer and this one.
For the same reason it is not working in the Chrome Dev Tools console. In the console tab, there is dropdown containing a list of frames. Make sure you select the correct frame in which the specific element exist and then execute your XPath.

Related

Why does my XPath not select based on text()?

I have a page in firefox (no frame) which contains the following part of html code:
...
<div class="col-sm-6 align-right">
<a href="/efelg/download_zip" class="alert-link">
Download all results in .zip format
</a>
</div>
...
which I want to select with a selenium XPATH expression. In order to test my XPATH expression, I installed an add-on for firefox called 'TryXpath' in order to check my expression. However, the expression seems to be incorrect, as no element is selected. Here is the expression:
//a[text()= "Download all results in .zip format"]
but what is wrong with that expression? I found it in different SO answers - but for me this does not seem to work. Why do I get 0 hits? Why is the expression wrong find the html element I posted above (no frame, element is visible and clickable...)
You can try this:
//a[contains(text(),'Download all results in .zip format')]
it is working in my side, Please try at let me know
The reason your XPath isn't selecting the shown a element is due to the leading and trail white space surrounding your targeted text. While you could use contains() as the currently upvoted and selected answer does, be aware that it could also match when the targeted string is a substring of what's found in the HTML in an a element -- this may or may not be desirable.
Consider instead using normalized-space() and testing via equality:
//a[normalize-space()='Download all results in .zip format']
This will check that the (space-normalized) string value of a equals the given text.
See also
Testing text() nodes vs string values in XPath

Xpath that find a specific text and only this specific text

I'm using an xpath to locate element that contains a certain text in it. My problem is that it locate another element that have the same text i'm looking for in it but also some other text, here the xpath i'm using is:
//a[contains(text(), 'Workflow')]
And i want to locate a link that contain the text Workflow and Workflow only,
but the xpath locate a link with Workflow.MAINMENU wich i don't want to.
Is this possible with an XPATH ?
Yes, this is possible. You need to not use the contains function, but to instead compare the text directly:
//a[text() = 'Workflow']
If there is whitespace surrounding the text, you could use:
//a[normalize-space(text()) = 'Workflow']

Web scraping without id VBA

I'm trying to scrape a web , some elements were easy to get . But I have a problem with those who have no id like this .
<TABLE class=DisplayMain1 cellSpacing=1 cellPadding=0><TBODY>
<TR class=TitleLabelBig1>
<TD class=Title1 colSpan=100><SPAN style="FONT-FAMILY: arial narrow; FONT-WEIGHT: normal">Tool & </SPAN><BR>PE311934-1-1 </TD></TR></TBODY></TABLE>
i want this ---►PE311934-1-1
i Try with "document.getElementsByClassName" but the vba gave me a error :/..
some tip?
Use Regular Expressions and the XMLHttpRequest object in VBA
I made a AddIn some time ago that does just that:
http://www.analystcave.com/excel-tools/excel-scrape-html-add/
If you just want the source code then here (GetElementByRegex function):
http://www.analystcave.com/excel-scrape-html-element-id/
Now the actual regex will be quite simple:
</SPAN><BR>(.*?)</TD></TR></TBODY></TABLE>
If it captures too much items simply expand the regex.
You don't specify the error and there is not enough HTML to know how many elements there are on the page.
You may have forgotten to use an index with document.getElementsByClassName("Title1"), as it returns a collection
For example, the first item would be: document.getElementsByClassName("Title1")(0)
In the same way, you could use a CSS querySelector such as .Title1
Which says the same thing i.e. select the elements with ClassName "Title1".
For the first instance simply use:
document.querySelector(".Title1")
For a nodeList of all matching
document.querySelectorAll(".Title1")
and then iterate over its length.
You would access the .innerText property of the element, generally, to retrieve the required string.
For the snippet shown, assuming the item is the first .Title1 on the page the CSS selector retrieves the following from your HTML
The resultant string can then be processed for what you want. This method, and regex, are fragile at best considering how easily an updated source page can break these methods.
In your above example, you can use the class name, .Title1, and then use Replace() to remove the Tool & .

Selenium automation- finding best xpath

I am looking to avoid using xpaths that are 'xpath position'. Reason being, the xpath can change and fail an automation test if a new object is on the page and shifts the expected xpath position.
But on some web pages, this is the only xpath I can find. For example, I am looking to click a tab called 'FooBar'.
If I use the Selenium IDE FireFox plugin, I get:
//td[12]/a/font
If I use the FirePath Firefox plugin, I get:
html/body/form/table[2]/tbody/tr/td[12]/font
If a new tab called "Hello, World" is added to the web page (before FooBar tab) then FooBar tab will change and have an xpath position of
//td[13]/a/font
What would you suggest to do?
TY!
Instead of using absolute xpath you could use relateive xpath which is short and more reliable.
Say
<td id="FooBar" name="FooBar">FooBar</td>
By.id("FooBar");
By.name("FooBar");
By.xpath("//td[text()='FooBar']") //exact match
By.xpath("//td[#id='FooBar']") //with any attribute value
By.xpath("//td[contains(text(),'oBar')]") //partial match with contains function
By.xpath("//td[starts-with(text(),'FooB')]") //partial match with startswith function
This blog post may be useful for you.
Relative xpath is good idea. relative css is even better(faster)
If possible suggest/request id for element.
Check also chrome -> check element -> copy css/xpath
Using //td is not a good idea because it will return all your td nodes. Any predicate such as //td[25] will be a very fragile selection because any td added to any previous table will change its result. Using plugins to generate XPath is great to find quickly what you want, but its always best to use it just as a starting point, and then analyze the structure of the file to write a locator that will be harder to break when changes occur.
The best locators are anchored to invariant values or attributes. Plugins usually won't suggest id or attribute anchors. They usually use absolute positional expressions. If can rewrite your locator path in terms of invariant structures in the file, you can then select the elements or text that you want relative to it.
For example, suppose you have
<body> ...
... lots of code....
<h1>header that has a special word</h1>
... other tags and text but not `h1` ...
<table id="some-id">
...
<td>some-invariant-text</td>
<td>other text</td>
<td>the field that you want</td>
...
The table has an ID. That's the best anchor. Now you can select the table as
//table[#id='some-id']
But many times you don't have the id, or even some other invariant attribute. You can still try to discover a pattern. For example: suppose that the last <h1> before the table you want contains a word you can match, you could still find the table using:
//table[preceding::h1[1][contains(.,'word')]]
Once you have the table, you can use relative axes to find the other nodes. Let's assume you want an td but there are no attributes on any tbody, tr, etc. You can still look for some invariant text. Tables usually have headers, or some fixed text which you can match. In the example above, if you find a td that is 2 fields before the one that you want, you could use:
//table[preceding::h1[1][contains(.,'word')]]/td[preceding-sibling::td[2][.='some-invariant-text']]
This is a simple example. If you apply some of these suggestions to the file you are working on, you can improve your XPath expression and make your selection code more robust.

Excel VBA: get content from online HTML table

can anybody pleas show me part of VBA code, which will get text "hello" from this example online HTML table? first node will be found by his ID (id="something").
...
<table id="something">
<tr>
<td><TABLE><TR><TD></TD></TR><TR><TD></TD></TR></TABLE></td><td></td>
</tr>
<tr>
<td></td><td></td><td>hello</td>
</tr>
...
i think it will be something like child->sibling->child->sibling->sibling->child, but I don't know the exact way.
EDIT
updated code tags are CAPITALS. so if I use getElemenetsById("something").getElemenetsByTagName('tr') it get only two tr tags to collection, or four (with tags which are deeper children)?
If you did search for an answer, you might want to broaden your scope next time. There are plenty of questions and answers that deal with DOM stuff and VBA.
Use getElementById on HTMLElement instead of HTMLDocument
While the question (and answers) aren't exactly what you want, it will show you how to create something you can work with.
You'll need to use a mixture of getElementById() and getElemenetsByTagName() to retrieve your desired "hello"
eg: Document.getElementById("something").getElementsByTagName("tr")(1).getElementsByTagName("td")(2).innerText
Get the element "something"
Inside "something" get all "tr" tags (specifically the one at index 1)
Inside the returned tr tag get all "td" tags (specifically the one at index 2)
Get the innerText of the previous result
These objects use a 0 based array so the first item is item(0).
Update
document.getElementById() will return an (singular) IHTMLElement (which will include all of its children) or nothing/null if it does not exist.
document.getElementsByTagName() will return a collection of IHTMLElement (again, each element will include all of its children). (or an empty collection if none exist)
document.getElementsByTagName("tr") this will return all tr elements inside the "document" element.
document.getElementsByTagName("tr")(0) will return the first (singular) IHTMLElement from the collection. (note the index at the end?)
There is no (that i could find) "sibling" feature of the InternetExplorer object in VBA, so you'd have to do it manually using the child index.
Using the DOM Functions is the clean way to do it. Its much clearer than just looking at a chain "Element.Children(0).children(1).children(2)" as you've no idea what the index means without manually looking it up.
I looked all over for the answer to this question, too. I finally found the solution by talking to a coworker which was actually through recording a macro.
I know, you all think you are above this, but it is actually the best way. See the full post here: http://automatic-office.com/?p=344
In short, you want to record the macro and go to data --> from web and navigate to your website and select the table you want.
I have used the above solutions "get element by id" type stuff in the past, and it is great for a few elements, but if you want a whole table, and you aren't super experienced, just record a macro.
don't tell your friends and then reformat it to look like your own work so no one knows you used the macro tool ;)