Get only inner text from webelemnt - html

I want to get only innerText from a webelement. I want to get only "Name" from the anchor tag.I have access to webdriver element associated with tag in below example(anchorElement).
I tried anchorElement.getText() and anchorElement.getAttribute("innerText"). Both return me "Name, sort Z to A". What should I do here ?
<a id="am-accessible-userName" href="javascript:void(0);" class="selected">
Name
<span class="util accessible-text">, sort Z to A</span>
<span class="jpui iconwrap sortIcon" id="undefined" tabindex="-1">
<span class="util accessible-text" id="accessible-"></span>
<i class="jpui angleup util print-hide icon" id="icon-undefined" aria-hidden="true"></i>
</span>
</a>

A bit of Javascript can pick out just the child text node:
RemoteWebDriver driver = ...
WebElement anchorElement = driver.findElement(By.id("am-accessible-userName"));
String rawText = (String) driver.executeScript(
"return arguments[0].childNodes[0].nodeValue;",
anchorElement);
So anchorElement is passed into the Javascript as arguments[0] there.
Clearly childNodes[0] is assuming where the text node is. If that's not safe, you could iterate the childNodes too, perhaps checking for childNode.nodeName === "#text"

As per the HTML the desired element is a Text Node and also the First Child Node of the <a> tag. So to extract the text Name you can use the following code block :
Java Binding Art :
WebElement myElement = driver.findElement(By.xpath("//a[#class='selected' and #id='am-accessible-userName']"));
String myText = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].firstChild.textContent;", myElement);
System.out.println(myText);

The text alone could be obtained by using proper javascript code to iterate through child nodes of a given weblement and then returning text if the current node is a text node.
Note: A trimmed value of node text will be returned.
public String getInnerText(WebDriver e, String xpathStr){
WebElement ele = e.findElement(By.xpath(xpathStr));
return ((String) ((JavascriptExecutor) e).executeScript("var children = arguments[0].childNodes;\n" +
"for(child in children){\n" +
" if(children[child].nodeName === \"#text\"){" +
" return children[child].nodeValue };\n" +
"}" , ele )).trim();
}

Related

Libgdx: How to show HTML text in a label?

I have a string like this:
"noun<br> an expression of greeting <br>- every morning they exchanged polite hellos<br> <font color=dodgerblue> ••</font> Syn: hullo, hi, howdy, how-do-you-do<be>"
want to show it in a label as a rich text. for example Instead of <br> tags, text must go to the next line.
in Android we can do that with:
Html.fromHtml(myHtmlString)
but I don't know how to do it in libgdx.
I try to use Jsoup but it removes all tags and does not go to the next line for <br> tag for example.
Jsoup.parse(myHtmlString).text()
Jsoup.parse returns a document containing many elements -of- strings. Not a single string so you are only seeing the first bit. You can assemble the complete string yourself by going through the elements or try
Document doc = Jsoup.parse(yourHtmlInput);
String htmlString = doc.toString();
String htmlText = "<p>This is an <strong>Example</strong></p>";
//this will convert your HTML text into normal text
String normalText = Jsoup.parse(htmlText).text();
in kotlin i use this code:
var definition = "my html string"
definition = definition.replace("<br>", "\n")
definition = definition.replace("<[^>]*>".toRegex(), "")

How to use the text between HTML tags to access an element - Selenium WebDriver

I have following HTML code.
<span class="ng-binding" ng-bind="::result.display">All Sector ETFs</span>
<span class="ng-binding" ng-bind="::result.display">China Macro Assets</span>
<span class="ng-binding" ng-bind="::result.display">Consumer Discretionary (XLY)</span>
<span class="ng-binding" ng-bind="::result.display">Consumer Staples (XLP)</span>
As it can be seen that tags are all the same for every line except the text between the tags.
How can I access each of the above line separately based on the text between tags.
use the below as xpath
//span[text()='All Sector ETFs']
You can use x-path function text() for that.
For example
//span[text()="All Sector ETFs"]
to find first span
You can use following xPath to find desired element based on text
String text = 'Your text';
//text may be ==>All Sector ETFs, China Macro Assets, Consumer Discretionary (XLY), Consumer Staples (XLP)
String xPath = "//*[contains(text(),'"+text+"')]";
By this you can find each elements..
Hope it will help you..:)
Hi please do it like below
Way One
public static void main(String[] args) {
WebDriver driver = new FirefoxDriver();
List<WebElement> mySpanTags = driver.findElements(By.xpath("ur xpath"));
System.out.println("Count the number of total tags : " + mySpanTags.size());
// print the value of the tags one by one
// or do whatever you want to do with a specific tag
for(int i=0;i<mySpanTags.size();i++){
System.out.println("Value in the tag is : " + mySpanTags.get(i).getText());
// either perform next operation inside this for loop
if(mySpanTags.get(i).getText().equals("Consumer Staples (XLP)")){
// perform your operation here
mySpanTags.get(i).click(); // clicks on the span tag
}
}
// or perform next operations on span tag here outside the for loop
// in this case use index for a specific tag (e.g below)
mySpanTags.get(3).click(); // clicks on the 4 th span tag
}
Way Two
find the tag directly //span[text()='Consumer Staples (XLP)']

Parsing string retrieved with Jsoup in Android

I am writing an Android App that will read some info from a website and display it on the App's screen. I am using the Jsoup library to get the info in the form of a string. First, here's what the website html looks like:
<strong>
Now is the time<br />
For all good men<br />
To come to the aid<br />
Of their country<br />
</strong>
Here's how I'm retrieving and trying to parse the text:
Document document = Jsoup.connect(WEBSITE_URL).get();
resultAggregator = "";
Elements nodePhysDon = document.select("strong");
//check results
if (nodePhysDon.size()> 0) {
//get value
donateResult = nodePhysDon.get(0).text();
resultAggregator = donateResult;
}
if (resultAggregator != "") {
// split resultAggregator into an array breaking up with br /
String donateItems[] = resultAggregator.split("<br />");
}
But then donateItems[0] is not just "Now is the time", It's all four strings put together. I have also tried without the space between "br" and "/", and get the same result. If I do resultAggregator.split("br"); then donateItems[0] is just the first word: "Now".
I suspect the problem is the Jsoup method select is stripping the tags out?
Any suggestions? I can't change the website's html. I have to work with it as is.
Try this:
//check results
if (nodePhysDon.size()> 0) {
//use toString() to get the selected block with tags included
donateResult = nodePhysDon.get(0).toString();
resultAggregator = donateResult;
}
if (resultAggregator != "") {
// remove <strong> and </strong> tags
resultAggregator = resultAggregator.replace("<strong>", "");
resultAggregator = resultAggregator.replace("</strong>", "");
//then split with <br>
String donateItems[] = resultAggregator.split("<br>");
}
Make sure to split with <br> and not <br />

How to excluded XPATH Nested SPAN Class

From the following Html code, I want to select only the first span class Text..
<span class="item_amount order_minibasket_amount order_full_minibasket">10
<span class="article">Article
<i class="icon"> >
</i>
</span>
</span>
This is my current XPATH :
//span[contains(#class,'order_minibasket_amount')]
when I use this in my Selenium Test, I got the whole SPAN TEXT.. Like :
10 Article >
I just want to get the "10" article amount..
AMOUNT(new PageElement(By.xpath("//span[contains(#class,'order_minibasket_amount')]/text()[1]"), "not such Element...."))
public String getAmount() {
return amount = PageObjectUtil.findAndInitElementInside(webElement, PageElements.AMOUNT.pe, amount, String.class);
}
Many thanks in advance,
Cheers,
koko
What you want cannot be done directly, you will have to resort to String manipulation. Something like:
String completeString = driver.findElement(By.className("item_amount")).getText()
String endString = driver.findElement(By.className("article")).getText()
String beginString = completeString.replace(endString, "")
You could add /text() to the end:
//span[contains(#class,'order_minibasket_amount')]/text()
Instead of selecting the span element node, this XPath will select the set of text nodes that are direct children of the span element. This should be a set of two nodes, one containing "10", a newline and four spaces (the text between the opening tag of the target span and the opening tag of the nested span) and the other containing just a newline (between the closing tags of the two spans. If you only want the first text node child (10, nl, spaces) then use
//span[contains(#class,'order_minibasket_amount')]/text()[1]
Now I am using work around solution...but i am not happy with it. :-(
public String getAmount() {
String tempAmount = PageObjectUtil.waitFindAndInitElement(PageElements.AMOUNT.pe).getText();
String output = tempAmount.replaceAll("[a-zA-Z->]", "");
return amount = output.trim();
}
cheers,
KoKo

Retrieve attributes and span using HTMLAgilityPack library

In this piece of HTML code:
<div class="item">
<div class="thumb">
<a href="http://www.mp3crank.com/wolf-eyes/lower-demos-121866" rel="bookmark" lang="en" title="Wolf Eyes - Lower Demos album downloads">
<img width="100" height="100" alt="Mp3 downloads Wolf Eyes - Lower Demos" title="Free mp3 downloads Wolf Eyes - Lower Demos" src="http://www.mp3crank.com/cover-album/Wolf-Eyes-–-Lower-Demos.jpg" /></a>
</div>
<div class="release">
<h3>Wolf Eyes</h3>
<h4>
Lower Demos
</h4>
<script src="/ads/button.js"></script>
</div>
<div class="release-year">
<p>Year</p>
<span>2013</span>
</div>
<div class="genre">
<p>Genre</p>
Rock
Pop
</div>
</div>
I know how to parse it in other ways, but I would like to retrieve this Info using HTMLAgilityPack library:
Title : Wolf Eyes - Lower Demos
Cover : http://www.mp3crank.com/cover-album/Wolf-Eyes-–-Lower-Demos.jpg
Year : 2013
Genres: Rock, Pop
URL : http://www.mp3crank.com/wolf-eyes/lower-demos-121866
Which are these html lines:
Title : title="Wolf Eyes - Lower Demos"
Cover : src="http://www.mp3crank.com/cover-album/Wolf-Eyes-–-Lower-Demos.jpg"
Year : <span>2013</span>
Genre1: Rock
Genre2: Pop
URL : href="http://www.mp3crank.com/wolf-eyes/lower-demos-121866"
This is what I'm trying, but I always get an object reference not set exception when trying to select a single node,
Sorry but I'm very newbie with HTML, I've tried to follow the steps of this question HtmlAgilityPack basic how to get title and link?
Public Class Form1
Private htmldoc As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
Private htmlnodes As HtmlAgilityPack.HtmlNodeCollection = Nothing
Private Title As String = String.Empty
Private Cover As String = String.Empty
Private Genres As String() = {String.Empty}
Private Year As Integer = -0
Private URL as String = String.Empty
Private Sub Test() Handles MyBase.Shown
' Load the html document.
htmldoc.LoadHtml(IO.File.ReadAllText("C:\source.html"))
' Select the (10 items) nodes.
htmlnodes = htmldoc.DocumentNode.SelectNodes("//div[#class='item']")
' Loop trough the nodes.
For Each node As HtmlAgilityPack.HtmlNode In htmlnodes
Title = node.SelectSingleNode("//div[#class='release']").Attributes("title").Value
Cover = node.SelectSingleNode("//div[#class='thumb']").Attributes("src").Value
Year = CInt(node.SelectSingleNode("//div[#class='release-year']").Attributes("span").Value)
Genres = ¿select multiple nodes?
URL = node.SelectSingleNode("//div[#class='release']").Attributes("href").Value
Next
End Sub
End Class
Your mistake here it to try to access an attribute of a childnode from the one you've found.
When you call node.SelectSingleNode("//div[#class='release']") you get the correct div returned, but calling .Attributes returns just the attributes for the div tag itself, not any of the inner HTML elements.
It's possible to write XPATH queries that select the sub-node, e.g. //div[#class='release']/a - see http://www.w3schools.com/xpath/xpath_syntax.asp for more information on XPATH. Although the examples are for XML, most of the principles should apply to a HTML document.
Another approach is to use further XPATH calls on the node you've found. I've amended your code to make it work using this approach:
' Load the html document.
htmldoc.LoadHtml(IO.File.ReadAllText("C:\source.html"))
' Select the (10 items) nodes.
htmlnodes = htmldoc.DocumentNode.SelectNodes("//div[#class='item']")
' Loop through the nodes.
For Each node As HtmlAgilityPack.HtmlNode In htmlnodes
Dim releaseNode = node.SelectSingleNode(".//div[#class='release']")
'Assumes we find the node and it has a a-tag
Title = releaseNode.SelectSingleNode(".//a").Attributes("title").Value
URL = releaseNode.SelectSingleNode(".//a").Attributes("href").Value
Dim thumbNode = node.SelectSingleNode(".//div[#class='thumb']")
Cover = thumbNode.SelectSingleNode(".//img").Attributes("src").Value
Dim releaseYearNode = node.SelectSingleNode(".//div[#class='release-year']")
Year = CInt(releaseYearNode.SelectSingleNode(".//span").InnerText)
Dim genreNode = node.SelectSingleNode(".//div[#class='genre']")
Dim genreLinks = genreNode.SelectNodes(".//a")
Genres = (From n In genreLinks Select n.InnerText).ToArray()
Console.WriteLine("Title : {0}", Title)
Console.WriteLine("Cover : {0}", Cover)
Console.WriteLine("Year : {0}", Year)
Console.WriteLine("Genres: {0}", String.Join(",", Genres))
Console.WriteLine("URL : {0}", URL)
Next
Note that in this code we're assuming the document is correctly formed and that each node/element/attribute exists and is correct. You might want to add a lot of error checking to this, e.g. If someNode Is Nothing Then ....
Edit: I've amended the code above slightly, to ensure each .SelectSingleNode uses the ".//" prefix - this ensures it works if there are several "item" nodes, otherwise it selects the first match from the document not the current node.
If you want a shorter XPATH solution, here is the same code using that approach:
' Load the html document.
htmldoc.LoadHtml(IO.File.ReadAllText("C:\source.html"))
' Select the (10 items) nodes.
htmlnodes = htmldoc.DocumentNode.SelectNodes("//div[#class='item']")
' Loop through the nodes.
For Each node As HtmlAgilityPack.HtmlNode In htmlnodes
Title = node.SelectSingleNode(".//div[#class='release']/h4/a[#title]").Attributes("title").Value
URL = node.SelectSingleNode(".//div[#class='release']/h4/a[#href]").Attributes("href").Value
Cover = node.SelectSingleNode(".//div[#class='thumb']/a/img[#src]").Attributes("src").Value
Year = CInt(node.SelectSingleNode(".//div[#class='release-year']/span").InnerText)
Dim genreLinks = node.SelectNodes(".//div[#class='genre']/a")
Genres = (From n In genreLinks Select n.InnerText).ToArray()
Console.WriteLine("Title : {0}", Title)
Console.WriteLine("Cover : {0}", Cover)
Console.WriteLine("Year : {0}", Year)
Console.WriteLine("Genres: {0}", String.Join(",", Genres))
Console.WriteLine("URL : {0}", URL)
Console.WriteLine()
Next
You were not that far from the solution. Two important notes:
// is a recursive call. It can have some heavy performance impact, and also it may select nodes you don't want, so I suggest you only use it when the hierarchy is deep or complex or variable, and you don't want to specify the whole path.
There is a useful helper method on XmlNode named GetAttributeValue which will you get an attribute even if it does not exist (you need to specify the default value).
Here is a sample that seems to work:
' select the base/parent DIV (here we use a discriminant CLASS attribute)
' all select calls below will use this DIV element as a starting point
Dim node As HtmlNode = htmldoc.DocumentNode.SelectNodes("//div[#class='item']")
' get to the A tag which is a child or grand child (//) of a 'release' DIV
Console.WriteLine(("Title :" & node.SelectSingleNode("div[#class='release']//a").GetAttributeValue("title", CStr(Nothing))))
' get to the IMG tag which is a child or grand child (//) of a 'thumb' DIV
Console.WriteLine(("Cover :" & node.SelectSingleNode("div[#class='thumb']//img").GetAttributeValue("src", CStr(Nothing))))
' get to the SPAN tag which is a child or grand child (//) of a 'release-year' DIV
Console.WriteLine(("Year :" & node.SelectSingleNode("div[#class='release-year']//span").InnerText))
' get all A elements which are child or grand child(//) of a 'genre' DIV
Dim nodes As HtmlNodeCollection = node.SelectNodes("div[#class='genre']//a")
Dim i As Integer
For i = 0 To nodes.Count - 1
Console.WriteLine(String.Concat(New Object() { "Genre", (i + 1), ":", nodes.Item(i).InnerText }))
Next i
' get to the A tag which is a child or grand child (//) of a 'release' DIV
Console.WriteLine(("Url :" & node.SelectSingleNode("div[#class='release']//a").GetAttributeValue("href", CStr(Nothing))))