how to read the multiple sub tittle using xpath? - html

I want to read the top stories link from CNN news site using XPath in selenium. I gave my XPath as shown below.
text = ieDriver.findElement(By.xpath(//*[#id="intl_homepage1-zone-1"]/div[2]/div/div[2]/ul/article[1]/div/div[2]/h3/a/span[1]/strong")).getText();
It read only one sub heading, but I want to read the all the top stores heading how can I do that?
I know if i change the article[2],article[3]...article[i] it will read. Is there any way to read it using single XPath?

The single xpath for fetching link of top stories is :
//div[contains(#data-analytics,'Top stories_list-hierarchical')]//h3[#class='cd__headline']/a/span[1]
You have to first store the elements in the list. For that, please use "findElements" not "findElement" as findElement will return only one element.
Then you can iterate over the list and print the text of all the Web Elements.

You need to first fetch all the webElements in the list using findElements and then you can iterate and read through all the links
List<WebElement> article_List= ieDriver.findElements(By.xpath("//*[#id="intl_homepage1-zone-1"]/div[2]/div/div[2]/ul/article");
for(WebElement article : article_List)
{
String text = article.findElement("./div/div[2]/h3/a/span[1]/strong").getText();
}
In this way you can iterate over the whole list of headings.
P.S Try to use relative xpaths over absolute xpaths.
Hope this answer was helpful for you

Related

How to select all elements with a specific name under every li node with the same structure?

I have a certain bunch of XPath locators that hold the elements I want to extract, and they have a similar structure:
/div/ul/li[1]/div/div[2]/a
/div/ul/li[2]/div/div[2]/a
/div/ul/li[3]/div/div[2]/a
...
They are actually simplified from Pixiv user page. Each /div/div[2]/a element has a title string, so they are actually artwork titles.
I want to use a single expression to fetch all the above a elements in an WebExtension called PageProbe. Although I've tried a bunch of methods, it just can't return the wanted result.
However, the following expression does return all the a elements, including the ones I don't need.
/div/
The following expression returns the a element under only the first li item.
/div/ul/li/div/div[2]/a
Sorry for not providing enough info earlier. Hope someone can help me out. Thanks.
According to the information you gave here you can simply use this xpath:
/div/ul/li/div/div[2]/a
however I'm quite sure it should be some better locator based on other attributes like class names etc.

Xpath that find a specific text and only this specific text

I'm using an xpath to locate element that contains a certain text in it. My problem is that it locate another element that have the same text i'm looking for in it but also some other text, here the xpath i'm using is:
//a[contains(text(), 'Workflow')]
And i want to locate a link that contain the text Workflow and Workflow only,
but the xpath locate a link with Workflow.MAINMENU wich i don't want to.
Is this possible with an XPATH ?
Yes, this is possible. You need to not use the contains function, but to instead compare the text directly:
//a[text() = 'Workflow']
If there is whitespace surrounding the text, you could use:
//a[normalize-space(text()) = 'Workflow']

Parsing inconsistent HTML with XPath

I'm trying to gather information from a web page that has inconsistent HTML, for example:
<ul><li>Item #1</li></ul><ul><li>sub Item #1</li></ul>
and that's alright, I use the XPath expression
//div[#id="content"]/ul/li/text()
and it does the job (except that doesn't gather the information from sub Item #1.,
Also the HTML varies and this is other way:
<dl><dd><ul><li>Item #1</li></ul></dd></dl><dl><dd><ul><li>sub Item #1</li></ul></dd></dl>
Well, I'm trying to gather Item #1 and sub Item #1. But with this inconsistent HTML I'm not able to find an XPath expression that will allow me to gather the information in any case, could you help me with this?
There will always be a list, the Item #1 and sub Item #1 always will be inside a <ul><li>
You could try using descendant axis (//) to select ul/li/text() no matter how deep it is nested within a consistent ancestor/parent. For example, assuming that ancestor/parent of ul/li is always a div having id equals "content" :
//div[#id="content"]//ul/li/text()

Excel VBA: get content from online HTML table

can anybody pleas show me part of VBA code, which will get text "hello" from this example online HTML table? first node will be found by his ID (id="something").
...
<table id="something">
<tr>
<td><TABLE><TR><TD></TD></TR><TR><TD></TD></TR></TABLE></td><td></td>
</tr>
<tr>
<td></td><td></td><td>hello</td>
</tr>
...
i think it will be something like child->sibling->child->sibling->sibling->child, but I don't know the exact way.
EDIT
updated code tags are CAPITALS. so if I use getElemenetsById("something").getElemenetsByTagName('tr') it get only two tr tags to collection, or four (with tags which are deeper children)?
If you did search for an answer, you might want to broaden your scope next time. There are plenty of questions and answers that deal with DOM stuff and VBA.
Use getElementById on HTMLElement instead of HTMLDocument
While the question (and answers) aren't exactly what you want, it will show you how to create something you can work with.
You'll need to use a mixture of getElementById() and getElemenetsByTagName() to retrieve your desired "hello"
eg: Document.getElementById("something").getElementsByTagName("tr")(1).getElementsByTagName("td")(2).innerText
Get the element "something"
Inside "something" get all "tr" tags (specifically the one at index 1)
Inside the returned tr tag get all "td" tags (specifically the one at index 2)
Get the innerText of the previous result
These objects use a 0 based array so the first item is item(0).
Update
document.getElementById() will return an (singular) IHTMLElement (which will include all of its children) or nothing/null if it does not exist.
document.getElementsByTagName() will return a collection of IHTMLElement (again, each element will include all of its children). (or an empty collection if none exist)
document.getElementsByTagName("tr") this will return all tr elements inside the "document" element.
document.getElementsByTagName("tr")(0) will return the first (singular) IHTMLElement from the collection. (note the index at the end?)
There is no (that i could find) "sibling" feature of the InternetExplorer object in VBA, so you'd have to do it manually using the child index.
Using the DOM Functions is the clean way to do it. Its much clearer than just looking at a chain "Element.Children(0).children(1).children(2)" as you've no idea what the index means without manually looking it up.
I looked all over for the answer to this question, too. I finally found the solution by talking to a coworker which was actually through recording a macro.
I know, you all think you are above this, but it is actually the best way. See the full post here: http://automatic-office.com/?p=344
In short, you want to record the macro and go to data --> from web and navigate to your website and select the table you want.
I have used the above solutions "get element by id" type stuff in the past, and it is great for a few elements, but if you want a whole table, and you aren't super experienced, just record a macro.
don't tell your friends and then reformat it to look like your own work so no one knows you used the macro tool ;)

Get Image with Xpath using class of Div

How do I write the xpath to get the main news image in this article?
The below one failed for me.
//div[contains(#class,'sectionColumns')]//div[contains(#class,'column2']//*img"]
I want it to return all images in case of slideshow. I want it to be flexible as some classes
change when news changes.
Without looking at "this article", there is an obvious syntax error in your XPath expression:
//div[contains(#class,'sectionColumns')]//div[contains(#class,'column2']//*img"]
The substring of the above:*img", contains two errors -- * followed by a name, and an unbalanced quote.
Probably you want:
//div[contains(#class,'sectionColumns')]//div[contains(#class,'column2']//img]