How to select all elements with a specific name under every li node with the same structure? - html

I have a certain bunch of XPath locators that hold the elements I want to extract, and they have a similar structure:
/div/ul/li[1]/div/div[2]/a
/div/ul/li[2]/div/div[2]/a
/div/ul/li[3]/div/div[2]/a
...
They are actually simplified from Pixiv user page. Each /div/div[2]/a element has a title string, so they are actually artwork titles.
I want to use a single expression to fetch all the above a elements in an WebExtension called PageProbe. Although I've tried a bunch of methods, it just can't return the wanted result.
However, the following expression does return all the a elements, including the ones I don't need.
/div/
The following expression returns the a element under only the first li item.
/div/ul/li/div/div[2]/a
Sorry for not providing enough info earlier. Hope someone can help me out. Thanks.

According to the information you gave here you can simply use this xpath:
/div/ul/li/div/div[2]/a
however I'm quite sure it should be some better locator based on other attributes like class names etc.

Related

XPath concat two children

I know this is a commonly asked question (I found Concatenate multiple node values in xpath XPath joining multiple elements and a few others), but for the life of me I can't figure it out. I've got the following HTML:
<div class="product-card__price__new"><span class="product-card__price__euros">0.</span> <span class="product-card__price__cents">69</span></div>
From which I need to extract the 0.69. I tried the following XPATH:
'.//*[#class="product-card__price__new"]/concat(/span/text(), following-sibling::span[1]/text)'
'.//*[#class="product-card__price__new"]/span/text()/concat(., following-sibling::span[1]/text)'
'.//*[#class="product-card__price__new"]/text()'
But I keep getting nothing. What would be the correct expression to extract it?
It's much simpler than this. The string value of an element is the concatenation of all its descendant text nodes. So it's just
string(.//*[#class="product-card__price__new"])
and in many contexts you don't need the call on string() because it's implicit.
Most of the time when people use text() they are (a) over-complicating things and/or (b) getting it wrong.

Xpath with wildcards between id and divs?

I try to enter data in a table using Robot Framework. The table has an ID, but it changes every time I load the page (it is some kind of UUID) so I can't use it as "anchor" for my xpath. However there is a heading for this table that seems reasonable to start with that has a fixed ID. Inbetween the heading and the table there are a couple of divs. So something like this (some mix of pseudo code and what I get when I copy selector and xpath in Chrome) to get to the first cell in the first line of the table:
//*[#id="heading"] (a bunch of divs) /*[#id="random string of letters"]/div[3]/div/div/div[2]
I would like to write an xpath that looked something like this
//*[#id="heading"] [wildcard for the random ID and divs] /div[3]/div/div/div[2]
How do I write this?
Thank you.
If only one element inside the "header" contains an id attribute you could use
//*[#id="heading"]//*[#id]/div[3]/div/div/div[2]
If there are more than one element with id attribute you need something more, eg if it contains a certain tag
//*[#id="heading"]//*[contains(#id, "tag")]/div[3]/div/div/div[2]
or (if using xpath 2.0) and only this #id contains an uuid within the heading
//*[#id="heading"]//*[matches(#id,"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}")]/div[3]/div/div/div[2]
Otherways you will have to try to find something unique (within the context of "heading") to start the div[3]/div/div/div[2] search (if you are lucky div[3]/div/div/div[2] is unique enough.

Selenium automation- finding best xpath

I am looking to avoid using xpaths that are 'xpath position'. Reason being, the xpath can change and fail an automation test if a new object is on the page and shifts the expected xpath position.
But on some web pages, this is the only xpath I can find. For example, I am looking to click a tab called 'FooBar'.
If I use the Selenium IDE FireFox plugin, I get:
//td[12]/a/font
If I use the FirePath Firefox plugin, I get:
html/body/form/table[2]/tbody/tr/td[12]/font
If a new tab called "Hello, World" is added to the web page (before FooBar tab) then FooBar tab will change and have an xpath position of
//td[13]/a/font
What would you suggest to do?
TY!
Instead of using absolute xpath you could use relateive xpath which is short and more reliable.
Say
<td id="FooBar" name="FooBar">FooBar</td>
By.id("FooBar");
By.name("FooBar");
By.xpath("//td[text()='FooBar']") //exact match
By.xpath("//td[#id='FooBar']") //with any attribute value
By.xpath("//td[contains(text(),'oBar')]") //partial match with contains function
By.xpath("//td[starts-with(text(),'FooB')]") //partial match with startswith function
This blog post may be useful for you.
Relative xpath is good idea. relative css is even better(faster)
If possible suggest/request id for element.
Check also chrome -> check element -> copy css/xpath
Using //td is not a good idea because it will return all your td nodes. Any predicate such as //td[25] will be a very fragile selection because any td added to any previous table will change its result. Using plugins to generate XPath is great to find quickly what you want, but its always best to use it just as a starting point, and then analyze the structure of the file to write a locator that will be harder to break when changes occur.
The best locators are anchored to invariant values or attributes. Plugins usually won't suggest id or attribute anchors. They usually use absolute positional expressions. If can rewrite your locator path in terms of invariant structures in the file, you can then select the elements or text that you want relative to it.
For example, suppose you have
<body> ...
... lots of code....
<h1>header that has a special word</h1>
... other tags and text but not `h1` ...
<table id="some-id">
...
<td>some-invariant-text</td>
<td>other text</td>
<td>the field that you want</td>
...
The table has an ID. That's the best anchor. Now you can select the table as
//table[#id='some-id']
But many times you don't have the id, or even some other invariant attribute. You can still try to discover a pattern. For example: suppose that the last <h1> before the table you want contains a word you can match, you could still find the table using:
//table[preceding::h1[1][contains(.,'word')]]
Once you have the table, you can use relative axes to find the other nodes. Let's assume you want an td but there are no attributes on any tbody, tr, etc. You can still look for some invariant text. Tables usually have headers, or some fixed text which you can match. In the example above, if you find a td that is 2 fields before the one that you want, you could use:
//table[preceding::h1[1][contains(.,'word')]]/td[preceding-sibling::td[2][.='some-invariant-text']]
This is a simple example. If you apply some of these suggestions to the file you are working on, you can improve your XPath expression and make your selection code more robust.

Excel VBA: get content from online HTML table

can anybody pleas show me part of VBA code, which will get text "hello" from this example online HTML table? first node will be found by his ID (id="something").
...
<table id="something">
<tr>
<td><TABLE><TR><TD></TD></TR><TR><TD></TD></TR></TABLE></td><td></td>
</tr>
<tr>
<td></td><td></td><td>hello</td>
</tr>
...
i think it will be something like child->sibling->child->sibling->sibling->child, but I don't know the exact way.
EDIT
updated code tags are CAPITALS. so if I use getElemenetsById("something").getElemenetsByTagName('tr') it get only two tr tags to collection, or four (with tags which are deeper children)?
If you did search for an answer, you might want to broaden your scope next time. There are plenty of questions and answers that deal with DOM stuff and VBA.
Use getElementById on HTMLElement instead of HTMLDocument
While the question (and answers) aren't exactly what you want, it will show you how to create something you can work with.
You'll need to use a mixture of getElementById() and getElemenetsByTagName() to retrieve your desired "hello"
eg: Document.getElementById("something").getElementsByTagName("tr")(1).getElementsByTagName("td")(2).innerText
Get the element "something"
Inside "something" get all "tr" tags (specifically the one at index 1)
Inside the returned tr tag get all "td" tags (specifically the one at index 2)
Get the innerText of the previous result
These objects use a 0 based array so the first item is item(0).
Update
document.getElementById() will return an (singular) IHTMLElement (which will include all of its children) or nothing/null if it does not exist.
document.getElementsByTagName() will return a collection of IHTMLElement (again, each element will include all of its children). (or an empty collection if none exist)
document.getElementsByTagName("tr") this will return all tr elements inside the "document" element.
document.getElementsByTagName("tr")(0) will return the first (singular) IHTMLElement from the collection. (note the index at the end?)
There is no (that i could find) "sibling" feature of the InternetExplorer object in VBA, so you'd have to do it manually using the child index.
Using the DOM Functions is the clean way to do it. Its much clearer than just looking at a chain "Element.Children(0).children(1).children(2)" as you've no idea what the index means without manually looking it up.
I looked all over for the answer to this question, too. I finally found the solution by talking to a coworker which was actually through recording a macro.
I know, you all think you are above this, but it is actually the best way. See the full post here: http://automatic-office.com/?p=344
In short, you want to record the macro and go to data --> from web and navigate to your website and select the table you want.
I have used the above solutions "get element by id" type stuff in the past, and it is great for a few elements, but if you want a whole table, and you aren't super experienced, just record a macro.
don't tell your friends and then reformat it to look like your own work so no one knows you used the macro tool ;)

Confounded by XPath

When it comes to indexing in XPath, I feel like I'm missing something here.
If I have two table tags in an HTML document, and within the Chrome console I type $x("//table[1]");, I expect to get the first table tag on the page.
Instead, I get a list containing both table tags. I suspected it might have something to do with using // but using an absolute XPath expression yielded the same results.
I think this is a pretty simple misunderstanding, but I'm not seeing it when reading the docs.
//table[1] returns all tables that are the first table child of their respective parents.
To get the first table use /descendant::table[1] or in XPath 2.0 (//table)[1].
Here it is in the standard:
The path expression //para[1] does not mean the same as the path expression /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their respective parents.
Use
(//table)[1]
i.e. the first of all the tables.