failed to use xpath - html

To crawl "https://www.reddit.com/r/CryptoCurrency/"
I used xpath to find li element has class named "first"
response.xpath('//li[#class="first"]')
but it returns None
I can't understand what's wrong
as I know '//' finds all elements in document. so I thought it will find all 'li' elements
and predicate [#class="top-matter"] means if element has class attribute named "top-matter", should select it.

Related

Robot framework, how to check class

Is there a keyword in Robot Framework to ensure element has a certain class? Something like
Element should has class element className
Alternatively, I could check if element has a certain attribute with certain value. Former would be more suitable though, as element may contain multiple classes.
You could create a new keyword via XPath selectors:
Element should have class
[Arguments] ${element} ${className}
Wait until page contains element ${element}[contains(#class, '${className}')]
Or via CSS selectors:
Element should have class
[Arguments] ${element} ${className}
Wait until page contains element ${element}.${className}
Wait until page contains element could be replaced by any keyword of your liking to check if the element exists and is visible, such as Element should be visible.
Here's an alternative solution (though the accepted answer's CSS one is quite good), working for any kind of selector strategy:
Element should have class
[Arguments] ${locator} ${target value}
${class}= Get Element Attribute ${locator}#class
Should Contain ${class} ${target value}
It can be modified for any other attribute - just substitute the #class in Get Element Attribute with it (or even, make it an optional argument).
Some of the solutions on this page may suffer from sub-string matches. Checking that the class attribute (e.g. test-run) contains a class (e.g. test) may pass even though it should fail.
There are a few ways to deal with this, but in the end, I did the following:
Element Should Have Class
[Arguments] ${locator} ${class}
${escaped}= Regexp Escape ${class}
${classes}= Get Element Attribute ${locator} class
Should Match Regexp ${classes} \\b${escaped}\\b
Element Should Not Have Class
[Arguments] ${locator} ${class}
${escaped}= Regexp Escape ${class}
${classes}= Get Element Attribute ${locator} class
Should Not Match Regexp ${classes} \\b${escaped}\\b
Here below an example of both ways:
${temp}= get element attribute xpath=/elementpath class
should contain ${temp} ${ClassName}
OR
Wait until page contains element xpath=/elementpath[contains(#class, '${ClassName}')

Using XPath to get the first child for every child of a node

I'm trying to parse some HTML with the following structure, how can I extract the first <a> element of every <li> element using xpath?
<ul>
<li>
<a>
<span>
<a>
</li>
<li>
<a>
<span>
<a>
</li>
...
</ul>
#Mathias : You are correct, I apologize. //li/a[1] did not work because it wasn't a direct child (there is an article tag in between, which I omitted for simplicity).
Then let me post this as a solution with some more explanation.
If, as you have described, //li/a[1] does not return anything while (//li//a)[1] does, then the HTML sample you show is not representative for your actual document. Then, a would be a descendant of li, but not a direct child of it.
A correct XPath expression in this case is
//li//a[1]
but only use it if the level of nesting varies, i.e. if there could be other elements nested between li and a:
<li>
<article>
<other>
<a/>
If the nesting is consistent, but it is not always the article element which is in between li and a then use
//li/*/a[1]
Which avoids the // axis that is computationally more expensive than /.
Finally, if you know that the a elements you are interested in are always grandchildren of li elements and if it is always the article element in between them, use
//li/article/a[1]
When I correct the expression to be //li/article/a[1]', I get the first a` for the first li.
//li/article/a[1] returns several results if there are several a elements that are children of article and grandchildren of li. If this only returns a single result either
you invoke this XPath expression in a context where only a single result is expected, e.g. if you use an XPath library in a programming language or
the structure of your input document is even more intricate
I think that the XPath to accomplish that would be .//ul/li/a[position()=1] .
Explanation:
The reason I spell it all out as .//ul/li/a is because, when you use the xpath, if there is an error, your stack-trace will reveal exactly what the locator pointed at, and is less vague. But, you can obviously short-hand it if you dont care: .//a .
Using the position clause, you can do =1 or >1 , or whatever. I would choose using [position()=1] over using [1] because Xpath doesn't use 0-based arrays, which might confuse others looking at your locator. I mean position=0, by logic, means null, right?
I start my locator with a . because personally, sometimes I like to chain my locators together in a combination. You don't really need to start with the dot char but since i use the // wildcard in this case, its effectively the same as starting without a dot, but with the additional ability to be chained.
Answer tested on http://the-internet.herokuapp.com/

How do I use the and operator in XPath?

I am trying to select all text within ul tags and p tags on a web page. I'm finding it hard to select both at the same time. I can select each separately no problem. Here's what I've tried so far:
('//p and ul');
('//p::ul');
I'm also trying to select a specific list. There are multiple lists on the web page but I am only interested in one. How would I go about selecting all list on a page that are within a tag with a certain id, e.g.
<div id="thisistheid"
<ul....
Thanks for any help.
As mentioned in my comment, I think you need to use //p and //ul, but I'm not certain.
I can certainly answer your second question though: //div[#id='thisistheid']//ul will select uls only if they are a descendant of #thisistheid. You can also use /ul in place of //ul to only allow one level of depth (useful if you have nested lists)
And and Or are boolean expressions:
An and expression is evaluated by evaluating each operand and converting its value to a boolean as if by a call to the boolean function. The result is true if both values are true and false otherwise. The right operand is not evaluated if the left operand evaluates to false.
Since a node cannot be both P and UL at the same time, your test will never return true.
Try the Union operator (a PathExpression) instead:
//p | //ul
This will give you all the p and ul elements in the document
To get a node with a specific id you can use the id() function
id('thisistheid')
If this doesn't work for whatever reason, you can still use an attribute test, e.g.
//div[#id='thisistheid']
Or - if you are using the DOM API, you can use getElementById().
You can find an easy to follow XPath tutorial at http://schlitt.info/opensource/blog/0704_xpath.html

Confounded by XPath

When it comes to indexing in XPath, I feel like I'm missing something here.
If I have two table tags in an HTML document, and within the Chrome console I type $x("//table[1]");, I expect to get the first table tag on the page.
Instead, I get a list containing both table tags. I suspected it might have something to do with using // but using an absolute XPath expression yielded the same results.
I think this is a pretty simple misunderstanding, but I'm not seeing it when reading the docs.
//table[1] returns all tables that are the first table child of their respective parents.
To get the first table use /descendant::table[1] or in XPath 2.0 (//table)[1].
Here it is in the standard:
The path expression //para[1] does not mean the same as the path expression /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their respective parents.
Use
(//table)[1]
i.e. the first of all the tables.

XPath Expression Problem

I have the following HTML snippet, http://paste.enzotools.org/show/1209/ , and I want to extract the tag that has a text() descendant with the value of "172.80" (it's the fourth node from that snippet). My attempts so far have been:
'descendant::td[#class="roomPrice figure" and contains(descendant::text(), "172.80")]'
'descendant::td[#class="roomPrice figure" and contains(div/text(), "172.80")]'
'descendant::td[#class="roomPrice figure" and div[contains(text(), "172.80")]]'
but neither of them selects anything.
Does anyone have any suggestions?
When passing node set to function calls, do note that if the function signature doesn't declare a node set argument then it will cast the the first node from that node set.
So, I think you need this XPath expression:
descendant::td[#class="roomPrice figure"][div[text()[contains(.,'172.80')]]]
Test for a text node child of div
or
descendant::td[#class="roomPrice figure"]
[div[descendant::text()[contains(.,'172.80')]]]
Test for a text node descendant of div
or
descendant::td[#class="roomPrice figure"]
[descendant::text()[contains(.,'172.80')]]
Test for a text node descendat of td
I believe you want something like this:
<xsl:for-each select="//td[contains(string(.), '172.80')]">
The string() function will give you all the text in the current and descendant nodes wherease text() just gives you the text in the current (context) node.
Of course, you extend the xpath selector to filter on the class names too...
<xsl:for-each select="//td[contains(string(.), '172.80')][#class='roomPrice figure']">
And as stated in the comments above, you're posted xml/html is invalid as it stands.
My understanding is that you want to select the td element in specified class, that has a descendant text node containing the value "172.80".
I'm assuming the context node is the <tr> (or some ancestor of it).
The attempts you listed all suffer from the problem that contains() converts its first argument to a single string, using only the first node of the nodeset. So if the td or div has a descendant or child text node before the one that contains "172.80", the one containing "172.80" will not be noticed.
Try this:
'descendant::td[#class="roomPrice figure" and
descendant::text()[contains(., "172.80")]]'