This is the html code I got:
<tr class="">
<td>
Go to Google
</td>
<td>02/10/20, 09:24 AM</td>
<td><button type="button" class="delete"></td>
</tr>
As you can see, the <tr> has three <td>s. Now, we are talking about a table, so we can assume that I have 100 of these <tr>s, with different values, but with the same structure. I would like to get the xpath of the third td (the button), using the value of <a> ('Go to Google'). How can I do that? I know I need to something like:
(//td//parent::a)
but evidently am not smart enough for that.
So you want to find the td element that has a child a with the text “Go to Google”, and then from this td find the next td that has a button element child.
In XPath this could look like this:
//td[a[text()="Go to Google"]]/following-sibling::td[button]
I'm scraping these two sites:
https://www.library.uq.edu.au/uqlsm/availablepcsembed.php?branch=Law
https://www.library.uq.edu.au/uqlsm/availablepcsembed.php?branch=BSL.
Unfortunately, they have variations. One has the level name (Eg. Level 2) inside a href tag, while the other one is just plain text. How can I select one or the other depending which one is there?
I tried this to no avail:
level.css(/"a[href]"|".left"/).text
Here are shortened versions of the 2 HTML sections:
<table class="chart">
<tr valign="middle">
<td class="left">Level 2</td> <!-- the problem -->
<td class="middle"><div style="width:86%;"><strong>86%</strong></div></td>
</tr>
</table>
<table class="chart">
<tr valign="middle">
<td class="left">Level 1</td>
<td class="middle"><div style="width:32%;"><strong>32%</strong></div></td>
</tr>
</table>
My Code (edited from section of code to whole method)
def self.scrape_details_page(library_url)
details_page = Nokogiri::HTML(open(library_url))
details_page.css("table.chart tr").collect do |level|
right = level.css(".right").text.split
{level: level.css("a[href]").text, available: right[0], out_of_available: right[3]}
end
end
If what you want to do is grab the text that is within the innermost div, you should be able to dive all the way down just by calling #text on the parsed td element. No need to account for and walk extra tags that might be present inside, e.g. the link tag. Given your code as written:
details_page.css("table.chart tr").collect do |level|
level = level.text
end
For each element, that would pull the level label or percentage value (inner text) as a string and assign the value to the levels variable.
Edit: also, if all you care about is getting the level label, you can just filter the elements by class up front:
details_page.css("table.chart tr td.left").collect do |level|
level = level.text
end
The answer by jk_ should work in this particular case.
In the more general case, if you're going to use a CSS selector, you need to use CSS syntax for "or" (a comma). So if you were going to use the selectors you originally asked about, it'd be
level.css('a[href], .left').text
Thanks to inspiration from #jk_ I fixed it using .css(".left").text. That just selects all the text in the left td inside the tr.
The working code:
def self.scrape_details_page(library_url)
details_page = Nokogiri::HTML(open(library_url))
details_page.css("table.chart tr").collect do |level|
right = level.css(".right").text.split
{level: level.css(".left").text, available: right[0], out_of_available: right[3]}
end
end
I want to find the first tr tag with PONumber: text. I am not able to do that. Any help? I can find it using the //table/tbody/tr/td[contains(text(),'PONumber')] but it gives 2 objects. I want to find the first one only.
<tr>
<td class="clsLabel" align="right"> PONumber: </td>
<td class="clsInput"> PN659 </td>
</tr>
<tr>
<td class="clsLabel" align="right"> PreviousPONumber: </td>
<td class="clsInput"/>
</tr>
You can use following xpath to find exact object which you want
//tr/td[normalize-space(.)='PONumber:']
You can use something like
(//tr/td[contains(text(),'PONumber')])[1]
so put the xpath in brackets and with [1] you can specifiy to only return the first entry. Otherwise you could also use something like:
//tr/td[contains(text(),'PONumber') and not(contains(text(),'Previous'))]
so "Previous" will be excluded from the search results
You can limit the XPath result to return only the first matched by using [1] :
(//table/tbody/tr/td[contains(.,'PONumber')])[1]
I have following HTML Structure: I am trying to build a robust method to extract second color digest element since there will be many of these tag within the DOM.
<table>
<tbody>
<tr bgcolor="#AAAAAA">
<tr>
<tr>
<tr>
<tr>
<td>Color Digest </td>
<td>AgArAQICGQMVBBwTIRQHIwg0GUMURAZTBWQJcwV0AoEDAQ </td>
</tr>
<tr>
<td>Color Digest </td>
<td>2,43,2,25,21,28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,33,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, </td>
</tr>
</tbody>
</table>
I am trying to extract the Second "Color Digest" td element that has the decoded value.
I wrote the following xpath but instead of getting the second i am not getting the second td element.
//td[text() = ' Color Digest ']/following-sibling::td[2]
And when I change it to td[2] to td[1] I get both the elements.
You should be looking for the second tr that has the td that equals ' Color Digest ', then you need to look at either the following sibling of the first td in the tr, or the second td.
Try the following:
//tr[td='Color Digest'][2]/td/following-sibling::td[1]
or
//tr[td='Color Digest'][2]/td[2]
http://www.xpathtester.com/saved/76bb0bca-1896-43b7-8312-54f924a98a89
You can go for identifying a list of elements with xPath:
//td[text() = ' Color Digest ']/following-sibling::td[1]
This will give you a list of two elements, than you can use the 2nd element as your intended one. For example:
List<WebElement> elements = driver.findElements(By.xpath("//td[text() = ' Color Digest ']/following-sibling::td[1]"))
Now, you can use the 2nd element as your intended element, which is elements.get(1)
/html/body/table/tbody/tr[9]/td[1]
In Chrome (possible Safari too) you can inspect an element, then right click on the tag you want to get the xpath for, then you can copy the xpath to select that element.
I would like to be able to place an empty tag anywhere in my document as a marker that can be addressed by jQuery. However, it is important that the XHTML still validates.
To give you a bit of background as to what I'm doing: I've compared the current and previous versions of a particular document and I'm placing markers in the html where the differences are. I'm then intending to use jQuery to highlight the parent block-level elements when highlightchanges=true is in the URL's query string.
At the moment I'm using <span> tags but it occurred to me that this sort of thing wouldn't validate:
<table>
<tr>
<td>Old row</td>
</tr>
<span class="diff"></span><tr>
<td>Just added</td>
</tr>
</table>
So is there a tag I can use anywhere? Meta tag maybe?
Thanks for your help!
Iain
Edit: On the advice of codeka, I may look for a better difference engine and I may have found one that is attuned to finding differences in XHTML: http://www.rohland.co.za/index.php/2009/10/31/csharp-html-diff-algorithm/
You can use HTML comments and this plugin (or this one).
Can you not just modify the class of elements that have changed?
<p class="diff other-class">Something changed</p>
<table>
<tr>
<td>Old row</td>
</tr>
<tr class="diff">
<td>Just added</td>
</tr>
</table>