XPath:: Get following Sibling - html
I have following HTML Structure: I am trying to build a robust method to extract second color digest element since there will be many of these tag within the DOM.
<table>
<tbody>
<tr bgcolor="#AAAAAA">
<tr>
<tr>
<tr>
<tr>
<td>Color Digest </td>
<td>AgArAQICGQMVBBwTIRQHIwg0GUMURAZTBWQJcwV0AoEDAQ </td>
</tr>
<tr>
<td>Color Digest </td>
<td>2,43,2,25,21,28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,33,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, </td>
</tr>
</tbody>
</table>
I am trying to extract the Second "Color Digest" td element that has the decoded value.
I wrote the following xpath but instead of getting the second i am not getting the second td element.
//td[text() = ' Color Digest ']/following-sibling::td[2]
And when I change it to td[2] to td[1] I get both the elements.
You should be looking for the second tr that has the td that equals ' Color Digest ', then you need to look at either the following sibling of the first td in the tr, or the second td.
Try the following:
//tr[td='Color Digest'][2]/td/following-sibling::td[1]
or
//tr[td='Color Digest'][2]/td[2]
http://www.xpathtester.com/saved/76bb0bca-1896-43b7-8312-54f924a98a89
You can go for identifying a list of elements with xPath:
//td[text() = ' Color Digest ']/following-sibling::td[1]
This will give you a list of two elements, than you can use the 2nd element as your intended one. For example:
List<WebElement> elements = driver.findElements(By.xpath("//td[text() = ' Color Digest ']/following-sibling::td[1]"))
Now, you can use the 2nd element as your intended element, which is elements.get(1)
/html/body/table/tbody/tr[9]/td[1]
In Chrome (possible Safari too) you can inspect an element, then right click on the tag you want to get the xpath for, then you can copy the xpath to select that element.
Related
Xpath of a table cell based on the other cell in the same row
This is the html code I got: <tr class=""> <td> Go to Google </td> <td>02/10/20, 09:24 AM</td> <td><button type="button" class="delete"></td> </tr> As you can see, the <tr> has three <td>s. Now, we are talking about a table, so we can assume that I have 100 of these <tr>s, with different values, but with the same structure. I would like to get the xpath of the third td (the button), using the value of <a> ('Go to Google'). How can I do that? I know I need to something like: (//td//parent::a) but evidently am not smart enough for that.
So you want to find the td element that has a child a with the text “Go to Google”, and then from this td find the next td that has a button element child. In XPath this could look like this: //td[a[text()="Go to Google"]]/following-sibling::td[button]
Selecting variations in Nokogiri
I'm scraping these two sites: https://www.library.uq.edu.au/uqlsm/availablepcsembed.php?branch=Law https://www.library.uq.edu.au/uqlsm/availablepcsembed.php?branch=BSL. Unfortunately, they have variations. One has the level name (Eg. Level 2) inside a href tag, while the other one is just plain text. How can I select one or the other depending which one is there? I tried this to no avail: level.css(/"a[href]"|".left"/).text Here are shortened versions of the 2 HTML sections: <table class="chart"> <tr valign="middle"> <td class="left">Level 2</td> <!-- the problem --> <td class="middle"><div style="width:86%;"><strong>86%</strong></div></td> </tr> </table> <table class="chart"> <tr valign="middle"> <td class="left">Level 1</td> <td class="middle"><div style="width:32%;"><strong>32%</strong></div></td> </tr> </table> My Code (edited from section of code to whole method) def self.scrape_details_page(library_url) details_page = Nokogiri::HTML(open(library_url)) details_page.css("table.chart tr").collect do |level| right = level.css(".right").text.split {level: level.css("a[href]").text, available: right[0], out_of_available: right[3]} end end
If what you want to do is grab the text that is within the innermost div, you should be able to dive all the way down just by calling #text on the parsed td element. No need to account for and walk extra tags that might be present inside, e.g. the link tag. Given your code as written: details_page.css("table.chart tr").collect do |level| level = level.text end For each element, that would pull the level label or percentage value (inner text) as a string and assign the value to the levels variable. Edit: also, if all you care about is getting the level label, you can just filter the elements by class up front: details_page.css("table.chart tr td.left").collect do |level| level = level.text end
The answer by jk_ should work in this particular case. In the more general case, if you're going to use a CSS selector, you need to use CSS syntax for "or" (a comma). So if you were going to use the selectors you originally asked about, it'd be level.css('a[href], .left').text
Thanks to inspiration from #jk_ I fixed it using .css(".left").text. That just selects all the text in the left td inside the tr. The working code: def self.scrape_details_page(library_url) details_page = Nokogiri::HTML(open(library_url)) details_page.css("table.chart tr").collect do |level| right = level.css(".right").text.split {level: level.css(".left").text, available: right[0], out_of_available: right[3]} end end
Xpath using sibblings or fellowing in two defrent Cell
Put bluntly I want to locate TestCoupon10% inside td then open a sibling td then locate //a[contains(#id,"cmdOpen")] I did try sibling and fellowing but likely I didnt do it right because //span[./text()="TestCoupon10%"]/following-sibling:a[contains(#id,"cmdOpen")] result into an invalid xpath. the HTML structure look as fellow <tr> <td> <span id="oCouponGrid_ctl03_lblCode">TestCoupon10%</span> </td> <td>...</td> <td>...</td> <td valign="middle" align=""right"> <a id="oCouponGrid_ctl03_cmdOpen"> </td> </tr> I need to find cmdOpen and test coupon does anyone has an idea how to?
Axes are delimited with double colons, not single ones (those are used for namespace prefixes). You wanted to say this: //span[./text()="TestCoupon10%"]/following-sibling::a[contains(#id,"cmdOpen")] But - the <a> is not a following sibling of the <span> in question. You need to do some navigating: //span[./text()="TestCoupon10%"]/parent::td/following-sibling::td/a[contains(#id,"cmdOpen")] Or, simply avoid descending into the tree you you don't have to "climb up" again in the first place. //td[span = "TestCoupon10%"]/following-sibling::td/a[contains(#id,"cmdOpen")]
Excel VBA web scraping from table
I am trying to extract some info from the table below into Excel using VBL without any success. The values which I need do not seem to have any element ID, tag name or class name assigned to it. I'm after the Fuel Usage value(89218) and the time value in the same row (01:15). Can anyone point me in the right direction on how to scrape values from a table, or how to extract data from specific TR, TD. HTML source of the table: <h3>Airbus A300-600-PW4158 Fuel Planner</h3> <p>London to Chicago EGKK-KORD (3441 NM)<br /></p> <h2>Total Fuel: 101901 POUNDS</h2> <table width="100%" border=1> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;">Fuel</td> <th style="text-align:left;">Time</th> </tr> <tr> <td>Fuel Usage</td> <td>89218</td> <td>08:47</td> </tr> <tr> <td>Reserve Fuel</td> <td>12682</td> <td>01:15</td> </tr> <tr> <td>Fuel on Board</td> <td>101901</td> <td>10:02</td> </tr> </table> much appreciated.
CSS Selectors: Without seeing more of the HTML you can use the following CSS selectors selectors for the snippet shown: tr td:nth-child(2) tr td:nth-child(3) With CSS selectors this will bring back nodeLists of all 2 or 3 child tds with a tr. For example: You can access individual items from a nodeList by index. VBA: The syntax in vba overall will be something like: .document.querySelectorAll("tr td:nth-child(2)")(0).innerText or possibly .document.querySelectorAll("tr td:nth-child(2)").Item(0).innerText The 0 is hypothetical. You would need to inspect your full HTML to ascertain the correct index to use. The .document innerHTML can be populated from the .responseText using IE, for example, to navigate to the page.
Scraping td elements in a table in HTML
I have to get the Text from the td elements from a table in html which looks like this: <table id="gvrslt" > <tbody><tr style="font-size:10pt;"> <th scope="col">Sem</th><th scope="col" style="font-size:X-Small;">Total Obtained Marks</th><th scope="col" style="font-size:X-Small;">Max Total Marks</th><th scope="col">Result</th> </tr> <tr> <td align="center">VI</td> <td align="center">458</td> <td align="center">550</td> <td align="center">PASSED</td> </tr> </tbody></table> I want to grab the 458 from the table which has more such td elements.The problem is that before getting to the Results' page and getting the above HTML, I have to enter some credentials and then a Result page is shown with Right click disabled. Now I can get the source of the Results' page via driver.page_source but when I try to find the table elements via webdriver, it searches the page where I entered the credentials and not the actual results' page. Is there a way to search the driver.page_source for table and td elements Here is my code: html=driver.page_source soup = BeautifulSoup(html) table=soup.find_all('table',id='gvrslt') print(table)
If you want to get the text directly you can use a css locator to get to the 2nd td directly instead of using the table. table[id='gvrslt'] td:nth-of-type(2) nth-of-type gets you the 2nd td element
Try using Xpath in this case: //table[#id='gvrslt']//td[index] with your index of td
I'm not familiar with selenium using python. What you try is find the value using xpath. In C# below is the code. See if it can hep you in any way possible. IWebElement tdCell = driver.FindElement(By.XPath("//table[#id='']/tbody/tr[2]/td[2]")); string valueOfTd = tdCell.Text;