I have 2 occourances of same td in 2 different tables.
I am able to get the value 'Yes' for the 1st one using this:
//h:td[1][*[contains(.,'Loudspeaker')]]/../h:td[last()]/text()
but not getting the value 'Voice 75dB / Noise 66dB / Ring 75dB' for the 2nd one.
I tried:
//h:td[2][*[contains(.,'Loudspeaker')]]/../h:td[last()]/text()
I am very new to html and xpath so please bear with me.
portion of my html:
</table><table cellspacing="0">
<tr>
<th rowspan="3" scope="row">Sound</th>
<td class="ttl">Alert types</td>
<td class="nfo">Vibration; MP3, WAV ringtones</td>
</tr>
<tr>
<td class="ttl">Loudspeaker </td>
<td class="nfo">Yes</td>
</tr>
.
.
<table cellspacing="0">
<tr>
<th rowspan="5" scope="row">Tests</th>
<td class="ttl">Display</td>
<td class="nfo">
<a class="noUnd" href="http://xyz.php">Contrast ratio: Infinite (nominal) / 3.419:1 (sunlight)</a></td>
</tr><tr>
<td class="ttl">Loudspeaker</td>
<td class="nfo">
<a class="noUnd" href="http://xyz.php">Voice 75dB / Noise 66dB / Ring 75dB</a></td>
</tr><tr>
..
Thanks in Advance.
The only difference between these two snippets is that in the second one your text is nested within an a element. So it has to be
//h:td[2][*[contains(.,'Loudspeaker')]]/../h:td[last()]/h:a/text()
(I guess you have a namespace definition for h as you use it in your XPath.
What you are doing is:
//h:td[2] find each second td in the whole document (main issue here, because there is no second td with text "Loudspeaker" ).
[*[contains(.,'Loudspeaker')]] check if this (second td) has a child with text Loudspeaker in any children.
/../h:td[last()]/text() get the text of last td off parent.
But what you seem like to do is something like:
(//h:tr[h:td/*[contains(.,'Loudspeaker')]]) find all tr with has text "Loudspeaker"
[2] select the second of this trs.
/h:td[last()]/. text of any children of last td of this second found tr.
Therefor try (not tested!):
(//h:tr[h:td/*[contains(.,'Loudspeaker')]])[2]/h:td[last()]/.
public string FindElementUsingOneTrTwoTd(string tblName, string className, string searchString)
{
return "//*[#id=\"" + tblName + "\"]/tbody/tr/td[contains(normalize-space(#class), \"" + className + "\") and contains(string(),\"" + searchString + "\")]/../td[2]";
}
Related
I have the following code to select a certain cell in a table element:
tag = soup.find_all('td', attrs={'class': 'I'})
as shown in the attached image 1, I would like to somehow be able to find its first sibling within the same class "even_row". Ideally, the selection would output only the contents of data-seconds, in this case "58". Not every "even_row" class has a element with class I, and some have more than one, so I need to get the value data-seconds only for the "even_row" classes that have the element with class "I"
Any help would be appreciated as I've been banging my head on the wall looking through documentation to no avail.
html look like :
<tr class='even_row'>
<td class='row_labels' data-seconds="58">
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
One way to get around that issue is to pass True
from bs4 import BeautifulSoup
html = """
<tr class='even_row'>
<td class='row_labels' data-seconds="58">
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
</tr>
<tr class='even_row'>
<td class='row_labels' >
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
</tr>
"""
soup = BeautifulSoup(html,'html.parser')
even_rows = soup.find_all('tr', attrs={'class': 'even_row'})
for row in even_rows:
tag = row.find("td", {"data-seconds" : True})
if tag is not None:
print(tag.get('data-seconds'))
Output :
58
another way to do it is using regular expressions
import re
tds = [tag.get('data-seconds') for tag in soup.findAll("td", {"data-seconds" : re.compile(r".*")})]
print(tds)
Output :
['58']
Cannot test properly without the html but sounds like with bs4 4.7.1+ you can use :has to satisfy your requirements for .even_row:has(.I) i.e. parent with class even_row, having child with class I, and then add in [data-seconds] to cater for all child data-seconds attribute values
print([i['data-seconds'] for i in soup.select('.even_row:has(.I) [data-seconds]')])
I have tried finding a solution to my problem for few days already - somehow I just don't manage to find a working solution.
Unfortunately I cannot give the URL for the webpage that I have as it would require a login and password - which I cannot share.
I have the VBA code already doing me everything, login into the webpage - proving the proper information inside the page and clicking validate button. But the problem is that I should then see if the below text appears:
ENQUADRAMENTO EM VIGOR - if yes, I will continue slightly differently the process and if not then differently.
Now below is the code from the webpage:
<tr>
<td>
<table cellpadding="4" border="0" width="100%">
<tbody><tr>
<td class="fieldTitleBold" style="width=30%">Enquadramento em IVA</td>
<td class="fieldValue" colspan="3">NORMAL TRIMESTRAL</td>
</tr>
<tr>
<td style="width=10%" class="fieldTitleBold">Situação</td>
<td class="fieldValue" colspan="3">ENQUADRAMENTO EM VIGOR</td>
</tr>
</tbody></table>
</td>
</tr>
I have tried many different ways and the latest I tried is with byclassname (this worked for me in a different website for similar purpose) but doesn't work here for some reason:
Set doc = ie.document
Set htmTable = doc.getElementsByClassName("ENQUADRAMENTO EM VIGOR")(0)
If Not htmTable Is Nothing Then
'continue depending if the text was found or not in different ways
ENQUADRAMENTO EM VIGOR is the .innerText value not the class name. The class value is fieldValue and is associated with a td (table cell) element.
This is pretty easy if it only occurs once. Use Instr to see if present in page html
If Instr(ie.document.body.innerHTML,"ENQUADRAMENTO EM VIGOR") > 0 Then
Otherwise, you can gather a nodeList of td elements with that class name and loop testing the .innerText
Dim classes As Object, i As Long
Set classes = ie.document.querySelectorAll("td.fieldValue")
For i = 0 To classes.Length - 1
If classes.item(i).innerText = "ENQUADRAMENTO EM VIGOR" Then
'do something
'Exit For ....
End If
End Sub
$(document).ready(function() {
var lenfV = document.querySelectorAll(".fieldValue");
for(let i=0;i<lenfV.length;i++) {
if(lenfV[i].innerHTML == "ENQUADRAMENTO EM VIGOR") {
console.log("is there");
}
//else {console.log(213423);}
}
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<p> I think, The below option will help you</p>
<table>
<tr>
<td>
<table cellpadding="4" border="0" width="100%">
<tbody><tr>
<td class="fieldTitleBold" style="width=30%">Enquadramento em IVA</td>
<td class="fieldValue" colspan="3">NORMAL TRIMESTRAL</td>
</tr>
<tr>
<td style="width=10%" class="fieldTitleBold">Situação</td>
<td class="fieldValue" colspan="3">ENQUADRAMENTO EM VIGOR</td>
</tr>
</table>
</td>
</tr>
</table>
I need a XPath expression that count all the <tr> rows that have a starting class attribute string: room_loop_counter grouped by their attribute name itself.
I have the following sample HTML code to extract data from:
<tbody id="container" >
<tr class="room_loop_counter1 maintr">
<td class="legibility " rowspan="6"></td>
<td colspan="4" style="padding:0;"></td>
</tr>
<tr class="room_loop_counter1">
<td ></td>
<td class=""></td>
</tr>
<tr class="room_loop_counter1"></tr>
<tr class="room_loop_counter2 maintr divider"></tr>
<tr class="room_loop_counter2"></tr>
<tr class="room_loop_counter3 maintr divider"></tr>
<tr class="room_loop_counter3"></tr>
<tr class="room_loop_counter3"></tr>
<tr class="room_loop_counter3"></tr>
<tr class="room_loop_counter3"></tr>
</tbody>
Given the above HTML I would want to get as result : 2,1,4. The count is the number of elements minus one, since I want to discard from the count the first <tr>(the one with the maintr) that is the header...
Between <tr> elements there could be other <tr> elements so their are not strictly one after the other, so we can't rely on following or preceding sibling logic.
I've tried with the following XPath expression :
count(//table[#id="maxotel_rooms"]/tbody/tr[#class=distinct-values(//table[#id="maxotel_rooms"]/tbody/tr[starts-with(#class, "room_loop_counter") and not(contains(#class, "maintr"))]/#class)]/#class])
but it doesn't work on chrome(evaluating it with $x('') on the console window) since it doesn't recognize the distinct-values function.
Could you suggest a possible solution? What is the best approach ?
Check this XPath for unique tr with class starts with some data and not followed by some other class name.
//tbody/tr[starts-with(#class, "room_loop_counter") and not(contains(#class, "maintr"))]/following::tr[not(./#class=following::tr/#class) and not(contains(#class, "maintr"))]
Javascript:
var path = "//body/div";
var uniquePathCount = window.document.evaluate('count(' + path + ')', window.document, null, 0, null);
console.log( uniquePathCount );
console.log( uniquePathCount.numberValue );
Ouput:
<tr class="room_loop_counter1"/>
<tr class="room_loop_counter2"/>
<tr class="room_loop_counter3"/>
Want to get text which is outside a tag. Here is the HTML:
<table border="0" cellpadding="0" cellspacing="0" width="100%"
class="viewingsCommentsTbl">
<tbody>
<tr>
<td>
<b style="border: 2px solid red;
background: rgb(204, 136, 136);">Viewing Conducted: </b>
18-May-2016
</td>
</tr>
<tr>
<td style=""><b style="">Duration: </b> 1 hr</td>
</tr>
<tr>
<td style=""><b style="">Comments: </b>66yy</td>
</tr>
</tbody>
</table>
I wanted to get date i.e "18-May-2016"
I tried following XPath, but it does not work:
//*[#class="viewingsCommentsTbl"]/tbody/tr[1]/td/b
The text is in the <td> tag, not the <b>. Try
//*[#class="viewingsCommentsTbl"]/tbody/tr[1]/td
Hi please try it like below
WebElement dateis = driver.findElement(By.xpath("//*[#class='viewingsCommentsTbl']/tbody/tr/td"));
System.out.println("Date is : " + dateis.getText());
and the output is : Date is : Viewing Conducted: 18-May-2016
// also if want to extract date only then
String [] extractdate = dateis.getText().split(" ");
System.out.println("Extracted date is : " + extractdate[2]);
and the output is : Extracted date is : 18-May-2016
Here is a more robust way to select "18-May-2016" based on its preceding Viewing Conducted: label within a td in the viewingsCommentsTbl table independent of
table layout:
normalize-space(
substring-after(//table[#class='viewingsCommentsTbl']
//td[starts-with(.,'Viewing Conducted:')],'Viewing Conducted:'))
This gets the text outside of a tag (per your request) by selecting the string value of the element's parent and then using substring-after() to get just the text that follows the label.
I have table without any class or id (there are more tables on the page) with this structure:
<table cellpadding="2" cellspacing="2" width="100%">
...
<tr>
<td class="cell_c">...</td>
<td class="cell_c">...</td>
<td class="cell_c">...</td>
<td class="cell">SOME_ID</td>
<td class="cell_c">...</td>
</tr>
...
</table>
I want to get only one row, which contains <td class="cell">SOME_ID</td> and SOME_ID is an argument.
UPD.
Currently i am doing iy in this way:
doc = Jsoup.connect("http://www.bank.gov.ua/control/uk/curmetal/detail/currency?period=daily").get();
Elements rows = doc.select("table tr");
Pattern p = Pattern.compile("^.*(USD|EUR|RUB).*$");
for (Element trow : rows) {
Matcher m = p.matcher(trow.text());
if(m.find()){
System.out.println(m.group());
}
}
But why i need Jsoup if most of work is done by regexp ? To download HTML ?
If you have a generic HTML structure that always is the same, and you want a specific element which has no unique ID or identifier attribute that you can use, you can use the css selector syntax in Jsoup to specify where in the DOM-tree the element you are after is located.
Consider this HTML source:
<html>
<head></head>
<body>
<table cellpadding="2" cellspacing="2" width="100%">
<tbody>
<tr>
<td class="cell">I don't want this one...</td>
<td class="cell">Neither do I want this one...</td>
<td class="cell">Still not the right one..</td>
<td class="cell">BINGO!</td>
<td class="cell">Nothing further...</td>
</tr> ...
</tbody>
</table>
</body>
</html>
We want to select and parse the text from the fourth <td> element.
We specify that we want to select the <td> element that has the index 3 in the DOM-tree, by using td:eq(3). In the same way, we can select all <td> elements before index 3 by using td:lt(3). As you've probably figured out, this is equal and less than.
Without using first() you will get an Elements object, but we only want the first one so we specify that. We could use get(0) instead too.
So, the following code
Element e = doc.select("td:eq(3)").first();
System.out.println("Did I find it? " + e.text());
will output
Did I find it? BINGO!
Some good reading in the Jsoup cookbook!