Want to get text which is outside a tag. Here is the HTML:
<table border="0" cellpadding="0" cellspacing="0" width="100%"
class="viewingsCommentsTbl">
<tbody>
<tr>
<td>
<b style="border: 2px solid red;
background: rgb(204, 136, 136);">Viewing Conducted: </b>
18-May-2016
</td>
</tr>
<tr>
<td style=""><b style="">Duration: </b> 1 hr</td>
</tr>
<tr>
<td style=""><b style="">Comments: </b>66yy</td>
</tr>
</tbody>
</table>
I wanted to get date i.e "18-May-2016"
I tried following XPath, but it does not work:
//*[#class="viewingsCommentsTbl"]/tbody/tr[1]/td/b
The text is in the <td> tag, not the <b>. Try
//*[#class="viewingsCommentsTbl"]/tbody/tr[1]/td
Hi please try it like below
WebElement dateis = driver.findElement(By.xpath("//*[#class='viewingsCommentsTbl']/tbody/tr/td"));
System.out.println("Date is : " + dateis.getText());
and the output is : Date is : Viewing Conducted: 18-May-2016
// also if want to extract date only then
String [] extractdate = dateis.getText().split(" ");
System.out.println("Extracted date is : " + extractdate[2]);
and the output is : Extracted date is : 18-May-2016
Here is a more robust way to select "18-May-2016" based on its preceding Viewing Conducted: label within a td in the viewingsCommentsTbl table independent of
table layout:
normalize-space(
substring-after(//table[#class='viewingsCommentsTbl']
//td[starts-with(.,'Viewing Conducted:')],'Viewing Conducted:'))
This gets the text outside of a tag (per your request) by selecting the string value of the element's parent and then using substring-after() to get just the text that follows the label.
Related
I have tried finding a solution to my problem for few days already - somehow I just don't manage to find a working solution.
Unfortunately I cannot give the URL for the webpage that I have as it would require a login and password - which I cannot share.
I have the VBA code already doing me everything, login into the webpage - proving the proper information inside the page and clicking validate button. But the problem is that I should then see if the below text appears:
ENQUADRAMENTO EM VIGOR - if yes, I will continue slightly differently the process and if not then differently.
Now below is the code from the webpage:
<tr>
<td>
<table cellpadding="4" border="0" width="100%">
<tbody><tr>
<td class="fieldTitleBold" style="width=30%">Enquadramento em IVA</td>
<td class="fieldValue" colspan="3">NORMAL TRIMESTRAL</td>
</tr>
<tr>
<td style="width=10%" class="fieldTitleBold">Situação</td>
<td class="fieldValue" colspan="3">ENQUADRAMENTO EM VIGOR</td>
</tr>
</tbody></table>
</td>
</tr>
I have tried many different ways and the latest I tried is with byclassname (this worked for me in a different website for similar purpose) but doesn't work here for some reason:
Set doc = ie.document
Set htmTable = doc.getElementsByClassName("ENQUADRAMENTO EM VIGOR")(0)
If Not htmTable Is Nothing Then
'continue depending if the text was found or not in different ways
ENQUADRAMENTO EM VIGOR is the .innerText value not the class name. The class value is fieldValue and is associated with a td (table cell) element.
This is pretty easy if it only occurs once. Use Instr to see if present in page html
If Instr(ie.document.body.innerHTML,"ENQUADRAMENTO EM VIGOR") > 0 Then
Otherwise, you can gather a nodeList of td elements with that class name and loop testing the .innerText
Dim classes As Object, i As Long
Set classes = ie.document.querySelectorAll("td.fieldValue")
For i = 0 To classes.Length - 1
If classes.item(i).innerText = "ENQUADRAMENTO EM VIGOR" Then
'do something
'Exit For ....
End If
End Sub
$(document).ready(function() {
var lenfV = document.querySelectorAll(".fieldValue");
for(let i=0;i<lenfV.length;i++) {
if(lenfV[i].innerHTML == "ENQUADRAMENTO EM VIGOR") {
console.log("is there");
}
//else {console.log(213423);}
}
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<p> I think, The below option will help you</p>
<table>
<tr>
<td>
<table cellpadding="4" border="0" width="100%">
<tbody><tr>
<td class="fieldTitleBold" style="width=30%">Enquadramento em IVA</td>
<td class="fieldValue" colspan="3">NORMAL TRIMESTRAL</td>
</tr>
<tr>
<td style="width=10%" class="fieldTitleBold">Situação</td>
<td class="fieldValue" colspan="3">ENQUADRAMENTO EM VIGOR</td>
</tr>
</table>
</td>
</tr>
</table>
I need a XPath expression that count all the <tr> rows that have a starting class attribute string: room_loop_counter grouped by their attribute name itself.
I have the following sample HTML code to extract data from:
<tbody id="container" >
<tr class="room_loop_counter1 maintr">
<td class="legibility " rowspan="6"></td>
<td colspan="4" style="padding:0;"></td>
</tr>
<tr class="room_loop_counter1">
<td ></td>
<td class=""></td>
</tr>
<tr class="room_loop_counter1"></tr>
<tr class="room_loop_counter2 maintr divider"></tr>
<tr class="room_loop_counter2"></tr>
<tr class="room_loop_counter3 maintr divider"></tr>
<tr class="room_loop_counter3"></tr>
<tr class="room_loop_counter3"></tr>
<tr class="room_loop_counter3"></tr>
<tr class="room_loop_counter3"></tr>
</tbody>
Given the above HTML I would want to get as result : 2,1,4. The count is the number of elements minus one, since I want to discard from the count the first <tr>(the one with the maintr) that is the header...
Between <tr> elements there could be other <tr> elements so their are not strictly one after the other, so we can't rely on following or preceding sibling logic.
I've tried with the following XPath expression :
count(//table[#id="maxotel_rooms"]/tbody/tr[#class=distinct-values(//table[#id="maxotel_rooms"]/tbody/tr[starts-with(#class, "room_loop_counter") and not(contains(#class, "maintr"))]/#class)]/#class])
but it doesn't work on chrome(evaluating it with $x('') on the console window) since it doesn't recognize the distinct-values function.
Could you suggest a possible solution? What is the best approach ?
Check this XPath for unique tr with class starts with some data and not followed by some other class name.
//tbody/tr[starts-with(#class, "room_loop_counter") and not(contains(#class, "maintr"))]/following::tr[not(./#class=following::tr/#class) and not(contains(#class, "maintr"))]
Javascript:
var path = "//body/div";
var uniquePathCount = window.document.evaluate('count(' + path + ')', window.document, null, 0, null);
console.log( uniquePathCount );
console.log( uniquePathCount.numberValue );
Ouput:
<tr class="room_loop_counter1"/>
<tr class="room_loop_counter2"/>
<tr class="room_loop_counter3"/>
I am sending an HTML message to send_email functionality from stored procedure.
HTML content is stored as a template in a table and the dynamic values are passed to this table which renders the display.
All this is done from the stored procedure which calles SMTP open connection.
This is how the body is formed inside the Stored procedure.
SELECT REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(body, '<DUE_DAYS> business days',v_past_due)
, '<COMPANY_ADDRESS>', p_email(indx).address||'<BR/>'
||p_email(indx).city||', '
||p_email(indx).state||' '
||p_email(indx).zip)
, '<TAX_PIN>', TO_CHAR(p_email(indx).tax_pin,'$9G999G999G990D00'))
, '<TAX_AMT>', TO_CHAR(p_email(indx).tax_amt,'$9G999G999G990D00'))
, '<TAX_AMT_PEN>', TO_CHAR(p_email(indx).tax_amt_pen,'$9G999G999G990D00'))
, '<DUE_DATE>' , due_date)
INTO v_body
FROM pmail_txt
WHERE status = 'DUE_SEND';
The html content in pmail_txt for status = 'DUE_SEND' looks like this,
<TABLE BORDER="0" WIDTH="75%;"><TBODY>
<TR><TD WIDTH="40%">Overall Due Days:</TD><TD WIDTH="60%"><v_past_due></TD></TR>
<TR><TD WIDTH="40%">Company Address:</TD><TD WIDTH="60%"><COMPANY_ADDRESS></TD></TR>
<TR><TD WIDTH="40%">Pin Number:</TD><TD WIDTH="60%"><TAX_PIN></TD></TR>
<TR><TD WIDTH="40%">Amount Due:</TD><TD WIDTH="60%"><TAX_AMT></TD></TR><TR>
<TD WIDTH="40%">Ex Amount Due:</TD><TD WIDTH="60%"><EIND_AMT></TD></TR>
<TR><TD WIDTH="40%"><STRONG>Pendign Amount Due</STRONG>:</TD><TD WIDTH="60%"><STRONG><TAX_AMT_PEN></STRONG></TD></TR>
<TR><TD WIDTH="40%"><STRONG>Due DATE</STRONG>:</TD><TD WIDTH ="60%"><STRONG><DUE_DATE></STRONG></TD></TR>
</TBODY>
</TABLE>
I want a condition to be added to the HTML such that when value is null do not show the field in the email body.
I have changed the code as per comment below. I get it work for not null fields however for for empty fields/null body of the message doesnt appear at all. Can someone tell me where i went wrong?
<style>
.hidden_v_past_due {
display:none;
}
</style>
<TABLE BORDER="0" WIDTH="75%;"><TBODY>
<TR class="hidden_v_past_due<v_past_due>"><TD WIDTH="40%">Overall Due Days:</TD><TD WIDTH="60%"><v_past_due></TD></TR>
<TR><TD WIDTH="40%">Company Address:</TD><TD WIDTH="60%"><COMPANY_ADDRESS></TD></TR>
<TR><TD WIDTH="40%">Pin Number:</TD><TD WIDTH="60%"><TAX_PIN></TD></TR>
<TR><TD WIDTH="40%">Amount Due:</TD><TD WIDTH="60%"><TAX_AMT></TD></TR><TR>
<TR><TD WIDTH="40%">Ex Amount Due:</TD><TD WIDTH="60%"><EIND_AMT></TD></TR>
<TR><TD WIDTH="40%"><STRONG>Pendign Amount Due</STRONG>:</TD><TD WIDTH="60%"><STRONG><TAX_AMT_PEN></STRONG></TD></TR>
<TR><TD WIDTH="40%"><STRONG>Due DATE</STRONG>:</TD><TD WIDTH ="60%"><STRONG><DUE_DATE></STRONG></TD></TR>
</TBODY>
</TABLE>
Also tried scriplet like below
<TABLE BORDER="0" WIDTH="75%;"><TBODY>
<%if(v_past_due != null){%>
<TR><TD WIDTH="40%">Overall Due Days:</TD><TD WIDTH="60%"><v_past_due></TD></TR><%}%>
<TR><TD WIDTH="40%">Company Address:</TD><TD WIDTH="60%"><COMPANY_ADDRESS></TD></TR>
<TR><TD WIDTH="40%">Pin Number:</TD><TD WIDTH="60%"><TAX_PIN></TD></TR>
<TR><TD WIDTH="40%">Amount Due:</TD><TD WIDTH="60%"><TAX_AMT></TD></TR><TR>
<TR><TD WIDTH="40%">Ex Amount Due:</TD><TD WIDTH="60%"><EIND_AMT></TD></TR>
<TR><TD WIDTH="40%"><STRONG>Pendign Amount Due</STRONG>:</TD><TD WIDTH="60%"><STRONG><TAX_AMT_PEN></STRONG></TD></TR>
<TR><TD WIDTH="40%"><STRONG>Due DATE</STRONG>:</TD><TD WIDTH ="60%"><STRONG><DUE_DATE></STRONG></TD></TR>
</TBODY>
</TABLE>
tried below options too
<%if(<v_past_due>){%>....///......<%}%>
<%if(v_past_due != null){%>....///......<%}%>
All these work for !null condition and everything fails for null.
You could try adding a class attribute to element such as when the corresponding field value is null the resulting class name will hide the element. E.g.
<style>
.hidde_v_past_due {
display:none;
}
</style>
...
<TR class="hidde_v_past_due<v_past_due>"><TD WIDTH="40%">Overall Due Days:</TD><TD WIDTH="60%"><v_past_due></TD></TR>
In this way, when v_past_due is empty then the class name will match the one in the style and your TR will not display.
I have table without any class or id (there are more tables on the page) with this structure:
<table cellpadding="2" cellspacing="2" width="100%">
...
<tr>
<td class="cell_c">...</td>
<td class="cell_c">...</td>
<td class="cell_c">...</td>
<td class="cell">SOME_ID</td>
<td class="cell_c">...</td>
</tr>
...
</table>
I want to get only one row, which contains <td class="cell">SOME_ID</td> and SOME_ID is an argument.
UPD.
Currently i am doing iy in this way:
doc = Jsoup.connect("http://www.bank.gov.ua/control/uk/curmetal/detail/currency?period=daily").get();
Elements rows = doc.select("table tr");
Pattern p = Pattern.compile("^.*(USD|EUR|RUB).*$");
for (Element trow : rows) {
Matcher m = p.matcher(trow.text());
if(m.find()){
System.out.println(m.group());
}
}
But why i need Jsoup if most of work is done by regexp ? To download HTML ?
If you have a generic HTML structure that always is the same, and you want a specific element which has no unique ID or identifier attribute that you can use, you can use the css selector syntax in Jsoup to specify where in the DOM-tree the element you are after is located.
Consider this HTML source:
<html>
<head></head>
<body>
<table cellpadding="2" cellspacing="2" width="100%">
<tbody>
<tr>
<td class="cell">I don't want this one...</td>
<td class="cell">Neither do I want this one...</td>
<td class="cell">Still not the right one..</td>
<td class="cell">BINGO!</td>
<td class="cell">Nothing further...</td>
</tr> ...
</tbody>
</table>
</body>
</html>
We want to select and parse the text from the fourth <td> element.
We specify that we want to select the <td> element that has the index 3 in the DOM-tree, by using td:eq(3). In the same way, we can select all <td> elements before index 3 by using td:lt(3). As you've probably figured out, this is equal and less than.
Without using first() you will get an Elements object, but we only want the first one so we specify that. We could use get(0) instead too.
So, the following code
Element e = doc.select("td:eq(3)").first();
System.out.println("Did I find it? " + e.text());
will output
Did I find it? BINGO!
Some good reading in the Jsoup cookbook!
I have 2 occourances of same td in 2 different tables.
I am able to get the value 'Yes' for the 1st one using this:
//h:td[1][*[contains(.,'Loudspeaker')]]/../h:td[last()]/text()
but not getting the value 'Voice 75dB / Noise 66dB / Ring 75dB' for the 2nd one.
I tried:
//h:td[2][*[contains(.,'Loudspeaker')]]/../h:td[last()]/text()
I am very new to html and xpath so please bear with me.
portion of my html:
</table><table cellspacing="0">
<tr>
<th rowspan="3" scope="row">Sound</th>
<td class="ttl">Alert types</td>
<td class="nfo">Vibration; MP3, WAV ringtones</td>
</tr>
<tr>
<td class="ttl">Loudspeaker </td>
<td class="nfo">Yes</td>
</tr>
.
.
<table cellspacing="0">
<tr>
<th rowspan="5" scope="row">Tests</th>
<td class="ttl">Display</td>
<td class="nfo">
<a class="noUnd" href="http://xyz.php">Contrast ratio: Infinite (nominal) / 3.419:1 (sunlight)</a></td>
</tr><tr>
<td class="ttl">Loudspeaker</td>
<td class="nfo">
<a class="noUnd" href="http://xyz.php">Voice 75dB / Noise 66dB / Ring 75dB</a></td>
</tr><tr>
..
Thanks in Advance.
The only difference between these two snippets is that in the second one your text is nested within an a element. So it has to be
//h:td[2][*[contains(.,'Loudspeaker')]]/../h:td[last()]/h:a/text()
(I guess you have a namespace definition for h as you use it in your XPath.
What you are doing is:
//h:td[2] find each second td in the whole document (main issue here, because there is no second td with text "Loudspeaker" ).
[*[contains(.,'Loudspeaker')]] check if this (second td) has a child with text Loudspeaker in any children.
/../h:td[last()]/text() get the text of last td off parent.
But what you seem like to do is something like:
(//h:tr[h:td/*[contains(.,'Loudspeaker')]]) find all tr with has text "Loudspeaker"
[2] select the second of this trs.
/h:td[last()]/. text of any children of last td of this second found tr.
Therefor try (not tested!):
(//h:tr[h:td/*[contains(.,'Loudspeaker')]])[2]/h:td[last()]/.
public string FindElementUsingOneTrTwoTd(string tblName, string className, string searchString)
{
return "//*[#id=\"" + tblName + "\"]/tbody/tr/td[contains(normalize-space(#class), \"" + className + "\") and contains(string(),\"" + searchString + "\")]/../td[2]";
}