How to get line from table with Jsoup - html

I have table without any class or id (there are more tables on the page) with this structure:
<table cellpadding="2" cellspacing="2" width="100%">
...
<tr>
<td class="cell_c">...</td>
<td class="cell_c">...</td>
<td class="cell_c">...</td>
<td class="cell">SOME_ID</td>
<td class="cell_c">...</td>
</tr>
...
</table>
I want to get only one row, which contains <td class="cell">SOME_ID</td> and SOME_ID is an argument.
UPD.
Currently i am doing iy in this way:
doc = Jsoup.connect("http://www.bank.gov.ua/control/uk/curmetal/detail/currency?period=daily").get();
Elements rows = doc.select("table tr");
Pattern p = Pattern.compile("^.*(USD|EUR|RUB).*$");
for (Element trow : rows) {
Matcher m = p.matcher(trow.text());
if(m.find()){
System.out.println(m.group());
}
}
But why i need Jsoup if most of work is done by regexp ? To download HTML ?

If you have a generic HTML structure that always is the same, and you want a specific element which has no unique ID or identifier attribute that you can use, you can use the css selector syntax in Jsoup to specify where in the DOM-tree the element you are after is located.
Consider this HTML source:
<html>
<head></head>
<body>
<table cellpadding="2" cellspacing="2" width="100%">
<tbody>
<tr>
<td class="cell">I don't want this one...</td>
<td class="cell">Neither do I want this one...</td>
<td class="cell">Still not the right one..</td>
<td class="cell">BINGO!</td>
<td class="cell">Nothing further...</td>
</tr> ...
</tbody>
</table>
</body>
</html>
We want to select and parse the text from the fourth <td> element.
We specify that we want to select the <td> element that has the index 3 in the DOM-tree, by using td:eq(3). In the same way, we can select all <td> elements before index 3 by using td:lt(3). As you've probably figured out, this is equal and less than.
Without using first() you will get an Elements object, but we only want the first one so we specify that. We could use get(0) instead too.
So, the following code
Element e = doc.select("td:eq(3)").first();
System.out.println("Did I find it? " + e.text());
will output
Did I find it? BINGO!
Some good reading in the Jsoup cookbook!

Related

Beautiful soup finding the first sibling of a known object with a known attribute

I have the following code to select a certain cell in a table element:
tag = soup.find_all('td', attrs={'class': 'I'})
as shown in the attached image 1, I would like to somehow be able to find its first sibling within the same class "even_row". Ideally, the selection would output only the contents of data-seconds, in this case "58". Not every "even_row" class has a element with class I, and some have more than one, so I need to get the value data-seconds only for the "even_row" classes that have the element with class "I"
Any help would be appreciated as I've been banging my head on the wall looking through documentation to no avail.
html look like :
<tr class='even_row'>
<td class='row_labels' data-seconds="58">
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
One way to get around that issue is to pass True
from bs4 import BeautifulSoup
html = """
<tr class='even_row'>
<td class='row_labels' data-seconds="58">
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
</tr>
<tr class='even_row'>
<td class='row_labels' >
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
</tr>
"""
soup = BeautifulSoup(html,'html.parser')
even_rows = soup.find_all('tr', attrs={'class': 'even_row'})
for row in even_rows:
tag = row.find("td", {"data-seconds" : True})
if tag is not None:
print(tag.get('data-seconds'))
Output :
58
another way to do it is using regular expressions
import re
tds = [tag.get('data-seconds') for tag in soup.findAll("td", {"data-seconds" : re.compile(r".*")})]
print(tds)
Output :
['58']
Cannot test properly without the html but sounds like with bs4 4.7.1+ you can use :has to satisfy your requirements for .even_row:has(.I) i.e. parent with class even_row, having child with class I, and then add in [data-seconds] to cater for all child data-seconds attribute values
print([i['data-seconds'] for i in soup.select('.even_row:has(.I) [data-seconds]')])

How to check if a text exist in IE webpage under div class/TR/TD Class?

I have tried finding a solution to my problem for few days already - somehow I just don't manage to find a working solution.
Unfortunately I cannot give the URL for the webpage that I have as it would require a login and password - which I cannot share.
I have the VBA code already doing me everything, login into the webpage - proving the proper information inside the page and clicking validate button. But the problem is that I should then see if the below text appears:
ENQUADRAMENTO EM VIGOR - if yes, I will continue slightly differently the process and if not then differently.
Now below is the code from the webpage:
<tr>
<td>
<table cellpadding="4" border="0" width="100%">
<tbody><tr>
<td class="fieldTitleBold" style="width=30%">Enquadramento em IVA</td>
<td class="fieldValue" colspan="3">NORMAL TRIMESTRAL</td>
</tr>
<tr>
<td style="width=10%" class="fieldTitleBold">Situação</td>
<td class="fieldValue" colspan="3">ENQUADRAMENTO EM VIGOR</td>
</tr>
</tbody></table>
</td>
</tr>
I have tried many different ways and the latest I tried is with byclassname (this worked for me in a different website for similar purpose) but doesn't work here for some reason:
Set doc = ie.document
Set htmTable = doc.getElementsByClassName("ENQUADRAMENTO EM VIGOR")(0)
If Not htmTable Is Nothing Then
'continue depending if the text was found or not in different ways
ENQUADRAMENTO EM VIGOR is the .innerText value not the class name. The class value is fieldValue and is associated with a td (table cell) element.
This is pretty easy if it only occurs once. Use Instr to see if present in page html
If Instr(ie.document.body.innerHTML,"ENQUADRAMENTO EM VIGOR") > 0 Then
Otherwise, you can gather a nodeList of td elements with that class name and loop testing the .innerText
Dim classes As Object, i As Long
Set classes = ie.document.querySelectorAll("td.fieldValue")
For i = 0 To classes.Length - 1
If classes.item(i).innerText = "ENQUADRAMENTO EM VIGOR" Then
'do something
'Exit For ....
End If
End Sub
$(document).ready(function() {
var lenfV = document.querySelectorAll(".fieldValue");
for(let i=0;i<lenfV.length;i++) {
if(lenfV[i].innerHTML == "ENQUADRAMENTO EM VIGOR") {
console.log("is there");
}
//else {console.log(213423);}
}
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<p> I think, The below option will help you</p>
<table>
<tr>
<td>
<table cellpadding="4" border="0" width="100%">
<tbody><tr>
<td class="fieldTitleBold" style="width=30%">Enquadramento em IVA</td>
<td class="fieldValue" colspan="3">NORMAL TRIMESTRAL</td>
</tr>
<tr>
<td style="width=10%" class="fieldTitleBold">Situação</td>
<td class="fieldValue" colspan="3">ENQUADRAMENTO EM VIGOR</td>
</tr>
</table>
</td>
</tr>
</table>

Unable to select multiple values using xpath

Here is my HTML code:
<table id="laptop_detail" class="table">
<tr>
<td style="padding-left:18px" class="ha">Touchscreen</td>
<td class="val"><span class="no_icon">No</span></td>
</tr>
<tr>
<td style="padding-left:18px" class="ha">Water Dispenser</td>
<td class="val"><span class="no_icon">No</span></td>
</tr>
<tr>
<td style="padding-left:18px" class="ha">Colour / Material</td>
<td class="val">Grey</td>
</tr>
</table>
Here is my xpath:
$x('//*[#id="laptop_detail"]//tr/td[contains(. ,"Touchscreen")]/following-sibling::td[1]/span/text() and //*[#id="laptop_detail"]//tr/td[contains(. ,"Water Dispenser")]/following-sibling::td[1]/span/text() and //*[#id="laptop_detail"]//tr/td[contains(. ,"Colour")]/following-sibling::td[1]/text()')
But my xpath returns "true" instead of my requirement "No, No, Grey". I know there is something wrong with my xpath but i am unable to understand it.
EDIT: Okay i had a little success, I was able to get "No, No" using this xpath:
$x('//*[#id="laptop_detail"]//tr/td[contains(. ,"Touchscreen") or contains(. ,"Water")]/following-sibling::td[1]/span/text()')
but unable to get "Grey" as that value is not inside span tag.
Here is a fix to your solution (I've added | operator):
//*[#id="laptop_detail"]/tr/td[contains(. ,"Touchscreen") or contains(. ,"Water")]/following-sibling::td[1]/span/text() | //*[#id="laptop_detail"]/tr/td[contains(. ,"Colour / Material")]/following-sibling::td[1]/text()
You can use little bit more easy syntax (run faster) if it is acceptable for your logic.
/table[#id="laptop_detail"]/tr/td[#class='val']/span/text() | /table[#id="laptop_detail"]/tr/td[#class='val']/text()

How to find 2nd td in html using xpath

I have 2 occourances of same td in 2 different tables.
I am able to get the value 'Yes' for the 1st one using this:
//h:td[1][*[contains(.,'Loudspeaker')]]/../h:td[last()]/text()
but not getting the value 'Voice 75dB / Noise 66dB / Ring 75dB' for the 2nd one.
I tried:
//h:td[2][*[contains(.,'Loudspeaker')]]/../h:td[last()]/text()
I am very new to html and xpath so please bear with me.
portion of my html:
</table><table cellspacing="0">
<tr>
<th rowspan="3" scope="row">Sound</th>
<td class="ttl">Alert types</td>
<td class="nfo">Vibration; MP3, WAV ringtones</td>
</tr>
<tr>
<td class="ttl">Loudspeaker </td>
<td class="nfo">Yes</td>
</tr>
.
.
<table cellspacing="0">
<tr>
<th rowspan="5" scope="row">Tests</th>
<td class="ttl">Display</td>
<td class="nfo">
<a class="noUnd" href="http://xyz.php">Contrast ratio: Infinite (nominal) / 3.419:1 (sunlight)</a></td>
</tr><tr>
<td class="ttl">Loudspeaker</td>
<td class="nfo">
<a class="noUnd" href="http://xyz.php">Voice 75dB / Noise 66dB / Ring 75dB</a></td>
</tr><tr>
..
Thanks in Advance.
The only difference between these two snippets is that in the second one your text is nested within an a element. So it has to be
//h:td[2][*[contains(.,'Loudspeaker')]]/../h:td[last()]/h:a/text()
(I guess you have a namespace definition for h as you use it in your XPath.
What you are doing is:
//h:td[2] find each second td in the whole document (main issue here, because there is no second td with text "Loudspeaker" ).
[*[contains(.,'Loudspeaker')]] check if this (second td) has a child with text Loudspeaker in any children.
/../h:td[last()]/text() get the text of last td off parent.
But what you seem like to do is something like:
(//h:tr[h:td/*[contains(.,'Loudspeaker')]]) find all tr with has text "Loudspeaker"
[2] select the second of this trs.
/h:td[last()]/. text of any children of last td of this second found tr.
Therefor try (not tested!):
(//h:tr[h:td/*[contains(.,'Loudspeaker')]])[2]/h:td[last()]/.
public string FindElementUsingOneTrTwoTd(string tblName, string className, string searchString)
{
return "//*[#id=\"" + tblName + "\"]/tbody/tr/td[contains(normalize-space(#class), \"" + className + "\") and contains(string(),\"" + searchString + "\")]/../td[2]";
}

How to embed links (anchor tag) into HTML context from UIBINDER in gwt

I have a HTML widget in my ui.xml which I am using in Uibinder to populate data as given below:
ui.xml ->
<g:HTML ui:field="operationsDetailTableTemplate" visible="false">
<table class="{style.LAYOUT_STYLE}" width="100%" border="1">
<tr>
<td><img src="images/indent-blue.gif"/></td>
<td>
<table class="{style.DEFAULT_STYLE}">
<thead>
<tr>
<th>OperationUuid</th>
....
</tr>
</thead>
<tbody>
<tr>
<td>%s</td>
...
</tr>
</tbody>
</table>
</td>
</tr>
....
</g:html>
Uibinder.java--->
String htmlText = operationsDetailTableTemplate.getHTML()
.replaceFirst("%s", toSafeString(operation.getOperationUuid()))
....
HTML html = new HTML(htmlText);
operationsDetail.add(html);
The above is done in a for loop for each of the operation retrieved from the database.
My question is how I can embed a hyperlink or an anchor tag on one of the cell (eg. operation id ) for each of the operation set retrieved. I also wish to have a listener attached to it.
P.S. - It does not allow me to have a anchor tag in HTML in ui.xml.
You'd better use the tools in the way they've been designed to be used: use ui:field="foo" on the <td> and #UiField Element foo + foo.setInnerHTML(toSafeString(...)) instead of extracting the HTML, modifying it and reinjecting it elsewhere. You could also use a <g:Anchor> and attach an #UiHandler to handle ClickEvents.
Your way of using UiBinder makes me think of SafeHtmlTemplates, or the new UiRenderer aka UiBinder for Cells: https://developers.google.com/web-toolkit/doc/latest/DevGuideUiBinder#Rendering_HTML_for_Cells