Finding an XPATH expression - html

For the following html:
<tr>
<td class="first">AUD</td>
<td> 0.00 </td>
<td> 1,305.01 </td>
<td> 1,305.01 </td>
<td> -65.20 </td>
<td> 0.00 </td>
<td> 0.00 </td>
<td> 1,239.81 </td>
<td class="fx-rate"> 0.98542 </td>
</tr>
I am trying to grab the value for the fx-rate, given the type of current. For example, the function would be something like get_fx_rate(currency). This is the XPATH expression I have so far, but it results in an empty element, []. What am I doing wrong here and what would be the correct expression?
"//td[#class='first']/text()[normalize-space()='AUD']/parent::td[#class='fx-rate']"

Use this:
//td[#class = 'first' and normalize-space() = 'AUD']/parent::tr/td[#class = 'fx-rate']
or clearer:
//tr[td[#class="first1" and normalize-space()="AUD"]]/td[#class="fx-rate"]

This is the way I managed to solve it, using partial xpaths:
### get all the elements via xpath
currencies = driver.find_elements_by_xpath("//td[#class='first']")
fx_rates = driver.find_elements_by_xpath("//td[#class='fx-rate']")
### build a list and zip it to get the k,v pairs
fx_values = [fx.text for fx in fx_rates if fx.text]
currency_text = [currency.text for currency in currencies if currency.text]
zip(currency_text,fx_values)[1:]

Related

How to read an HTML table and account for line breaks within cells

I have an HTML table output from a program that separates values within a cell with <br>. I've tried using XML::readHTMLTable and htmltab but they glom together the values without any separators. I need them to be comma-separated, but I don't see any arguments to those functions to account for this. I've posted a psuedo example of the file below. Currently it reads into two vectors c("ABC","DEF","GHI") and c("JKLMNO","PQR","STU") but I need the "JKLMNO" element to instead be "JKL,MNO".
<table>
<tr>
<td>
ABC<br/>
</td>
<td>
DEF<br/>
</td>
<td>
GHI<br/>
</td>
</tr>
<tr>
<td>
JKL<br/>
MNO<br/>
</td>
<td>
PQR<br/>
</td>
<td>
STU<br/
</td>
</tr>
</table>
I had this problem with in X being deleted by:
xTabs <- XML::readHTMLTable(X)
I fixed the problem as follows:
X1 <- gsub('<br/>', '\n', X)
xTabs <- XML::readHTMLTable(X1)
If I wanted '', I could then do a find and replace in xTabs. However, I'm happier with '\n'.
library(rvest)
library(dplyr)
doc <- read_html("<table>
<tr>
<td>
ABC<br/>
</td>
<td>
DEF<br/>
</td>
<td>
GHI<br/>
</td>
</tr>
<tr>
<td>
JKL<br/>
MNO<br/>
</td>
<td>
PQR<br/>
</td>
<td>
STU<br/
</td>
</tr>
</table>")
tab <- html_table(doc)[[1]]
mutate(tab, X1=gsub("[\r\n][[:space:]]+", ",", X1))
## X1 X2 X3
## 1 ABC DEF GHI
## 2 JKL,MNO PQR STU
UPDATE
For folks who have HTML in a different format and may not feel up to the strain of posting, if you had, say:
doc <- read_html("<table>
<tr>
<td>ABC<br/></td>
<td>DEF<br/></td>
<td>GHI<br/></td>
</tr>
<tr>
<td>JKL<br/>MNO<br/></td>
<td>PQR<br/></td>
<td>STU<br/</td>
</tr>
</table>")
the aforementioned solution won't work because it's not the same data the OP had. I know…it's shocking.
If that is the case, copying and pasting a solution is definitely easier than typing a new question and you can use the following:
library(rvest)
library(dplyr)
library(purrr)
map(1:3, function(col) {
html_nodes(doc, xpath=sprintf(".//tr/td[%d]", col)) %>%
map_chr(~paste0(html_nodes(., xpath=".//text()"), collapse=","))
}) %>%
set_names(sprintf("X%d", 1:3)) %>%
as_data_frame()
But — amazingly enough — if you had different tags and data in the TD tags or had to work with a more complex table structure, this solution would likely require adaptation as well. The mind, boggles.

HtmlAgilityPack reading table after specified table

I have similiar structure to this:
<table class="superclass">
<tr>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
</td>
<td>
</td>
</tr>
</table>
<table cellspacing="0">
<tr>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
</td>
<td>
</td>
</tr>
</table>
This is how I get the first table with class:
HtmlNode firstTable = document.DocumentNode.SelectSingleNode("//table[#class=\"superclass\"]");
Then I read the data. However I don't know how to get straight to the another table and read that data too. Any ideas?
I'd rather avoid counting which table it is and then using index to that table.
There is XPath following-sibling axis which allows you to get element following current context element at the same level :
HtmlNode firstTable = document.DocumentNode.SelectSingleNode("//table[#class=\"superclass\"]");
HtmlNode nextTable = firstTable.SelectSingleNode("following-sibling::table");
If you want to access multiple nodes, you can consider SelectNodes(xpath) method over SelectSingleNode(xpath) method.
I'll provide a sample code here for reference, it may not work towards your need.
var tables = htmlDocument.DocumentNode.SelectNodes("//table");
foreach (HtmlNode table in tables)
{
if (table.GetAttributeValue("class", "").Contains("superclass"))
{
//this is the table of class="superclass"
}
else
{
//this is the other table.
}
}

Cycle through input fields

I have a table which I'm populating with some data I get from a database query
Table example
<tr ng-repeat="row in reservasTable">
<td> {{row.L}} </td>
<td> {{row.number}} </td>
<td> {{row.line}} </td>
<td> {{row.cod_art}} </td>
<td> {{row.creation_date}} </td>
<td> {{row.deadline}} </td>
<td> {{row.qtt_ordered}} </td>
<td> {{row.qtt_delivered}} </td>
<td> {{row.ocor}} </td>
<td> <input type="text" id="qttField" ng-model-onblur ng-model="qtt" ui-keypress="{13:'setQtt($event)'}">
</tr>
The only field I don't get from the JSON query is the last one, an input field which is used to fill with a certain amount (Quantity).
What I need to do is: after I fill whatever fields I want, (e.g., the 1st, 4rd and last) I need to cycle through those fields, check which ones are filled and get the value from them.
I can't seem to get the value just from the model, since the model is the same for every field, that's why I'm currently using the 'Enter' button to update the value and send it to an array:
Simple Version:
$scope.arrayQtt = [];
var i = 0;
$scope.setQtt = function(evt){
$scope.arrayQtt[i] = evt.srcElement.value;
i++;
};
It is preferable to check all input fields with a 'Confirmation' button, after all fields are filled, since an user can edit a field the amount of times he wants before clicking the 'Confirmation' button.
Any help, advice or guidance is appreciated.
Thanks in advance!

JSOUP select all text following a closing tag until a specified tag

I have this html amongst a lot of table rows in a table:
.........
<tr class="greycellodd" align="right">
<td align="left">
<input type="checkbox" name="cashInvestment" value="100468057"/>
</td>
<td align="left">Cardcash
</td>
<td class="nobr">26 Aug 10</td>
<td class="nobr"> 1.00
</td>
<td class="nobr"> 1.00
</td>
<td align="right">£</td>
<td class="nobr">0.00 </td>
<td class="nobr">0.00 </td>
<td class="nobr">
<span class="changeupsmall">1.00 </span>
</td>
</tr>
<tr class="greycellodd">
<td align="right"/>
<td class="nobr" colspan="8">VISA</td>
</tr>
<tr class="greycelleven" align="right">
<td align="left">
<input type="checkbox" name="cashInvestment" value="100480214"/>
</td>
<td align="left">Santander
</td>
<td class="nobr">24 Sep 11</td>
<td class="nobr"> 1.00
.......
I need to extract everything between each checkbox tag
<input type="checkbox" name="cashInvestment" ../>
Example
Elemtent 1:
Cardcash
26 Aug 10
1.00
1.00
£
0.00
0.00
1.00
VISA
Element 2:
Santander
24 Sep 11
1.00
.......
I have tried:
Elements Inve = mainFirst.select("input ~ *" );
and
Elements Inve = doc.select("input"); // gives me nothing as there is no text to the input tag (it has no child).
I also need to get the value of the checkbox, which I know how to do, but would be nice to do at the same time if possible:
Elements mainTables = doc.select("table.maintable");
for (Element subTable : mainTables){
Elements borrowInve = subTable.select("input[type=checkbox][name=cashInvestment]" );
String attr = test.attr("value");
}
Thanks
Edit: resolved by checking the size :
Elements td = tableRows.get(i).select("td");
Elements cash = tableRows.get(i).getElementsByAttributeValue("name", attrValue); // check if checkbox is present
int theSize = cash.size();
if(theSize ==1){ // this row is not a comment
String checkbox = "";
Element cbox = td.select("input[type=checkbox]" ).first();
checkbox = cbox.attr("value");
else if (theSize ==0){ // this row contains a comment
.............
I've never done anything in JSOUP, but having a quick look at the docs, maybe something along the lines of:
Elements Inve = doc.select(".maintable tr td:not(:has(input))");
Although it'd probably be easier if you could add a class to the elements you want to target.

To show arraylist data in specific format

I am getting request attribute on a JSP page like = ArrayList arr= [a,b,c[e,f,g[j,k,l]]]. The list can be long. How should I show it in such a way a,b,c is parent of e,f,g is parent of j,k,l?
I want something like this or better
<tr onclick=showchild()>
<td> a </td> <td> b </td> <td> c </td>
</tr>
When I click on above tr its child i.e, below tr should be shown.
< tr onclick=showchild()>
<td> e </td <td> f </td> <td> g </td>
</tr>