Is it possible to apply a class attribute to individual table cells using knitr? I have successfully applied a class attribute to the section heading that contains a knitr::kable generated table and used that to format the entire table. However, I would like to be able to conditionally format individual cells which would require being able to apply a class to specific <td> elements.
My current workaround is to programmatically wrap the cell contents in a pair of <span> tags and pass that on to knitr::kable. This approach only allows me to format the text inside the cell versus the entire cell (e.g. setting the cell background color). Here's an example of what I'm currently using:
## Read in the report, process the data, send to kable
rpt <- generate.report()
mutate(rpt, Col2 = ifelse(abs(Col2) > Threshold,
paste('<span class="warning">',
sprintf("%.2f", Col2), '</span>'),
sprintf("%.2f", Col2))) %>%
knitr::kable(format="markdown", align = c("l", rep("r", 4)),
col.names = gsub("\\.", "<br>", colnames(.)))
Which results in the following example HTML output:
<td align="right"><span class="warning"> -1.74 </span></td>
I would like to be able to have knitr::kable generate something like this:
<td align="right" class="warning"> -1.74 </td>
That way I could apply css styles to the <td> tag vice the <span> tag.
package ReporteRs may help. Have a look here FlexTable.
You can then get the corresponding HTML code with function as.html and reuse it within your knitr code.
Ok, this may not be the answer but it may point you in the right direction. I had a similar problem formatting individual cells in knitr to prepare a pdf. In the end, I use xtable and wrote a function that relied on a logical matrix to decide whether or not a cell in the output table would be formatted.
I couldn't quite get it to work smoothly by myself so I had to post it on here and with the help of ivyleavedtoadflax I was able to develop a reasonably easy to use function to apply formatting to certain cells in an xtable in knitr.
Here's the link to my post
As I say, it's not the exact solution to your problem but it may point you in the right direction.
Related
I'm trying to scrape a web , some elements were easy to get . But I have a problem with those who have no id like this .
<TABLE class=DisplayMain1 cellSpacing=1 cellPadding=0><TBODY>
<TR class=TitleLabelBig1>
<TD class=Title1 colSpan=100><SPAN style="FONT-FAMILY: arial narrow; FONT-WEIGHT: normal">Tool & </SPAN><BR>PE311934-1-1 </TD></TR></TBODY></TABLE>
i want this ---►PE311934-1-1
i Try with "document.getElementsByClassName" but the vba gave me a error :/..
some tip?
Use Regular Expressions and the XMLHttpRequest object in VBA
I made a AddIn some time ago that does just that:
http://www.analystcave.com/excel-tools/excel-scrape-html-add/
If you just want the source code then here (GetElementByRegex function):
http://www.analystcave.com/excel-scrape-html-element-id/
Now the actual regex will be quite simple:
</SPAN><BR>(.*?)</TD></TR></TBODY></TABLE>
If it captures too much items simply expand the regex.
You don't specify the error and there is not enough HTML to know how many elements there are on the page.
You may have forgotten to use an index with document.getElementsByClassName("Title1"), as it returns a collection
For example, the first item would be: document.getElementsByClassName("Title1")(0)
In the same way, you could use a CSS querySelector such as .Title1
Which says the same thing i.e. select the elements with ClassName "Title1".
For the first instance simply use:
document.querySelector(".Title1")
For a nodeList of all matching
document.querySelectorAll(".Title1")
and then iterate over its length.
You would access the .innerText property of the element, generally, to retrieve the required string.
For the snippet shown, assuming the item is the first .Title1 on the page the CSS selector retrieves the following from your HTML
The resultant string can then be processed for what you want. This method, and regex, are fragile at best considering how easily an updated source page can break these methods.
In your above example, you can use the class name, .Title1, and then use Replace() to remove the Tool & .
I am looking to avoid using xpaths that are 'xpath position'. Reason being, the xpath can change and fail an automation test if a new object is on the page and shifts the expected xpath position.
But on some web pages, this is the only xpath I can find. For example, I am looking to click a tab called 'FooBar'.
If I use the Selenium IDE FireFox plugin, I get:
//td[12]/a/font
If I use the FirePath Firefox plugin, I get:
html/body/form/table[2]/tbody/tr/td[12]/font
If a new tab called "Hello, World" is added to the web page (before FooBar tab) then FooBar tab will change and have an xpath position of
//td[13]/a/font
What would you suggest to do?
TY!
Instead of using absolute xpath you could use relateive xpath which is short and more reliable.
Say
<td id="FooBar" name="FooBar">FooBar</td>
By.id("FooBar");
By.name("FooBar");
By.xpath("//td[text()='FooBar']") //exact match
By.xpath("//td[#id='FooBar']") //with any attribute value
By.xpath("//td[contains(text(),'oBar')]") //partial match with contains function
By.xpath("//td[starts-with(text(),'FooB')]") //partial match with startswith function
This blog post may be useful for you.
Relative xpath is good idea. relative css is even better(faster)
If possible suggest/request id for element.
Check also chrome -> check element -> copy css/xpath
Using //td is not a good idea because it will return all your td nodes. Any predicate such as //td[25] will be a very fragile selection because any td added to any previous table will change its result. Using plugins to generate XPath is great to find quickly what you want, but its always best to use it just as a starting point, and then analyze the structure of the file to write a locator that will be harder to break when changes occur.
The best locators are anchored to invariant values or attributes. Plugins usually won't suggest id or attribute anchors. They usually use absolute positional expressions. If can rewrite your locator path in terms of invariant structures in the file, you can then select the elements or text that you want relative to it.
For example, suppose you have
<body> ...
... lots of code....
<h1>header that has a special word</h1>
... other tags and text but not `h1` ...
<table id="some-id">
...
<td>some-invariant-text</td>
<td>other text</td>
<td>the field that you want</td>
...
The table has an ID. That's the best anchor. Now you can select the table as
//table[#id='some-id']
But many times you don't have the id, or even some other invariant attribute. You can still try to discover a pattern. For example: suppose that the last <h1> before the table you want contains a word you can match, you could still find the table using:
//table[preceding::h1[1][contains(.,'word')]]
Once you have the table, you can use relative axes to find the other nodes. Let's assume you want an td but there are no attributes on any tbody, tr, etc. You can still look for some invariant text. Tables usually have headers, or some fixed text which you can match. In the example above, if you find a td that is 2 fields before the one that you want, you could use:
//table[preceding::h1[1][contains(.,'word')]]/td[preceding-sibling::td[2][.='some-invariant-text']]
This is a simple example. If you apply some of these suggestions to the file you are working on, you can improve your XPath expression and make your selection code more robust.
I've searched but cannot find an answer.
How do I print an HTML Table with page breaks based on the value of a particular cell.
Basically I want to print a list of addresses and have a new page when the road name changes.
You can’t do this in HTML or in CSS. You need to mark the page breaks when generating the table or with client-side JavaScript. In either case, you just need to store the road name (which you need to get from somewhere according to the structure of the date). When processing a new row, you then just check the road name in its data against the stored value, and if they differ, emit
<tr style="page-break-before: always">
instead of a simple <tr> or, when doing this client-side, modify the style property of the tr element node accordingly.
Suppose I have an HTML table with multiple <tbody>, which we know is perfectly legal HTML, and attempt to read it with readHTMLTable as follows:
library(XML)
table.text <- '<table>
<thead>
<tr><th>Col1</th><th>Col2</th>
</thead>
<tbody>
<tr><td>1a</td><td>2a</td></tr>
</tbody>
<tbody>
<tr><td>1b</td><td>2b</td></tr>
</tbody>
</table>'
readHTMLTable(table.text)
The output I get only takes the first <tbody> element:
$`NULL`
Col1 Col2
1 1a 2a
and ignores the rest. Is this expected behavior? (I can't find any mention in the documentation.) And what are the most flexible and robust ways to access the entire table?
I'm currently using
table.text <- gsub('</tbody>[[:space:]]*<tbody>', '', table.text)
readHTMLTable(table.text)
which prevents me from using readHTMLTable directly on a URL to get a table like this, and also doesn't feel very robust.
If you look at the source for readHTMLTable getMethod(readHTMLTable, "XMLInternalElementNode") it contains the line
if (length(tbody))
node = tbody[[1]]
so it is purposefully designed to select only the content of the first tbody. Also ?readHTMLTable describes the function as providing
somewhat robust methods for extracting data from HTML tables in an HTML document
It is designed to be a utility function. Its great when it works but you may need to hack around it.
I am using WATIR for automated testing, and I need to copy in a variable the value of a rate. In the example below (from webpage source code), I need that variable myrate has value 2.595. I know how to retrieve value from <input> or <span> (see below), but not directly from a <td>. Any help? Thanks
<TABLE>
<TR>
<TD></TD>
<TD>Rate</TD>
<TD>2.595</TD>
</TR>
</TABLE>
For a <span> I use this code:
raRetrieved = browser.span(:name => 'myForm.raNumber').text
try this, find the row you want using a regular expression to match a row that contains the word 'Rate', then get the text of the third cell in the row.
myrate = browser.tr(:text, /Rate/).td(:index => 2).text
#or you can use the more user-friendly aliases for those tags
myrate = browser.row(:text, /Rate/).cell(:index => 2).text
If the word 'Rate' might appear elsewhere in other textin that table, but is always just the only entry in the second cell of the row you want, then find the cell with that exact text, use the parent method to locate the row that holds that cell , and then get the text from the third cell.
myrate = browser.cell(:text, 'Rate').parent.cell(:index => 2).text
use of .cell & .row vs .td & .tr is up to you, some people prefer the tags, others like the more descriptive names. Use whatever you feel makes the code the most readable for you or others who will work with it.
Note: code above presumes use of Watir-Webdriver, or Watir 2.x which both use zero based indexing. For older versions of Watir, change the index values to 3
And for the record I totally agree with comments of others about the lack of testability of the code sample you posted. it's horrid. Asking for something to locate the proper elements, such as ID values or Names is not out of line in terms of making the page easier to test.
Try this:
browser.td(how, what).text
The problem here is that table, tr and td tags do not have any attributes. You can try something like this (not tested):
browser.table[0][2].text
If this helps to anyone who is having the same issue, it is working for me like this:
browser.td(:text => "Rate").parent.cell(:index, 2).text
Thank you all