dfSummary - formatting factor levels in html output in Rmarkdown? - html

I'm trying to put together an html rmd report and having an issue with how summarytools::dfSummary displays factor levels. When I print to browser or R-Studio viewer, levels are formatted so that they are stacked vertically:
When I print to html in R Markdown they are formatted inline, which is annoying and less readable:
Any ideas on how to deal with this? I've looked around and haven't seen any way to deal with this in the print() function for summarytools::, maybe there's a way for me to reformat this in rmd?
Thanks.

I spent a day looking for a solution, so apparently summarytools’ CSS has been included in the following manner, with chunk option echo = FALSE:
` ``{r echo=FALSE, results='asis'}
st_css()
` ``

Related

XPath 1.0 expression to select an Element with text and child nodes

I've been having serious issues detecting elements in a particular section of a document. The issue is regarding a large menu presented as a sequence of buttons that contain both image and text in that order to this one:
<button type="button" id="ext-gen375" class=" x-btn-text">
<img style="height:13px" src="inc/FAST/images/icons/Transaction.png">
Transactions
</button>
I want to select the button using its text contents, the issue is that there are other buttons that have similar names, i.e."Policy" and "Policy Address". The ideal solution would be to match the text avoiding the use of contains or other substring functions, but I've been struggling to do so. I have tried several different expressions that seem fine on http://xpather.com/ but do not work on Mozilla or Chrome at all.
//button[text()[normalize-space()="Transactions"]]
//button[normalize-space(text())="Transactions"]
//button[normalize-space(.)="Transactions"]
//button[text()[translate(normalize-space(), "
","")="Transactions"]]
Thanks in advance guys.
Edit1:
Prophet had an excellent suggestion to use the tag in the search. Unfortunately, similar buttons share the same icon.
Edit2
Based on Siebe's answer I was able to look a deep further into the situation. My goal was to have a working XPath 1.0 expression for automation in Selenium, but I was using Chrome and Firefox to test the expressions. For some reason on those browsers, Non-Breaking Spaces in a text will not match the common whitespace character or any of the characters bellow:
' ', ' ', ' ', '\u00a0'
After many hours trying to find something, I opted to start a Selenium WebDriver section and perform the tests there and to my surprise, the XPath bellow worked using the Chrome Driver:
//button[text()[translate(., "\u00a0", "")="Transactions"]]
While that expression works for Selenium automation projects, it really reinforces the impact that different implementations of the same tool can have on your project. Thanks again to everyone that replied.
Why not to locate the button based on it child img node src attribute value?
This should work:
//button[./img[contains(#src,'Transaction.png')]]
The problem in that example is that text() returns Transactions (plus a couple of new lines) not Transactions
xmllint --html --xpath '//button/text()' test.html
Transactions
Adding single quotes to denote the real text:
echo "'$(xmllint --html --xpath '//button/text()[2]' test.html )'"
'
Transactions'
This comes a little closer
echo "'$(xmllint --html --xpath 'translate(normalize-space(//button/text()[2]), " ","")' test.html )'"
'   Transactions'
You could try this:
//button[normalize-space(translate(., ' ', ''))='Transactions']
Why?
Wel normalize-space() does not get rid of the and in xpath they can be removed with translate(., ' ', '')

R Stargazer table ASCII text output formatting (line break, alignment & reference group)

For better or worse, I don't use LaTeX (yet). I like producing stargazer formatted tables on the fly for class examples in both HTML and in the console. However, I'm having trouble with 3 formatting elements; so far I've found solutions for LaTeX and some in HTML, but the ASCII console text eludes me.
The 3 challenges are:
Breaking a line so that a variable name can wrap instead of increasing the table width.
Aligning coefficients & std. errors at the decimal, even when there are p-value stars.
Making space in the covariate labels & coefficients to allow for a reference group.
Let's start with some reproducible data & outputs to reference.
set.seed(3); x1 <- factor(sample(letters[1:4], 1000, replace=TRUE))
set.seed(4); x2 <- runif(1000, -10, 10)
set.seed(5); x3 <- rbinom(1000, size = 1, prob = 0.13)
set.seed(6); y <- runif(1000, -10, 10)
model <- (lm(y ~ x1 + x2 + x3))
stargazer(model, align=TRUE,
#type="html", out="SO_stargazer.html",
type="text", out="SO_stargazer.txt",
title="Example Title Goes Here",
dep.var.caption="",
dep.var.labels="This is my long title for the Dependent Variable Y",
covariate.labels=c("X1 Group B",
"X1 Group C",
"X1 Group D",
"X2 with a super ridiculous and annoyingly long name",
"X3"))
Line break
My default approach is to use \n in the character string. For example, I might try to break the DV caption:
dep.var.labels="This is my long title for \n the Dependent Variable Y",
But that generates the following error message:
Error in if (nchar(text.matrix[r, c]) > max.length[real.c]) { : missing value where TRUE/FALSE needed
Found a couple posts about this issue (here which reference here), but the poster on the first did not provide much of an example to follow and the second pertained to an underscore that I don't have or gave LaTeX solutions. The only difference that broke what already worked was the addition of the \n. I did try using the tex \\ escape, but that didn't do anything useful for text output.
I am able to get line breaks using <br> in the string for the html output file version.
This post also mentions the tex and html solutions, but not text.
Alignment on the decimal
When there are no statistical significance stars on coefficients, both the coefficients and std. errors align nicely, centered on the decimal point. However, once the stars appear, it 'pushes' the coefficient to the left. This happens in both the text and html output. This is not so bad with 1 star, but 3 stars can be quite a difference. How can I coerce it back to align on the decimal value for both formats? This issue persists even if I use the single.row=TRUE option. This post answer by #Marco Doe has a great visual of what I'm talking about, but noted the centering is for tex. Found a LaTeX solution, but no mention of the other formats on that post. I've tinkered with the align and float options to no avail (inspired by these quasi-related tex solution posts here and here). The latter post hinted at using xtable or post-process edits, but that was more than 5 years ago; so I'm hoping for an updated viable solution.
This image is from Marco Doe's solution and shows the LaTeX output, but does a good job showing an example output formats I get (left) and what I would like to have (right).
Reference categories
Found a LaTex solution, that 'pushes' the covariates & coeffient data down a row, making room for a reference group to be printed in the covariate column; however, the solution is in tex. How can I replicate this for the text output? Can I replicate it for HTML version as part of the R code without having to get surgical with the HTML output code?
#Giac posted the images (linked above) to illustrate the have (left) and want (right). Although these images are tex, how could I get the right image output in text and html?

Get code output nicer in R Markdown Knit? Breaking into two parts

I am knitting R Markdown (Rmd) to HTML, but some code outputs are not pretty, e.g., like the picture attached.console
How can I make output in one part if possible? Thanks.

rvest won't pull text after "<" even though it is part of the string

Website I'm trying to pull from: http://goodcompanies.com/company/31-bits/
Value I'm trying to scrape "Most <$100":
<div class="company-info-section no-flex-grow no-flex-shink">
<h4 class="all-caps title-line-right no-margin company-section-title">
<span>Price Range</span>
</h4>
<b>Most <$100</b>
</div>
Code I'm using:
html <- read_html(http://goodcompanies.com/company/31-bits/)
info <- html %>%
html_nodes('.company-info-section') %>%
html_text() %>%
.[1]
I get: "\n\t\t\n\t\t\tPrice Range\n\t\t\n\t\tMost \n\t"
But what I want and should get is: "\n\t\t\n\t\t\tPrice Range\n\t\t\n\t\tMost < $100\n\t"
It seems like the fact that in the actual HTML there isn't a space between the < and the $ is causing the issue. How can I get around this?
The space isn’t the issue; the actual issue is that the website is simply using invalid HTML: < in HTML code must be escaped (e.g. as <), and the website isn’t doing that.
Unfortunately rvest appears to cope badly with invalid HTML. The best solution would be to find an HTML parser that can deal with messy/invalid HTML. Unfortunately I don’t know any for R.
A hacky solution would be to download the page into a character string, fix the problem (i.e. perform a gubs('<$', '<$', page_text) or similar), and then pass it to rvest.

Perl formatting (i.e.sprintf) not retained in html display

I have ran into a bit of problem. Originally, I have the following input of the format:
12345 apple
12 orange
I saved the first column as $num and second column as $fruit. I want the output to look like this (see below). I would like for the output to align as if the $num are of all the same length. In reality, the $num will consists of variable-length numbers.
12345 apple
12 orange
As suggested, I use the following code:
$line = sprintf "%--10s %-20s", $num, $fruit;
This solution works great in command-line display, but this formatting is not retained when I try to display this via HTML. For example..
print "<html><head></head><body>
$line
</body></html>";
This produces the same output as the original before formatting. Do you guys have a suggestion as to how I can retain the sprintf formatting in html web-based display? I try to pad the $num with whitespaces, but the following code doesn't seem to work for me.
$num .= (" " x (10 - length($num)));
Anyways, I would appreciate any suggestions. Thanks!
HTML ignores extra whitespace. And the fact that it's probably displaying with a proportional font means it wouldn't line up even if the extra spaces were there.
The easy option is to just surround the text with <pre> tags, which will display by default with a monospace font and whitespace preserved. Alternatively, you can have your code generate an HTML table.
HTML compresses all consecutive spaces down to one space. If you want your output to be lined up like a table, you have to actually put the values in an HTML table.
The 'pre' in <pre> means preformatted, which exactly describes the output of a sprintf() statement. Hence the suggestion from friedo and I suspect, others.