Side-by-Side gt tables **WITH** footnotes - html

I am trying to create side-by-side gt tables, as the title suggests. I started with the very helpful answer found here: Arrange gt tables side by side or in a grid or table of tables. The key was to ouput the left and right tables as raw html (as_raw_html), then combine in a dataframe, then send back into gt and reformat as markdow (fmt_markdown).
However, I ran into a problem that I couldn't solve. The fmt_markdown command skips the footnote, so the resulting table has the raw html as a footnote.
I checked the documentation for gt, and the fmt_markdown command takes columns and rows as input - but, apparently, the footnote area is considered neither a column nor a row.
So the crux seems to be that I can't seem to find any way to target the footnote area for reformatting as mardown.
Below is a reproducible example.
library(tidyverse)
library(gt)
# Make a table with a footnote
tL <- exibble %>%
select(c(num, char, group)) %>%
gt() %>%
tab_footnote(
footnote = html("**I'm an apricot**"),
locations = cells_body(columns = char,
rows = char == "apricot")
) %>%
tab_style(style = cell_text(color = "blue"),
locations = cells_footnotes()) %>%
as_raw_html()
# Make a copy
tR <- tL
# Side-By-Side
SideBySide <- data.frame(Ltable = tL, Rtable = tR) %>%
gt() %>%
fmt_markdown(columns = everything())
And the result looks like this:
Created on 2022-02-20 by the reprex package (v2.0.1)

Related

Saving the Text from a News Article in R?

I found this post over here that shows how to save the text from a website. Is there a simple way in R to extract only the text elements of an HTML page?.
I tried one of the answers provided here and it seems to be working quite well! For example:
library(htm2txt)
url_1 <- 'https://en.wikipedia.org/wiki/Alan_Turing'
text_1 <- gettxt(url_1)
url_2 <- 'https://www.bbc.com/future/article/20220823-how-auckland-worlds-most-spongy-city-tackles-floods'
text_2 <- gettxt(url_2)
All the text from the article appears, but so does a lot of "extra text" which does not have any meaning. For example:
p. 40/03B\n• ^ a or identifiers\n• Articles with GND identifiers\n• Articles with ICCU identifiers\n•
Is there some standard way to only keep the actual text from these articles? Or does this depend too much on the individual structure of the website and no "one size fits all" solution exists for such a problem?
Perhaps there might be some method of doing this in R that only recognizes the "actual text"?
Thank you!
You can cross-reference the words from the HTML page with a dictionary from qdapDictionaries, so only real English words are kept, but this method does keep words that aren't exclusively from the article (e.g., the word "jump" from "Jump to navigation").
library(tidyverse)
library(htm2txt)
library(quanteda)
library(qdapDictionaries)
data(DICTIONARY)
text <- 'https://en.wikipedia.org/wiki/Alan_Turing' %>% gettxt() %>% corpus()
text <- tokens(text, remove_punct = TRUE, remove_numbers = TRUE)
text <- tokens_select(text, DICTIONARY$word)
text <- data.frame(text = sapply(text, as.character), stringsAsFactors = FALSE) %>%
group_by(text1 = tolower(text)) %>%
table() %>% as.data.frame() %>%
rename(word = text1) %>%
rename(frequency = Freq)
head(text)

How to correctly read html node & content

I received an R code from a colleague that is no longer working with me. The code intends to scrape prices for multiple products from an online dealer.
Altough the code itself takes the links to the products from a intern excel list, it looks somehow like this:
input_galaxus2<-paste0('https://www.galaxus.ch/',input_galaxus$`Galaxus Artikel`)
sess <- session(input_galaxus2[1]) #to start the session
for (j in input_galaxus2){
sess <- sess %>% session_jump_to(j) #jump to URL
i=i+1
try(vec_galaxus[i] <- read_html(sess) %>% #can read direct from sess
html_nodes('div strong') %>%
html_text() %>%
nth(5))
Sys.sleep(runif(1, min=1, max=2))
}
one of the articles marked as j in the code is for example 14513929
But when i run the code, i don't get the prices, but Service or Standorte
I guess it's because the html_text() or nodes are selected wrongly, but I can't really say how to properly select the real ones.

Deleting commas in R Markdown html output

I am using R Markdown to create an html file for regression results tables, which are produced by stargazer and lfe in a code chunk.
library(lfe); library(stargazer)
data <- data.frame(x = 1:10, y = rnorm(10), z = rnorm(10))
result <- stargazer(felm(y ~ x + z, data = data), type = 'html')
I create a html file win an inline code r result after the chunk above. However, a bunch of commas appear at the top of the table.
When I check the html code, I see almost every </tr> is followed by a comma.
How can I delete these commas?
Maybe not what you are looking for exactly but I am a huge fan of modelsummary. I knit to HTML to see how it looks and then usually knit to pdf. The modelsummary equivalent would look something like this
library(lfe)
library(modelsummary)
data = data.frame(x = 1:10, y = rnorm(10), z = rnorm(10))
results = felm(y ~ x + z, data = data)
modelsummary(results)
There are a lot of ways to customize it through kableExtra and other packages. The documentation is really good. Here is kind of a silly example
library(kableExtra)
modelsummary(results,
coef_map = c("x" = "Cool Treatment",
"z" = "Confounder",
"(Intercept)" = "(Intercept)")) %>%
row_spec(1, background = "#F5ABEA")

How to get descriptive table for both continuous and categorical variables?

I want to get descriptive table in html format for all variables that are in data frame. I need for continuous variables mean and standard deviation. For categorical variables frequency (absolute count) of each category and percentage of each category. Also I need the count of missing values to be included.
Lets use this data:
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
I want to get table in html format that will look like this:
----------------------------------------------------------------------
Variables N (missing) Mean (SD) / %
----------------------------------------------------------------------
len 59 (1) 18.9 (7.65)
supp
OJ 30 50%
VC 29 48.33%
NA 1 1.67%
dose 60 1.17 (0.629)
I need also to set the number of digits after decimal point to show.
If you know better variant to display that information in html in better way than please provide your solution.
Here's a programatic way to create separate summary tables for the numeric and factor columns. Note that this doesn't make note of NAs in the table as you requested, but does ignore NAs to calculate summary stats as you did. It's a starting point, anyway. From here you could combine the tables and format the headers however you want.
If you knit this code within an RMarkdown document with HTML output, kable will automatically generate the html table and a css will format the table nicely with a horizontal rules as pictured below. Note that there's also a booktabs option to kable that makes prettier tables like the LaTeX booktabs package. Otherwise, see the documentation for knitr::kable for options.
library(dplyr)
library(tidyr)
library(knitr)
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
numeric_cols <- dplyr::select_if(df, is.numeric) %>%
gather(key = "variable", value = "value") %>%
group_by(variable) %>%
summarize(count = n(),
mean = mean(value, na.rm = TRUE),
sd = sd(value, na.rm = TRUE))
factor_cols <- dplyr::select_if(df, is.factor) %>%
gather(key = "variable", value = "value") %>%
group_by(variable, value) %>%
summarize(count = n()) %>%
mutate(p = count / sum(count, na.rm = TRUE))
knitr::kable(numeric_cols)
knitr::kable(factor_cols)
I found r package table1 that does what I want. Here is a code:
library(table1)
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
table1(reformulate(colnames(df)), data=df)

kableExtra column_spec width not working

I am creating tables that will be rendered using Rmarkdown to HTML. I am using kable and have been experimenting with kableExtra to add features to my tables. I am not able to get the width option in column_spec to work when applying it to all columns in a table:
data.frame(RRmin=1, RRmax=10) %>%
dplyr::rename(`Reportable Range Min` = RRmin, `Reportable Range Max` = RRmax) %>%
kable() %>%
column_spec(1:2, width = "0.5in") %>%
kable_styling(c("bordered", "condensed"), full_width = F)
This gives a table that looks like this.
I can make the width longer and both columns change, but when it goes smaller it does not seem to work. I can make one column smaller but not the other:
data.frame(RRmin=1, RRmax=10) %>%
dplyr::rename(`Reportable Range Min` = RRmin, `Reportable Range Max` = RRmax) %>%
kable() %>%
column_spec(1, width = "0.5in") %>%
kable_styling(c("bordered", "condensed"), full_width = F)
This gives a table that looks like this. The first column was appropriately changed but I cannot get this effect when I'm trying to change the size of both columns. I have tried doing separate column_spec lines for each column, using escape=F and am not sure what to try next.
I have had similar problems with column_spec not working. I was able to find a fix that worked for my purposes by playing with the width_min option. Maybe that will help.
My issue was that none of the columns widths seemed to be adjusted by column_spec, even when I tried all of the options you mention above. The result was that some columns were way too thin. I set width_min="3in" and fixed it. This was not a perfect fix because now I'm left with other column that are too wide, but it at least made my table a little more readable.
This may be a little late, but I've just been working with the kableExtra package, and it seems that your code is now working pretty much as is.
At first I thought it might have something to do with the ordering of the kable_styling component, but it seems not to matter which order it is in. Perhaps it was a bug in the package that has since been fixed. It is also immaterial wether you use column_spec(column = 1:2, width = "2in"), or column_spec(1:2, width = "2in"). Both seem to work well, as do modifications to the columns size. See below:
---
output: pdf_document
---
```{r global_options, include=FALSE}
# Just some setup:
sapply(c("knitr", "tidyverse", "kableExtra"), require, character.only = TRUE)
options(knitr.kable.NA = '', knitr.table.format = "latex")
knitr::opts_chunk$set(fig.path = 'figures/',
echo = FALSE, warning = FALSE, message = FALSE)
opts_chunk$set(echo = FALSE,
message = FALSE,
warning = FALSE,
fig.align = "center",
fig.width = 5,
fig.pos = 'H',
as.is = TRUE)
```
```{r variable-names-table, as.is=TRUE}
# Size example 1; 1.5 inch columns
data.frame(RRmin=1, RRmax=10) %>%
dplyr::rename(`Reportable Range Min` = RRmin, `Reportable Range Max` = RRmax) %>%
kable() %>%
kable_styling(c("bordered", "condensed"), full_width = F) %>%
column_spec(column = 1:2, width = "1.5in")
# Size example 2; 3 inches
data.frame(RRmin=1, RRmax=10) %>%
dplyr::rename(`Reportable Range Min` = RRmin, `Reportable Range Max` = RRmax) %>%
kable() %>%
column_spec(column = 1:2, width = "3in") %>%
kable_styling(c("bordered", "condensed"), full_width = F)
# To set columns 1 and two to different sizes
data.frame(RRmin=1, RRmax=10) %>%
dplyr::rename(`Reportable Range Min` = RRmin, `Reportable Range Max` = RRmax) %>%
kable() %>%
column_spec(column = 1, width = "3in") %>%
column_spec(column = 2, width = "2in") %>%
kable_styling(c("bordered", "condensed"), full_width = F)
```
Just a note for anyone else dealing with the issue. The above will run as an RMD
R version 3.6.1, on mac
RStudio 1.2.1335
kableExtra 1.1.0
knitr 1.25
tidyverse 1.2.1
Simply replace width by width_min!
I'm using Tex Live 2020, and this problem still exists - it appears that column_spec has a bug. All thse examples run without a probllem if I remove the column_spec commands. As soon as I include the column_spec commands, I get a cryptic error which says 'Undefined control sequence, Latex Error: Illegal character in array arg. The description is also cryptic: ...n}|>{\raggedleft\arraybackslash}p{1.5in}}
The control sequence at the end of the top line
of your error message was never \def'ed. If you have
misspelled it (e.g., \hobx), type I and the correct spelling (e.g., I\hbox). Otherwise just continue, and I'll forget about whatever was undefined. Removing the column_spec command fixes the problem.
The fix is to include the array package in the latex preamble. One way of doing this is adding the following lines to your Rmarkdown header:
header-includes:
- \usepackage{array}