R Markdown HTML Number Figures - html

Does anyone know how to number the figures in the captions, for HTML format R Markdown script?
For PDF documents, the caption will say something like:
Figure X: Some Caption Text
However, the equivalent caption for the HTML version will simply say:
Some Caption Text
This makes cross-referencing figures by number completely useless.
Here is a minimal example:
---
title: "My Title"
author: "Me"
output:
pdf_document: default
html_document: default
---
```{r cars, fig.cap = "An amazing plot"}
plot(cars)
```
```{r cars2, fig.cap = "Another amazing plot"}
plot(cars)
```
I have tried setting toc, fig_caption and number_sections within each of the output formats, but this does not seem to change the result.

The other answers provided are relatively out of date, and this has since been made very easy using the bookdown package. This package provides a number of improvements which includes the built-in numbering of figures across Word, HTML and PDF.
To be able to use bookdown, you need to first install the package install.packages("bookdown") and then use one of the output formats. For HTML, this is html_document2. Taking your example:
---
title: "My Title"
author: "Me"
date: "1/1/2016"
output: bookdown::html_document2
---
```{r cars, fig.cap = "An amazing plot"}
plot(cars)
```
```{r cars2, fig.cap = "Another amazing plot"}
plot(cars)
```
These Figures will be numbered Figure 1 and Figure 2. Providing the code chunk is named and has a caption, we can cross reference the output using the the syntax \#ref(fig:foo) where foo is the name of the chunk i.e. \#ref(fig-cars). You can learn more about this behaviour here
Further Reading
R Markdown: The definitive Guide: Chapter 11 provides a great overview of bookdown
Authoring books with bookdown provides a comprehensive guide on bookdown, and recommended for more advanced details.

So unless someone has a better solution, this is the solution that I came up with, there are some flaws with this approach (for example, if the figure/table number is dependent on the section number etc...), but for the basic html document, it works.
Somewhere at the top of you document, run this:
```{r echo=FALSE}
#Determine the output format of the document
outputFormat = opts_knit$get("rmarkdown.pandoc.to")
#Figure and Table Caption Numbering, for HTML do it manually
capTabNo = 1; capFigNo = 1;
#Function to add the Table Number
capTab = function(x){
if(outputFormat == 'html'){
x = paste0("Table ",capTabNo,". ",x)
capTabNo <<- capTabNo + 1
}; x
}
#Function to add the Figure Number
capFig = function(x){
if(outputFormat == 'html'){
x = paste0("Figure ",capFigNo,". ",x)
capFigNo <<- capFigNo + 1
}; x
}
```
Then during the course of your document, if say you want to plot a figure:
```{r figA,fig.cap=capFig("My Figure Caption")
base = ggplot(data=data.frame(x=0,y=0),aes(x,y)) + geom_point()
base
```
Substitute the capFig to capTab in the above, if you want a table caption.

We can make use of pandoc-crossref, a filter that allows a cross-referencing of figures, tables, sections, and equations and works for all output format. The easiest way is to cat the figure label (in the form of {#fig:figure_label}) after each plot, although this requires echo=FALSE and results='asis'. Then we can reference a figure as we would a citation : [#fig:figure_label] produces fig. figure_number by default.
Here is a MWE:
---
output:
html_document:
toc: true
number_sections: true
fig_caption: true
pandoc_args: ["-F","pandoc-crossref"]
---
```{r}
knitr::opts_chunk$set(echo=FALSE,results='asis')
```
```{r plot1,fig.cap="This is plot one"}
x <- 1:10
y <- rnorm(10)
plot(x,y)
cat("{#fig:plot1}")
```
As we can see in [#fig:plot1]... whereas [#fig:plot2] shows...
```{r plot2, fig.cap="This is plot two"}
plot(y,x)
cat("{#fig:plot2}")
```
which produces (removing the graphics
PLOT1
Figure 1: This is plot one
As we can see in fig. 1… whereas fig. 2 shows…
PLOT2
Figure 2: This is plot two
See the pandoc-crossref readme for more options and customizations.
To install pandoc-crossref, assuming you have a haskell installation:
cabal update
cabal install pandoc-crossref

I solve cross-referencing using a solution similar to that posted by Nicholas above. I use bookdown for some projects but I find that awkward to use for other projects where I just want simple cross-referencing.
I use the following when I am writing a paper with rmarkdown and I want it in standard format for submission to a journal. I want a figure legend at the end, then tables, then I'll have the tables and figures. As I am writing, I only have a rough idea of what order the figures will be referenced in the text. I just want to reference them with a text code like fig:foobar and have the number assigned based appearance in the text. When I look at the figure legend list, I'll see what order to put the legends and will move legends around as needed.
Here's my structure.
I have an R package where I have things I need for papers, like various bibliographies and helper R functions. In that package, I have the following function which uses some variables defined in the main Rmd environment: .rmdenvir and .rmdctr .
ref <- function(useName) {
require(stringr)
if(!exists(".refctr")) .refctr <- c(`_` = 0)
if(any(names(.refctr)==useName)) return(.refctr[useName])
type=str_split(useName,":")[[1]][1]
nObj <- sum(str_detect(names(.refctr),type))
useNum <- nObj + 1
newrefctr <- c(.refctr, useNum)
names(newrefctr)[length(.refctr) + 1] <- useName
assign(".refctr", newrefctr, envir=.rmdenvir)
return(useNum)
}
It assumes that I name things I want referenced with something like cntname:foo, for example fig:foo. It makes a new counter for each one and I can make up new counters on the fly (while writing) if needed.
In my main Rmd file, I have some set-up lines:
```{r setup_main}
require(myPackageforPapers)
# here is where the variables needed by ref() are defined.
.rmdenvir = environment()
.refctr <- c(`_` = 0)
````
In the text I use the following
You can see what I am trying to show in Figure `r ref("fig:foo")`
and you can see it also in Tables `r ref("tab:foo")`
and A`r ref("tabappA:foobig")`.
to get "You can see what I am trying to show in Figure 1 and you can see it also in Tables 1 and A1." Although the numbers might not be 1; the number to use will be dynamically determined. I don't have to use a special function for the first time I reference a figure, table or whatever I am counting. ref() figures that out by looking to see if the label exists already. If not it assigns the next number, and returns it. So you don't have to use "label" in one place and "ref" in another.
In the course of writing, I might decide that appendix A is getting too big, and that I will split off some of the tables into an appendix B. All I need to do is change the above to
You can see what I am trying to show in Figure `r ref("fig:foo")`
and you can see it also in Tables `r ref("tab:foo")`
and B`r ref("tabappB:foobig")`.
I just specify a new counter name 'tabappB' and the numbers for that are dynamically determined.
At the end of my Rmd file, I have a figure list that will look like
# Figure Legends
Figure `r ref("fig:foo")`. This is the legend for this figure.
Figure `r ref("fig:foo2")`. This is the legend for another figure.
Then my tables appear like so
```{r print-tablefoo, echo=FALSE}
tablefoo=mtcars
thecap = "Tables appear with a legend while figures do not."
fullcap = paste("Table ", ref("tab:foo"), ". ", thecap, sep="")
kable(tablefoo, caption=fullcap)
```
and then the figures like so:
```{r fig-foo, echo=FALSE, fig.cap=paste("Figure",ref("fig:foo"))}
plot(1,1)
```
Appendix A is an Rmd file that included as a child. It will have tables like
```{r print-tableAfoo, echo=FALSE}
tablefoo=mtcars
thecap = "This is a legend."
fullcap = paste("Table A", ref("tabappA:foobig"), ". ", thecap, sep="")
kable(tablefoo, caption=fullcap)
```
I do have to add the "A" to get Table A1, but I find it easier if R doesn't think too much for me in terms of labelling my counters. I just I want it to return the right number.
The cross-referencing works for html, pdf/latex or word. I'd happily stick with latex solutions, but my co-authors use word so I need a solution that works with pandoc and word. Also sometimes I want html or some other output and I need a solution that works for any output that works with rmarkdown.

Related

HTML output takes too long to load and to show up the scroll bar

I have this Rmarkdown file but since it is pretty heavy (it is an online guide), the scroll bar (and the whole file except the first page) takes too long to show up when opening the html output. I tried to divide the rmd file into distinct rms sub files as below shown but I still can't get the result. Thank you
---
title: "my_file"
author: "me"
date: "26/02/2020"
output:
html_document:
toc: yes
toc_depth: 3
toc_float:
collapsed: yes
smooth_scroll: yes
word_document: default
---
```{r child = 'child0.Rmd'}
```
```{r child = 'child1.Rmd'}
```
```{r child = 'child2.Rmd'}
```
```{r child = 'child3.Rmd'}
```
```{r child = 'child4.Rmd'}
```
Investigate and try to reduce the size of yours pictures/graphics : in parallel or alternatively to the 'split' of your text in several 'html-pages', the idea is to made a compromise between time of opening and quality of your graphics (and imported pictures).
So, try :
to reduce size of graphics computed by some code chunk, see here for an exemple.
to reduce the size of yours imported pictures, if they huge, by resizing them.
to take advantage of the html format which is able to render svg files : try encoding in svg your graphics representation of your data. Not your external images, only your computation which resulting in graphics (text + area + color = some graphics are 'lighter' in svg than in jps or tif).

Flextable : using superscript in the dataframe

This question was asked few times, but surprinsingly, no answer was given.
I want some numbers in my dataframe to appear in superscript.
The functions compose and display are not suitable here since I don't know yet which values in my dataframe will appear in superscript (my tables are generated automatically).
I tried to use ^8^like for kable, $$10^-3$$, paste(expression(10^2)), "H\\textsubscript{123}", etc.
Nothing works !! Help ! I pull out my hair...
library(flextable)
bab = data.frame(c( "10\\textsubscript{-3}",
paste(as.expression(10^-3)), '10%-3%', '10^-2^' ))
flextable(bab)
I am knitting from Rto html.
In HTML, you do superscripts using things like <sup>-3</sup>, and subscripts using <sub>-3</sub>. However, if you put these into a cell in your table, you'll see the full text displayed, it won't be interpreted as HTML, because flextable escapes the angle brackets.
The kable() function has an argument escape = FALSE that can turn this off, but flextable doesn't: see https://github.com/davidgohel/flextable/issues/156. However, there's a hackish way to get around this limitation: replace the htmlEscape() function with a function that does nothing.
For example,
```{r}
library(flextable)
env <- parent.env(loadNamespace("flextable")) # The imports
unlockBinding("htmlEscape", env)
assign("htmlEscape", function(text, attribute = FALSE) text, envir=env)
lockBinding("htmlEscape", env)
bab = data.frame(x = "10<sup>-3</sup>")
flextable(bab)
```
This will display the table as
Be careful if you do this: there may be cases in your real tables where you really do want HTML escapes, and this code will disable that for the rest of the document. If you execute this code in an R session, it will disable escaping for the rest of the session.
And if you were thinking of using a document like this in a package you submit to CRAN, forget it. You shouldn't be messing with bindings like this in code that you expect other people to use.
Edited to add:
In fact, there's a way to do this without the hack given above. It's described in this article: https://davidgohel.github.io/flextable/articles/display.html#sugar-functions-for-complex-formatting. The idea is to replace the entries that need superscripts or subscripts with calls to as_paragraph, as_sup, as_sub, etc.:
```{r}
library(flextable)
bab <- data.frame(x = "dummy")
bab <- flextable(bab)
bab <- compose(bab, part = "body", i = 1, j = 1,
value = as_paragraph("10",
as_sup("-3")))
bab
```
This is definitely safer than the method I gave.

When knitting RMarkdown to HTML with RStudio, is it possible to view directly in browser, instead than previewing in a window?

I often with RMarkdown documents which are heavy on math, such as for example:
---
title: "Just a test"
author: "Yours Truly"
date: '`r Sys.Date()`'
output:
html_document:
fig_caption: yes
---
```{r setup, include=FALSE}
library(knitr)
opts_chunk$set(echo = FALSE,
cache = TRUE,
out.width = "75%",
fig.align = "center")
```
## Classical multiple linear regression
A common question in Data Science/Statistics is: how does a certain quantity $y$ depend on other quantities $x_1,\dots,x_p$? Generally, we are interested in $p(y|\mathbf{x})$, the conditional distribution of $y$ given $\mathbf{x}=(x_1,\dots,x_p)$. The simplest and perhaps most widely used model for $p(y|\mathbf{x})$ assumes that, given $\mathbf{x}$, $y$ is normally distributed, with a constant variance $\sigma^2$ and and mean which is a linear function of a parameter vector $\boldsymbol{\beta}=(\beta_0,\beta_1,\dots,\beta_k)$
$$\mathbb{E[y|\mathbf{x}]}=\boldsymbol{\beta}^T\cdot(1,\mathbf{x})=\beta_0+\sum_{j=1}^p\beta_jxj$$
When I knit to HTML, RStudio will preview this to a window. To see the HTML in a browser I click on "View in browser":
Isn't there a way to directly view the HTML in a browser after knitting?

How to put entire datatable onto html report? (or at least left align)

I have several wide tables that should fit onto an html report, but I don't know how to do it.
Consider the following example. It's rather silly I know, because I could chop the digits off, but many of my tables have string columns that are about this long and cannot be chopped:
---
title: "DT Fitting"
output: html_document
---
```{r testTable, fig.align = 'left', fig.width = 6in}
DT::datatable(datasets::euro.cross)
```
It renders an html report that looks like this:
Notice that I've tried using fig.align and fig.width to align or shrink the table, but they don't seem to work. Does anyone know to put this single table onto the page so as to be completely visible?
It looks like a previous SO post captures this.
This allows you to set the width via options(width = some number). This doesn't seem to be the ideal solution if you have multiple wide tables.
Another option is to consider, fixing the columns when setting up the datatables and enabling scrolling. Check out Section 5. They allow you to fix the first and last column, and scroll the columns in between.
Based on the Datatables link:
```{r}
DT::datatable(datasets::euro.cross,
extensions = 'FixedColumns',
options = list(
dom = 't',
scrollX = TRUE,
fixedColumns = list(leftColumns = 2, rightColumns = 1))
)
```

knitr/rmarkdown - reducing html file size

I want to produce an html document using knitr/rmarkdown. Currently, the file is over 20MB and I'm trying to find a way to reduce it. The large file size is probably due to my plots which have a lot of points in them.
If I change my output type to pdf, I can get it down to 1.7MB. I'm wondering if there is a way to reduce my file while keeping it as a html.
EDIT: Here's a minimal working example which I did in RStduio.
---
title: "Untitled"
author: "My Name"
date: "September 7, 2015"
output: html_document
---
```{r}
library(ggplot2)
knitr::opts_chunk$set(dev='svg')
```
```{r}
set.seed(1)
mydf <- data.frame(x=rnorm(2e4),y=rnorm(2e4))
ggplot(mydf, aes(x,y)) + geom_point(alpha=0.6)
```
I also noticed that if I have too many observations, the plot doesn't get generated at all. I just get an empty box with a question mark in the output.
```{r}
set.seed(2)
mydf <- data.frame(x=rnorm(5e4),y=rnorm(5e4))
ggplot(mydf, aes(x,y)) + geom_point(alpha=0.6)
# ...plot doesn't appear in output
```
Following the suggestion of #daroczig to use the "dpi" knitr chunk option, I modified your code as follows (see below).
You had set the dev chunk option equal to "svg", which produces very large vector graphics files, especially for images made up of many elements (points, lines, etc.)
I set the dev chunk option back equal to "png", which is the default raster graphics format for HTML output. So you don't need to touch it at all. Keeping the dev chunk option equal to "png" dramatically reduces the HTML output file size.
I set the dpi chunk option equal to 36 (72 is the default), to lower the image resolution, and decrease the HTML output file size further.
I set the out.width and out.height chunk options equal to "600px", to increase the image dimensions.
You can change the dpi, out.width, and out.height options, until you get the HTML output file size and the image dimension to what you want. There's a trade-off between output file size and image resolution.
After knitting the code, I got an HTML output file size equal to 653kB, even when plotting 5e4 data points.
---
title: "Change size of output HTML file by reducing resolution of plot image"
author: "My Name"
date: "September 7, 2015"
output: html_document
---
```{r}
# load ggplot2 silently
suppressWarnings(library(ggplot2))
# chunk option dev="svg" produces very large vector graphics files
knitr::opts_chunk$set(dev="svg")
# chunk option dev="png" is the default raster graphics format for HTML output
knitr::opts_chunk$set(dev="png")
```
```{r, dpi=36, out.width="600px", out.height="600px"}
# chunk option dpi=72 is the default resolution
set.seed(1)
mydf <- data.frame(x=rnorm(5e4),y=rnorm(5e4))
ggplot(mydf, aes(x,y)) + geom_point(alpha=0.6)
```
To prevent scatterplots with many points blowing up the size of your vector graphics (and accordingly html output) you can use geom_point_raster() from the ggrastr package. Eat the cake and have it too!