How to reduce image size in sweave? - sweave

How to reduce image size in sweave?
\documentclass[a4paper]{article}
\title{Sweave Example 1}
\author{Friedrich Leisch}
\begin{document}
\maketitle
In this example we embed parts of the examples from the
\texttt{kruskal.test} help page into a \LaTeX{} document:
<<>>=
data(airquality)
library(ctest)
kruskal.test(Ozone ~ Month, data = airquality)
#
which shows that the location parameter of the Ozone
distribution varies significantly from month to month. Finally we
include a boxplot of the data:
\begin{center}
<<fig=TRUE,echo=FALSE>>=
boxplot(Ozone ~ Month, data = airquality) ## reduce image size so that can fit in window.
#
\end{center}
\end{document}
Regards

Try knitr instead Sweave. See out.width and out.height options.
Your example would look like this:
\documentclass[a4paper]{article}
\title{Sweave Example 1}
\author{Friedrich Leisch}
\begin{document}
\maketitle
In this example we embed parts of the examples from the
\texttt{kruskal.test} help page into a \LaTeX{} document:
<<>>=
data(airquality)
library(ctest)
kruskal.test(Ozone ~ Month, data = airquality)
#
which shows that the location parameter of the Ozone
distribution varies significantly from month to month. Finally we
include a boxplot of the data:
<<plot,fig.align="center",out.width="0.8\\linewidth",echo=FALSE>>=
boxplot(Ozone ~ Month, data = airquality) ## reduce image size so that can fit in window.
#
\end{document}

Related

knitr/rmarkdown - reducing html file size

I want to produce an html document using knitr/rmarkdown. Currently, the file is over 20MB and I'm trying to find a way to reduce it. The large file size is probably due to my plots which have a lot of points in them.
If I change my output type to pdf, I can get it down to 1.7MB. I'm wondering if there is a way to reduce my file while keeping it as a html.
EDIT: Here's a minimal working example which I did in RStduio.
---
title: "Untitled"
author: "My Name"
date: "September 7, 2015"
output: html_document
---
```{r}
library(ggplot2)
knitr::opts_chunk$set(dev='svg')
```
```{r}
set.seed(1)
mydf <- data.frame(x=rnorm(2e4),y=rnorm(2e4))
ggplot(mydf, aes(x,y)) + geom_point(alpha=0.6)
```
I also noticed that if I have too many observations, the plot doesn't get generated at all. I just get an empty box with a question mark in the output.
```{r}
set.seed(2)
mydf <- data.frame(x=rnorm(5e4),y=rnorm(5e4))
ggplot(mydf, aes(x,y)) + geom_point(alpha=0.6)
# ...plot doesn't appear in output
```
Following the suggestion of #daroczig to use the "dpi" knitr chunk option, I modified your code as follows (see below).
You had set the dev chunk option equal to "svg", which produces very large vector graphics files, especially for images made up of many elements (points, lines, etc.)
I set the dev chunk option back equal to "png", which is the default raster graphics format for HTML output. So you don't need to touch it at all. Keeping the dev chunk option equal to "png" dramatically reduces the HTML output file size.
I set the dpi chunk option equal to 36 (72 is the default), to lower the image resolution, and decrease the HTML output file size further.
I set the out.width and out.height chunk options equal to "600px", to increase the image dimensions.
You can change the dpi, out.width, and out.height options, until you get the HTML output file size and the image dimension to what you want. There's a trade-off between output file size and image resolution.
After knitting the code, I got an HTML output file size equal to 653kB, even when plotting 5e4 data points.
---
title: "Change size of output HTML file by reducing resolution of plot image"
author: "My Name"
date: "September 7, 2015"
output: html_document
---
```{r}
# load ggplot2 silently
suppressWarnings(library(ggplot2))
# chunk option dev="svg" produces very large vector graphics files
knitr::opts_chunk$set(dev="svg")
# chunk option dev="png" is the default raster graphics format for HTML output
knitr::opts_chunk$set(dev="png")
```
```{r, dpi=36, out.width="600px", out.height="600px"}
# chunk option dpi=72 is the default resolution
set.seed(1)
mydf <- data.frame(x=rnorm(5e4),y=rnorm(5e4))
ggplot(mydf, aes(x,y)) + geom_point(alpha=0.6)
```
To prevent scatterplots with many points blowing up the size of your vector graphics (and accordingly html output) you can use geom_point_raster() from the ggrastr package. Eat the cake and have it too!

Following HTML knit - RMarkdown including block of white space

I have working on journaling the visualization of some spatial data using Raster and RMarkdown, but am having a problem with there being a bunch of negative space above each figure. Here is the RMarkdown code (somewhat simplified):
```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.width=12, fig.height=8, echo=FALSE,
warning=FALSE, message=FALSE)
```
```{r r-packages}
library(maptools)
library(raster)
library(rgdal)
```
###Description of data
Data are taken from the National Land Cover Database - 2011 and represent land cover at a 30m X 30m resolution.
location of data: [National Land Cover Database - 2011]('http://gisdata.usgs.gov/TDDS/DownloadFile.php?TYPE=nlcd2006&FNAME=nlcd_2006_landcover_2011_edition_2014_10_10.zip')
###Import raster file for US landcover and shapefile for state borders and counties
```{r Import raster file for us landcover}
rfile <- '~/Documents/Data/nlcd_2006_landcover_2011_edition_2014_10_10/nlcd_2006_landcover_2011_edition_2014_10_10.img' #location of raster data
r1 <- raster(rfile)
##Import shapefile for state borders
statepath <- '~/Documents/Data/'
setwd(statepath)
shp1 <- readOGR(".", "states")
##Transform shapefile to fit raster projection
shp1 <- spTransform(shp1, r1#crs)
##Remove hawaii and alasks which are not in raster image
shp1.sub <- c("Hawaii","Alaska")
states.sub <- shp1[!as.character(shp1$STATE_NAME) %in% shp1.sub, ]
##Import county data
#data source: ftp://ftp2.census.gov/geo/tiger/TIGER2011/COUNTY/tl_2011_us_county.zip
countypath <- '~/Documents/Data/tl_2011_us_county'
setwd(countypath)
shp2 <- readOGR(".", "tl_2011_us_county")
##Transform shapefile to fit raster projection
counties <- spTransform(shp2, r1#crs)
counties.sub <- counties[as.character(counties$STATEFP) %in% states.sub$STATE_FIPS, ]
```
Raster plot of US with state and county border overlays
```{r plot landcover with state borders}
#Plot state borders over raster
plot(r1)
plot(counties.sub, border = "darkgrey",lwd=.65,add=T)
plot(states.sub,border = "darkblue",add=T)
```
Raster cropped and masked to extent of California
```{r crop raster to a single state (California)}
shp.sub <- c("California")
shp.ca <- states.sub[as.character(states.sub$STATE_NAME) %in% shp.sub, ]
r1.crop <- crop(r1, extent(shp.ca))
plot(r1)
```
Everything runs fine, but when the markdown is output to HTML, a bunch of white space is included as well. [Here's the published RPub] (now solved). (http://rpubs.com/pbwilliams/80167). I think this is a Raster problem, as I haven't had this issue with figures, for example, in ggplot.
I have been able to temporarily fix this by shrinking the image down, but anytime I enlarge the picture to anything reasonable, the extra space is added. If anyone knows how to fix this, it would be greatly appreciated.
As suggested in the comments, using the chunk option fig.keep = 'last' should fix this particular problem, since each code chunk seems to have two plots, and the first one is a blank one (you only want to keep the last one).

Plotting a histogram using a csv file

I have a csv file with the following format.
Label 1, 20
Label 2, 10
Label 3, 30
.
.
.
LabelN, 5
How do I plot the second column using the labels given in the csv file as labels on the x-axis?
(Something like this, where 1891-1900 is a label)
EDIT:
Found these questions which are quite similar to mine,
Plotting word frequency histogram using gnuplot
Gnuplot xticlabels with several lines
After trying the commands given in answer 1.
set xtics border in scale 1,0.5 nomirror rotate by -90 offset character 0, 0, 0
plot "data.txt" using 2:xticlabels(1) with histogram
I'm getting a not so clean histogram because the number of labels is quite large. I've tried the formatting given in answer 2. Can anyone suggest a way to get a cleaner histogram?
You have several options:
Plot only the important labels (extremes, mean etc. for example)
Skip every 5th label or so if labels form a series
Split your graph if you must plot every single label.
Seems like case 2) applies here, and thus skipping some of the labels before plotting will make the plot look better.
You can pre-process the file to skip every 5th label (say) using something like the following script:
line_number = 0
for line in open("d1.txt", "r"):
line_split = line.split(",")
if(line_number % 5 == 0):
print line,
else:
print ",",line_split[1],
line_number += 1
You can now plot with appropriate font size
set xtics border in scale 1,0.5 nomirror rotate by -90 offset character 0, 0, 0
set xtics font ",9"
plot "d2.txt" using 2:xticlabels(1) with histogram title "legend_here"

Multiple figures with rhtml and knitr

I have an Rhtml file from which I source a R file.
In this R file I am doing some plots.
p=ggplot(data)
p+geom_line()
Now, I can produce one plot after the other and when doing
knit(".Rhtml") then I get on figure after the other.
But I would like to have the figures side by side.
(Number of figures varies from report to report).
Is there a way to set an option in the Rhtml file, so that
the figures are arranged side by side (e.g. two or three or four columns).
So, actually it would be something like a par(mfrow).
Use out.width to put figures side by side. Here is a reproducible example
## Figures side by side
```{r out.width = '50%', echo = F, message = F}
require(ggplot2)
p0 = qplot(wt, mpg, data = mtcars)
p1 = p0 + geom_smooth()
p0
p1
```
EDIT. If you want your code to show up, or messages to show up, then just add fig.show = "hold" to your chunk options to ensure that your figures are printed after the rest of the chunk, which will then print them side by side since you set out.width = "50%"
See this news from knitr to note when the change was introduced.
Plots can be combined with the gridExtra package. If you have, e.g., three plots (p1, p2, and p3), the command is:
library(gridExtra)
newPlot <- grid.arrange(p1, p2, p3)
Have a look at the gridExtra package for more details.

R Markdown HTML Number Figures

Does anyone know how to number the figures in the captions, for HTML format R Markdown script?
For PDF documents, the caption will say something like:
Figure X: Some Caption Text
However, the equivalent caption for the HTML version will simply say:
Some Caption Text
This makes cross-referencing figures by number completely useless.
Here is a minimal example:
---
title: "My Title"
author: "Me"
output:
pdf_document: default
html_document: default
---
```{r cars, fig.cap = "An amazing plot"}
plot(cars)
```
```{r cars2, fig.cap = "Another amazing plot"}
plot(cars)
```
I have tried setting toc, fig_caption and number_sections within each of the output formats, but this does not seem to change the result.
The other answers provided are relatively out of date, and this has since been made very easy using the bookdown package. This package provides a number of improvements which includes the built-in numbering of figures across Word, HTML and PDF.
To be able to use bookdown, you need to first install the package install.packages("bookdown") and then use one of the output formats. For HTML, this is html_document2. Taking your example:
---
title: "My Title"
author: "Me"
date: "1/1/2016"
output: bookdown::html_document2
---
```{r cars, fig.cap = "An amazing plot"}
plot(cars)
```
```{r cars2, fig.cap = "Another amazing plot"}
plot(cars)
```
These Figures will be numbered Figure 1 and Figure 2. Providing the code chunk is named and has a caption, we can cross reference the output using the the syntax \#ref(fig:foo) where foo is the name of the chunk i.e. \#ref(fig-cars). You can learn more about this behaviour here
Further Reading
R Markdown: The definitive Guide: Chapter 11 provides a great overview of bookdown
Authoring books with bookdown provides a comprehensive guide on bookdown, and recommended for more advanced details.
So unless someone has a better solution, this is the solution that I came up with, there are some flaws with this approach (for example, if the figure/table number is dependent on the section number etc...), but for the basic html document, it works.
Somewhere at the top of you document, run this:
```{r echo=FALSE}
#Determine the output format of the document
outputFormat = opts_knit$get("rmarkdown.pandoc.to")
#Figure and Table Caption Numbering, for HTML do it manually
capTabNo = 1; capFigNo = 1;
#Function to add the Table Number
capTab = function(x){
if(outputFormat == 'html'){
x = paste0("Table ",capTabNo,". ",x)
capTabNo <<- capTabNo + 1
}; x
}
#Function to add the Figure Number
capFig = function(x){
if(outputFormat == 'html'){
x = paste0("Figure ",capFigNo,". ",x)
capFigNo <<- capFigNo + 1
}; x
}
```
Then during the course of your document, if say you want to plot a figure:
```{r figA,fig.cap=capFig("My Figure Caption")
base = ggplot(data=data.frame(x=0,y=0),aes(x,y)) + geom_point()
base
```
Substitute the capFig to capTab in the above, if you want a table caption.
We can make use of pandoc-crossref, a filter that allows a cross-referencing of figures, tables, sections, and equations and works for all output format. The easiest way is to cat the figure label (in the form of {#fig:figure_label}) after each plot, although this requires echo=FALSE and results='asis'. Then we can reference a figure as we would a citation : [#fig:figure_label] produces fig. figure_number by default.
Here is a MWE:
---
output:
html_document:
toc: true
number_sections: true
fig_caption: true
pandoc_args: ["-F","pandoc-crossref"]
---
```{r}
knitr::opts_chunk$set(echo=FALSE,results='asis')
```
```{r plot1,fig.cap="This is plot one"}
x <- 1:10
y <- rnorm(10)
plot(x,y)
cat("{#fig:plot1}")
```
As we can see in [#fig:plot1]... whereas [#fig:plot2] shows...
```{r plot2, fig.cap="This is plot two"}
plot(y,x)
cat("{#fig:plot2}")
```
which produces (removing the graphics
PLOT1
Figure 1: This is plot one
As we can see in fig. 1… whereas fig. 2 shows…
PLOT2
Figure 2: This is plot two
See the pandoc-crossref readme for more options and customizations.
To install pandoc-crossref, assuming you have a haskell installation:
cabal update
cabal install pandoc-crossref
I solve cross-referencing using a solution similar to that posted by Nicholas above. I use bookdown for some projects but I find that awkward to use for other projects where I just want simple cross-referencing.
I use the following when I am writing a paper with rmarkdown and I want it in standard format for submission to a journal. I want a figure legend at the end, then tables, then I'll have the tables and figures. As I am writing, I only have a rough idea of what order the figures will be referenced in the text. I just want to reference them with a text code like fig:foobar and have the number assigned based appearance in the text. When I look at the figure legend list, I'll see what order to put the legends and will move legends around as needed.
Here's my structure.
I have an R package where I have things I need for papers, like various bibliographies and helper R functions. In that package, I have the following function which uses some variables defined in the main Rmd environment: .rmdenvir and .rmdctr .
ref <- function(useName) {
require(stringr)
if(!exists(".refctr")) .refctr <- c(`_` = 0)
if(any(names(.refctr)==useName)) return(.refctr[useName])
type=str_split(useName,":")[[1]][1]
nObj <- sum(str_detect(names(.refctr),type))
useNum <- nObj + 1
newrefctr <- c(.refctr, useNum)
names(newrefctr)[length(.refctr) + 1] <- useName
assign(".refctr", newrefctr, envir=.rmdenvir)
return(useNum)
}
It assumes that I name things I want referenced with something like cntname:foo, for example fig:foo. It makes a new counter for each one and I can make up new counters on the fly (while writing) if needed.
In my main Rmd file, I have some set-up lines:
```{r setup_main}
require(myPackageforPapers)
# here is where the variables needed by ref() are defined.
.rmdenvir = environment()
.refctr <- c(`_` = 0)
````
In the text I use the following
You can see what I am trying to show in Figure `r ref("fig:foo")`
and you can see it also in Tables `r ref("tab:foo")`
and A`r ref("tabappA:foobig")`.
to get "You can see what I am trying to show in Figure 1 and you can see it also in Tables 1 and A1." Although the numbers might not be 1; the number to use will be dynamically determined. I don't have to use a special function for the first time I reference a figure, table or whatever I am counting. ref() figures that out by looking to see if the label exists already. If not it assigns the next number, and returns it. So you don't have to use "label" in one place and "ref" in another.
In the course of writing, I might decide that appendix A is getting too big, and that I will split off some of the tables into an appendix B. All I need to do is change the above to
You can see what I am trying to show in Figure `r ref("fig:foo")`
and you can see it also in Tables `r ref("tab:foo")`
and B`r ref("tabappB:foobig")`.
I just specify a new counter name 'tabappB' and the numbers for that are dynamically determined.
At the end of my Rmd file, I have a figure list that will look like
# Figure Legends
Figure `r ref("fig:foo")`. This is the legend for this figure.
Figure `r ref("fig:foo2")`. This is the legend for another figure.
Then my tables appear like so
```{r print-tablefoo, echo=FALSE}
tablefoo=mtcars
thecap = "Tables appear with a legend while figures do not."
fullcap = paste("Table ", ref("tab:foo"), ". ", thecap, sep="")
kable(tablefoo, caption=fullcap)
```
and then the figures like so:
```{r fig-foo, echo=FALSE, fig.cap=paste("Figure",ref("fig:foo"))}
plot(1,1)
```
Appendix A is an Rmd file that included as a child. It will have tables like
```{r print-tableAfoo, echo=FALSE}
tablefoo=mtcars
thecap = "This is a legend."
fullcap = paste("Table A", ref("tabappA:foobig"), ". ", thecap, sep="")
kable(tablefoo, caption=fullcap)
```
I do have to add the "A" to get Table A1, but I find it easier if R doesn't think too much for me in terms of labelling my counters. I just I want it to return the right number.
The cross-referencing works for html, pdf/latex or word. I'd happily stick with latex solutions, but my co-authors use word so I need a solution that works with pandoc and word. Also sometimes I want html or some other output and I need a solution that works for any output that works with rmarkdown.