trelliscopejs R package only one trelliscope figure visible in html file - html

I am having a problem rendering more than one trelliscopejs displays in an html file created with Rmarkdown. I'm using self_contained=TRUE in order to render displays in html. The problem is that only the first display is rendered correctly: whereas the rest of them is rendered as blank spaces: I'm using some examples from official tutorial. The whole .rmd file is posted below:
---
title: "Testy trelliscopejs"
author: "balkon16"
date: "22 lipca 2018"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(message = FALSE, warning = FALSE) #suppress ggplot2 warnings
```
```{r}
library(trelliscopejs)
library(ggplot2)
library(dplyr)
```
```{r}
library(tidyr)
library(rbokeh)
d <- mpg %>%
group_by(manufacturer, class) %>%
nest() %>%
mutate(panel = map_plot(data,
~ figure(xlab = "City mpg", ylab = "Highway mpg") %>%
ly_points(cty, hwy, data = .x)))
d
```
```{r}
d %>%
trelliscope(name = "city_vs_highway_mpg", self_contained = TRUE)
```
```{r}
mpg %>%
group_by(manufacturer, class) %>%
summarise(
mean_city_mpg = cog(mean(cty), desc = "Mean city mpg"),
mean_hwy_mpg = cog(mean(hwy), desc = "Mean highway mpg"),
panel = panel(
figure(xlab = "City mpg", ylab = "Highway mpg",
xlim = c(7, 37), ylim = c(9, 47)) %>%
ly_points(cty, hwy,
hover = data_frame(model = paste(year, model),
cty = cty, hwy = hwy)))) %>%
trelliscope(name = "city_vs_highway_mpg", nrow = 1, ncol = 2,
self_contained = TRUE)
```
```{r}
qplot(x = 0, y = cty, data = mpg, geom = c("boxplot", "jitter")) +
facet_trelliscope(~ class, ncol = 7, height = 800, width = 200,
state = list(sort = list(sort_spec("cty_mean"))), self_contained = TRUE) +
ylim(7, 37) + theme_bw()
```
It seems that I'm using the newest version of trelliscopejs as devtools::install_github("hafen/trelliscopejs") returns:
Skipping install of 'trelliscopejs' from a github remote, the SHA1 (4be901e4) has not changed since last install.
Use `force = TRUE` to force installation
Here's my session info:
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
[5] LC_TIME=Polish_Poland.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Cairo_1.5-9 bindrcpp_0.2.2 dplyr_0.7.6 ggplot2_3.0.0.9000
[5] trelliscopejs_0.1.13 rbokeh_0.5.0 tidyr_0.8.1
loaded via a namespace (and not attached):
[1] progress_1.2.0 gistr_0.4.2 tidyselect_0.2.4
[4] purrr_0.2.5 lattice_0.20-35 colorspace_1.3-2
[7] htmltools_0.3.6 yaml_2.1.19 base64enc_0.1-3
[10] utf8_1.1.3 rlang_0.2.1 hexbin_1.27.2
[13] pillar_1.2.2 glue_1.3.0 withr_2.1.2
[16] pryr_0.1.4 bindr_0.1.1 plyr_1.8.4
[19] stringr_1.3.1 munsell_0.4.3 gtable_0.2.0
[22] devtools_1.13.5 htmlwidgets_1.2 memoise_1.1.0
[25] codetools_0.2-15 evaluate_0.10.1 labeling_0.3
[28] knitr_1.20 curl_3.2 Rcpp_0.12.17
[31] scales_0.5.0 backports_1.1.2 checkmate_1.8.5
[34] DistributionUtils_0.5-1 webshot_0.5.0 jsonlite_1.5
[37] hms_0.4.2 digest_0.6.15 stringi_1.1.7
[40] grid_3.5.1 rprojroot_1.3-2 cli_1.0.0
[43] tools_3.5.1 magrittr_1.5 maps_3.3.0
[46] lazyeval_0.2.1.9000 autocogs_0.0.1 tibble_1.4.2
[49] crayon_1.3.4 pkgconfig_2.0.1 rsconnect_0.8.8
[52] prettyunits_1.0.2 assertthat_0.2.0 rmarkdown_1.10
[55] httr_1.3.1 rstudioapi_0.7 R6_2.2.2
[58] mclust_5.4.1 git2r_0.21.0 compiler_3.5.1

I found the following works.
You need to do two things in the facet_trelliscope function:
Specify the path variable. For each facet_trelliscope specify one
unique folder.
Set self_contained be TRUE.

Related

R Shiny app loads, but radio buttons do not select values properly

This is my first time using stack overflow so apologies if I do this wrong.
I'm fairly new to coding in R and I'm trying to make a simple Shiny app using a TidyTuesday dataset. I wanted to make a map with points showing the different types of water systems ("water_tech") and radio buttons to choose which type of water system is plotted on the map. I got the app to load without an error message, however no matter which button is selected, all of the different types of water systems are plotted on the map, not just the one I selected (essentially, the buttons don't work). If anyone has any ideas about what could be causing this to happen I would greatly appreciate it!
Reproducible code:
### Load Libraries
library(shiny)
#> Warning: package 'shiny' was built under R version 4.0.4
library(shinythemes)
#> Warning: package 'shinythemes' was built under R version 4.0.4
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.0.5
#> Warning: package 'tibble' was built under R version 4.0.5
#> Warning: package 'tidyr' was built under R version 4.0.5
#> Warning: package 'dplyr' was built under R version 4.0.5
library(here)
#> here() starts at C:/Users/eruks/AppData/Local/Temp/Rtmp2jxqLH/reprex-2a306cec2120-white-boto
library(rnaturalearth)
#> Warning: package 'rnaturalearth' was built under R version 4.0.5
library(rnaturalearthdata)
#> Warning: package 'rnaturalearthdata' was built under R version 4.0.5
library(sf)
#> Warning: package 'sf' was built under R version 4.0.5
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
### Load Data
water <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-05-04/water.csv')
#>
#> -- Column specification --------------------------------------------------------
#> cols(
#> row_id = col_double(),
#> lat_deg = col_double(),
#> lon_deg = col_double(),
#> report_date = col_character(),
#> status_id = col_character(),
#> water_source = col_character(),
#> water_tech = col_character(),
#> facility_type = col_character(),
#> country_name = col_character(),
#> install_year = col_double(),
#> installer = col_character(),
#> pay = col_character(),
#> status = col_character()
#> )
### User Interface
ui <- fluidPage(theme = shinytheme("spacelab"),
# Application title
titlePanel("Water Access Points in Africa"),
# Sidebar with radio buttons for choosing which type of water system
sidebarLayout(
sidebarPanel(
radioButtons(inputId = "water_tech",
label = "Water system:",
choices = c("Hand Pump", "Hydram", "Kiosk", "Mechanized Pump", "Rope and Bucket", "Tapstand"),
selected = "Hand Pump")
),
mainPanel(
plotOutput("water_plot")
)
)
)
server <- function(input, output) {
water_clean <- water %>%
drop_na(water_tech) %>%
mutate(water_tech = ifelse(str_detect(water_tech, "Hand Pump"), "Hand Pump", water_tech),
water_tech = ifelse(str_detect(water_tech, "Mechanized Pump"), "Mechanized Pump", water_tech),
water_tech = as.factor(water_tech)) %>%
select(2, 3, 7, 9) %>%
filter(lon_deg > -25 & lon_deg < 52 & lat_deg > -40 & lat_deg < 35)
africa <- ne_countries(scale = "medium", returnclass = "sf", continent = "Africa")
rwater <- reactive({
water_clean %>%
filter(water_tech == input$water_tech)
})
output$water_plot <- renderPlot({
rwater() %>%
ggplot() +
geom_sf(data = africa,
fill = "#ffffff") +
geom_point(data = water_clean,
aes(x = lon_deg,
y = lat_deg,
color = water_tech)) +
theme_bw() +
theme(panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank()) +
labs(x = "",
y = "")
})
}
# Run the application
shinyApp(ui = ui, server = server)
#> PhantomJS not found. You can install it with webshot::install_phantomjs(). If it is installed, please make sure the phantomjs executable can be found via the PATH variable.
```
<div style="width: 100% ; height: 400px ; text-align: center; box-sizing: border-box; -moz-box-sizing: border-box; -webkit-box-sizing: border-box;" class="muted well">Shiny applications not supported in static R Markdown documents</div>
<sup>Created on 2021-05-05 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0)</sup>```
Thank you :)
rwater() has no effect in this code:
rwater() %>%
ggplot() +
geom_sf(data = africa,
fill = "#ffffff") +
geom_point(data = water_clean,
aes(x = lon_deg,
y = lat_deg,
color = water_tech))
because you enter the water_clean data in geom_point.
I think you want:
ggplot() +
geom_sf(data = africa,
fill = "#ffffff") +
geom_point(data = rwater(),
aes(x = lon_deg,
y = lat_deg,
color = water_tech))

Extracting html text using R - can't access some nodes

I have a large number of water take permits that are available online and I want to extract some data from them. For example
url <- "https://www.ecan.govt.nz/data/consent-search/consentdetails/CRC000002.1"
I don't know html at all, but have been plugging away with help from google and a friend. I can get to some of the nodes without any issues using the xpath or css selector, for instance to get to the title:
library(rvest)
url %>%
read_html() %>%
html_nodes(xpath = '//*[#id="main"]/div/h1') %>%
html_text()
[1] "Details for CRC000002.1"
Or using the css selectors:
url %>%
read_html() %>%
html_nodes(css = "#main") %>%
html_nodes(css = "div") %>%
html_nodes(css = "h1") %>%
html_text()
[1] "Details for CRC000002.1"
So far, so good, but the information I actually want is buried a bit deeper and I can't seem to get to it. For instance, the client name field ("Killermont Station Limited", in this case) has this xpath:
clientxpath <- '//*[#id="main"]/div/div[1]/div/table/tbody/tr[1]/td[2]'
url %>%
read_html() %>%
html_nodes(xpath = clientxpath) %>%
html_text()
character(0)
The css selectors gets quite convoluted, but I get the same result. The help file for html_nodes() says:
# XPath selectors ---------------------------------------------
# chaining with XPath is a little trickier - you may need to vary
# the prefix you're using - // always selects from the root noot
# regardless of where you currently are in the doc
But it doesn't give me any clues on what to use an an alternative prefix in the xpath (might be obvious if I knew html).
My friend pointed out that some of the document is in javascript (ajax), which may be part of the problem too. That said, the bit I'm trying to get to above shows up in the html, but it is within a node called 'div.ajax-block'.
css selectors: #main > div > div.ajax-block > div > table > tbody > tr:nth-child(1) > td:nth-child(4)
Can anyone help? Thanks!
It's super disconcerting that most if not all SO R contributors default to "use a heavyweight third-party dependency" in curt "answers" when it comes to scraping. 99% of the time you don't need Selenium. You just need to exercise the little gray cells.
First, a big clue that the page loads content asynchronously is the wait-spinner that appears. The second one is in your snippet where the div actually has part of a selector name with ajax in it. Tell-tale signs XHR requests are in-play.
If you open Developer Tools in your browser and reload the page then go to Network and then the XHR tab you'll see:
Most of the "real" data on the page is loaded dynamically. We can write httr calls that mimic the browser calls.
However…
We first need to make one GET call to the main page to prime some cookies which will be carried over for us and then find a per-generated session token that's used to prevent abuse of the site. It's defined using JavaScript so we'll use the V8 package to evaluate it. We could have just use regular expressions to find the string. Do whatev you like.
library(httr)
library(rvest)
library(dplyr)
library(V8)
ctx <- v8() # we need this to eval some javascript
# Prime Cookies -----------------------------------------------------------
res <- httr::GET("https://www.ecan.govt.nz/data/consent-search/consentdetails/CRC000002.1")
httr::cookies(res)
## domain flag path secure expiration name
## 1 .ecan.govt.nz TRUE / FALSE 2019-11-24 11:46:13 visid_incap_927063
## 2 .ecan.govt.nz TRUE / FALSE <NA> incap_ses_148_927063
## value
## 1 +p8XAM6uReGmEnVIdnaxoxWL+VsAAAAAQUIPAAAAAABjdOjQDbXt7PG3tpBpELha
## 2 nXJSYz8zbCRj8tGhzNANAhaL+VsAAAAA7JyOH7Gu4qeIb6KKk/iSYQ==
pg <- httr::content(res)
html_node(pg, xpath=".//script[contains(., '_monsido')]") %>%
html_text() %>%
ctx$eval()
## [1] "2"
monsido_token <- ctx$get("_monsido")[1,2]
Here's the searchlist (which is, indeed empty):
httr::VERB(
verb = "POST", url = "https://www.ecan.govt.nz/data/document-library/searchlist",
httr::add_headers(
Referer = "https://www.ecan.govt.nz/data/consent-search/consentdetails/CRC000002.1",
`X-Requested-With` = "XMLHttpRequest",
TE = "Trailers"
), httr::set_cookies(
monsido = monsido_token
),
body = list(
name = "CRC000002.1",
pageSize = "999999"
),
encode = "form"
) -> res
httr::content(res)
## NULL ## <<=== this is OK as there is no response
Here's the "Consent Overview" section:
httr::GET(
url = "https://www.ecan.govt.nz/data/consent-search/consentoverview/CRC000002.1",
httr::add_headers(
Referer = "https://www.ecan.govt.nz/data/consent-search/consentdetails/CRC000002.1",
Authority = "www.ecan.govt.nz",
`X-Requested-With` = "XMLHttpRequest"
),
httr::set_cookies(
monsido = monsido_token
)
) -> res
httr::content(res) %>%
html_table() %>%
glimpse()
## List of 1
## $ :'data.frame': 5 obs. of 4 variables:
## ..$ X1: chr [1:5] "RMA Authorisation Number" "Consent Location" "To" "Commencement Date" ...
## ..$ X2: chr [1:5] "CRC000002.1" "Manuka Creek, KILLERMONT STATION" "To take water from Manuka Creek at or about map reference NZMS 260 H39:5588-2366 for irrigation of up to 40.8 hectares." "29 Apr 2010" ...
## ..$ X3: chr [1:5] "Client Name" "State" "To take water from Manuka Creek at or about map reference NZMS 260 H39:5588-2366 for irrigation of up to 40.8 hectares." "29 Apr 2010" ...
## ..$ X4: chr [1:5] "Killermont Station Limited" "Issued - Active" "To take water from Manuka Creek at or about map reference NZMS 260 H39:5588-2366 for irrigation of up to 40.8 hectares." "29 Apr 2010" ...
Here are the "Consent Conditions":
httr::GET(
url = "https://www.ecan.govt.nz/data/consent-search/consentconditions/CRC000002.1",
httr::add_headers(
Referer = "https://www.ecan.govt.nz/data/consent-search/consentdetails/CRC000002.1",
Authority = "www.ecan.govt.nz",
`X-Requested-With` = "XMLHttpRequest"
),
httr::set_cookies(
monsido = monsido_token
)
) -> res
httr::content(res) %>%
as.character() %>%
substring(1, 300) %>%
cat()
## <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
## <html><body><div class="consentDetails">
## <ul class="unstyled-list">
## <li>
##
##
## <strong class="pull-left">1</strong> <div class="pad-left1">The rate at which wa
Here's the "Consent Related":
httr::GET(
url = "https://www.ecan.govt.nz/data/consent-search/consentrelated/CRC000002.1",
httr::add_headers(
Referer = "https://www.ecan.govt.nz/data/consent-search/consentdetails/CRC000002.1",
Authority = "www.ecan.govt.nz",
`X-Requested-With` = "XMLHttpRequest"
),
httr::set_cookies(
monsido = monsido_token
)
) -> res
httr::content(res) %>%
as.character() %>%
substring(1, 300) %>%
cat()
## <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
## <html><body>
## <p>There are no related documents.</p>
##
##
##
##
##
## <div class="summary-table-wrapper">
## <table class="summary-table left">
## <thead><tr>
## <th>Relationship</th>
## <th>Recor
Here's the "Workflow:
httr::GET(
url = "https://www.ecan.govt.nz/data/consent-search/consentworkflow/CRC000002.1",
httr::add_headers(
Referer = "https://www.ecan.govt.nz/data/consent-search/consentdetails/CRC000002.1",
Authority = "www.ecan.govt.nz",
`X-Requested-With` = "XMLHttpRequest"
),
httr::set_cookies(
monsido = monsido_token
)
) -> res
httr::content(res)
## {xml_document}
## <html>
## [1] <body><p>No workflow</p></body>
Here are the "Consent Flow Restrictions":
httr::GET(
url = "https://www.ecan.govt.nz/data/consent-search/consentflowrestrictions/CRC000002.1",
httr::add_headers(
Referer = "https://www.ecan.govt.nz/data/consent-search/consentdetails/CRC000002.1",
Authority = "www.ecan.govt.nz",
`X-Requested-With` = "XMLHttpRequest"
),
httr::set_cookies(
monsido = monsido_token
)
) -> res
httr::content(res) %>%
as.character() %>%
substring(1, 300) %>%
cat()
## <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
## <html><body><div class="summary-table-wrapper">
## <table class="summary-table left">
## <thead>
## <th colspan="2">Low Flow Site</th>
## <th>Todays Flow <span class="lower">(m3/s)</span>
## </th>
You still need to parse HTML but now you can do it all with just plain R packages.

R Webscraping RCurl and httr Content

I'm learning a bit about webscraping and I'm having a little doubt regarding 2 packages (httr and RCurl), I'm trying to get a code from a magazine (ISSN) on the researchgate website and I came across a situation. When extracting the content from the site by httr and RCurl, I get the ISSN in the RCurl package and in httr my function is returning NULL, could anyone tell me why this? in my opinion it was for both functions to be working. Follow the code below.
library(rvest)
library(httr)
library(RCurl)
url <- "https://www.researchgate.net/journal/0730-0301_Acm_Transactions_On_Graphics"
########
# httr #
########
conexao <- GET(url)
conexao_status <- http_status(conexao)
conexao_status
content(conexao, as = "text", encoding = "utf-8") %>% read_html() -> webpage1
ISSN <- webpage1 %>%
html_nodes(xpath = '//*/div/div[2]/div[1]/div[1]/table[2]/tbody/tr[7]/td') %>%
html_text %>%
str_to_title() %>%
str_split(" ") %>%
unlist
ISSN
########
# RCurl #
########
options(RCurlOptions = list(verbose = FALSE,
capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"),
ssl.verifypeer = FALSE))
webpage <- getURLContent(url) %>% read_html()
ISSN <- webpage %>%
html_nodes(xpath = '//*/div/div[2]/div[1]/div[1]/table[2]/tbody/tr[7]/td') %>%
html_text %>%
str_to_title() %>%
str_split(" ") %>%
unlist
ISSN
sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build
9200)
Matrix products: default
locale: [1] LC_COLLATE=Portuguese_Brazil.1252
LC_CTYPE=Portuguese_Brazil.1252 LC_MONETARY=Portuguese_Brazil.1252
[4] LC_NUMERIC=C LC_TIME=Portuguese_Brazil.1252
attached base packages: [1] stats graphics grDevices utils
datasets methods base
other attached packages: [1] testit_0.7 dplyr_0.7.4
progress_1.1.2 readxl_1.1.0 stringr_1.3.0 RCurl_1.95-4.10
bitops_1.0-6 [8] httr_1.3.1 rvest_0.3.2 xml2_1.2.0
jsonlite_1.5
loaded via a namespace (and not attached): [1] Rcpp_0.12.16
bindr_0.1.1 magrittr_1.5 R6_2.2.2 rlang_0.2.0
tools_3.5.0 [7] yaml_2.1.19 assertthat_0.2.0
tibble_1.4.2 bindrcpp_0.2.2 curl_3.2 glue_1.2.0
[13] stringi_1.1.7 pillar_1.2.2 compiler_3.5.0
cellranger_1.1.0 prettyunits_1.0.2 pkgconfig_2.0.1
Because the content type is JSON and not HTML, you can't use read_html() on it:
> conexao
Response [https://www.researchgate.net/journal/0730-0301_Acm_Transactions_On_Graphics]
Date: 2018-06-02 03:15
Status: 200
Content-Type: application/json; charset=utf-8
Size: 328 kB
Use fromJSON() instead to extract issn:
library(jsonlite)
result <- fromJSON(content(conexao, as = "text", encoding = "utf-8") )
result$result$data$journalFullInfo$data$issn
result:
> result$result$data$journalFullInfo$data$issn
[1] "0730-0301"

Failing to merge html files in R

So I have two html pages, html_1.html and html_2.html I I would like to stack them one on top of the other in R. How to do that?
example:
library(dygraphs)
m1 = dygraph(nhtemp, main = "New Haven Temperatures") %>%
dyRangeSelector(dateWindow = c("1920-01-01", "1960-01-01"))
m2 = dygraph(nhtemp, main = "New Haven Temperatures") %>%
dyRangeSelector(dateWindow = c("1920-01-01", "1960-01-01"))
library(htmltools)
save_html(m1, file = 'm1.html')
save_html(m2, file = 'm2.html')
##Now load and merge m1.html and m2.html
The simplest way is to use an R markdown document:
---
title: ""
output: html_document
---
```{r echo=FALSE, message=FALSE, warning=FALSE}
library(dygraphs)
dygraph(nhtemp, main = "New Haven Temperatures", elementId = "a") %>%
dyRangeSelector(dateWindow = c("1920-01-01", "1960-01-01"))
dygraph(nhtemp, main = "New Haven Temperatures", elementId = "b") %>%
dyRangeSelector(dateWindow = c("1920-01-01", "1960-01-01"))
```
That takes care of many complex things for you.
The heavyweight way is to build the page on your own without getting into the gnarly details of widget javascript dependencies:
library(dygraphs)
library(htmlwidgets)
library(htmltools)
w1 <- dygraph(nhtemp, main = "New Haven Temperatures", elementId = "a") %>%
dyRangeSelector(dateWindow = c("1920-01-01", "1960-01-01"))
w2 <- dygraph(nhtemp, main = "New Haven Temperatures", elementId = "b") %>%
dyRangeSelector(dateWindow = c("1920-01-01", "1960-01-01"))
saveWidget(w1, "w1.html")
saveWidget(w2, "w2.html")
w1_src <- sprintf("data:text/html;base64,%s", openssl::base64_encode(rawToChar(readBin("w1.html", "raw", file.size("w1.html")))))
w2_src <- sprintf("data:text/html;base64,%s", openssl::base64_encode(rawToChar(readBin("w2.html", "raw", file.size("w2.html")))))
tags$html(
tags$body(
tags$iframe(src=w1_src, seamless="", frameborder="0", allowtransparency="true", scrolling="no", style="width:100%;height:400px"),
tags$iframe(src=w2_src, seamless="", frameborder="0", allowtransparency="true", scrolling="no", style="width:100%;height:400px")
)
) %>%
save_html("bothwidgets.html")
You can't just save_html() a widget since they depend on components that get automagically incorporated for you. You need to use iframes in the second approach unless you want to deal with widget dependency de-duplication and proper component inclusion for a lighter weight document on your own.
In this case, the difference isn't too bad, but still substantial. The first output file is 1.3MB the second is 2MB.
Note that you'll likely need to size the iframes better than I did in a production environment.

R - MLR - Classifier Calibration - Benchmark Results

I've run a benchmark experiment with nested cross validation (tuning + performance measurement) for a classification problem and would like to create calibration charts.
If I pass a benchmark result object to generateCalibrationData, what does plotCalibration do? Is it averaging? If so how?
Does it make sense to have an aggregate = FALSE option to understand variability across folds as per generateThreshVsPerfData for ROC curves?
In response to #Zach's request for a reproducible example, I (the OP) edit my original post as follows:
Edit: Reproducible Example
# Practice Data
library("mlr")
library("ROCR")
library(mlbench)
data(BreastCancer)
dim(BreastCancer)
levels(BreastCancer$Class)
head(BreastCancer)
BreastCancer <- BreastCancer[, -c(1, 6, 7)]
BreastCancer$Cl.thickness <- as.factor(unclass(BreastCancer$Cl.thickness))
BreastCancer$Cell.size <- as.factor(unclass(BreastCancer$Cell.size))
BreastCancer$Cell.shape <- as.factor(unclass(BreastCancer$Cell.shape))
BreastCancer$Marg.adhesion <- as.factor(unclass(BreastCancer$Marg.adhesion))
head(BreastCancer)
# Define Nested Cross-Validation Strategy
cv.inner <- makeResampleDesc("CV", iters = 2, stratify = TRUE)
cv.outer <- makeResampleDesc("CV", iters = 6, stratify = TRUE)
# Define Performance Measures
perf.measures <- list(auc, mmce)
# Create Task
bc.task <- makeClassifTask(id = "bc",
data = BreastCancer,
target = "Class",
positive = "malignant")
# Create Tuned KSVM Learner
ksvm <- makeLearner("classif.ksvm",
predict.type = "prob")
ksvm.ps <- makeParamSet(makeDiscreteParam("C", values = 2^(-2:2)),
makeDiscreteParam("sigma", values = 2^(-2:2)))
ksvm.ctrl <- makeTuneControlGrid()
ksvm.lrn = makeTuneWrapper(ksvm,
resampling = cv.inner,
measures = perf.measures,
par.set = ksvm.ps,
control = ksvm.ctrl,
show.info = FALSE)
# Create Tuned Random Forest Learner
rf <- makeLearner("classif.randomForest",
predict.type = "prob",
fix.factors.prediction = TRUE)
rf.ps <- makeParamSet(makeDiscreteParam("mtry", values = c(2, 3, 5)))
rf.ctrl <- makeTuneControlGrid()
rf.lrn = makeTuneWrapper(rf,
resampling = cv.inner,
measures = perf.measures,
par.set = rf.ps,
control = rf.ctrl,
show.info = FALSE)
# Run Cross-Validation Experiments
bc.lrns = list(ksvm.lrn, rf.lrn)
bc.bmr <- benchmark(learners = bc.lrns,
tasks = bc.task,
resampling = cv.outer,
measures = perf.measures,
show.info = FALSE)
# Calibration Charts
bc.cal <- generateCalibrationData(bc.bmr)
plotCalibration(bc.cal)
Produces the following:
Aggregared Calibration Plot
Attempting to un-aggregate leads to:
> bc.cal <- generateCalibrationData(bc.bmr, aggregate = FALSE)
Error in generateCalibrationData(bc.bmr, aggregate = FALSE) :
unused argument (aggregate = FALSE)
>
> sessionInfo()
R version 3.2.3 (2015-12-10)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mlbench_2.1-1 ROCR_1.0-7 gplots_3.0.1 mlr_2.9
[5] stringi_1.1.1 ParamHelpers_1.10 ggplot2_2.1.0 BBmisc_1.10
loaded via a namespace (and not attached):
[1] digest_0.6.9 htmltools_0.3.5 R6_2.2.0 splines_3.2.3
[5] scales_0.4.0 assertthat_0.1 grid_3.2.3 stringr_1.0.0
[9] bitops_1.0-6 checkmate_1.8.2 gdata_2.17.0 survival_2.38-3
[13] munsell_0.4.3 tibble_1.2 randomForest_4.6-12 httpuv_1.3.3
[17] parallelMap_1.3 mime_0.5 DBI_0.5-1 labeling_0.3
[21] chron_2.3-47 shiny_1.0.0 KernSmooth_2.23-15 plyr_1.8.4
[25] data.table_1.9.6 magrittr_1.5 reshape2_1.4.1 kernlab_0.9-25
[29] ggvis_0.4.3 caTools_1.17.1 gtable_0.2.0 colorspace_1.2-6
[33] tools_3.2.3 parallel_3.2.3 dplyr_0.5.0 xtable_1.8-2
[37] gtools_3.5.0 backports_1.0.4 Rcpp_0.12.4
no plotCalibration doesn't do any averaging, though it can plot a smooth.
if you call generateCalibrationData on a benchmark result object it will treat each iteration of your resampled predictions as exchangeable and compute the calibration across all resampled predictions for that bin.
yes it probably would make sense to have an option to generate an unaggregated calibration data object and be able to plot it. you are welcome to open an issue on GitHub to that effect, but this is going to be low on my priority list TBH.