tbl_uvregression for lme4 objects - lme4

I am trying to get a univariate regression table using tbl_uvregression from gtsummary. I am running these regression models with lme4 and I am not sure where and how to specify the random effect. Here's an example using the trial data from the survival package.
library(lme4)
#> Loading required package: Matrix
library(gtsummary)
library(survival)
data(trial)
trial %>%
tbl_uvregression(
method = glmer,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE,
pvalue_fun = function(x) style_pvalue(x, digits = 2),
formula = "{y} ~ {x}+ {1|grade}"
)
#> Error: Problem with `mutate()` input `formula_chr`.
#> x object 'grade' not found
#> i Input `formula_chr` is `glue(formula)`.
Created on 2020-09-28 by the reprex package (v0.3.0)
Please help

For the RE in the model do not specify with the {} instead use ().
library(lme4)
#> Loading required package: Matrix
library(gtsummary)
library(survival)
data(trial)
trial %>%
tbl_uvregression(
method = glmer,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE,
pvalue_fun = function(x) style_pvalue(x, digits = 2),
formula = "{y} ~ {x}+ (1|grade)"
)

Related

How can I adjust my pcor model for confounders and do it for many models at one time?

I have a dataset with many columns. First column is the outcome (Test)(Dependent variable, y). Columns 2-32 are confounders. Finally, columns 33-54 are miRNAs (expression)(Independent variable, x).
I want to do a partial correlation (to obtain p-value and estimate) between each one of the independent variables with the dependent variable, adjusting by confounders. Since my variables don't follow a normal distribution, I want to use Spearman method.
I don't want to put all of them in the same model, I want different models, one by one. That is:
Model 1: Test vs miRNA1 by confounders
Model 2: Test vs miRNA2 by confounders
[...]
Model 21: Test vs miRNA21 by confounders
I tried with an auxiliary function. But it doesn't work. Any help? Thanks :)
The script is here:
#data
n <- 10000
nc <- 30
nm <- 20
y <- rnorm(n = n)
X <- matrix(rnorm(n = n*(nc+nm)), ncol = nc + nm)
df <- data.frame(y = y, X)
#variable names
confounders <- colnames(df)[2:31]
mirnas <- colnames(df)[32:51]
#auxiliar regression function
pcor_fun <- function(data, y_col, X_cols) {
formula <- as.formula(paste(y_col, X_cols))
pcor <- pcor.test(formula = formula, data = data, method = "spearman")
pcor_summary <- summary(pcor)$coef
return(pcor_summary)
}
#simple linear regressions
lm_list1 <- lapply(X = mirnas, FUN = pcor_fun, data = df, y_col = "y")
lm_list1[[1]]
#adjusting by confounders
lm_list2 <- lapply(X = mirnas, FUN = function(x) pcor_fun(data = df, y_col = "y", X_cols = c(confounders, x)))
lm_list2[[1]]

R Shiny app loads, but radio buttons do not select values properly

This is my first time using stack overflow so apologies if I do this wrong.
I'm fairly new to coding in R and I'm trying to make a simple Shiny app using a TidyTuesday dataset. I wanted to make a map with points showing the different types of water systems ("water_tech") and radio buttons to choose which type of water system is plotted on the map. I got the app to load without an error message, however no matter which button is selected, all of the different types of water systems are plotted on the map, not just the one I selected (essentially, the buttons don't work). If anyone has any ideas about what could be causing this to happen I would greatly appreciate it!
Reproducible code:
### Load Libraries
library(shiny)
#> Warning: package 'shiny' was built under R version 4.0.4
library(shinythemes)
#> Warning: package 'shinythemes' was built under R version 4.0.4
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.0.5
#> Warning: package 'tibble' was built under R version 4.0.5
#> Warning: package 'tidyr' was built under R version 4.0.5
#> Warning: package 'dplyr' was built under R version 4.0.5
library(here)
#> here() starts at C:/Users/eruks/AppData/Local/Temp/Rtmp2jxqLH/reprex-2a306cec2120-white-boto
library(rnaturalearth)
#> Warning: package 'rnaturalearth' was built under R version 4.0.5
library(rnaturalearthdata)
#> Warning: package 'rnaturalearthdata' was built under R version 4.0.5
library(sf)
#> Warning: package 'sf' was built under R version 4.0.5
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
### Load Data
water <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-05-04/water.csv')
#>
#> -- Column specification --------------------------------------------------------
#> cols(
#> row_id = col_double(),
#> lat_deg = col_double(),
#> lon_deg = col_double(),
#> report_date = col_character(),
#> status_id = col_character(),
#> water_source = col_character(),
#> water_tech = col_character(),
#> facility_type = col_character(),
#> country_name = col_character(),
#> install_year = col_double(),
#> installer = col_character(),
#> pay = col_character(),
#> status = col_character()
#> )
### User Interface
ui <- fluidPage(theme = shinytheme("spacelab"),
# Application title
titlePanel("Water Access Points in Africa"),
# Sidebar with radio buttons for choosing which type of water system
sidebarLayout(
sidebarPanel(
radioButtons(inputId = "water_tech",
label = "Water system:",
choices = c("Hand Pump", "Hydram", "Kiosk", "Mechanized Pump", "Rope and Bucket", "Tapstand"),
selected = "Hand Pump")
),
mainPanel(
plotOutput("water_plot")
)
)
)
server <- function(input, output) {
water_clean <- water %>%
drop_na(water_tech) %>%
mutate(water_tech = ifelse(str_detect(water_tech, "Hand Pump"), "Hand Pump", water_tech),
water_tech = ifelse(str_detect(water_tech, "Mechanized Pump"), "Mechanized Pump", water_tech),
water_tech = as.factor(water_tech)) %>%
select(2, 3, 7, 9) %>%
filter(lon_deg > -25 & lon_deg < 52 & lat_deg > -40 & lat_deg < 35)
africa <- ne_countries(scale = "medium", returnclass = "sf", continent = "Africa")
rwater <- reactive({
water_clean %>%
filter(water_tech == input$water_tech)
})
output$water_plot <- renderPlot({
rwater() %>%
ggplot() +
geom_sf(data = africa,
fill = "#ffffff") +
geom_point(data = water_clean,
aes(x = lon_deg,
y = lat_deg,
color = water_tech)) +
theme_bw() +
theme(panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank()) +
labs(x = "",
y = "")
})
}
# Run the application
shinyApp(ui = ui, server = server)
#> PhantomJS not found. You can install it with webshot::install_phantomjs(). If it is installed, please make sure the phantomjs executable can be found via the PATH variable.
```
<div style="width: 100% ; height: 400px ; text-align: center; box-sizing: border-box; -moz-box-sizing: border-box; -webkit-box-sizing: border-box;" class="muted well">Shiny applications not supported in static R Markdown documents</div>
<sup>Created on 2021-05-05 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0)</sup>```
Thank you :)
rwater() has no effect in this code:
rwater() %>%
ggplot() +
geom_sf(data = africa,
fill = "#ffffff") +
geom_point(data = water_clean,
aes(x = lon_deg,
y = lat_deg,
color = water_tech))
because you enter the water_clean data in geom_point.
I think you want:
ggplot() +
geom_sf(data = africa,
fill = "#ffffff") +
geom_point(data = rwater(),
aes(x = lon_deg,
y = lat_deg,
color = water_tech))

R - MLR - Classifier Calibration - Benchmark Results

I've run a benchmark experiment with nested cross validation (tuning + performance measurement) for a classification problem and would like to create calibration charts.
If I pass a benchmark result object to generateCalibrationData, what does plotCalibration do? Is it averaging? If so how?
Does it make sense to have an aggregate = FALSE option to understand variability across folds as per generateThreshVsPerfData for ROC curves?
In response to #Zach's request for a reproducible example, I (the OP) edit my original post as follows:
Edit: Reproducible Example
# Practice Data
library("mlr")
library("ROCR")
library(mlbench)
data(BreastCancer)
dim(BreastCancer)
levels(BreastCancer$Class)
head(BreastCancer)
BreastCancer <- BreastCancer[, -c(1, 6, 7)]
BreastCancer$Cl.thickness <- as.factor(unclass(BreastCancer$Cl.thickness))
BreastCancer$Cell.size <- as.factor(unclass(BreastCancer$Cell.size))
BreastCancer$Cell.shape <- as.factor(unclass(BreastCancer$Cell.shape))
BreastCancer$Marg.adhesion <- as.factor(unclass(BreastCancer$Marg.adhesion))
head(BreastCancer)
# Define Nested Cross-Validation Strategy
cv.inner <- makeResampleDesc("CV", iters = 2, stratify = TRUE)
cv.outer <- makeResampleDesc("CV", iters = 6, stratify = TRUE)
# Define Performance Measures
perf.measures <- list(auc, mmce)
# Create Task
bc.task <- makeClassifTask(id = "bc",
data = BreastCancer,
target = "Class",
positive = "malignant")
# Create Tuned KSVM Learner
ksvm <- makeLearner("classif.ksvm",
predict.type = "prob")
ksvm.ps <- makeParamSet(makeDiscreteParam("C", values = 2^(-2:2)),
makeDiscreteParam("sigma", values = 2^(-2:2)))
ksvm.ctrl <- makeTuneControlGrid()
ksvm.lrn = makeTuneWrapper(ksvm,
resampling = cv.inner,
measures = perf.measures,
par.set = ksvm.ps,
control = ksvm.ctrl,
show.info = FALSE)
# Create Tuned Random Forest Learner
rf <- makeLearner("classif.randomForest",
predict.type = "prob",
fix.factors.prediction = TRUE)
rf.ps <- makeParamSet(makeDiscreteParam("mtry", values = c(2, 3, 5)))
rf.ctrl <- makeTuneControlGrid()
rf.lrn = makeTuneWrapper(rf,
resampling = cv.inner,
measures = perf.measures,
par.set = rf.ps,
control = rf.ctrl,
show.info = FALSE)
# Run Cross-Validation Experiments
bc.lrns = list(ksvm.lrn, rf.lrn)
bc.bmr <- benchmark(learners = bc.lrns,
tasks = bc.task,
resampling = cv.outer,
measures = perf.measures,
show.info = FALSE)
# Calibration Charts
bc.cal <- generateCalibrationData(bc.bmr)
plotCalibration(bc.cal)
Produces the following:
Aggregared Calibration Plot
Attempting to un-aggregate leads to:
> bc.cal <- generateCalibrationData(bc.bmr, aggregate = FALSE)
Error in generateCalibrationData(bc.bmr, aggregate = FALSE) :
unused argument (aggregate = FALSE)
>
> sessionInfo()
R version 3.2.3 (2015-12-10)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mlbench_2.1-1 ROCR_1.0-7 gplots_3.0.1 mlr_2.9
[5] stringi_1.1.1 ParamHelpers_1.10 ggplot2_2.1.0 BBmisc_1.10
loaded via a namespace (and not attached):
[1] digest_0.6.9 htmltools_0.3.5 R6_2.2.0 splines_3.2.3
[5] scales_0.4.0 assertthat_0.1 grid_3.2.3 stringr_1.0.0
[9] bitops_1.0-6 checkmate_1.8.2 gdata_2.17.0 survival_2.38-3
[13] munsell_0.4.3 tibble_1.2 randomForest_4.6-12 httpuv_1.3.3
[17] parallelMap_1.3 mime_0.5 DBI_0.5-1 labeling_0.3
[21] chron_2.3-47 shiny_1.0.0 KernSmooth_2.23-15 plyr_1.8.4
[25] data.table_1.9.6 magrittr_1.5 reshape2_1.4.1 kernlab_0.9-25
[29] ggvis_0.4.3 caTools_1.17.1 gtable_0.2.0 colorspace_1.2-6
[33] tools_3.2.3 parallel_3.2.3 dplyr_0.5.0 xtable_1.8-2
[37] gtools_3.5.0 backports_1.0.4 Rcpp_0.12.4
no plotCalibration doesn't do any averaging, though it can plot a smooth.
if you call generateCalibrationData on a benchmark result object it will treat each iteration of your resampled predictions as exchangeable and compute the calibration across all resampled predictions for that bin.
yes it probably would make sense to have an option to generate an unaggregated calibration data object and be able to plot it. you are welcome to open an issue on GitHub to that effect, but this is going to be low on my priority list TBH.

Unable to convert JSON to dataframe

I want to convert a json-file into a dataframe in R. With the following code:
link <- 'https://www.dropbox.com/s/ckfn1fpkcix1ccu/bevingenbag.json'
document <- fromJSON(file = link, method = 'C')
bev <- do.call("cbind", document)
i'm getting this:
type features
1 FeatureCollection list(type = "Feature", geometry = list(type = "Point", coordinates = c(6.54800000288927, 52.9920000044505)), properties = list(gid = "1496600", yymmdd = "19861226", lat = "52.992", lon = "6.548", mag = "2.8", depth = "1.0", knmilocatie = "Assen", baglocatie = "Assen", tijd = "74751"))
which is the first row of a matrix. All the other rows have the same structure. I'm interested in the properties = list(gid = "1496600", yymmdd = "19861226", lat = "52.992", lon = "6.548", mag = "2.8", depth = "1.0", knmilocatie = "Assen", baglocatie = "Assen", tijd = "74751") part, which should be converted into a dataframe with the columns gid, yymmdd, lat, lon, mag, depth, knmilocatie, baglocatie, tijd.
I searched for and tryed several solutions but none of them worked. I used the rjson package for this. I also tryed the RJSONIO & jsonlite package, but was unable to extract the desired information.
Anyone an idea how to solve this problem?
Here's a way to obtain the data frame:
library(rjson)
document <- fromJSON(file = "bevingenbag.json", method = 'C')
dat <- do.call(rbind, lapply(document$features,
function(x) data.frame(x$properties)))
Edit: How to replace empty values with NA:
dat$baglocatie[dat$baglocatie == ""] <- NA
The result:
head(dat)
gid yymmdd lat lon mag depth knmilocatie baglocatie tijd
1 1496600 19861226 52.992 6.548 2.8 1.0 Assen Assen 74751
2 1496601 19871214 52.928 6.552 2.5 1.5 Hooghalen Hooghalen 204951
3 1496602 19891201 52.529 4.971 2.7 1.2 Purmerend Kwadijk 200914
4 1496603 19910215 52.771 6.914 2.2 3.0 Emmen Emmen 21116
5 1496604 19910425 52.952 6.575 2.6 3.0 Geelbroek Ekehaar 102631
6 1496605 19910808 52.965 6.573 2.7 3.0 Eleveld Assen 40114
This is just another, quite similar, approach.
#SvenHohenstein's approach creates a dataframe at each step, an expensive process. It's much faster to create vectors and re-type the whole result at the end. Also, Sven's approach makes each column a factor, which might or might not be what you want. The approach below runs about 200 times faster. This can be important if you intend to do this repeatedly. Finally, you will need to convert columns lon, lat, mag, and depth to numeric.
library(microbenchmark)
library(rjson)
document <- fromJSON(file = "bevingenbag.json", method = 'C')
json2df.1 <- function(json){ # #SvenHohenstein approach
df <- do.call(rbind, lapply(json$features,
function(x) data.frame(x$properties, stringsAsFactors=F)))
return(df)
}
json2df.2 <- function(json){
df <- do.call(rbind,lapply(json[["features"]],function(x){c(x$properties)}))
df <- data.frame(apply(result,2,as.character), stringsAsFactors=F)
return(df)
}
microbenchmark(x<-json2df.1(document), y<-json2df.2(document), times=10)
# Unit: milliseconds
# expr min lq median uq max neval
# x <- json2df.1(document) 2304.34378 2654.95927 2822.73224 2977.75666 3227.30996 10
# y <- json2df.2(document) 13.44385 15.27091 16.78201 18.53474 19.70797 10
identical(x,y)
# [1] TRUE

R: calling rq() within a function and defining the linear predictor

I am trying to call rq() of the package quantreg within a function. Herebelow is a simplified explanation of my problem.
If I follow the recommendations found at
http://developer.r-project.org/model-fitting-functions.txt, I have a design matrix after the line
x <- model.matrix(mt, mf, contrasts)
with the first column full of 1's to create an intercept.
Now, when I call rq(), I am obliged to use something like
fit <- rq (y ~ x [,2], tau = 0.5, ...)
My problem happens if there is more than 1 explanatory variable. I don't know how to find an automatic way to write:
x [,2] + x [,3] + x [,4] + ...
Here is the complete simplified code:
ao_qr <- function (formula, data, method = "br",...) {
cl <- match.call ()
## keep only the arguments which should go into the model
## frame
mf <- match.call (expand.dots = FALSE)
m <- match (c ("formula", "data"), names (mf), 0)
mf <- mf[c (1, m)]
mf$drop.unused.levels <- TRUE
mf[[1]] <- as.name ("model.frame")
mf <- eval.parent (mf)
if (method == "model.frame") return (mf)
## allow model.frame to update the terms object before
## saving it
mt <- attr (mf, "terms")
y <- model.response (mf, "numeric")
x <- model.matrix (mt, mf, contrasts)
## proceed with the quantile regression
fit <- rq (y ~ x[,2], tau = 0.5, ...)
print (summary (fit, se = "boot", R = 100))
}
I call the function with:
ao_qr(pain ~ treatment + extra, data = data.subset)
And here is how to get the data:
require (lqmm)
data(labor)
data <- labor
data.subset <- subset (data, time == 90)
data.subset$extra <- rnorm (65)
In this case, with this code, my linear predictor only includes "treatment". If I want "extra", I have to manually add x[,3] in the linear predictor of rq() in the code. This is not automatic and will not work on other datasets with unknown number of variables.
Does anyone know how to tackle this ?
Any help would be greatly appreciated !!!
I found a simple solution:
x[,2:ncol(x)]