R: calling rq() within a function and defining the linear predictor - regression

I am trying to call rq() of the package quantreg within a function. Herebelow is a simplified explanation of my problem.
If I follow the recommendations found at
http://developer.r-project.org/model-fitting-functions.txt, I have a design matrix after the line
x <- model.matrix(mt, mf, contrasts)
with the first column full of 1's to create an intercept.
Now, when I call rq(), I am obliged to use something like
fit <- rq (y ~ x [,2], tau = 0.5, ...)
My problem happens if there is more than 1 explanatory variable. I don't know how to find an automatic way to write:
x [,2] + x [,3] + x [,4] + ...
Here is the complete simplified code:
ao_qr <- function (formula, data, method = "br",...) {
cl <- match.call ()
## keep only the arguments which should go into the model
## frame
mf <- match.call (expand.dots = FALSE)
m <- match (c ("formula", "data"), names (mf), 0)
mf <- mf[c (1, m)]
mf$drop.unused.levels <- TRUE
mf[[1]] <- as.name ("model.frame")
mf <- eval.parent (mf)
if (method == "model.frame") return (mf)
## allow model.frame to update the terms object before
## saving it
mt <- attr (mf, "terms")
y <- model.response (mf, "numeric")
x <- model.matrix (mt, mf, contrasts)
## proceed with the quantile regression
fit <- rq (y ~ x[,2], tau = 0.5, ...)
print (summary (fit, se = "boot", R = 100))
}
I call the function with:
ao_qr(pain ~ treatment + extra, data = data.subset)
And here is how to get the data:
require (lqmm)
data(labor)
data <- labor
data.subset <- subset (data, time == 90)
data.subset$extra <- rnorm (65)
In this case, with this code, my linear predictor only includes "treatment". If I want "extra", I have to manually add x[,3] in the linear predictor of rq() in the code. This is not automatic and will not work on other datasets with unknown number of variables.
Does anyone know how to tackle this ?
Any help would be greatly appreciated !!!

I found a simple solution:
x[,2:ncol(x)]

Related

how do I integrate pmap with a ggplot function?

I'm trying to use a ggplot function that I can use in a pipe with pmap, feeding a tibble of variables. These variables include the data frame, filtering options and plotting variables.
The function works but in the context of pmap it doesn't with an error: Error in UseMethod("filter") :
no applicable method for 'filter' applied to an object of class "character"
library(tidyverse)
library(palmerpenguins)
make_plot <- function(dat, species) {
dat %>%
filter(.data$species == .env$species) %>%
ggplot() +
aes(bill_length_mm, body_mass_g, color=sex) +
geom_point() +
ggtitle(glue("Species: {species}")) +
xlab("bill length (mm)") +
ylab("body mass (g)") +
theme(plot.title.position = "plot")
}
species <- c("Adelie", "Chinstrap", "Gentoo")
penguins_vars <- tibble(dat = rep("penguins", 3), species = species)
plots <- pmap(penguins_vars, make_plot)
#function still works:
make_plot(penguins, "Adelie")
pmap essentially iterates through the rows of the tibble (penguins_vars) so when it's called, make_plot is passed the string "penguins" not the data set (like it is in the working call). I think you want something like this:
plots <- map(species, make_plot, dat = penguins)

How can I adjust my pcor model for confounders and do it for many models at one time?

I have a dataset with many columns. First column is the outcome (Test)(Dependent variable, y). Columns 2-32 are confounders. Finally, columns 33-54 are miRNAs (expression)(Independent variable, x).
I want to do a partial correlation (to obtain p-value and estimate) between each one of the independent variables with the dependent variable, adjusting by confounders. Since my variables don't follow a normal distribution, I want to use Spearman method.
I don't want to put all of them in the same model, I want different models, one by one. That is:
Model 1: Test vs miRNA1 by confounders
Model 2: Test vs miRNA2 by confounders
[...]
Model 21: Test vs miRNA21 by confounders
I tried with an auxiliary function. But it doesn't work. Any help? Thanks :)
The script is here:
#data
n <- 10000
nc <- 30
nm <- 20
y <- rnorm(n = n)
X <- matrix(rnorm(n = n*(nc+nm)), ncol = nc + nm)
df <- data.frame(y = y, X)
#variable names
confounders <- colnames(df)[2:31]
mirnas <- colnames(df)[32:51]
#auxiliar regression function
pcor_fun <- function(data, y_col, X_cols) {
formula <- as.formula(paste(y_col, X_cols))
pcor <- pcor.test(formula = formula, data = data, method = "spearman")
pcor_summary <- summary(pcor)$coef
return(pcor_summary)
}
#simple linear regressions
lm_list1 <- lapply(X = mirnas, FUN = pcor_fun, data = df, y_col = "y")
lm_list1[[1]]
#adjusting by confounders
lm_list2 <- lapply(X = mirnas, FUN = function(x) pcor_fun(data = df, y_col = "y", X_cols = c(confounders, x)))
lm_list2[[1]]

Rmarkdown - code run ok at script, but when knit gives error

I have one Rmarkdown document, that was given to me and worked fine with the person who given to me.
But when I do with my datas, I dont know what's going on, when I run the exactly code with script document works well, but when I try to knit to html, give me an error.
bhv_df <- plyr::ddply(bhv_df, ~segmentid, function(d){
d= bhv_df[bhv_df$segmentid == bhv_df$segmentid[1],
# predictions are made based on the mid time between start and end of the message
predObj <- crawl::crwPredict(object.crwFit = crawl_models_list[[d$segmentid[1]]], predTime = d$MidTime, speedEst=TRUE, flat=TRUE)
predObj_dives <- predObj[predObj$locType == "p",]
# reproject into lat/long, because the crawl models have been built in a mercator pacific centered CRS
coord_points <- predObj_dives
coordinates(coord_points) =~ mu.x + mu.y
proj4string(coord_points) <- CRS("+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
predObj_dives[c("lon", "lat")] <- coordinates(spTransform(coord_points, CRS("+proj=longlat +datum=WGS84")))
# calculate the lon360
return(cbind(d, predObj_dives[c("mu.x", "mu.y", "lon", "lat")]))
})
#load("./Outputs/crawl_argos_df.RData")
######### MERGE DIVE DATASET with CRAWL-derived MOUVEMENT DATA
bhv_df <- plyr::ddply(bhv_df, ~segmentid, function(d){ # for each tag dive data
cr <- crawl_argos_df[crawl_argos_df$segmentid == d$segmentid[1], ] # select the corresponding crawl track
fl <- filtered_argos_df_crawl[filtered_argos_df_crawl$segmentid == d$segmentid[1], ] # select the corresponding filtered track
d[c("lq_closest_filt","dt_closest_filt", "dist_closest_filt", "rel.angle","speed")] <- NA # add empty columns
for (i in 1:nrow(d)){ # for each dive...
fl$dist_diff <- as.vector(spDists(x = as.matrix(d[i,c("lon", "lat")]), y = as.matrix(fl[c("lon", "lat")]), longlat = T)) # distance calculated in km
fl$dt_diff <- as.numeric(difftime(d$MidTime[i], fl$time, units = "hours"))
d[i, "dt_closest_filt"] <- min(abs(fl$dt_diff)) # select the filtered position closest in time to the dive, time in hours
d[i, "dist_closest_filt"] <- fl[which(abs(fl$dt_diff) == min(abs(fl$dt_diff))), "dist_diff"][1] # retrieve the distance from this filtered position to the dive location (predicted by crawl)
d[i, "lq_closest_filt"] <- fl[which(abs(fl$dt_diff) == min(abs(fl$dt_diff))), "lq"][1] # argos quality of closest filtered argos position
cr$diff <- as.numeric(difftime(d$MidTime[i], cr$time, units = "hours")) # look at the time diff between this dive and all positions recorded in crawl for that same tag
d[i, c("rel.angle","speed")] <- cr[abs(cr$diff) == min(abs(cr$diff)), c("rel.angle", "speed")] # select closest position recorded when dive occurred
}
return(d)
})
bhv_df$depth_bin <- cut(bhv_df$DepthMean, seq(0, 700, 50))
ggplot(bhv_df[bhv_df$What == "Dive" & bhv_df$depth_range == "deep" & bhv_df$DepthMean < 1000,], aes(x = lon, y = lat)) +
stat_contour(data = bathyNOAA_df, aes(x, y, z=z), binwidth = 500, color = "grey60", size = 0.2) +
geom_tile(data = bathyNOAA_df_shallow[bathyNOAA_df_shallow$z >= 0, ], aes(x, y), fill = "grey10") +
geom_jitter(aes(fill = -DepthMean, size = DepthMean), col="black", alpha=0.8, pch = 21, width=0.1) +
scale_fill_viridis(option = "magma", name = "Dive Depth (m)", direction = 1, begin = 0.2) +
xlab("Longitude") +
ylab("Latitude") +
coord_fixed(xlim = c(-50, -26), ylim = c(-55, -15), expand = F)
The error (is it just at beginning of the code, at 4 line):
Erro: unexpected symbol in:
" predObj <- crawl::crwPredict(object.crwFit = crawl_models_list[[d$segmentid1]], predTime = d$MidTime, speedEst=TRUE, flat=TRUE)
predObj_dives"
And this:
I tryed to see if have conflicts, but apparently dont have
> conflicts()
[1] "lines" "cividis" "inferno" "magma" "plasma" "viridis" "viridis.map"
[8] "summary" "days" "hours" "minutes" "origin" "seconds" "show"
[15] "years" "hour" "isoweek" "mday" "minute" "month" "quarter"
[22] "second" "wday" "week" "yday" "year" "coerce" "coerce"
[29] "plot" "show" "summary" "%>%" "%>%" "%>%" "between"
[36] "count" "first" "intersect" "last" "setdiff" "union" "%>%"
[43] "flatten" "map" "transpose" "%>%" "add_row" "as_data_frame" "as_tibble"
[50] "data_frame" "data_frame_" "frame_data" "glimpse" "lst" "lst_" "tbl_sum"
[57] "tibble" "tribble" "trunc_mat" "type_sum" "enexpr" "enexprs" "enquo"
[64] "enquos" "ensym" "ensyms" "expr" "quo" "quo_name" "quos"
[71] "sym" "syms" "vars" "filter" "lag" "lines" "plot"
[78] "as.raster" "data" "Arith" "coerce" "Compare" "initialize" "show"
[85] "as.difftime" "body<-" "date" "intersect" "kronecker" "merge" "Position"
[92] "setdiff" "setequal" "split" "subset" "summary" "union"
>
Someone know what's going on?
Thanks!
After many tries, I removed one part of the code, and everything worked well
# I removed: d= bhv_df[bhv_df$segmentid == bhv_df$segmentid[1],
bhv_df <- plyr::ddply(bhv_df, ~segmentid, function(d){
# predictions are made based on the mid time between start and end of the message
predObj <- crawl::crwPredict(object.crwFit = crawl_models_list[[d$segmentid[1]]], predTime = d$MidTime, speedEst=TRUE, flat=TRUE)
predObj_dives <- predObj[predObj$locType == "p",]
# reproject into lat/long, because the crawl models have been built in a mercator pacific centered CRS
coord_points <- predObj_dives
coordinates(coord_points) =~ mu.x + mu.y
proj4string(coord_points) <- CRS("+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
predObj_dives[c("lon", "lat")] <- coordinates(spTransform(coord_points, CRS("+proj=longlat +datum=WGS84")))
# calculate the lon360
return(cbind(d, predObj_dives[c("mu.x", "mu.y", "lon", "lat")]))
})
######### MERGE DIVE DATASET with CRAWL-derived MOUVEMENT DATA
bhv_df <- plyr::ddply(bhv_df, ~segmentid, function(d){ # for each tag dive data
cr <- crawl_argos_df[crawl_argos_df$segmentid == d$segmentid[1], ] # select the corresponding crawl track
fl <- filtered_argos_df_crawl[filtered_argos_df_crawl$segmentid == d$segmentid[1], ] # select the corresponding filtered track
d[c("lq_closest_filt","dt_closest_filt", "dist_closest_filt", "rel.angle","speed")] <- NA # add empty columns
for (i in 1:nrow(d)){ # for each dive...
fl$dist_diff <- as.vector(spDists(x = as.matrix(d[i,c("lon", "lat")]), y = as.matrix(fl[c("lon", "lat")]), longlat = T)) # distance calculated in km
fl$dt_diff <- as.numeric(difftime(d$MidTime[i], fl$time, units = "hours"))
d[i, "dt_closest_filt"] <- min(abs(fl$dt_diff)) # select the filtered position closest in time to the dive, time in hours
d[i, "dist_closest_filt"] <- fl[which(abs(fl$dt_diff) == min(abs(fl$dt_diff))), "dist_diff"][1] # retrieve the distance from this filtered position to the dive location (predicted by crawl)
d[i, "lq_closest_filt"] <- fl[which(abs(fl$dt_diff) == min(abs(fl$dt_diff))), "lq"][1] # argos quality of closest filtered argos position
cr$diff <- as.numeric(difftime(d$MidTime[i], cr$time, units = "hours")) # look at the time diff between this dive and all positions recorded in crawl for that same tag
d[i, c("rel.angle","speed")] <- cr[abs(cr$diff) == min(abs(cr$diff)), c("rel.angle", "speed")] # select closest position recorded when dive occurred
}
return(d)
}) #warnings probably due to RT being equal to NA at beginning and end of the track
Thank you

How to make the roots calculated within the function available to be used outside the function

I am trying to execute the following code where the main function (LV) includes another function (fun_sp) for finding the root at each time point. The root is called p. As per my understanding, p is dependent upon a variable C which is a vector and changes at each time point, so p should also be a vector that changes at each time point. But when I output p, I get a single value only. Am I understanding it wrongly ?
Any inputs will be helpful ?
library(deSolve)
library(rootSolve)
ka = 0.1; CL = 0.2; Ke = 0.3; R = 10; KD = 0.1
LV <- function(time,state, params)
{
C <- state[1]
P <- state[2]
fun_sp <- function(p){p + ((C/R)*p/(p+(KD/R))) -1}
p <<- uniroot.all(fun_sp, c(0,1))
fb <- p/(p+(KD/R))
dC <- fb*ka*C - CL*C + P*CL - Ke*C
dP <- CL*C - P*CL
list(c(dC, dP))
}
state_ini = c(C=100,P=0)
time = c(seq(1, 24 , 1))
fv <- ode(state_ini, time, LV, parms, method = "lsoda", rtol=1e-6, atol=1e-6, verbose=FALSE)
p
fv = as.data.frame(fv)
str(fv)
Your variable "p" will be overwritten in each iteration of the LV function. The good news is, that internal results can be stored in the output matrix by adding them to the return value (that is a list) as additional arguments (e.g. root=p) after the vector of the derivatives (in your case c(dC, dP)) like follows:
list(c(dC, dP), root=p)
Your example may then read as follows and the global assignment operator <<- is not anymore needed :
library(deSolve)
library(rootSolve)
ka <- 0.1; CL <- 0.2; Ke <- 0.3; R <- 10; KD <- 0.1
LV <- function(time,state, params) {
C <- state[1]
P <- state[2]
fun_sp <- function(p){p + ((C/R)*p/(p+(KD/R))) -1}
p <- uniroot.all(fun_sp, c(0,1))
fb <- p/(p+(KD/R))
dC <- fb*ka*C - CL*C + P*CL - Ke*C
dP <- CL*C - P*CL
list(c(dC, dP), root=p)
}
state_ini = c(C=100,P=0)
time = c(seq(1, 24 , 1))
fv <- ode(state_ini, time, LV, parms, method = "lsoda", rtol=1e-6, atol=1e-6)
fv
Note however, that uniroot.all may return more than one value, so you may consider to use the 'basic' uniroot function (without .all) from the stats package.
Hope it helps, Thomas

Is there a way to get a vector with the name of all functions that one could use in R?

I would like to have a call that returns me a vector with the names of all function that I could call in the current R session. Does anybody know how to achieve this?
(I would like to check user entered variables against this vector. We had some unforseen problem with users entering e.g., c as variable names)
UPDATE: I would like to get the function names from all packages currently loaded.
SOLUTION (half way): Based on Joris Meys tip with lsf.str() I came up with the following function that returns a sorted vector with all currently available function names:
getFunctionNames <- function() {
loaded <- (.packages())
loaded <- paste("package:", loaded, sep ="")
return(sort(unlist(lapply(loaded, lsf.str))))
}
Bu,t see also the comments on Joris Meys' post for even better answers.
I'd use lsf.str() as a start.
eg : x <- as.character(lsf.str("package:base")) gives you a list of all functions in the base package. You could do add all packages you want to check against. stats and utils come to mind first.
EDIT : Regarding your question about currently loaded packages :
x <- unlist(sapply(search()[-1],function(x)as.character(lsf.str(x)))) see comments
pkgs <- search()
pkgs <- pkgs[grep("package:",pkgs)]
y <- unlist(sapply(pkgs,lsf.str))
does the trick.
I asked a similar Q on R-Help many moons ago (2007) and Prof. Brian Ripley provided this as a solution:
findfuns <- function(x) {
if(require(x, character.only=TRUE)) {
env <- paste("package", x, sep=":")
nm <- ls(env, all=TRUE)
nm[unlist(lapply(nm, function(n) exists(n, where=env,
mode="function",
inherits=FALSE)))]
} else character(0)
}
pkgs <- dir(.Library)
z <- lapply(pkgs, findfuns)
names(z) <- pkgs
Z <- sort(unique(unlist(z)))
Which gives output like:
> head(Z)
[1] "^" "-" "-.Date" "-.POSIXt" ":" "::"
This was for finding all the functions in packages specified by object pkgs so you can control which packages are loaded/checked against.
A modified version that work on the currently loaded set of packages would be:
findfuns2 <- function(pkgs) {
nm <- ls(pkgs, all = TRUE)
nm[unlist(lapply(nm, function(n) exists(n, where = pkgs,
mode = "function",
inherits = FALSE)))]
if(isTRUE(all.equal(length(nm), 0)))
character(0)
else
nm
}
pkgs <- search()
pkgs <- pkgs[grep("package:", pkgs)]
z <- lapply(pkgs, findfuns2)
z <- sort(unique(unlist(z)))
head(z)