How can I adjust my pcor model for confounders and do it for many models at one time? - function

I have a dataset with many columns. First column is the outcome (Test)(Dependent variable, y). Columns 2-32 are confounders. Finally, columns 33-54 are miRNAs (expression)(Independent variable, x).
I want to do a partial correlation (to obtain p-value and estimate) between each one of the independent variables with the dependent variable, adjusting by confounders. Since my variables don't follow a normal distribution, I want to use Spearman method.
I don't want to put all of them in the same model, I want different models, one by one. That is:
Model 1: Test vs miRNA1 by confounders
Model 2: Test vs miRNA2 by confounders
[...]
Model 21: Test vs miRNA21 by confounders
I tried with an auxiliary function. But it doesn't work. Any help? Thanks :)
The script is here:
#data
n <- 10000
nc <- 30
nm <- 20
y <- rnorm(n = n)
X <- matrix(rnorm(n = n*(nc+nm)), ncol = nc + nm)
df <- data.frame(y = y, X)
#variable names
confounders <- colnames(df)[2:31]
mirnas <- colnames(df)[32:51]
#auxiliar regression function
pcor_fun <- function(data, y_col, X_cols) {
formula <- as.formula(paste(y_col, X_cols))
pcor <- pcor.test(formula = formula, data = data, method = "spearman")
pcor_summary <- summary(pcor)$coef
return(pcor_summary)
}
#simple linear regressions
lm_list1 <- lapply(X = mirnas, FUN = pcor_fun, data = df, y_col = "y")
lm_list1[[1]]
#adjusting by confounders
lm_list2 <- lapply(X = mirnas, FUN = function(x) pcor_fun(data = df, y_col = "y", X_cols = c(confounders, x)))
lm_list2[[1]]

Related

tbl_uvregression for lme4 objects

I am trying to get a univariate regression table using tbl_uvregression from gtsummary. I am running these regression models with lme4 and I am not sure where and how to specify the random effect. Here's an example using the trial data from the survival package.
library(lme4)
#> Loading required package: Matrix
library(gtsummary)
library(survival)
data(trial)
trial %>%
tbl_uvregression(
method = glmer,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE,
pvalue_fun = function(x) style_pvalue(x, digits = 2),
formula = "{y} ~ {x}+ {1|grade}"
)
#> Error: Problem with `mutate()` input `formula_chr`.
#> x object 'grade' not found
#> i Input `formula_chr` is `glue(formula)`.
Created on 2020-09-28 by the reprex package (v0.3.0)
Please help
For the RE in the model do not specify with the {} instead use ().
library(lme4)
#> Loading required package: Matrix
library(gtsummary)
library(survival)
data(trial)
trial %>%
tbl_uvregression(
method = glmer,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE,
pvalue_fun = function(x) style_pvalue(x, digits = 2),
formula = "{y} ~ {x}+ (1|grade)"
)

MLR - should i use CV in RF model training

I have a question in the MLR package,
after tuning a randomforest hyperparameters with a cross validation
getLearnerModel(rforest) - will not use CV, rather use the entire data set as a whole, is that correct?
#traintask
trainTask <- makeClassifTask(data = trainsample,target = "DIED30", positive="1")
#random forest tuning
rf <- makeLearner("classif.randomForest", predict.type = "prob", par.vals = list(ntree = 1000, mtry = 3))
rf$par.vals <- list( importance = TRUE)
rf_param <- makeParamSet(
makeDiscreteParam("ntree",values= c(500,750, 1000,2000)),
makeIntegerParam("mtry", lower = 1, upper = 15),
makeDiscreteParam("nodesize", values =c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20))
)
rancontrol <- makeTuneControlGrid()
set_cv <- makeResampleDesc("CV",iters = 10L)
rf_tune <- tuneParams(learner = rf, resampling = set_cv, task = trainTask, par.set = rf_param, control = rancontrol, measures = auc)
rf_tune$x
rf.tree <- setHyperPars(rf, par.vals = rf_tune$x)
#train best model
rforest <- train(rf.tree, trainTask)
getLearnerModel(rforest)
#predict
pforest<- predict(rforest,trainTask)
rforest is eventually trained using the RF model on the entire data, rather than cross validation.
is there any way to perform the final training with CV as well in MLR?
I'm planning to validate the result on an external dataset. Should I train the model with 10CV prior to running on the external dataset (don't know how) or just use parameters found in the 10CV hyperparameters search?
thanks in advance for your time,

How to make the roots calculated within the function available to be used outside the function

I am trying to execute the following code where the main function (LV) includes another function (fun_sp) for finding the root at each time point. The root is called p. As per my understanding, p is dependent upon a variable C which is a vector and changes at each time point, so p should also be a vector that changes at each time point. But when I output p, I get a single value only. Am I understanding it wrongly ?
Any inputs will be helpful ?
library(deSolve)
library(rootSolve)
ka = 0.1; CL = 0.2; Ke = 0.3; R = 10; KD = 0.1
LV <- function(time,state, params)
{
C <- state[1]
P <- state[2]
fun_sp <- function(p){p + ((C/R)*p/(p+(KD/R))) -1}
p <<- uniroot.all(fun_sp, c(0,1))
fb <- p/(p+(KD/R))
dC <- fb*ka*C - CL*C + P*CL - Ke*C
dP <- CL*C - P*CL
list(c(dC, dP))
}
state_ini = c(C=100,P=0)
time = c(seq(1, 24 , 1))
fv <- ode(state_ini, time, LV, parms, method = "lsoda", rtol=1e-6, atol=1e-6, verbose=FALSE)
p
fv = as.data.frame(fv)
str(fv)
Your variable "p" will be overwritten in each iteration of the LV function. The good news is, that internal results can be stored in the output matrix by adding them to the return value (that is a list) as additional arguments (e.g. root=p) after the vector of the derivatives (in your case c(dC, dP)) like follows:
list(c(dC, dP), root=p)
Your example may then read as follows and the global assignment operator <<- is not anymore needed :
library(deSolve)
library(rootSolve)
ka <- 0.1; CL <- 0.2; Ke <- 0.3; R <- 10; KD <- 0.1
LV <- function(time,state, params) {
C <- state[1]
P <- state[2]
fun_sp <- function(p){p + ((C/R)*p/(p+(KD/R))) -1}
p <- uniroot.all(fun_sp, c(0,1))
fb <- p/(p+(KD/R))
dC <- fb*ka*C - CL*C + P*CL - Ke*C
dP <- CL*C - P*CL
list(c(dC, dP), root=p)
}
state_ini = c(C=100,P=0)
time = c(seq(1, 24 , 1))
fv <- ode(state_ini, time, LV, parms, method = "lsoda", rtol=1e-6, atol=1e-6)
fv
Note however, that uniroot.all may return more than one value, so you may consider to use the 'basic' uniroot function (without .all) from the stats package.
Hope it helps, Thomas

How to extract an adjacency matrix of a giant component of a graph using R?

I would like to extract an adjacency matrix of a giant component of a graph using R.
For example, I can create Erdos-Renyi g(n,p)
n = 100
p = 1.5/n
g = erdos.renyi.game(n, p)
coords = layout.fruchterman.reingold(g)
plot(g, layout=coords, vertex.size = 3, vertex.label=NA)
# Get the components of an undirected graph
cl = clusters(g)
# How many components?
cl$no
# How big are these (the first row is size, the second is the number of components of that size)?
table(cl$csize)
cl$membership
# Get the giant component
nodes = which(cl$membership == which.max(cl$csize))
# Color in red the nodes in the giant component and in sky blue the rest
V(g)$color = "SkyBlue2"
V(g)[nodes]$color = "red"
plot(g, layout=coords, vertex.size = 3, vertex.label=NA)
here, I only want to extract the adjacency matrix of those red nodes.
enter image description here
It's easy to get the giant component as a new graph like below and then get the adjacency matrix.
g <- erdos.renyi.game(100, .015, directed = TRUE)
# if you have directed graph, decide if you want
# strongly or weakly connected components
co <- components(g, mode = 'STRONG')
gi <- induced.subgraph(g, which(co$membership == which.max(co$csize)))
# if you want here you can decide if you want values only
# in the upper or lower triangle or both
ad <- get.adjacency(gi)
But you might want to keep the vertex IDs of the original graph. In this case just subset the adjacency matrix:
g <- erdos.renyi.game(100, .015)
co <- components(g)
gi_vids <- which(co$membership == which.max(co$csize))
gi_ad <- get.adjacency(g)[gi_vids, gi_vids]
# you can even add the names of the nodes
# as row and column names.
# generating dummy node names:
V(g)$name <- sapply(
seq(vcount(g)),
function(i){
paste(letters[ceiling(runif(5) * 26)], collapse = '')
}
)
rownames(gi_ad) <- V(g)$name[gi_vids]
colnames(gi_ad) <- V(g)$name[gi_vids]

R: calling rq() within a function and defining the linear predictor

I am trying to call rq() of the package quantreg within a function. Herebelow is a simplified explanation of my problem.
If I follow the recommendations found at
http://developer.r-project.org/model-fitting-functions.txt, I have a design matrix after the line
x <- model.matrix(mt, mf, contrasts)
with the first column full of 1's to create an intercept.
Now, when I call rq(), I am obliged to use something like
fit <- rq (y ~ x [,2], tau = 0.5, ...)
My problem happens if there is more than 1 explanatory variable. I don't know how to find an automatic way to write:
x [,2] + x [,3] + x [,4] + ...
Here is the complete simplified code:
ao_qr <- function (formula, data, method = "br",...) {
cl <- match.call ()
## keep only the arguments which should go into the model
## frame
mf <- match.call (expand.dots = FALSE)
m <- match (c ("formula", "data"), names (mf), 0)
mf <- mf[c (1, m)]
mf$drop.unused.levels <- TRUE
mf[[1]] <- as.name ("model.frame")
mf <- eval.parent (mf)
if (method == "model.frame") return (mf)
## allow model.frame to update the terms object before
## saving it
mt <- attr (mf, "terms")
y <- model.response (mf, "numeric")
x <- model.matrix (mt, mf, contrasts)
## proceed with the quantile regression
fit <- rq (y ~ x[,2], tau = 0.5, ...)
print (summary (fit, se = "boot", R = 100))
}
I call the function with:
ao_qr(pain ~ treatment + extra, data = data.subset)
And here is how to get the data:
require (lqmm)
data(labor)
data <- labor
data.subset <- subset (data, time == 90)
data.subset$extra <- rnorm (65)
In this case, with this code, my linear predictor only includes "treatment". If I want "extra", I have to manually add x[,3] in the linear predictor of rq() in the code. This is not automatic and will not work on other datasets with unknown number of variables.
Does anyone know how to tackle this ?
Any help would be greatly appreciated !!!
I found a simple solution:
x[,2:ncol(x)]