include random slope in binomial mixed model - lme4

I am using a binomial GLMM to examine the relationship between presence of individuals (# hours/day) at a site over time. Since presence is measured daily for several individuals, I've included a random intercept for individual ID.
e.g.,
presence <- cbind(hours, 24-hours)
glmer(presence ~ time + (1 | ID), family = binomial)
I'd like to also look at using ID as a random slope, but I don't know how to add this to my model. I've tried the two different approaches below, but I'm not sure which is correct.
glmer(presence ~ time + (1 + ID), family = binomial)
Error: No random effects terms specified in formula
glmer(presence ~ time + (1 + ID | ID), family = binomial)
Error: number of observations (=1639) < number of random effects (=5476) for term (1 + ID | ID); the random-effects parameters are probably unidentifiable

You cannot have a random slope for ID and have ID as a (level-two) grouping variable (see this documentation for more detail: https://cran.r-project.org/web/packages/lme4/lme4.pdf).
The grouping variable, which is ID in the models below, is used as a variable for which to specify random effects. model_1 gives random intercepts for the ID variable. model_2 gives both random intercepts and random slopes for the time variable. In other words, model_1 allows the intercept of the relationship between presence and time to vary with ID(the slope remains the same), whereas model_2 allows for the both the intercept and slopes to vary with ID, so that the relationship between presence and time (i.e., the slope) can be different for each individual (ID).
model_1 = glmer(presence ~ time + (1 | ID), family = binomial)
model_2 = glmer(presence ~ time + (1 + time | ID), family = binomial)
I would also recommend:
Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: an introduction to basic and advanced multilevel modeling (2nd ed.): Sage.

Related

Testing Data Consistency and its effect on Multilevel Modeling Multivariate Inference

I have a MLM model looking at the effect of demographics of a few cities on a region wide outcome variable as follows:
RegionalProgress = β0j + β1j * Demographics + u0j + e0ij
The data used in this analysis consists of 7 different cities with different data sources. The point I am trying to make is these 7 data sets (that I have combined together) have inconsistant structure and substance, and differences do (or do not) alter or at least complicate multivariate relationships. A tip I got was to use β1j and its variation across cities. I'm having trouble understanding how this would relate to proving inconsistancies in data sets. I'm doing all of this in R and my model looks like this in case that's helpful:
model11 <- lmerTest::lmer(RegionalProgress ~ 1 + (1|CITY) + PopDen + HomeOwn + Income + Black + Asian + Hispanic + Age, data = data, REML = FALSE)
Can anyone help me understand the tip, or give me other tips how to find evidence of:
there are meaningful differences (or not) between the data sets across cities,
how these differences do affect multivariate relationships?

Created factors with EFA, tried regressing (lm) with control variables - Error message "variable lengths differ"

EFA first-timer here!
I ran an Exploratory Factor Analysis (EFA) on a data set ("df1" = 1320 observations) with 50 variables by creating a subset with relevant variables only that have no missing values ("df2" = 301 observations).
I was able to filter 4 factors (19 variables in total).
Now I would like to take those 4 factors and regress them with control variables.
For instance: Factor 1 (df2$fa1) describes job satisfaction.
I would like to control for age and marital status.
Fa1Regression <- lm(df2$fa1 ~ df1$age + df1$marital)
However I receive the error message:
Error in model.frame.default(formula = df2$fa1 ~ df1$age + :
variable lengths differ (found for 'df1$age')
What can I do to run the regression correctly? Can I delete observations from df1 that are nonexistent in df2 so that the variable lengths are the same?
Its having a problem using lm to regress a latent factor on other coefficients. Instead, use the lavaan package, where your model statement would be myModel<- 'df2$fa1~ x1+x2+x3'

p values for random effects in lmer

I am working on a mixed model using lmer function. I want to obtain p-values for all the fixed and random effects. I am able to obtain p-values for fixed effects using different methods but I haven't found anything for random effects. Whatever method I have found on the internet is to make a null model for the same and then get the p-values by comparison. Can I have a method through which I don't need to make an another model?
My model looks like:
mod1 = lmer(Out ~ Var1 + (1 + Var2 | Var3), data = dataset)
You must do these through model comparison, as far as I know. The lmerTest package has a function called step, which will reduce your model to just the significant parameters (fixed and random) based on a number of different tests. The documentation isn't entirely clear on how everything is done, so I much prefer to use model comparison to get at specific tests.
For your model, you could test the random slope by specifying:
mod0 <- lmer(Out ~ Var1 + (1 + Var2 | Var3), data = dataset, REML=TRUE)
mod1 <- lmer(Out ~ Var1 + (1 | Var3), data = dataset, REML=TRUE)
anova(mod0, mod1, refit=FALSE)
This will show you the log likelihood test and test statistic (chi-square distributed). But you are testing two parameters here: the random slope of Var2 and the covariance between the random slopes and random intercepts. So you need a p-value adjustment:
1-(.5*pchisq(anova(mod0,mod1, refit=FALSE)$Chisq[[2]],df=2)+
.5*pchisq(anova(mod0,mod1, refit=FALSE)$Chisq[[2]],df=1))
More on those tests here or here.

Organizing ranef.mer in ascending or descending order

I'm trying to figure out how to organize the ranef.mer list of random effects from a simple lmer model with only random intercepts and one variable (sex).
fit.b <- lmer(Math ~ 1 + Sex + (1+Sex|SchoolID), data=pisa_com, REML=FALSE)
I've plotted the random effects using qqmath, but I either need to be able to label each of the random effects by their cluster number (in this case, schools), or organize the ranef.mer output.
Solved this last night. The ranef.mer can be coerced into a dataframe.
I fit the model:
fit.b <- lmer(Math ~ 1 + Sex + (1+Sex|SchoolID), data=pisa_com, REML=FALSE)
Then coerced it into a dataframe by including the identifying variable
random.effects <- as.data.frame(ranef(fit.b)$SchoolID)
Then write it to a .csv for sorting in excel
write.csv(random.effects, file="~/folder/file.name.csv")

Compare large sets of weighted tag clouds?

I have thousands of large sets of tag cloud data; I can retrieve a weighted tag clouds for each set with a simple select/group statement (for example)
SELECT tag, COUNT( * ) AS weight
FROM tags
WHERE set_id = $set_id
GROUP BY tag
ORDER BY COUNT( * ) DESC
What I'd like to know is this -- what is the best way to compare weighted tag clouds and find other sets that are most similar, taking the weight (the number of occurrences within the set) into account and possibly even computing a comparison score, all in one somewhat effiecient statement?
I found the web to be lacking quality literature on the topic, thought it somewhat broadly relevant and tried to abstract my example to keep it generally applicable.
First you need to normalize every tag cloud like you would do for a vector, assuming that a tag cloud is a n-dimensional vector in which every dimension rapresents a word and its value rapresents the weight of the word.
You can do it by calculating the norm (or magnitude) of every cloud, that is the square root of all the weights squared:
m = sqrt( w1*w1 + w2*w2 + ... + wn*wn)
then you generate your normalized tag cloud by dividing each weight for the norm of the cloud.
After this you can easily calculate similarity by using a scalar product between the clouds, that is just multiply every component of each pair and all all of them together. Eg:
v1 = { a: 0.12, b: 0.31; c: 0.17; e: 0.11 }
v2 = { a: 0.21, b: 0.11; d: 0.08; e: 0.28 }
similarity = v1.a*v2.a + v1.b*v1.b + 0 + 0 + v1.e*v2.e
if a vector has a tag that the other one doesn't then that specific product is obviously 0.
This similarity in within range [0,1], 0 means no correlation while 1 means equality.