p values for random effects in lmer - lme4

I am working on a mixed model using lmer function. I want to obtain p-values for all the fixed and random effects. I am able to obtain p-values for fixed effects using different methods but I haven't found anything for random effects. Whatever method I have found on the internet is to make a null model for the same and then get the p-values by comparison. Can I have a method through which I don't need to make an another model?
My model looks like:
mod1 = lmer(Out ~ Var1 + (1 + Var2 | Var3), data = dataset)

You must do these through model comparison, as far as I know. The lmerTest package has a function called step, which will reduce your model to just the significant parameters (fixed and random) based on a number of different tests. The documentation isn't entirely clear on how everything is done, so I much prefer to use model comparison to get at specific tests.
For your model, you could test the random slope by specifying:
mod0 <- lmer(Out ~ Var1 + (1 + Var2 | Var3), data = dataset, REML=TRUE)
mod1 <- lmer(Out ~ Var1 + (1 | Var3), data = dataset, REML=TRUE)
anova(mod0, mod1, refit=FALSE)
This will show you the log likelihood test and test statistic (chi-square distributed). But you are testing two parameters here: the random slope of Var2 and the covariance between the random slopes and random intercepts. So you need a p-value adjustment:
1-(.5*pchisq(anova(mod0,mod1, refit=FALSE)$Chisq[[2]],df=2)+
.5*pchisq(anova(mod0,mod1, refit=FALSE)$Chisq[[2]],df=1))
More on those tests here or here.

Related

Pytorch : different behaviours in GAN training with different, but conceptually equivalent, code

I'm trying to implement a simple GAN in Pytorch. The following training code works:
for epoch in range(max_epochs): # loop over the dataset multiple times
print(f'epoch: {epoch}')
running_loss = 0.0
for batch_idx,(data,_) in enumerate(data_gen_fn):
# data preparation
real_data = data
input_shape = real_data.shape
inputs_generator = torch.randn(*input_shape).detach()
# generator forward
fake_data = generator(inputs_generator).detach()
# discriminator forward
optimizer_generator.zero_grad()
optimizer_discriminator.zero_grad()
#################### ALERT CODE #######################
predictions_on_real = discriminator(real_data)
predictions_on_fake = discriminator(fake_data)
predictions = torch.cat((predictions_on_real,
predictions_on_fake), dim=0)
#########################################################
# loss discriminator
labels_real_fake = torch.tensor([1]*batch_size + [0]*batch_size)
loss_discriminator_batch = criterion_discriminator(predictions,
labels_real_fake)
# update discriminator
loss_discriminator_batch.backward()
optimizer_discriminator.step()
# generator
# zero the parameter gradients
optimizer_discriminator.zero_grad()
optimizer_generator.zero_grad()
fake_data = generator(inputs_generator) # make again fake data but without detaching
predictions_on_fake = discriminator(fake_data) # D(G(encoding))
# loss generator
labels_fake = torch.tensor([1]*batch_size)
loss_generator_batch = criterion_generator(predictions_on_fake,
labels_fake)
loss_generator_batch.backward() # dL(D(G(encoding)))/dW_{G,D}
optimizer_generator.step()
If I plot the generated images for each iteration, I see that the generated images look like the real ones, so the training procedure seems to work well.
However, if I try to change the code in the ALERT CODE part , i.e., instead of:
#################### ALERT CODE #######################
predictions_on_real = discriminator(real_data)
predictions_on_fake = discriminator(fake_data)
predictions = torch.cat((predictions_on_real,
predictions_on_fake), dim=0)
#########################################################
I use the following:
#################### ALERT CODE #######################
predictions = discriminator(torch.cat( (real_data, fake_data), dim=0))
#######################################################
That is conceptually the same (in a nutshell, instead of doing two different forward on the discriminator, the former on the real, the latter on the fake data, and finally concatenate the results, with the new code I first concatenate real and fake data, and finally I make just one forward pass on the concatenated data.
However, this code version does not work, that is the generated images seems to be always random noise.
Any explanation to this behavior?
Why do we different results?
Supplying inputs in either the same batch, or separate batches, can make a difference if the model includes dependencies between different elements of the batch. By far the most common source in current deep learning models is batch normalization. As you mentioned, the discriminator does include batchnorm, so this is likely the reason for different behaviors. Here is an example. Using single numbers and a batch size of 4:
features = [1., 2., 5., 6.]
print("mean {}, std {}".format(np.mean(features), np.std(features)))
print("normalized features", (features - np.mean(features)) / np.std(features))
>>>mean 3.5, std 2.0615528128088303
>>>normalized features [-1.21267813 -0.72760688 0.72760688 1.21267813]
Now we split the batch into two parts. First part:
features = [1., 2.]
print("mean {}, std {}".format(np.mean(features), np.std(features)))
print("normalized features", (features - np.mean(features)) / np.std(features))
>>>mean 1.5, std 0.5
>>>normalized features [-1. 1.]
Second part:
features = [5., 6.]
print("mean {}, std {}".format(np.mean(features), np.std(features)))
print("normalized features", (features - np.mean(features)) / np.std(features))
>>>mean 5.5, std 0.5
>>>normalized features [-1. 1.]
As we can see, in the split-batch version, the two batches are normalized to the exact same numbers, even though the inputs are very different. In the joint-batch version, on the other hand, the larger numbers are still larger than the smaller ones as they are normalized using the same statistics.
Why does this matter?
With deep learning, it's always hard to say, and especially with GANs and their complex training dynamics. A possible explanation is that, as we can see in the example above, the separate batches result in more similar features after normalization even if the original inputs are quite different. This may help early in training, as the generator tends to output "garbage" which has very different statistics from real data.
With a joint batch, these differing statistics make it easy for the discriminator to tell the real and generated data apart, and we end up in a situation where the discriminator "overpowers" the generator.
By using separate batches, however, the different normalizations result in the generated and real data to look more similar, which makes the task less trivial for the discriminator and allows the generator to learn.

Created factors with EFA, tried regressing (lm) with control variables - Error message "variable lengths differ"

EFA first-timer here!
I ran an Exploratory Factor Analysis (EFA) on a data set ("df1" = 1320 observations) with 50 variables by creating a subset with relevant variables only that have no missing values ("df2" = 301 observations).
I was able to filter 4 factors (19 variables in total).
Now I would like to take those 4 factors and regress them with control variables.
For instance: Factor 1 (df2$fa1) describes job satisfaction.
I would like to control for age and marital status.
Fa1Regression <- lm(df2$fa1 ~ df1$age + df1$marital)
However I receive the error message:
Error in model.frame.default(formula = df2$fa1 ~ df1$age + :
variable lengths differ (found for 'df1$age')
What can I do to run the regression correctly? Can I delete observations from df1 that are nonexistent in df2 so that the variable lengths are the same?
Its having a problem using lm to regress a latent factor on other coefficients. Instead, use the lavaan package, where your model statement would be myModel<- 'df2$fa1~ x1+x2+x3'

Backpropagation on Two Layered Networks

i have been following cs231n lectures of Stanford and trying to complete assignments on my own and sharing these solutions both on github and my blog. But i'm having a hard time on understanding how to modelize backpropagation. I mean i can code modular forward and backward passes but what bothers me is that if i have the model below : Two Layered Neural Network
Lets assume that our loss function here is a softmax loss function. In my modular softmax_loss() function i am calculating loss and gradient with respect to scores (dSoft = dL/dY). After that, when i'am following backwards lets say for b2, db2 would be equal to dSoft*1 or dW2 would be equal to dSoft*dX2(outputs of relu gate). What's the chain rule here ? Why isnt dSoft equal to 1 ? Because dL/dL would be 1 ?
The softmax function is outputs a number given an input x.
What dSoft means is that you're computing the derivative of the function softmax(x) with respect to the input x. Then to calculate the derivative with respect to W of the last layer you use the chain rule i.e. dL/dW = dsoftmax/dx * dx/dW. Note that x = W*x_prev + b where x_prev is the input to the last node. Therefore dx/dW is just x and dx/db is just 1, which means that dL/dW or simply dW is dsoftmax/dx * x_prev and dL/db or simply db is dsoftmax/dx * 1. Note that here dsoftmax/dx is dSoft we defined earlier.

Simulation from a nested glmer model

I'm having a problem generating simulations from a 3 level glmer model when conditioning on the random effects (I'm actually using predict via bootMer but the problem is the same).
This works:
library(lme4)
fit1 = glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
simulate(fit1, re.form=NULL)
This fails:
cbpp$bigherd = rep(1:7, 8)
fit2 = glmer(cbind(incidence, size - incidence) ~ period + (1 | bigherd / herd),
data = cbpp, family = binomial)
simulate(fit2, re.form=NULL)
Error: No random effects terms specified in formula
Many thanks for any ideas.
Update
Ben, many thanks for your help below, really appreciate it. I wonder if I can impose on you again.
What I want to do is simulate predictions on the response scale and I'm not sure if I can use your work around? Or if there is an alternative to what I'm doing. Thank you!
This works as expected, but is not conditional on random effects:
FUN = function(.){
predict(., type="response")
}
bootMer(fit2, FUN, nsim=3)$t
This doesn't work, as would be expected given above problem:
bootMer(fit2, FUN, nsim=3, use.u=TRUE)$t
As far as I can see, I can't pass re.form to bootMer.
Does the alternative below result in simulated predictions conditional on random effects without passing use.u to bootMer?
FUN = function(.){
predict(., type="response", re.form=~(1|herd:bigherd) + (1|bigherd))
}
bootMer(fit2, FUN, nsim=10)$t
I'm not sure what's going on yet, but here are two workarounds that do work:
simulate(fit2, re.form=lme4:::reOnly(formula(fit2)))
simulate(fit2, re.form=~(1|herd:bigherd) + (1|bigherd))
There must be something going wrong with the expansion of the "slash" term, because this doesn't work:
simulate(fit2, re.form=~(1|bigherd/herd))
I've posted this as an lme4 issue
These workarounds don't work for bootMer (which only takes the use.u argument, not re.form) in the current CRAN release (1.1-9).
It is fixed in the development version on Github (1.1-10): devtools::install_github("lme4/lme4") will install it, if you have compilation tools installed.
In the meantime you could just go ahead and implement your own parametric bootstrap (for parametric bootstrapping, bootMer is actually a very thin wrapper around simulate()/[refit()orupdate()]/FUN`). Much of the complication has to do with parallel computation (you'd have to add some of it back in if you want parallel computation in your own PB implementation).
This is the outline of a hand-rolled parametric bootstrap:
nboot <- 10
nresp <- length(FUN(orig_fit))
res <- matrix(NA,nboot,nresp)
for (i in 1:nboot) {
res[i,] <- FUN(update(orig_fit,data=simulate(orig_fit,...)))
## or use refit() for LMMs
## ... are options to simulate()
}
t(apply(res,2,quantile,c(0.025,0.975)))

Dynamic Topic model output - Blei format

I am working with the Dynamic Topic Models package that was developed by Blei. I am new to LDA however I understand it.
I would like to know what does the output by the name of
lda-seq/topic-000-var-obs.dat store?
I know that lda-seq/topic-001-var-e-log-prob.dat stores the log of the variational posterior and by applying the exponential over it, I get the probability of the word within Topic 001.
Thanks
Topic-000-var-e-log-prob.dat store the log of the variational posterior of the topic 1.
Topic-001-var-e-log-prob.dat store the log of the variational posterior of the topic 2.
I have failed to find a concrete answer anywhere. However, since the documentation's sample.sh states
The code creates at least the following files:
- topic-???-var-e-log-prob.dat: the e-betas (word distributions) for topic ??? for all times.
...
- gam.dat
without mentioning the topic-000-var-obs.dat file, suggests that it is not imperative for most analyses.
Speculation
obs suggest observations. After a little dig around in the example/model_run results, I plotted the sum across epochs for each word/token using:
temp = scan("dtm/example/model_run/lda-seq/topic-000-var-obs.dat")
temp.matrix = matrix(temp, ncol = 10, byrow = TRUE)
plot(rowSums(temp.matrix))
and the result is something like:
The general trend of the non-negative values is decreasing and many values are floored (in this case to -11.00972 = log(1.67e-05)) Suggesting that these values are weightings or some other measure of influence on the model. The model removes some tokens and the influence/importance of the others tapers off over the index. The later trend may be caused by preprocessing such as sorting tokens by tf-idf when creating the dictionary.
Interestingly the row sum values varies for both the floored tokens and the set with more positive values:
temp = scan("~/Documents/Python/inference/project/dtm/example/model_run/lda-seq/topic-009-var-obs.dat")
temp.matrix = matrix(temp, ncol = 10, byrow = TRUE)
plot(rowSums(temp.matrix))