How do I specify a random slope for a specific contrast in lme4? - regression

With the following dataset...
Subj <- rep(1:10, each = 10)
Item <- rep(1:10, times = 10)
IV1 <- rep(1:5, times = 20)
DV <- rnorm(100)
library(data.table)
data <- as.data.table(cbind(Subj, Item, IV1, DV))
data$Subj <- as.factor(data$Subj)
data$Item <- as.factor(data$Item)
data$IV1 <- as.factor(data$IV1)
library(MASS)
contrasts(data$IV1) <- contr.sdif(5)
library(lme4)
m1 <- lmer(DV ~ IV1 + (1 + IV1|Subj) + (1|Item), data = data)
Now suppose that it turned out that there was only variance in the random subject slope for the contrast of IV1 level 2 vs. IV1 level1. Is it possible to fit a random slope only for this contrast?

Related

Combining Multiple Plots in R Together

Using the "plotly" library in R - I generated some random data and made some interactive data visualizations:
library(plotly)
library(ggplot2)
library(dplyr)
library(hrbrthemes)
#subplot 1
data1 <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) - seq(-140, 224)^2 / 10000
)
p1 <- ggplot(data1, aes(x=day, y=value)) +
geom_line( color="#69b3a2") +
xlab("") +
theme_ipsum() +
theme(axis.text.x=element_text(angle=60, hjust=1))
fig1 <- ggplotly(p1)
scatter_1 = data.frame(x = rnorm(100,100,100), y = rnorm(100,100,100))
fig2 <- plot_ly(data = scatter_1, x = ~x, y = ~y)
#subplot 2
data2 <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) - seq(-140, 224)^2 / 10000
)
p2 <- ggplot(data2, aes(x=day, y=value)) +
geom_line( color="#69b3a2") +
xlab("") +
theme_ipsum() +
theme(axis.text.x=element_text(angle=60, hjust=1))
fig3 <- ggplotly(p2)
scatter_2 = data.frame(x = rnorm(100,100,100), y = rnorm(100,100,100))
fig4 <- plot_ly(data = scatter_1, x = ~x, y = ~y)
#subplot 3
data3 <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) - seq(-140, 224)^2 / 10000
)
p3 <- ggplot(data3, aes(x=day, y=value)) +
geom_line( color="#69b3a2") +
xlab("") +
theme_ipsum() +
theme(axis.text.x=element_text(angle=60, hjust=1))
fig5 <- ggplotly(p3)
scatter_3 = data.frame(x = rnorm(100,100,100), y = rnorm(100,100,100))
fig6 <- plot_ly(data = scatter_3, x = ~x, y = ~y)
After this, I used the "subplot" function in the "plotly" library to make the three following subplots:
subplot1 <- subplot(fig1, fig2, nrows = 1, margin = 0.05)
subplot2 <- subplot(fig3, fig4, nrows = 2, margin = 0.05)
subplot3 <- subplot(fig5, fig6, nrows = 2, margin = 0.05)
I was wondering if its possible to combine these three subplots into a single "object" (that can be later saved as an HTML file, e.g. using htmlwidgets) that would look something like this:
#pseudocode (e.g. imagine some "wrap" function)
results = wrap(subplot1, subplot2, subplot3)
saveWidget( results, "results.html")
That is, combine the plots in such a way that the user can navigate between these 3 subplots. Is this possible?
There are many ways to combine these. The easiest way is probably using subplot again.
subplot(subplot1, subplot2, subplot3, nrows = 3, margin = .05,
heights = c(.2, .4, .4)) # proportion of subplot height
An option could be creating a dropdown menu in Rmarkdown html to show each subplot with in a dropdown menu using {.tabset .tabset-dropdown}. Here is a reproducible example:
---
title: "Combining Multiple Plots in R Together"
date: "2022-08-28"
output: html_document
---
# Subplots {.tabset .tabset-dropdown}
```{r, warning=FALSE, echo=FALSE, message=FALSE}
library(plotly)
library(ggplot2)
library(dplyr)
library(hrbrthemes)
#subplot 1
data1 <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) - seq(-140, 224)^2 / 10000
)
p1 <- ggplot(data1, aes(x=day, y=value)) +
geom_line( color="#69b3a2") +
xlab("") +
theme_ipsum() +
theme(axis.text.x=element_text(angle=60, hjust=1))
fig1 <- ggplotly(p1)
scatter_1 = data.frame(x = rnorm(100,100,100), y = rnorm(100,100,100))
fig2 <- plot_ly(data = scatter_1, x = ~x, y = ~y)
#subplot 2
data2 <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) - seq(-140, 224)^2 / 10000
)
p2 <- ggplot(data2, aes(x=day, y=value)) +
geom_line( color="#69b3a2") +
xlab("") +
theme_ipsum() +
theme(axis.text.x=element_text(angle=60, hjust=1))
fig3 <- ggplotly(p2)
scatter_2 = data.frame(x = rnorm(100,100,100), y = rnorm(100,100,100))
fig4 <- plot_ly(data = scatter_1, x = ~x, y = ~y)
#subplot 3
data3 <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) - seq(-140, 224)^2 / 10000
)
p3 <- ggplot(data3, aes(x=day, y=value)) +
geom_line( color="#69b3a2") +
xlab("") +
theme_ipsum() +
theme(axis.text.x=element_text(angle=60, hjust=1))
fig5 <- ggplotly(p3)
scatter_3 = data.frame(x = rnorm(100,100,100), y = rnorm(100,100,100))
fig6 <- plot_ly(data = scatter_3, x = ~x, y = ~y)
subplot1 <- subplot(fig1, fig2, nrows = 1, margin = 0.05)
subplot2 <- subplot(fig3, fig4, nrows = 2, margin = 0.05)
subplot3 <- subplot(fig5, fig6, nrows = 2, margin = 0.05)
```
## subplot 1
```{r, echo=FALSE}
subplot1
```
## subplot 2
```{r, echo=FALSE}
subplot2
```
## subplot 3
```{r, echo=FALSE}
subplot3
```
Output:
Sounds like you want to export reactive elements, which doesn't sound exportable. I know shiny can be modularized to be reusable, but I don't think that's exportable to htmlwidget. It may be possible if plots 1, 3, 5, and 2,4,6 are the same type and you make a custom slider. Sounds like a pain to me.

delete_part deletes the top border when outputting pdf

I am using the following rmarkdown code, using xelatex engine:
access <- function(x, ...) {
x <- delete_part(x)
x <- colformat_double(x, big.mark = "'", decimal.mark = ",")
x <- set_table_properties(x, layout = "autofit")
x <- border_remove(x)
std_border <- fp_border_default(width = 1, color = "black")
x <- border_outer(x, part="all", border = std_border )
x <- border_inner_h(x, border = std_border, part="all")
x <- border_inner_v(x, border = std_border, part="all")
autofit(x)
}
firstc <- c("Field:","Table:","Sort:","Show:","Criteria:","Or:")
secondc <- c("Field:","Table:","Sort:","Show:","Criteria:","Or:")
```
```{r echo=FALSE}
tabela <- data.frame(firstc,secondc)
ft <- flextable(tabela)
ft <- access(ft)
ft <- hline_top(ft)
ft <- fit_to_width(ft, max_width = 4)
ft <- set_table_properties(ft, layout = "autofit", width = 1)
ft
```
However, the top hline does not show up in the PDF output.
Any ideas?

How do I add significance asterisks next to my values in a correlation matrix heat map?

I found this code online at http://www.sthda.com/english/wiki/ggplot2-quick-correlation-matrix-heatmap-r-software-and-data-visualization
It provides instructions for how to create a correlation matrix heat map and it works well. However, I was wondering how to get little stars * next to the values in the matrix that are significant. How would I go about doing that. Any help is greatly appreciated!!
mydata <- mtcars[, c(1,3,4,5,6,7)]
head(mydata)
cormat <- round(cor(mydata),2)
head(cormat)
library(reshape2)
melted_cormat <- melt(cormat)
head(melted_cormat)
library(ggplot2)
ggplot(data = melted_cormat, aes(x=Var1, y=Var2, fill=value)) +
geom_tile()
# Get lower triangle of the correlation matrix
get_lower_tri<-function(cormat){
cormat[upper.tri(cormat)] <- NA
return(cormat)
}
# Get upper triangle of the correlation matrix
get_upper_tri <- function(cormat){
cormat[lower.tri(cormat)]<- NA
return(cormat)
}
upper_tri <- get_upper_tri(cormat)
# Melt the correlation matrix
library(reshape2)
melted_cormat <- melt(upper_tri, na.rm = TRUE)
# Heatmap
library(ggplot2)
ggplot(data = melted_cormat, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1))+
coord_fixed()
reorder_cormat <- function(cormat){
# Use correlation between variables as distance
dd <- as.dist((1-cormat)/2)
hc <- hclust(dd)
cormat <-cormat[hc$order, hc$order]
}
# Reorder the correlation matrix
cormat <- reorder_cormat(cormat)
upper_tri <- get_upper_tri(cormat)
# Melt the correlation matrix
melted_cormat <- melt(upper_tri, na.rm = TRUE)
# Create a ggheatmap
ggheatmap <- ggplot(melted_cormat, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1))+
coord_fixed()
# Print the heatmap
print(ggheatmap)
ggheatmap +
geom_text(aes(Var2, Var1, label = value), color = "black", size = 4) +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.justification = c(1, 0),
legend.position = c(0.6, 0.7),
legend.direction = "horizontal")+
guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
title.position = "top", title.hjust = 0.5))
cor() doesn't show the significance level, you may have to use rcorr() from Hmisc package
This is quite similar to what you want (the graphic output is not so nice though)
library(ggplot2)
library(reshape2)
library(Hmisc)
library(stats)
abbreviateSTR <- function(value, prefix){ # format string more concisely
lst = c()
for (item in value) {
if (is.nan(item) || is.na(item)) { # if item is NaN return empty string
lst <- c(lst, '')
next
}
item <- round(item, 2) # round to two digits
if (item == 0) { # if rounding results in 0 clarify
item = '<.01'
}
item <- as.character(item)
item <- sub("(^[0])+", "", item) # remove leading 0: 0.05 -> .05
item <- sub("(^-[0])+", "-", item) # remove leading -0: -0.05 -> -.05
lst <- c(lst, paste(prefix, item, sep = ""))
}
return(lst)
}
d <- mtcars
cormatrix = rcorr(as.matrix(d), type='spearman')
cordata = melt(cormatrix$r)
cordata$labelr = abbreviateSTR(melt(cormatrix$r)$value, 'r')
cordata$labelP = abbreviateSTR(melt(cormatrix$P)$value, 'P')
cordata$label = paste(cordata$labelr, "\n",
cordata$labelP, sep = "")
cordata$strike = ""
cordata$strike[cormatrix$P > 0.05] = "X"
txtsize <- par('din')[2] / 2
ggplot(cordata, aes(x=Var1, y=Var2, fill=value)) + geom_tile() +
theme(axis.text.x = element_text(angle=90, hjust=TRUE)) +
xlab("") + ylab("") +
geom_text(label=cordata$label, size=txtsize) +
geom_text(label=cordata$strike, size=txtsize * 4, color="red", alpha=0.4)
Source
difference_p is the P_value of correlation matrix,
ax5 draws the sns.heatmap and return as ax5
data=correlation_p
for y in range(data.shape[0]):
for x in range(data.shape[1]):
if data[y,x]<0.1:
ax4.text(x + 0.5, y + 0.5, '-',size=48,
horizontalalignment='center',
verticalalignment='center',
)

Loop for regression over multiple factors

I am struggling to get a loop to run several regressions and store the coefficients and intercepts. I have a data similar as this:
data <- data.frame(y = rnorm(10), x1 = rnorm(10)*2, ID = c(rep(1,10), rep(2,10)), group = c(rep(3,5), rep(4,5)))
Where ID and group are factors, therefore:
data$ID <- as.factor(data$ID)
data$group <- as.factor(data$group)
So far I tried 2 approaches.
First I did the following:
for (i in unique(data$ID)){
for (j in unique(data$group)){
fit <- glm(y ~ x1, data=data[data$ID == i & data$group == j, ])
}
}
Afterwards I did the following:
myfun <- function(data) {
step(glm(y ~ x1, data = data), trace=0)
}
fcomb <- unique(data[,c("ID","group")])
mod <- list()
for(i in 1:nrow(fcomb)) {
mod <- c(mod,list(myfun(subset(data,ID==fcomb$ID[i] & group==fcomb$group[i]))))
}
In the end I would like to have a dataset in which for each ID and group I would have the intercept and the beta for the effect of x1 in y.
When I performed the second strategy I got something, but the betas and the intercepts are the same (which is totally impossible) and I still don't know how to store the values.
set.seed(1839)
data <- data.frame(
y = rnorm(10),
x1 = rnorm(10) * 2,
ID = c(rep(1, 10), rep(2, 10)),
group = c(rep(3, 5), rep(4, 5))
)
grid <- expand.grid(ID = unique(data$ID), group = unique(data$group))
results <- lapply(1:nrow(grid), function(x) {
lm(y ~ x1, data[data$ID == grid[x, 1] & data$group == grid[x, 2], ])$coef
})
results <- t(do.call(cbind, results))
results <- cbind(grid, results)
results
Returns:
ID group (Intercept) x1
1 1 3 -0.454072247 1.0295731
2 2 3 -0.454072247 1.0295731
3 1 4 0.007800405 -0.1832663
4 2 4 0.007800405 -0.1832663

Predict sentiment score using multiclass logistic regression with R

I am trying to create a sentiment analysis classifier using logistic regression with R (glmnet).. Here is the R code :
library(tidyverse)
library(text2vec)
library(caret)
library(glmnet)
library(ggrepel)
Train_classifier <- read.csv('IRC.csv',header=T, sep=";")
Test_classifier <- read.csv('IRC2.csv',header=T, sep=";")
# select only 4 column of the dataframe
Train <- Train_classifier[, c("Note.Reco", "Raison.Reco", "DATE_SAISIE", "idpart")]
Test <- Test_classifier[, c("Note.Reco", "Raison.Reco", "DATE_SAISIE", "idpart")]
#delete rows with empty value columns
subTrain <- filter(Train, trimws(Raison.Reco)!=" ")
subTrain$ID <- seq.int(nrow(subTrain))
# # replacing class values
subTrain$Note.Reco = ifelse(subTrain$Note.Reco >= 0 & subTrain$Note.Reco <= 4, 0, ifelse(subTrain$Note.Reco >= 5 &
subTrain$Note.Reco <= 6, 1, ifelse(subTrain$Note.Reco >= 7 & subTrain$Note.Reco <= 8, 2, 3)))
subTest <- filter(Test, trimws(Raison.Reco)!=" ")
subTest$ID <- seq.int(nrow(subTest))
#Data pre processing
#Doc2Vec
prep_fun <- tolower
tok_fun <- word_tokenizer
subTrain[] <- lapply(subTrain, as.character)
it_train <- itoken(subTrain$Raison.Reco,
preprocessor = prep_fun,
tokenizer = tok_fun,
ids = subTrain$ID,
progressbar = TRUE)
subTest[] <- lapply(subTest, as.character)
it_test <- itoken(subTest$Raison.Reco,
preprocessor = prep_fun,
tokenizer = tok_fun,
ids = subTest$ID,
progressbar = TRUE)
#creation of vocabulairy and term document matrix
### fichier d'apprentissage
vocab_train <- create_vocabulary(it_train)
vectorizer_train <- vocab_vectorizer(vocab_train)
dtm_train <- create_dtm(it_train, vectorizer)
### test data
vocab_test <- create_vocabulary(it_test)
vectorizer_test <- vocab_vectorizer(vocab_test)
dtm_test <- create_dtm(it_test, vectorizer_test)
##Define tf-idf model
tfidf <- TfIdf$new()
# fit the model to the train data and transform it with the fitted model
dtm_train_tfidf <- fit_transform(dtm_train, tfidf)
dtm_test_tfidf <- fit_transform(dtm_test, tfidf)
glmnet_classifier <- cv.glmnet(x = dtm_train_tfidf,
y = subTrain[['Note.Reco']],
family = 'multinomial',
# L1 penalty
alpha = 1,
# interested in the area under ROC curve
type.measure = "auc",
# 5-fold cross-validation
nfolds = 5,
# high value is less accurate, but has faster training
thresh = 1e-3,
# again lower number of iterations for faster training
maxit = 1e3)
plot(glmnet_classifier)
Here is the struct of the data subTrain :
[![Note.Reco Raison.Reco DATE_SAISIE idpart ID
3 Good service 19/03/2014 56992
2 good stuff 19/03/2014 53645
8 very nice 20/02/2016 261392
...][1]][1]
I get this plot (attached file) Can you explain me more if it is true Thank you