How to create a one-mode network (adjacency matrix) based on matches from two-mode network - igraph

I'm working with a survey data (row=respondents; col=opinion) and trying to create a one-mode adjacency matrix among respondents that measures the number of times they gave the same answers for each dyad network. Specifically,
affiliation_matrix <- matrix(c(
0,1,0,
1,0,0,
0,1,1
)
,nrow=3
,ncol=3,
byrow=TRUE)
dimnames(affiliation_matrix) <- list(
c("Alyssa", "Brad", "Carla"),
c("opinion1", "opinion2", "opinion3")
)
affiliation_matrix
In the above example of 3x3 matrix, I'd like to create a matrix that looks like
ideal_matrix <- matrix(c(
1,1,2,
1,1,0,
2,0,2
)
,nrow=3
,ncol=3,
byrow=TRUE)
dimnames(ideal_matrix) <- list(
c("Alyssa", "Brad", "Carla"),
c("Alyssa", "Brad", "Carla")
)
So between Alyssa and Carla, their connection gets a 2 because they both answered the same for opinion1 and opinion2. Similarly, the connection between Alyssa and Brad gets a 1 because they both answered 0 in opinion3.
I was looking up the code get.adjacency() and bipartite.projection but this one seems to only deal with person-event network where only the value of 1 is treated as a match. Is there an R package that lets me do this, or do I need to create my own loop manually (if so... how??)?
Thanks!!

There might be an off-the-shelve solution I am not aware of. However, in this particular (binary) case one could recode the matrix, multiply each by its transpose and finally add them together:
affiliation_matrix_recoded <- affiliation_matrix
affiliation_matrix_recoded[] <- ifelse(affiliation_matrix_recoded > 0, 0, 1)
a <- tcrossprod(affiliation_matrix)
b <- tcrossprod(affiliation_matrix_recoded )
r <- a + b
diag(r) <- 0
r
Resulting in this:
Alyssa Brad Carla
Alyssa 0 1 2
Brad 1 0 0
Carla 2 0 0

Related

problem with bootMer CI: upper and lower limits are identical

I'm having the hardest time generating confidence intervals for my glmer poisson model. After following several very helpful tutorials (such as https://drewtyre.rbind.io/classes/nres803/week_12/lab_12/) as well as stackoverflow posts, I keep getting very strange results, i.e. the upper and lower limits of the CI are identical.
Here is a reproducible example containing a response variable called "production," a fixed effect called "Treatment_Num" and a random effect called "Genotype":
df1 <- data.frame(production=c(15,12,10,9,6,8,9,5,3,3,2,1,0,0,0,0), Treatment_Num=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), Genotype=c(1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2))
#run the glmer model
df1_glmer <- glmer(production ~ Treatment_Num +(1|Genotype),
data = df1, family = poisson(link = "log"))
#make an empty data set to predict from, that contains the explanatory variables but no response
require(magrittr)
df_empty <- df1 %>%
tidyr::expand(Treatment_Num, Genotype)
#create new column containing predictions
df_empty$PopPred <- predict(df1_glmer, newdata = df_empty, type="response",re.form = ~0)
#function for bootMer
myFunc_df1_glmer <- function(mm) {
predict(df1_glmer, newdata = df_empty, type="response",re.form=~0)
}
#run bootMer
require(lme4)
merBoot_df1_glmer <- bootMer(df1_glmer, myFunc_df1_glmer, nsim = 10)
#get confidence intervals out of it
predCL <- t(apply(merBoot_df1_glmer$t, MARGIN = 2, FUN = quantile, probs = c(0.025, 0.975)))
#enter lower and upper limits of confidence interval into df_empty
df_empty$lci <- predCL[, 1]
df_empty$uci <- predCL[, 2]
#when viewing df_empty the problem becomes clear: the lci and uci are identical!
df_empty
Any insights you can give me will be much appreciated!
Ignore my comment!
The issue is with the function you created to pass to bootMer(). You wrote:
myFunc_df1_glmer <- function(mm) {
predict(df1_glmer, newdata = df_empty, type="response",re.form=~0)
}
The argument mm should be a fitted model object derived from the bootstrapped data.
However, you don't pass this object to predict(), but rather the original model
object. If you change the function to:
myFunc_df1_glmer <- function(mm) {
predict(mm, newdata = df_empty, type="response",re.form=~0)
#^^ pass in the object created by bootMer
}
then it works:
> df_empty
# A tibble: 8 x 5
Treatment_Num Genotype PopPred lci uci
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 12.9 9.63 15.7
2 1 2 12.9 9.63 15.7
3 2 1 5.09 3.87 5.89
4 2 2 5.09 3.87 5.89
5 3 1 2.01 1.20 2.46
6 3 2 2.01 1.20 2.46
7 4 1 0.796 0.361 1.14
8 4 2 0.796 0.361 1.14
As an aside -- how many genotypes in your actual data? If less than 5-7 you might
do better using a straight up glm() with genotype as a factor using sum-to-zero
contrasts.

Alternative to extract function when working with raster objects

I wonder how to sum pixel values of a raster (val_r) for each categories of another raster (cat_r). In other words, does an alternative to the function "extract" exist when working with raster objects? Thank you very much!
# sample raster with categories
cat_r<-raster(ncol=3,nrow=3, xmn=-10, xmx=10, ymn=-10, ymx=10)
cat_r[]<-c(1,2,1,3,4,3,4,4,4 ) #4 categories: 1, 2, 3 and 4
#sample raster with pixel values
val_r <-raster(ncol=3,nrow=3, xmn=-10, xmx=10, ymn=-10, ymx=10)
val_r[]<-c(1,0,1,5,2,5,2,2,2)
#extract function doesn't work for
extract(val_r, cat_r, fun=sum)
#I should find the following values: category 1: 2, cat 2: 0, cat 3: 10, cat 4: 8
You can use the zonal method:
library(raster)
cat_r <- raster(ncol=3,nrow=3, xmn=-10, xmx=10, ymn=-10, ymx=10, vals=c(1,2,1,3,4,3,4,4,4 ))
val_r <- setValues(cat_r, c(1,0,1,5,2,5,2,2,2))
zonal(val_r, cat_r, "sum")
# zone sum
#[1,] 1 2
#[2,] 2 0
#[3,] 3 10
#[4,] 4 8
This is equivalent to
s <- stack(cat_r, val_r)
v <- values(s)
tapply(v[,2], v[,1], sum)
# 1 2 3 4
# 2 0 10 8

How to write a JSON object from R dataframe with grouping

In general I feel there is a need to make JSON objects by folding multiple columns. There is no direct way to do this afaik. Please point it out if there is ..
I have data of this from
A B C
1 a x
1 a y
1 c z
2 d p
2 f q
2 f r
How do I write a json which looks like
{'query':'1', 'type':[{'name':'a', 'values':[{'value':'x'}, {'value':'y'}]}, {'name':'c', 'values':[{'value':'z'}]}]}
and similarly for 'query':'2'
I am looking to spit them in the mongo import/export individual json lines format.
Any pointers are also appreciated..
You've got a little "non-standard" thing going with two keys of "value" (I don't know if this is legal json), as you can see here:
(js <- jsonlite::fromJSON('{"query":"1", "type":[{"name":"a", "values":[{"value":"x"}, {"value":"y"}]}, {"name":"c", "values":[{"value":"z"}]}]}'))
## $query
## [1] "1"
##
## $type
## name values
## 1 a x, y
## 2 c z
... with a data.frame cell containing a list of data.frames:
js$type$values[[1]]
## value
## 1 x
## 2 y
class(js$type$values[[1]])
## [1] "data.frame"
If you can accept your "type" variable containing a vector instead of a named-list, then perhaps the following code will suffice:
jsonlite::toJSON(lapply(unique(dat[, 'A']), function(a1) {
list(query = a1,
type = lapply(unique(dat[dat$A == a1, 'B']), function(b2) {
list(name = b2,
values = dat[(dat$A == a1) & (dat$B == b2), 'C'])
}))
}))
## [{"query":[1],"type":[{"name":["a"],"values":["x","y"]},{"name":["c"],"values":["z"]}]},{"query":[2],"type":[{"name":["d"],"values":["p"]},{"name":["f"],"values":["q","r"]}]}]

Grouping multiple rows in R

I've generated a heatmap in R for microbiome data, using the following link
My data as far as rows is concerned looks like this:
781
782
783
547
519
575
044
045
049
If I want to group 781-783, 547-575 and 044-049 as individual groups and give them separate colours using the below idea:
Assigning animals to different groups (2 random groups in this case)
var1 <- round(runif(n=12, min=1, max=2))
var1 <- replace (var1, which(var1 == 1), "deepskyblue")
var1 <- replace (var1, which(var1 == 2), "magenta")
cbind(row.names(data.prop), var1)
How do I go about it? I understand that the above code, randomly generates 2 groups, but how can I specify which rows go into which group?
Thank you,
Susheel
Because rownames are of necessity character and the only good range-operator in R is ":" for numeric values: you need to coerce ranges to the desired "0nn" format. This is untested in the absence of a proper test case (which questioners are asked to provide):
#look at...
sprintf("%03i", c(781:783, 547:575, 044:049))
# then....
data.prop[ sprintf("%03i", c( 781:783, 547:575, 044:049), 'var1'] <-
mapply(function(clr, rng) {rep(clr, length(rng) )},
c("deepskyblue", "magenta", "green"),
list( 781:783, 547:575, 44:49)
)

Subsetting a data frame in a function using another data frame as parameter

I would like to submit a data frame to a function and use it to subset another data frame.
This is the basic data frame:
foo <- data.frame(var1= c(1, 1, 1, 2, 2, 3), var2=c('A', 'A', 'B', 'B', 'C', 'C'))
I use the following function to find out the frequencies of var2 for specified values of var1.
foobar <- function(x, y, z){
a <- subset(x, (x$var1 == y))
b <- subset(a, (a$var2 == z))
n=nrow(b)
return(n)
}
Examples:
foobar(foo, 1, "A") # returns 2
foobar(foo, 1, "B") # returns 1
foobar(foo, 3, "C") # returns 1
This works. But now I want to submit a data frame of values to foobar. Instead of the above examples, I would like to submit df to foobar and get the same results as above (2, 1, 1)
df <- data.frame(var1=c(1, 1, 3), var2=c("A", "B", "C"))
When I change foobar to accept two arguments like foobar(foo, df) and use y[, c(var1)] and y[, c(var2)] instead of the two parameters x and y it still doesn't work. Which way is there to do this?
edit1: last paragraph clarified
edit2: var1 type corrected
Try this:
library(plyr)
match_df <- function(x, match) {
vars <- names(match)
# Create unique id for each row
x_id <- id(match[vars])
match_id <- id(x[vars])
# Match identifiers and return subsetted data frame
x[match(x_id, match_id, nomatch = 0), ]
}
match_df(foo, df)
# var1 var2
# 1 1 A
# 3 1 B
# 5 2 C
Your function foobar is expecting three arguments, and you only supplied two arguments to it with foobar(foo, df). You can use apply to get what you want:
apply(df, 1, function(x) foobar(foo, x[1], x[2]))
And in use:
> apply(df, 1, function(x) foobar(foo, x[1], x[2]))
[1] 2 1 1
To respond to your edit:
I'm not entirely sure what y[, c(var1)] means, but here's an attempt at trying to figure out what you are trying to do.
What I think you were trying to do was: foobar(foo, y = df[, "var1"], z = df[, "var2"]).
First, note that the use of c() is not needed here and you can reference the columns you want by placing the name of the column in quotes OR reference the column by number (as I did above). Secondly, df[, "var1"] returns all of the rows for the column names var1 which has a length of three:
> length(df[, "var1"])
[1] 3
The function you defined is not set up to deal with vectors of length greater than 1. That is why we need to iterate through each row of your dataframe to grab a single value, process it, and then go to the next row in the data.frame. That is what the apply function does. It is equivalent to saying something along the lines of for (i in 1: length(nrow(df)) but is a more idiomatic way of handling such issues.
Finally, is there a reason you generated var1 as a factor? It probably makes more sense to treate these as numeric in my opinion. Compare:
> str(df)
'data.frame': 3 obs. of 2 variables:
$ var1: Factor w/ 2 levels "1","3": 1 1 2
$ var2: Factor w/ 3 levels "A","B","C": 1 2 3
Versus
> df2 <- data.frame(var1=c(1,1,3), var2=c("A", "B", "C"))
> str(df2)
'data.frame': 3 obs. of 2 variables:
$ var1: num 1 1 3
$ var2: Factor w/ 3 levels "A","B","C": 1 2 3
In summary - apply is the function you are after here. You may want to spend some time thinking about whether your data should be numeric or a factor, but apply is still what you want.
foobar2 <- function(x, df) {
.dofun <- function(y, z){
a <- subset(x, x$var1==y)
b <- subset(a, a$var2==z)
n <- nrow(b)
return (n)
}
ans <- mapply(.dofun, as.character(df$var1), as.character(df$var2))
names(ans) <- NULL
return(ans)
}