Why won't VarCorr display variance for lmerModLmerTest or glmerMod objects? - lme4

I'm trying to extract this section of the model summary from a set of glmms. I want both the variance and the standard deviation.
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 0.4123 0.6421
Number of obs: 56, groups: herd, 15
I tried following this answer Extract random effect variances from lme4 mer model object
But I can't seem to get the variance, only the standard deviation. I thought perhaps this was because I was using glmer instead of lmer, but I seem to get the same results.
gm1 <- lmer( size ~ period + (1 | herd), data = cbpp)
summary(gm1)
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 44.40 6.664
Residual 14.51 3.810
Number of obs: 56, groups: herd, 15
> VarCorr(gm1, comp="Variance")
Groups Name Std.Dev.
herd (Intercept) 6.6636
Residual 3.8096
> VarCorr(gm1, comp="Std.Dev.")
Groups Name Std.Dev.
herd (Intercept) 6.6636
Residual 3.8096
> VarCorr(gm1, comp=c("Variance","Std.Dev."))
Groups Name Std.Dev.
herd (Intercept) 6.6636
Residual 3.8096
gm2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
summary(gm2)
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 0.4123 0.6421
Number of obs: 56, groups: herd, 15
> VarCorr(gm2, comp="Variance")
Groups Name Std.Dev.
herd (Intercept) 0.64207
> VarCorr(gm2, comp="Std.Dev.")
Groups Name Std.Dev.
herd (Intercept) 0.64207
> VarCorr(gm2, comp=c("Variance","Std.Dev."))
Groups Name Std.Dev.
herd (Intercept) 0.64207
Any ideas what might be going on here?

comp is an argument to the print() method, not to VarCorr.
print(VarCorr(gm1), comp=c("Variance", "Std.Dev."))
Groups Name Variance Std.Dev.
herd (Intercept) 44.404 6.6636
Residual 14.513 3.8096
You might also be interested in
as.data.frame(VarCorr(gm1))[,c("vcov", "sdcor")]
vcov sdcor
1 44.40371 6.663611
2 14.51309 3.809605

Related

Cox regression in a subset of rows for many different columns?

I have a large dataset, with many columns. Columns 56 to 77 are miRNA expression dividied into tertiles (but only written: 1, 2 or 3). Columns 33:54 is miRNA expression in values (1.453, 3.245, etc). I want to run a Cox regression using tertile 1 and tertile 3, ignoring tertile 2, for each miRNA.
Example:
Model 1: I want a Cox regression with miRNA 1 (column 33), time and event.
Model 2: Cox regression with miRNA 2 (column 34), time and event.
Model 3: Cox regression with miRNA 3 (column 35), time and event.
Etc.
The data for each miRNA column is the three tertiles, and I just one to use tertile 1 and 3.
I tried with lapply. First using subset to obtain tertiles 1 and 3. Then Cox regression. But I get an error. Anyone knows how to do this?
Thanks! :)
The code:
miRNA_tertiles <- DB[56:77]
cox_tert = lapply(miRNA_tertiles, function(x){
new.data = subset(DB, x != 2)
formula = as.formula(paste('Surv(years, AD)~', x))
cox_fit_tert = coxph(formula, data = new.data)
summary(cox_fit_tert)$coefficients[,c(2,3,5)] %>% round(3)
})

occurrence to score

I get a frequency of words I would like to convert number_of_occuerence to a number between 0-10.
word number_of_occurrence score
and 200 10
png 2 1
where 50 6
news 120 7
If you want to rate terms frequencies in a corpus, I suggest you to read this wikipedia article : Term frequency–inverse document frequency.
There are many ways to count the term frequency.
I understood want to rate it between 0 to 10.
I didn't get how you calculated you score values examples.
Anyway I suggest you an usual method: the log function.
#count the occurrences of you terms
freq_table = {}
words = tokenize(sentence)
for word in words:
word = word.lower()
#stem the word if you can, using nltk
if word in stopWords:#do you really want to count the occurrences of 'and'?
continue
if word in freq_table:
freq_table[word] += 1
else:
freq_table[word] = 1
#log normalize the occurrences
for wordCount in freq_table.values():
wordCount = 10*math.log(1+wordCount)
of course instead of log normalization you can use a normalization by the maximum.
#ratio max normalize the occurrences
max = max(freq_table.values())
for wordCount in freq_table.values():
wordCount = 10*wordCount/max
Or if you need a threshold effect,you can use a sigmoid function you could customize:
For more word processing check the Natural Language Toolkit. For a good term frequency count stematisation is good choice (stopwords are also useful)!
Score is between 0-10. The maximum score is 10 for occurence 50, therefore anything higher than that should also has score 10. On the other hand, minimum score is 0, while the score is 1 for occurence 5, so assume anything lower than that has score 0.
Interpolation is based on your given condition only:
If a word appear 50 times it should be closer to 10 and if a word
appear 5 times it should be closer to 1.
df['score'] = df['number_of_occurrence'].apply(lambda x: x/5 if 5<=x<=50 else (0 if x< 5 else 10))
Output:

MySQL: Sort By Verse Number

I'm trying to sort by data in Ascending order. Here's how my results are being displayed now:
1:1
1:10
1:2
1:3
1:4
1:5
1:6
1:7
1:8
1:9
2:1
Instead, I want them like this: 1:1, 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, 1:10, 2:1.
Currently, my approach is to replace the : with a .. By adding the +0 in my query below, I thought it would simply treat my values as integers with decimal places. Any feedback on what I'm missing here?
"SELECT myverses.*
FROM myverses
INNER JOIN biblebooks ON myverses.book = biblebooks.name
ORDER BY biblebooks.id ASC, REPLACE(myverses.reference, ':', '.')+0 ASC;";
1.2 is the same as 1.20, which is why it's higher than 1.10.
You can use:
ORDER BY biblebooks.id ASC,
SUBSTRING_INDEX(myverses.reference, ':', 1)*1000 + SUBSTRING_INDEX(myverses.reference, ':', -1) ASC
This will convert 1:1 to 1001, 1.10 to 1010, 1.2 to 1002, so they'll sort correctly.
Just make the multiplier larger than the maximum number of verses in a chapter.

R data.frame to SQL - preserving ordered factors

I am just starting to use MySQL to handle data that is currently in R dataframe objects. I was hoping for a simple round-trip to and from SQL that would recreate an R dataframe exactly:
library("compare",pos=2)
library("RMySQL",pos=2)
conR <- dbConnect(MySQL(),
user = '...',
password = '...',
host = '...',
dbname='r2014')
a3 <- data.frame(x=5:1,y=letters[1:5],z=ordered(c("NEVER","ALWAYS","NEVER","SOMETIMES","NEVER"),levels=c("NEVER","SOMETIMES","ALWAYS")))
a3
dbWriteTable(conn = conR, name = 'a3', value = a3)
a4 <- dbReadTable(conn = conR, name = 'a3')
compare(a3,a4)$detailedResult
a3$z
a4$z
the result shows that factors end up as strings (columns y and z), and that the ordering information for ordered factors is lost (column z):
> a3
x y z
1 5 a NEVER
2 4 b ALWAYS
3 3 c NEVER
4 2 d SOMETIMES
5 1 e NEVER
> compare(a3,a4)$detailedResult
x y z
TRUE FALSE FALSE
> a3$z
[1] NEVER ALWAYS NEVER SOMETIMES NEVER
Levels: NEVER < SOMETIMES < ALWAYS
> a4$z
[1] "NEVER" "ALWAYS" "NEVER" "SOMETIMES" "NEVER"
> a3$y
[1] a b c d e
Levels: a b c d e
> a4$y
[1] "a" "b" "c" "d" "e"
Is there some way to specify the information in the ordered factors in the creation of the table a3 in the database?
I would change the code to:
dbWriteTable(conn = conR, name = 'a3', value = a3, row.names=TRUE)
a4 <- dbReadTable(conn = conR, name = 'a3', row.names=TRUE)
row.names of a data.frame are ordered by default. When they are stored in an SQL column they are also ordered. The SELECT query can use ORDER BY row_names to fetch the ordered set.
Value of row.names in dbReadTable() argument can be changed to NA in case the SQL table does not contain the row_names column.[2]
[1] REF: DBI::dbWriteTable
The interpretation of rownames depends on the ‘row.names’
argument, see ‘sqlRownamesToColumn()’ for details:
• If ‘FALSE’ or ‘NULL’, row names are ignored.
• If ‘TRUE’, row names are converted to a column named
"row_names", even if the input data frame only has natural
row names from 1 to ‘nrow(...)’.
• If ‘NA’, a column named "row_names" is created if the data
has custom row names, no extra column is created in the case
of natural row names.
• If a string, this specifies the name of the column in the
remote table that contains the row names, even if the input
data frame only has natural row names.
[2] REF: DBI::dbReadTable
The presence of rownames depends on the ‘row.names’ argument, see
‘sqlColumnToRownames()’ for details:
• If ‘FALSE’ or ‘NULL’, the returned data frame doesn't have
row names.
• If ‘TRUE’, a column named "row_names" is converted to row
names.
• If ‘NA’, a column named "row_names" is converted to row names
if it exists, otherwise no translation occurs.
• If a string, this specifies the name of the column in the
remote table that contains the row names.

MySQL extract average data from multiple group criteria

Apologies for the wall of text, the example in the end explains my question. Any help is appreciated, thank you!
I have a table which contains several columns of data from among other values voltages and currents.
These instances are logged every second when there is current flowing. I want to calculate an approximated kJ and kW from these values.
Basically I have one table, instances, that contains:
instanceID,
location,
current,
voltage,
time.
And another one, sets, that contains:
instanceID,
setID.
The instanceID is the same, the instanceID in instances is a FK pointing to sets. For every location in instances there are approximately 23 rows (varies). There are 30 locations. So I have 23 rows where the instance have location 1, another 23 for the same instance for location 2 and so on. Time is the logged time for when the measured data was taken (so if the difference is one second between all the 23 instances the difference between the first and the last time is 23 seconds).
I need to calculate the average kW and the total kJ (approximated).
What I've done is the following:
SELECT instances.instanceID, location, current,
voltage, current * voltage AS kW,
COUNT(IF(current > 0 AND voltage > 0,
instances.instanceID,
0)) AS InstancedTime
FROM instances
INNER JOIN sets ON instances.instanceID = sets.instanceID
WHERE sets.setID = arbitrary_number;
The problem arises that I get the following table:
instanceID, location, current, voltage, kW, InstancedTime
The kW is a random number from one of the 23 sets, which is fine since it's an approximation anyway, but the COUNT(IF()) is counting ALL the instances in the instances table, when I only want the query to count the instances for every location.
I tried the MAX(CAST(time AS SIGNED)) - MIN(CAST(time AS SIGNED)), but that takes the max time from the last location minus the min time of the first location, I want to isolate it to one location at a time.
What I want to do is get the total amount of kJ which would be the time it had power multiplied by the kW of that time. Since I know the time is always 1 second between the instances it should be enough to count the number of instances for individual locations and multiply that by the kW, however I want to do that for all the instances within one set. It is possible to replace the set by using a single query for all the individual instances but that would take eons.
I'm trying to take a table that looks like
instanceID, location, current, voltage, kW, InstancedTime
1, 1, 500V, 2A, 1kW, 1s
1, 1, 500V, 2A, 1kw, 1s
1, 2, 400V, 3A, 1.2kW, 1s
1, 2, 400V, 3A, 1.2kW, 1s
2, 1, 700V, 2A, 1.4kW, 1s
2, 1, 700V, 2A, 1.4kw, 1s
2, 2, 300V, 3A, 0.9kW, 1s
2, 2, 300V, 3A, 0.9kW, 1s
And add the kJ which would be summarising the number of instances that ID 1 has been in location 1 and location 2, doing the same for ID 2 and presenting this all in one table that would look like:
instanceID, location, current, voltage, kW, SumInstancedTime, kJ
1, 1, 500V, 2A, 1kW, 2s, 2kJ
1, 2, 400V, 3A, 1.2kW, 2s, 2.4kJ
2, 1, 700V, 2A, 1.4kW, 2s, 2.8kJ
2, 2, 300V, 3A, 0.9kW, 2s, 1.8kJ
Thank you for your time, any provided help is appreciated!
I cannot test my answer right now, but it sounds that what you need is a GROUP BY:
the following query is an example that averages your current and voltage for every set of instance/location and then calculates the value
SELECT instances.instanceID, location, AVG(current) as avg_current,
AVG(voltage) as avg_voltage, AVG(current) * AVG(voltage) AS kW,
COUNT(IF(current > 0 AND voltage > 0,
instances.instanceID,
0)) AS InstancedTime
FROM instances
INNER JOIN sets ON instances.instanceID = sets.instanceID
WHERE sets.setID = arbitrary_number;
GROUP BY instances.instanceID, location
This is an example where you are trying to group consecutive rows in a table. In your example, they are not interleaved, but I'm assuming they could be. You need to assign everything with the same instanceID and location to the same group.
My approach is to find the next higher instance/location pair, and to assign that as a group identifier. I do this using a subquery. Once I have this identifier, I just summarize each group:
select i.instanceId, i.location, i.current, i.volate, i.kw, COUNT(*) as SumTime,
SUM(kw) as KJ
from (select i.*,
(select concat(i2.instanceId, ',', i2.location)
from instances i2
where i2.instanceId > i.instanceId or (i2.instanceId = i.instanceId and i2.location > i.location)
order by i2.instanceId desc, i2.locationId desc
limit 1
) as grouping
from instances i
) i
group by grouping