data structure
I want to do an ANOVA - I am interested in differences in relative_genus_intensity between Material groups for each genus_ID individually.
So I could do anova and posthoc test for the first row:
aov1<-genus_lib_unique %>% filter (genus_ID == "Haemophilus") %>% aov(relative_genus_intensity ~ Material, data=.)
TukeyHSD(aov1)
How can I do this for all (unique) rows?
How do I correct this for multiple testing?
Would a 2-way-ANOVA be an alternative? The problem I have with that is that (mostly) nonsense is compared:
results of 2-way-ANOVA -> here the first three rows are correct comparisons which I am interested in, but 4. - ... not. Because so much different groups are compared this is then a problem in multiple comparison pval adjustments.
Thank you for your help
Related
For the lasso (linear regression with L1 regularization) with a fixed value of λ, it is necessary to
use cross–validation to select the best optimization algorithm.
I know for a fact that we can use cross validation to find optimal value of λ, but is it neccesary to use cross validation in case λ is fixed?
Any thoughts please?
Cross Validation isn't about if your Regularization Parameter is Fixed or not. Its more related to the R^2 metric.
Lets say you consider 100 records and divide your data into 5 sub-datasets , means each sub-data contains 20 records.
Now out of 5 sub-datas , there are 5 different ways to assign anyone of the sub-data as Cross-Validation (CV) Data.
For all these 5 scenarios, we can find out the R^2, and then find out the Average R^2.
This way, you can have a comparison of your R-score with the Average R-score.
I have three different regression models with age and training slope :
X~Y + age + training.slope, X~Z + age + training.slope, X~V + age + training.slope.
I have separated these variables out in different models for good reason (e.g. avoiding regression to the mean) etc. Further, I perform these analysis separately for two groups and then compare their coefficients. Could anyone suggest what would be an appropriate way to FDR correct them? Should I combine the p-values of Y,Z, and V and apply a FDR correction? Further given that this is run for two groups would you combine the p-values of the three variables for both groups and FDR correct them all together?
Cheers!
I have data of 6 groups with sample size of n = 2, 10, 2, 9, 3, 1 and I want to perform Permutational multivariate analysis of variance (PERMANOVA) on these data.
My question is: Is it correct to run perMANOVA on these data with the small sample size? The results look strange for me because the group of n = 1 showed insignificant difference to other groups although the graphical representation of the groups clearly show a difference.
Thank you
I would not trust any result with group of n=1 because there is no source of variation to define difference among groups.
I have also received some answers from other platforms. I put them here for information:
The sample size is simply too small to yield a stable solution via manova. Note that the n = 1 cell contributes a constant value for that cell's mean, no matter what you do by way of permutations.
Finally, note that the effective per-cell sample size with unequal cell n for one-way designs tracks well to the harmonic mean of n. For your data set as it stands, that means an "effective" per-cell n of about 2.4. Unless differences are gigantic on the DV set, no procedure (parametric or exact/permutation) will have the statistical power to detect differences with that size.
MANOVA emphasizes the attribute scattering in the study group and the logic of this analysis is based on the scattering of scores. It is not recommended to use small groups with one or more people (I mean less than 20 people) to perform parametric tests such as MANOVA. In my opinion, use non-parametric tests to examine small groups.
I would like to understand how the output of extract in rstan orders the posterior samples. I understand that I can view the posterior samples from each chain by using as.array,
stanfit <- sampling(
model,
data = stan.data)
fitarray <- as.array(stanfit)
For example, fitarray[, 2, 1] will give me the samples for the second chain of the first parameter. One way to store the posterior samples in the output of extract would be just to concatenate them. When I do,
fit <- extract(stanfit)
mean(fitarray[,2,1]) == mean(fit$ss[1001:2000])
for several chains and parameters I always get TRUE (ss is the first parameter). This makes it seem like the posterior samples are being concatenated in fit. However, when I do,
fitarray[,2,1] == fit$ss[1001:2000]
I get FALSE (confirmed that there's not just precision difference). It appears that fitarray and fit are storing the iterations differently. How do I view the iterations (in order) of each posterior sample chain separately?
As can be seen from rstan:::as.array.stanfit, the as.array method is essentially defined as
extract(x, permuted = FALSE, inc_warmup = FALSE)
Your default use of extract keeps the warmup and permutes the post-warmup draws randomly, which is why the indices do not line up with the as.array output.
Write a Scheme predicate function that tests for the structural equality of two given lists. Two lists are structurally equal if they have the same list structure, although their atoms may be different.
(123) (456) is ok
(1(23))((12)3) is not ok
I have no idea how to do this. Any help would be appreciated.
Here are some hints. This one is a bit repetitive to write, because the question looks like homework I'll let you fill-in the details:
(define (structurally-equal l1 l2)
(cond ( ? ; if both lists are null
#t)
( ? ; if one of the lists is null but the other is not
#f)
( ? ; if the `car` part of both lists is an atom
(structurally-equal (cdr l1) (cdr l2)))
( ? ; if the `car` part of one of the lists is an atom but the other is not
#f)
(else
(and (structurally-equal ? ?) ; recur over the `car` of each list
(structurally-equal ? ?))))) ; recur over the `cdr` of each list
There are two ways you could approach this. The first one uses a function to generate an output that represents the list structure.
Think of a way that you could represent the structure of any list as a unique string or number, such that any lists with identical structure would have the same representation and no other list would generate the same output.
Write a function that analyses any list's structure and generates that output.
Run both lists through the function and compare the output. If the same, they have the same structure.
The second one, which is the approach Oscar has taken, is to recur through both lists at the same time. Here, you pass both lists to one function, which does this:
Is the first element of the first list identical (structurally) to the first element of the second? If not, return false.
Are these first elements lists? If so, return the result of (and (recur on the first element of both lists) (recur on the rest of both lists))
If not, return the result of (recur on the rest of both lists).
The second approach is more efficient in the simple circumstance where you want to compare two lists. It returns as soon as a difference is found, only having to process both lists in their entirety where both lists are, indeed, structurally identical.
If you had a large collection of lists and might want to compare any two at any time, the first approach can be more efficient as you can store the result and thus any list need only be processed once. It also allows you to
Organise your collection of lists by, for example, creating a hash map that groups together all lists with the same structure.
Compare lists for similarity of structure (e.g. do these lists start and/or end with the same structure, even if they differ in the middle?)
I suspect, though, that your homework is best served by the second approach.