Difference between revisions of "Calypsos Stats Details"
(→Negative binominal distribution (DESeq2)) 
m (1 revision imported) 
(No difference)

Latest revision as of 04:51, 11 April 2016
Contents
StatsPage
Available tests: Parametric tests (select Anova, "Nested Anova" or anova+) and none parametric tests (select RankTest).
Pvalues are adjusted for multiple testing by Bonferroni (P.adj) and false discovery rate (FDR qvalue, BenjamineHochberg procedure). Click on the header to order the data table.
The pvalue distribution is also plotted as histogram and compared to the expected distribution by a qqplot. A uniform pvalue distribution (red "Expected" line) indicates that low pvalues are observed by chance only. A pvalue distribution with more low pvalues than expected by chance indicates that at least some of the observed differences are relevant and not observed by chance only (as in the figure shown below).
RankTest
If RankTest is selected, a Wilcoxon test is performed if 2 groups are compared, otherwise a KruskalWallis test.
Anova
Data values across sample groups are compared by Analysis of variance (ANOVA). ANOVA is used to analyze the differences among group means.
NestedAnova
Select "Nested Anova" for nested designs " Handbook of Biological Statistics. In this case, the primary group (3rd column of meta data file) is used as group (e.g. treatment), the secondary group (5th column of meta data file) as subgroup (nested variable, e.g. animal cage). The following test is run in R: aov(abundance(taxa) ~ primaryGroup/secondaryGroup).
anova+
Multivariate anova to identify associations between features and multiple factors. Numeric factors are categorized automatically (e.g. if a variable in the range 0  200 it is divided into the categories 050,50100,100150,150200). An anova is run separately for each feature. The feature is included as dependent variable, each factor as explanatory variable. The following test is run in R: aov(feature ~ factor1 + factor2 + ...). Anova is run without interaction terms.
Negative binominal distribution (DESeq2)
Statistical testing based on the negative binominal distribution is implemented using the DESeq2 software. DESeq2 has been developed for RNAseq data, but can also be used for community composition data. To use DESeq2, raw taxonomic counts have to be uploaded.
The method is implemented in R, using the DESeq2 bioconductor package:
des<DESeq(dds, test="Wald", fitType="parametric") p<results(des,cooksCutoff=F)$pvalue
, where dds is a DESeq object created from the counts data via the DESeqDataSetFromMatrix function.
Multivariate Random Forest
Complex associations between multiple factors and measurements can be identified by the multivariate Random Forest. Score indicate strengths of the associations. The score is the mean decrease in accuracy of the decision tree if the respective feature is removed. The following function is called in R: randomForest(group~factor1+factor2+factor3+...,importance=T,proximity=F,ntree=10000,mtry=20)