Multivariate here

From wiki
Jump to: navigation, search

Unsupervised clustering is achieved using Principal Component Analysis (PCA), Principal coordinates analysis (PcOA), non-metric multidimensional scaling (NMDS) [1][2], Networks and hierarchical clustering. Unsupervised clustering allows identification of sample clusters and hidden data structures. Unsupervised cluttering further allows identification of outliers, which could potentially be problematic and may need to be removed from the analysis.

Supervised multivariate methods allow identification of complex microbiota-environment interactions. P-values indicate significant associations, i.e. if factors explain significant amour of variation in community composition. The following supervised methods are provided:

All multivariate methods provided by Calypso are related.

CCA and CCA+

Canonical correspondence analysis (CCA) [3] uses dissimilarity matrixes to test if sample groups are significantly different (i.e. if they have different community profiles as measured by the selected distance metric). [4]. CCA provides a p-value for each explanatory variable indicating if this variable significantly explains variation in community profiles. CCA included primary group, secondary group and pair to explain variation in community distances. CCA+ includes all environmental variable specified in the meta annotation file to explain variation in community distances. These variables are provided by the user in the uploaded meta annotation file.

RDA and RDA+

RDA extracts variation in the microbial community profiles that can be explained by a set of explanatory variable (either sample groups (RDA) or environmental variables (RDA+)) [5]. P-values indicate if explanatory variables are significantly associated with variation in the data matrix (abundance of individual taxa). P-values <0.05 indicate that a variable is significantly associated with variations in the abundance of specific taxa.

Age, gender and location are all significantly associated with variation in the microbial communities of the example data set.

RDA is implemented using the rda() function from the vegan R package. rda() is run with parameters scale=T and na.action="na.omit". All other parameters are set to their default values.

Anosim

Also Anosim uses dissimilarity matrixes to test if sample groups are significantly different (i.e. if they have different community profiles as measured by the selected distance metric). [6] Anosim provides a single p-value indicating if community profiles are significantly different between sample groups. The p-value is calculated by comparing intra-group distances with between-group distances.

DCA

Detrended correspondence analysis (DCA). A method widely used in ecology to find the main factors or gradients in large, species-rich but usually sparse data matrices [7][8].