Metadata V3

From wiki
Revision as of 07:32, 2 December 2016 by Calypso (Talk | contribs) (Optional explanatory variables)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The meta data file must have at least the following 6 columns in exactly this order:

Format meta data file
Column Name Description
1 Sample id The identifier of each sample. Sample ids must match sample ids of the uploaded data matrix
2 Label A unique sample label. These labels are shown in generated figures instead of the sample id.
3 Pair Individual or animal. This information is used for paired analysis if several samples were taken from the same individual, e.g. at different time points during a longitudinal study or at different locations from the same individual. Set pair to different ids (e.g. P1,P2,P3…. ) if each sample was taken from a different individual.
4 Primary group The primary group is the main sample group, e.g. cases/control, treated/untreated, and is used for most univariate analysis and to colour figures.
5 Secondary group A secondary sample group, e.g. case/control, geography, sampling time point. In some analysis, the secondary group has a defined meaning, see details below. For example, the secondary group is used as nested variable when comparing sample groups by nested anova. For paired analysis of longitudinal data, the secondary group is interpreted as sampling time point. If your dataset doesn't have a secondary group, simply set this parameter to an arbitrary value, e.g. "none".
6 Include flag This column takes values 0 and 1 and indicates if a sample should be included (1) or excluded (0) from the analysis. This parameter can be used to exclude samples from subsequent analysis without modifying the data matrix. For example, problematic samples can be excluded from data analysis by simply setting their value to 0.
7, 8, 9, … Optional explanatory variables These variables are used in multivariate analysis, e.g. redundancy analysis, canonical correspondence analysis or multiple regression

Optional explanatory variables

Explanatory variables are used in multivariate analysis to examine complex environment-microbiome (or host-microbiome associations). If primary and/or secondary groups should be included in the multivariate analysis, these fields have to be defined also as explanatory variables.

Explanatory variables can be numeric and/or categorical. Example variables are BMI, gender, age, tissue, batch, blood sugar level, temperature, time of sampling, sample location, or ph. Categorical variables must contain non-numeric characters (e.g. T1, T2, ..). Categorical variables must not be encoded numerically (e.g. 1,2,3,...). Numeric variables must not contain non-numeric values (e.g. "unknown"). Missing values must be given as NA.

Example: Assume a case/control study in which gender and age are potential confounding factors. To explore if case/control status controlled for gender and age explains variation in microbial community composition, define case/control status, gender and age as explanatory variables in the meta data file. Case/control status can additionally be defined as primary group, this will not effect any multivariate analysis. Then, run a multivariate analysis, such as RDA++, CCA++ or multivariate regression. To identify correlations between taxa and case/control status, gender and age apply Network+ on the same data files.

Example meta data file

Calypso meta data file.PNG