GMine Regression Details

From wiki
Jump to: navigation, search

Multivariable linear regression

Multivariable linear regression allows identification of complex associations between measurements and multiple explanatory variables or confounding factors.

Example factors are BMI, age, blood sugar, ph, temperature or iron. In DataSmart, regression models generally have the form:

dp = factor1 + factor2 + factor3 + ...

where dp is a dependent variable (a feature defined in the data file, e.g. measured expression values of a gene) and factor1,…,factorN are all factors defined in the meta information file. In the regression analysis, the dependent variable is explained or "modeled" by the values of the included factors.

The following modes are available:

TypeDescription
Features vs FactorsIdentify associations between individual features and multiple factors.
Time SeriesAnalysis of longitudinal data.

Details

Features vs Factors

Identify associations between individual features and all factors (as defined in meta information file) in a multivariate analysis. For each feature a regression model is fit of the form:

feature = factor1 + factor2 + factor3 ...

All factors provided in the meta information file are included. The results of the analysis are shown in table format. For each feature-factor pair model p-value (".p") and coefficient (".c") are shown, indicating if the feature-factor association is significant.

Use the drop-down menu to select a specific feature. For each factor, two scatterplots are displayed. In the first plot, the values of the selected feature are plotted versus the values of the factor. The Pearson correlation is given as R, the p-value indicates the significance of the correlation. In the second plot, the selected feature is controlled by the remaining factors. Features are controlled by first fitting a multiple linear regression model including all remaining factors as explanatory variables. The residuals of this model are then plotted versus the selected feature.

Multivariate paired data can be analyzed by mixed effect regression, which incorporates the paired variable (e.g. subject, animal or cage) as random effect and other factors (e.g. cases/controls or treatment) as fixed effect. These models can distinguish between group-specific effects (e.g. average in cases and controls) and subject or cage-specific effects. For paired data, select the "Paired" checkbox. A linear mixed effects model is fit of the form:

feature = factor1 + factor2 + factor3 ... + pair1 + pair2 + ...,

where factors are included as fixed effects and pairs as random effect. The model is fit in R using the command: lmer(feature ~ factor1 + factor2 + factor3 .. ... + (1|pair))

All factors defined in the meta information file are included. The pair information is taken from the third column of the meta information file.

Use the drop-down menu to select a specific feature. For each factor f, two scatterplots are displayed. In the first plot, the values of the selected feature are plotted versus factor f. In the second plot, the partial correlation is shown. The values of the selected feature are controlled by all remaining factors (by fitting a regression model with the remaining factors and plotting the residuals versus f). The Pearson correlation is given as R, the p-value indicates the significance of the correlation.

For paired data select the "Paired" checkbox and press "Run Analysis". In the case of paired data, a linear mixed effects model is fit of the form: feature = fa1 + fa2 + fa3 ... + pair1 + pair2 + ..., where the factors are included as fixed effects and the pairs as random effect.


Time Series

Analysis of longitudinal data (time series). Use the Time variable drop-down menu to chose the variable defining the time points. For each feature, a mixed effect regression model is fit of the form:

feature = time point 1 + time point 2 + ...+ pair1 + pair2 + ...,

where time point is included as fixed effect and pair as random effect. This is calculated in R using the lmer() function via the formula:

feature ~ as.factor(time variable) + (1|pair).

Use the Feature drop-down menu to select a single feature. In the shown plot, measurements obtained for the same individual are connected by lines.