Detrending Data

It is common practice to use linear models to estimate effects of independent variables on dependent measures. In cases with several parameters a multiple linear regression can be applied. By orthogonalization of the different parameters or regressors, each regressor will pick up only the variance that is uniquely explained by that specific regressor. The beta estimate (slope of the regression line) provide explained and residual sum of squares that can then be used for statistical testing.

What, however, if we are interested not in the observed values but in values corrected for the value of one or the other regressor. One approach is to run a statistical model containing only the independent variable(s) we want to use to correct the data. We get parameter estimates and also the residuals of the model. The residuals are the difference between observed values and applied model. if our model does not explain variance at all, residuals will be large, still allowing for fitting a second linear model to the data.

So, if we do a second analysis on the residual values from the first analysis after reintroducing the mean of the observed values before running the first analysis, we see that the beta estimates of the new regressors are exactly the same as the regressors of the original analysis with usage of all regressors at the same time.

# Generate 2 random vars
x1 = rnorm(20,10)
x2 = rnorm (20,20)

# Generate dependent variable being a function of those 2 vars + intercept + noise
y = 12.5 + x1 + x2 + rnorm(20)

# Run first full linear model with both parameters in it.
summary(lm(y~x1 + x2))

# Run reduced linear model with x1 as only independent measure
lm_x1 = lm(y ~ x1 )

#Calculate adjusted dependent measure by adding the residuals of the reduced model to the mean of the observed dependent measure
yadj= residuals(lm_x1)) + mean(y)

# run second linear model with adjusted dependent measure and x2 as only independent measure. Compared beta's for x2 are almost the same (small differences may occur due to some degree of dependence between x1 and x2)
summary(lm(yadj ~ x2))

Why is it important to reintroduce the mean?