Linear Regression

As someone who was trained in nonlinear dynamics, I never gave much thought to linear regression.  After all, what could be more boring than fitting data with a straight line.  Now I use it all the time and find it rather beautiful.  I’ll start with the simplest example and show how it generalizes easily.  Consider a list of N ordered pairs (x_i,y_i) and you want to fit a straight line through the points. You want to find parameters such that

y_i = b_0 + b_1 x_i +\epsilon_i   (1)

has the smallest errors \epsilon_i where  smallest usually, although not always, means in the least squares sense.

Hence, you want to minimize

\chi^2=\sum_i (y_i-b_0+b_1 x_i)^2.    (2)

Setting the derivatives of \chi^2 with respect to b_0 and b_1 to zero gives

b_1= \sum (y_i -\bar{y})(x_i-\bar{x_i})/\sum_i(x_i-\bar{x})^2,

where \bar{y}=(1/N)\sum_i y_i and \bar{x}=(1/N)\sum_i x_i and b_0= \bar{y_i} - b_1 \bar{y_0}.  The easy way to remember this is that the slope b_1 is equal to the covariance of x and y divided by the variance of x, e.g. b_1={\rm cov}(x,y)/{\rm var}(x) and you get b_0 by taking the average of equation (1) and solving for b_0.

Recall that the Pearson correlation coefficient is given by r = {\rm cov}(x,y)/\sqrt{{\rm var}(x){\rm var}(y)} so that r =b_1 \sqrt{{\rm var}(x)/{\rm var}(y)}.  Thus the correlation coefficient between x and y is given by the slope of the linear regression multiplied by the ratio of the standard deviations.  (The square of r is the fraction of the variance explained by the regression.)  Now we can see where the term regression comes from.  If we “standardize” equation (1) (i.e. subtract the mean of y and divide by the standard deviation) and use the fact that \bar{y_i}= b_0 + b_1\bar{x_i} we get

\delta y_i = r \delta x_i,  (3)

where \delta y_i = (y_i-\bar{y_i})/\sqrt{{\rm var}(y_i)} and \delta x_i = (x_i-\bar{x_i})/\sqrt{{\rm var}(x_i)}.  What (3) implies is that the deviation from the mean in units of standard deviation of y_i is always less than or equal to the standardized deviation from the mean of x_i, since -1\le r \le 1.

Thus y has “regressed to the mean”.  For example,  suppose y represents daughters and x represents mothers.  Then what this says is that the daughter will always be closer to the mean than the mother in terms of the respective standard deviations of daughters and mothers.   However, even though families tend to regress to the mean this does not imply that the variance of the distribution has to shrink in each generation.  The variance and mean of each generation could decrease, stay the same or increase.  It’s just where you are with respect to the population that changes.

There is also a linear algebra way of looking at linear regression, which is useful if you want to generalize to higher dimensions.  We can always generalize equation (1) to

Y = XB

where Y is a N\times M dimensional matrix of data points, X is a N\times P dimensional matrix of independent variables (i.e. regressors) and B is a P\times M dimensional matrix of parameters.  For equation (1) Y would be a vector with elements y_i, X would be the matrix of all ones in the first column and elements x_i in the second column and B would be the vector [b_0, b_1]^T.  In this form we see that we want to “invert this equation” to obtain the parameters B.  However, generally X is not an invertible square matrix so to solve this problem we multiply both sides by the transpose of X and then take the inverse giving

B=(X^TX)^{-1}X^T Y

This then is the only formula you have to remember.  In fact this is just the generalized version of the one dimensional formula where X^T Y is the covariance between X and Y and X^T X is the covariance/variance matrix of X.

 

Typo corrected April 1, 2011

3 thoughts on “Linear Regression

  1. I love this – it’s the most concise course on multiple regression anybody will ever pen. One tiny glitch, though, ought to be fixed:

    “and you get b_0 by taking the average of equation (2) and solving for b_0”

    should instead refer to equation (1).

    Like

Leave a comment