# Blog Archives

## Derivation: Ordinary Least Squares Solution and Normal Equations

In a linear regression framework, we assume some output variable $y$ is a linear combination of some independent input variables $X$ plus some independent noise $\epsilon$. The way the independent variables are combined is defined by a parameter vector $\beta$: $\Large{\begin{array}{rcl} y &=& X \beta + \epsilon \end{array}}$

We also assume that the noise term $\epsilon$ is drawn from a standard Normal distribution: $\Large{ \begin{array}{rcl}\epsilon &\sim& N(0,I)\end{array}}$

For some estimate of the model parameters $\hat \beta$, the model’s prediction errors/residuals $e$ are the difference between the model prediction and the observed ouput values $\Large{\begin{array}{rcl} e = y - X\hat \beta \end{array}}$

The Ordinary Least Squares (OLS) solution to the problem (i.e. determining an optimal solution for $\hat \beta$) involves minimizing the sum of the squared errors with respect to the model parameters, $\hat \beta$. The sum of squared errors is equal to the inner product of the residuals vector with itself $\sum e_i^2 = e^Te$ : $\Large{\begin{array}{rcl} e^T e &=& (y - X \hat \beta)^T (y - X \hat \beta) \\ &=& y^Ty - y^T (X \hat \beta) - (X \hat \beta)^T y + (X \hat \beta)^T (X \hat \beta) \\ &=& y^Ty - (X \hat \beta)^T y - (X \hat \beta)^T y + (X \hat \beta)^T (X \hat \beta) \\ &=& y^Ty - 2(X \hat \beta)^T y + (X \hat \beta)^T (X \hat \beta) \\ &=& y^Ty - 2\hat \beta^T X^T y + \hat \beta^T X^T X \hat \beta \\ \end{array}}$

To determine the parameters, $\hat \beta$, we minimize the sum of squared residuals with respect to the parameters. $\Large{\begin{array}{rcl} \frac{\partial}{\partial \beta} \left[ e^T e \right] &=& 0 \\ &=& -2X^Ty + 2X^TX \hat \beta \text{, and thus} \\ X^Ty &=& X^TX \hat \beta \end{array}}$

due to the identity $\frac{\partial \mathbf{a}^T \mathbf{b}}{\partial \mathbf{a}} = \mathbf{b}$, for vectors $\mathbf{a}$ and $\mathbf{b}$. This relationship is matrix form of the Normal Equations. Solving for $\hat \beta$ gives  the analytical solution to the Ordinary Least Squares problem. $\Large{\begin{array}{rcl} \hat \beta &=& (X^TX)^{-1}X^Ty \end{array}}$

Boom.