In a linear regression framework, we assume some output variable is a linear combination of some independent input variables plus some independent noise . The way the independent variables are combined is defined by a parameter vector :
We also assume that the noise term is drawn from a standard Normal distribution:
For some estimate of the model parameters , the model’s prediction errors/residuals are the difference between the model prediction and the observed ouput values
The Ordinary Least Squares (OLS) solution to the problem (i.e. determining an optimal solution for ) involves minimizing the sum of the squared errors with respect to the model parameters, . The sum of squared errors is equal to the inner product of the residuals vector with itself :
To determine the parameters, , we minimize the sum of squared residuals with respect to the parameters.
due to the identity , for vectors and . This relationship is matrix form of the Normal Equations. Solving for gives the analytical solution to the Ordinary Least Squares problem.