# Blog Archives

## Derivation: Error Backpropagation & Gradient Descent for Neural Networks

The material in this post has been migraged with python implementations to my github pages website.

## Derivation: The Covariance Matrix of an OLS Estimator (and applications to GLS)

We showed in an earlier post that for the linear regression model

,

the optimal Ordinary Least Squares (OLS) estimator for model parameters is

However, because independent variables and responses can take on any value, they are both random variables. And, because is a linear combination of and , it is also a random variable, and therefore has a covariance. The definition of the covariance matrix for the OLS estimator is defined as:

where, denotes the expected value operator. In order to find an expression for , we first need an expression for . The following derives this expression:

,

where we use the fact that

.

It follows that

and therefore

Now following the original definition for …

where we take advantage of in order to rewrite the second term in the product of the expectation. If we take to be fixed for a given estimator of (in other words we don’t randomly resample the independent variables), then the expectation only depends on the remaining stochastic/random variable, namely . Therefore the above expression can be written as

.

where is the covariance of the noise term in the model. Because OLS assumes uncorrelated noise, the noise covariance is equal to , where is the variance along each dimension, and is an identity matrix of size equal to the number of dimensions. The expression for the estimator covariance is now:

,

which simplifies to

A further simplifying assumption made by OLS that is often made is that is drawn from a zero mean multivariate Guassian distribution of unit variances (i.e. ), resulting in a noise covariance equal to the identity. Thus

## Applying the derivation results to Generalized Least Squares

Notice that the expression for the OLS estimator covariance is equal to first inverse term in the expression for the OLS estimator. Identitying the covariance for the OLS estimator in this way gives a helpful heuristic to easily identify the covariance of related estimators that do not make the simplifying assumptions about the covariance that are made in OLS. For instance in Generalized Least Squares (GLS), it is possible for the noise terms to co-vary. The covariance is represented as a noise covariance matrix . This gives the model form

,

where .

In otherwords, under GLS, the noise terms have zero mean, and covariance . It turns out that estimator for the GLS model parameters is

.

Notice the similarity between the GLS and OLS estimators. The only difference is that in GLS, the solution for the parameters is scaled by the inverse of the noise covariance. And, in a similar fashion to the OLS estimator, the covariance for the GLS estimator is first term in the product that defines the GLS estimator: