# Blog Archives

## Covariance Matrices and Data Distributions

Correlation between variables in a -dimensional dataset are often summarized by a covariance matrix. To get a better understanding of how correlation matrices characterize correlations between data points, we plot data points drawn from 3 different 2-dimensional Gaussian distributions, each of which is defined by a different covariance matrix.

The left plots below display the covariance matrix for each Gaussian distribution. The values along the diagonal represent the variance of the data along each dimension, and the off-diagonal values represent the covariances between the dimensions. Thus the -th entry of each matrix represents the correlation between the -th and -th dimensions. The right plots show data drawn from the corresponding 2D Gaussian.

The top row plot display a covariance matrix equal to the identity matrix, and the points drawn from the corresponding Gaussian distribution. The diagonal values are 1, indicating the data have variance of 1 along both of the dimensions. Additionally, the off-diagonal elements are zero, meaning that the two dimensions are uncorrelated. We can see this in the data drawn from the distribution as well. The data are distributed in a sphere about origin. For such a distribution of points, it is difficult (impossible) to draw any single regression line that can predict the second dimension from the first, and vice versa. Thus an identity covariance matrix is equivalent to having independent dimensions, each of which has unit (i.e. 1) variance. Such a dataset is often called “white” (this naming convention comes from the notion that white noise signals–which can be sampled from independent Gaussian distributions–have equal power at all frequencies in the Fourier domain).

The middle row plots the points that result from a diagonal, but not identity covariance matrix. The off-diagonal elements are still zero, indicating that the dimensions are uncorrelated. However, the variances along each dimension are not equal to one, and are not equal. This is demonstrated by the elongated distribution in red. The elongation is along the second dimension, as indicated by the larger value in the bottom-right (point ) of the covariance matrix.

The bottom row plots points that result from a non-diagonal covariance matrix. Here the off-diagonal elements of covariance matrix have non-zero values, indicating a correlation between the dimensions. This correlation is reflected in the distribution of drawn datapoints (in blue). We can see that the primary axis along which the points are distributed is not along either of the dimensions, but a linear combination of the dimensions.

The MATLAB code to create the above plots is here

% INITIALIZE SOME CONSTANTS mu = [0 0]; % ZERO MEAN S = [1 .9; .9 3]; % NON-DIAGONAL COV. SDiag = [1 0; 0 3]; % DIAGONAL COV. SId = eye(2); % IDENTITY COV. % SAMPLE SOME DATAPOINTS nSamples = 1000; samples = mvnrnd(mu,S,nSamples)'; samplesId = mvnrnd(mu,SId,nSamples)'; samplesDiag = mvnrnd(mu,SDiag,nSamples)'; % DISPLAY subplot(321); imagesc(SId); axis image, caxis([0 1]), colormap hot, colorbar title('Identity Covariance') subplot(322) plot(samplesId(1,:),samplesId(2,:),'ko'); axis square xlim([-5 5]), ylim([-5 5]) grid title('White Data') subplot(323); imagesc(SDiag); axis image, caxis([0 3]), colormap hot, colorbar title('Diagonal Covariance') subplot(324) plot(samplesDiag(1,:),samplesDiag(2,:),'r.'); axis square xlim([-5 5]), ylim([-5 5]) grid title('Uncorrelated Data') subplot(325); imagesc(S); axis image, caxis([0 3]), colormap hot, colorbar title('Non-diagonal Covariance') subplot(326) plot(samples(1,:),samples(2,:),'b.'); axis square xlim([-5 5]), ylim([-5 5]) grid title('Correlated Data')

## MCMC: Multivariate Distributions, Block-wise, & Component-wise Updates

In the previous posts on MCMC methods, we focused on how to sample from univariate target distributions. This was done mainly to give the reader some intuition about MCMC implementations with fairly tangible examples that can be visualized. However, MCMC can easily be extended to sample multivariate distributions.

In this post we will discuss two flavors of MCMC update procedure for sampling distributions in multiple dimensions: block-wise, and component-wise update procedures. We will show how these two different procedures can give rise to different implementations of the Metropolis-Hastings sampler to solve the same problem.

## Block-wise Sampling

The first approach for performing multidimensional sampling is to use ** block-wise updates**. In this approach the proposal distribution has the same dimensionality as the target distribution . Specifically, if is a distribution over variables, ie. , then we must design a proposal distribution that is also a distribution involving variables. We then accept or reject a proposed state sampled from the proposal distribution in exactly the same way as for the univariate Metropolis-Hastings algorithm. To generate multivariate samples we perform the following block-wise sampling procedure:

- set
- generate an initial state
- repeat until

set

generate a proposal state from

calculate the proposal correction factor

calculate the acceptance probability

draw a random number from

if accept the proposal state and set

else set

Let’s take a look at the block-wise sampling routine in action.

### Example 1: Block-wise Metropolis-Hastings for sampling of bivariate Normal distribution

In this example we use block-wise Metropolis-Hastings algorithm to sample from a bivariate (i.e. ) Normal distribution:

with mean

and covariance

Usually the target distribution will have a complex mathematical form, but for this example we’ll circumvent that by using MATLAB’s built-in function to evaluate . For our proposal distribution, , let’s use a circular Normal centered at the the previous state/sample of the Markov chain/sampler, i.e:

,

where is a 2-D identity matrix, giving the proposal distribution unit variance along both dimensions and , and zero covariance. You can find an MATLAB implementation of the block-wise sampler at the end of the section. The display of the samples and the target distribution output by the sampler implementation are shown below:

We can see from the output that the block-wise sampler does a good job of drawing samples from the target distribution.

Note that our proposal distribution in this example is symmetric, therefore it was not necessary to calculate the correction factor per se. This means that this Metropolis-Hastings implementation is identical to the simpler Metropolis sampler.

%------------------------------------------------------ % EXAMPLE 1: METROPOLIS-HASTINGS % BLOCK-WISE SAMPLER (BIVARIATE NORMAL) rand('seed' ,12345); D = 2; % # VARIABLES nBurnIn = 100; % TARGET DISTRIBUTION IS A 2D NORMAL WITH STRONG COVARIANCE p = inline('mvnpdf(x,[0 0],[1 0.8;0.8 1])','x'); % PROPOSAL DISTRIBUTION STANDARD 2D GUASSIAN q = inline('mvnpdf(x,mu)','x','mu') nSamples = 5000; minn = [-3 -3]; maxx = [3 3]; % INITIALIZE BLOCK-WISE SAMPLER t = 1; x = zeros(nSamples,2); x(1,:) = randn(1,D); % RUN SAMPLER while t < nSamples t = t + 1; % SAMPLE FROM PROPOSAL xStar = mvnrnd(x(t-1,:),eye(D)); % CORRECTION FACTOR (SHOULD EQUAL 1) c = q(x(t-1,:),xStar)/q(xStar,x(t-1,:)); % CALCULATE THE M-H ACCEPTANCE PROBABILITY alpha = min([1, p(xStar)/p(x(t-1,:))]); % ACCEPT OR REJECT? u = rand; if u < alpha x(t,:) = xStar; else x(t,:) = x(t-1,:); end end % DISPLAY nBins = 20; bins1 = linspace(minn(1), maxx(1), nBins); bins2 = linspace(minn(2), maxx(2), nBins); % DISPLAY SAMPLED DISTRIBUTION ax = subplot(121); bins1 = linspace(minn(1), maxx(1), nBins); bins2 = linspace(minn(2), maxx(2), nBins); sampX = hist3(x, 'Edges', {bins1, bins2}); hist3(x, 'Edges', {bins1, bins2}); view(-15,40) % COLOR HISTOGRAM BARS ACCORDING TO HEIGHT colormap hot set(gcf,'renderer','opengl'); set(get(gca,'child'),'FaceColor','interp','CDataMode','auto'); xlabel('x_1'); ylabel('x_2'); zlabel('Frequency'); axis square set(ax,'xTick',[minn(1),0,maxx(1)]); set(ax,'yTick',[minn(2),0,maxx(2)]); title('Sampled Distribution'); % DISPLAY ANALYTIC DENSITY ax = subplot(122); [x1 ,x2] = meshgrid(bins1,bins2); probX = p([x1(:), x2(:)]); probX = reshape(probX ,nBins, nBins); surf(probX); axis xy view(-15,40) xlabel('x_1'); ylabel('x_2'); zlabel('p({\bfx})'); colormap hot axis square set(ax,'xTick',[1,round(nBins/2),nBins]); set(ax,'xTickLabel',[minn(1),0,maxx(1)]); set(ax,'yTick',[1,round(nBins/2),nBins]); set(ax,'yTickLabel',[minn(2),0,maxx(2)]); title('Analytic Distribution')

## Component-wise Sampling

A problem with block-wise updates, particularly when the number of dimensions becomes large, is that finding a suitable proposal distribution is difficult. This leads to a large proportion of the samples being rejected. One way to remedy this is to simply loop over the the dimensions of in sequence, sampling each dimension independently from the others. This is what is known as using * component-wise updates*. Note that now the proposal distribution is univariate, working only in one dimension, namely the current dimension that we are trying to sample. The component-wise Metropolis-Hastings algorithm is outlined below.

- set
- generate an initial state
- repeat until

set

for each dimension

generate a proposal state from

calculate the proposal correction factor

calculate the acceptance probability

draw a random number from

if accept the proposal state and set

else set

Note that in the component-wise implementation a sample for the -th dimension is proposed, then accepted or rejected while all other dimensions () are held fixed. We then move on to the next (-th) dimension and repeat the process while holding all other variables () fixed. In each successive step we are using updated values for the dimensions that have occurred since increasing .

### Example 2: Component-wise Metropolis-Hastings for sampling of bivariate Normal distribution

In this example we draw samples from the same bivariate Normal target distribution described in Example 1, but using component-wise updates. Therefore is the same, however, the proposal distribution is now a univariate Normal distribution with unit unit variance in the direction of the -th dimension to be sampled. The MATLAB implementation of the component-wise sampler is at the end of the section. The samples and comparison to the analytic target distribution are shown below.

Again, we see that we get a good characterization of the bivariate target distribution.

%-------------------------------------------------- % EXAMPLE 2: METROPOLIS-HASTINGS % COMPONENT-WISE SAMPLING OF BIVARIATE NORMAL rand('seed' ,12345); % TARGET DISTRIBUTION p = inline('mvnpdf(x,[0 0],[1 0.8;0.8 1])','x'); nSamples = 5000; propSigma = 1; % PROPOSAL VARIANCE minn = [-3 -3]; maxx = [3 3]; % INITIALIZE COMPONENT-WISE SAMPLER x = zeros(nSamples,2); xCurrent(1) = randn; xCurrent(2) = randn; dims = 1:2; % INDICES INTO EACH DIMENSION t = 1; x(t,1) = xCurrent(1); x(t,2) = xCurrent(2); % RUN SAMPLER while t < nSamples t = t + 1; for iD = 1:2 % LOOP OVER DIMENSIONS % SAMPLE PROPOSAL xStar = normrnd(xCurrent(:,iD), propSigma); % NOTE: CORRECTION FACTOR c=1 BECAUSE % N(mu,1) IS SYMMETRIC, NO NEED TO CALCULATE % CALCULATE THE ACCEPTANCE PROBABILITY pratio = p([xStar xCurrent(dims~=iD)])/ ... p([xCurrent(1) xCurrent(2)]); alpha = min([1, pratio]); % ACCEPT OR REJECT? u = rand; if u < alpha xCurrent(iD) = xStar; end end % UPDATE SAMPLES x(t,:) = xCurrent; end % DISPLAY nBins = 20; bins1 = linspace(minn(1), maxx(1), nBins); bins2 = linspace(minn(2), maxx(2), nBins); % DISPLAY SAMPLED DISTRIBUTION figure; ax = subplot(121); bins1 = linspace(minn(1), maxx(1), nBins); bins2 = linspace(minn(2), maxx(2), nBins); sampX = hist3(x, 'Edges', {bins1, bins2}); hist3(x, 'Edges', {bins1, bins2}); view(-15,40) % COLOR HISTOGRAM BARS ACCORDING TO HEIGHT colormap hot set(gcf,'renderer','opengl'); set(get(gca,'child'),'FaceColor','interp','CDataMode','auto'); xlabel('x_1'); ylabel('x_2'); zlabel('Frequency'); axis square set(ax,'xTick',[minn(1),0,maxx(1)]); set(ax,'yTick',[minn(2),0,maxx(2)]); title('Sampled Distribution'); % DISPLAY ANALYTIC DENSITY ax = subplot(122); [x1 ,x2] = meshgrid(bins1,bins2); probX = p([x1(:), x2(:)]); probX = reshape(probX ,nBins, nBins); surf(probX); axis xy view(-15,40) xlabel('x_1'); ylabel('x_2'); zlabel('p({\bfx})'); colormap hot axis square set(ax,'xTick',[1,round(nBins/2),nBins]); set(ax,'xTickLabel',[minn(1),0,maxx(1)]); set(ax,'yTick',[1,round(nBins/2),nBins]); set(ax,'yTickLabel',[minn(2),0,maxx(2)]); title('Analytic Distribution')

## Wrapping Up

Here we saw how we can use block- and component-wise updates to derive two different implementations of the Metropolis-Hastings algorithm. In the next post we will use component-wise updates introduced above to motivate the Gibbs sampler, which is often used to increase the efficiency of sampling well-defined probability multivariate distributions.