MCMC: Multivariate Distributions, Block-wise, & Component-wise Updates

In the previous posts on MCMC methods, we focused on how to sample from univariate target distributions. This was done mainly to give the reader some intuition about MCMC implementations with fairly tangible examples that can be visualized. However, MCMC can easily be extended to sample multivariate distributions.

In this post we will discuss two flavors of MCMC update procedure for sampling distributions in multiple dimensions: block-wise, and component-wise update procedures. We will show how these two different procedures can give rise to different implementations of the Metropolis-Hastings sampler to solve the same problem.

Block-wise Sampling

The first approach for performing multidimensional sampling is to use block-wise updates. In this approach the proposal distribution q(\bold{x}) has the same dimensionality as the target distribution p(\bold x). Specifically, if p(\bold x) is a distribution over D variables, ie. \bold{x} = (x_1, x_2, \dots, x_D), then we must design a proposal distribution that is also a distribution involving D variables. We then accept or reject a proposed state \bold x^* sampled from the proposal distribution q(\bold x) in exactly the same way as for the univariate Metropolis-Hastings algorithm. To generate M multivariate samples we perform the following block-wise sampling procedure:

  1. set t = 0
  2. generate an initial state \bold x^{(0)} \sim \pi^{(0)}
  3. repeat until t = M

set t = t+1

generate a proposal state \bold x^* from q(\bold x | \bold x^{(t-1)})

calculate the proposal correction factor c = \frac{q(\bold x^{(t-1)} | \bold x^*) }{q(\bold x^*|\bold x^{(t-1)})}

calculate the acceptance probability \alpha = \text{min} \left (1,\frac{p(\bold x^*)}{p(\bold x^{(t-1)})} \times c\right )

draw a random number u from \text{Unif}(0,1)

if u \leq \alpha accept the proposal state \bold x^* and set \bold x^{(t)}=\bold x^*

else set \bold x^{(t)} = \bold x^{(t-1)}

Let’s take a look at the block-wise sampling routine in action.

Example 1: Block-wise Metropolis-Hastings for sampling of bivariate Normal distribution

In this example we use block-wise Metropolis-Hastings algorithm to sample from a bivariate (i.e. D = 2) Normal distribution:

p(\bold x) = \mathcal N (\bold{\mu}, \bold \Sigma)

with mean

\mu = [0, 0]

and covariance

\bold \Sigma = \begin{bmatrix} 1 & 0.8 \\ 0.8 & 1\end{bmatrix}

Usually the target distribution p(\bold x) will have a complex mathematical form, but for this example we’ll circumvent that by using MATLAB’s built-in function \text{mvnpdf} to evaluate p(\bold x). For our proposal distribution, q(\bold x), let’s use a circular Normal centered at the the previous state/sample of the Markov chain/sampler, i.e:

q(\bold x | \bold x^{(t-1)}) \sim \mathcal N (\bold x^{(t-1)}, \bold I),

where \bold I is a 2-D identity matrix, giving the proposal distribution unit variance along both dimensions x_1 and x_2, and zero covariance. You can find an MATLAB implementation of the block-wise sampler at the end of the section. The display of the samples and the target distribution output by the sampler implementation are shown below:

Samples drawn from block-wise Metropolis-Hastings sampler

We can see from the output that the block-wise sampler does a good job of drawing samples from the target distribution.

Note that our proposal distribution in this example is symmetric, therefore it was not necessary to calculate the correction factor c per se. This means that this Metropolis-Hastings implementation is identical to the simpler Metropolis sampler.

%------------------------------------------------------
% EXAMPLE 1: METROPOLIS-HASTINGS
% BLOCK-WISE SAMPLER (BIVARIATE NORMAL)
rand('seed' ,12345);

D = 2; % # VARIABLES
nBurnIn = 100;

% TARGET DISTRIBUTION IS A 2D NORMAL WITH STRONG COVARIANCE
p = inline('mvnpdf(x,[0 0],[1 0.8;0.8 1])','x');

% PROPOSAL DISTRIBUTION STANDARD 2D GUASSIAN
q = inline('mvnpdf(x,mu)','x','mu')

nSamples = 5000;
minn = [-3 -3];
maxx = [3 3];

% INITIALIZE BLOCK-WISE SAMPLER
t = 1;
x = zeros(nSamples,2);
x(1,:) = randn(1,D);

% RUN SAMPLER
while t < nSamples
    t = t + 1;

    % SAMPLE FROM PROPOSAL
    xStar = mvnrnd(x(t-1,:),eye(D));

    % CORRECTION FACTOR (SHOULD EQUAL 1)
    c = q(x(t-1,:),xStar)/q(xStar,x(t-1,:));

    % CALCULATE THE M-H ACCEPTANCE PROBABILITY
    alpha = min([1, p(xStar)/p(x(t-1,:))]);

    % ACCEPT OR REJECT?
    u = rand;
    if u < alpha
        x(t,:) = xStar;
    else
        x(t,:) = x(t-1,:);
    end
end

% DISPLAY
nBins = 20;
bins1 = linspace(minn(1), maxx(1), nBins);
bins2 = linspace(minn(2), maxx(2), nBins);

% DISPLAY SAMPLED DISTRIBUTION
ax = subplot(121);
bins1 = linspace(minn(1), maxx(1), nBins);
bins2 = linspace(minn(2), maxx(2), nBins);
sampX = hist3(x, 'Edges', {bins1, bins2});
hist3(x, 'Edges', {bins1, bins2});
view(-15,40)

% COLOR HISTOGRAM BARS ACCORDING TO HEIGHT
colormap hot
set(gcf,'renderer','opengl');
set(get(gca,'child'),'FaceColor','interp','CDataMode','auto');
xlabel('x_1'); ylabel('x_2'); zlabel('Frequency');
axis square
set(ax,'xTick',[minn(1),0,maxx(1)]);
set(ax,'yTick',[minn(2),0,maxx(2)]);
title('Sampled Distribution');

% DISPLAY ANALYTIC DENSITY
ax = subplot(122);
[x1 ,x2] = meshgrid(bins1,bins2);
probX = p([x1(:), x2(:)]);
probX = reshape(probX ,nBins, nBins);
surf(probX); axis xy
view(-15,40)
xlabel('x_1'); ylabel('x_2'); zlabel('p({\bfx})');
colormap hot
axis square
set(ax,'xTick',[1,round(nBins/2),nBins]);
set(ax,'xTickLabel',[minn(1),0,maxx(1)]);
set(ax,'yTick',[1,round(nBins/2),nBins]);
set(ax,'yTickLabel',[minn(2),0,maxx(2)]);
title('Analytic Distribution')

Component-wise Sampling

A problem with block-wise updates, particularly when the number of dimensions D becomes large, is that finding a suitable proposal distribution is difficult. This leads to a large proportion of the samples being rejected. One way to remedy this is to simply loop over the the D dimensions of \bold x in sequence, sampling each dimension independently from the others. This is what is known as using component-wise updates. Note that now the proposal distribution q(x) is univariate, working only in one dimension, namely the current dimension that we are trying to sample. The component-wise Metropolis-Hastings algorithm is outlined below.

  1. set t = 0
  2. generate an initial state \bold x^{(0)} \sim \pi^{(0)}
  3. repeat until t = M

set t = t+1

for each dimension i = 1..D

generate a proposal state x_i^* from q(x_i | x_i^{(t-1)})

calculate the proposal correction factor c = \frac{q(x_i^{(t-1)}) | x_i^*)}{q(x_i^* | x_i^{(t-1)})}

calculate the acceptance probability \alpha = \text{min} \left (1,\frac{p( x_i^*, \bold x_j^{(t-1)})}{p( x_i^{(t-1)}, \bold x_j^{(t-1)})} \times c\right )

draw a random number u from \text{Unif}(0,1)

if u \leq \alpha accept the proposal state x_i^* and set x_i^{(t)}=x_i^*

else set x_i^{(t)} = x_i^{(t-1)}

Note that in the component-wise implementation a sample for the i-th dimension is proposed, then  accepted or rejected while all other dimensions (j \neq i) are held fixed. We then move on to the next ((i + 1)-th) dimension and repeat the process while holding all other variables (j \neq (i + 1)) fixed. In each successive step we are using updated values for the dimensions that have occurred since increasing (t -1) \rightarrow t .

Example 2: Component-wise Metropolis-Hastings for sampling of bivariate Normal distribution

In this example we draw samples from the same bivariate Normal target distribution described in Example 1, but using component-wise updates. Therefore p(x) is the same, however, the proposal distribution q(x) is now a univariate Normal distribution with unit unit variance in the direction of the i-th dimension to be sampled. The MATLAB implementation of the component-wise sampler is at the end of the section. The samples and comparison to the analytic target distribution are shown below.

Samples drawn from component-wise Metropolis-Hastings algorithm compared to target distribution

Again, we see that we get a good characterization of the bivariate target distribution.

%--------------------------------------------------
% EXAMPLE 2: METROPOLIS-HASTINGS
% COMPONENT-WISE SAMPLING OF BIVARIATE NORMAL
rand('seed' ,12345);

% TARGET DISTRIBUTION
p = inline('mvnpdf(x,[0 0],[1 0.8;0.8 1])','x');

nSamples = 5000;
propSigma = 1;		% PROPOSAL VARIANCE
minn = [-3 -3];
maxx = [3 3];

% INITIALIZE COMPONENT-WISE SAMPLER
x = zeros(nSamples,2);
xCurrent(1) = randn;
xCurrent(2) = randn;
dims = 1:2; % INDICES INTO EACH DIMENSION
t = 1;
x(t,1) = xCurrent(1);
x(t,2) = xCurrent(2);

% RUN SAMPLER
while t < nSamples
	t = t + 1;
	for iD = 1:2 % LOOP OVER DIMENSIONS

		% SAMPLE PROPOSAL
		xStar = normrnd(xCurrent(:,iD), propSigma);

		% NOTE: CORRECTION FACTOR c=1 BECAUSE
		% N(mu,1) IS SYMMETRIC, NO NEED TO CALCULATE

		% CALCULATE THE ACCEPTANCE PROBABILITY
		pratio = p([xStar xCurrent(dims~=iD)])/ ...
		p([xCurrent(1) xCurrent(2)]);
		alpha = min([1, pratio]);

		% ACCEPT OR REJECT?
		u = rand;
		if u < alpha
			xCurrent(iD) = xStar;
		end
	end

	% UPDATE SAMPLES
	x(t,:) = xCurrent;
end

% DISPLAY
nBins = 20;
bins1 = linspace(minn(1), maxx(1), nBins);
bins2 = linspace(minn(2), maxx(2), nBins);

% DISPLAY SAMPLED DISTRIBUTION
figure;
ax = subplot(121);
bins1 = linspace(minn(1), maxx(1), nBins);
bins2 = linspace(minn(2), maxx(2), nBins);
sampX = hist3(x, 'Edges', {bins1, bins2});
hist3(x, 'Edges', {bins1, bins2});
view(-15,40)

% COLOR HISTOGRAM BARS ACCORDING TO HEIGHT
colormap hot
set(gcf,'renderer','opengl');
set(get(gca,'child'),'FaceColor','interp','CDataMode','auto');
xlabel('x_1'); ylabel('x_2'); zlabel('Frequency');
axis square
set(ax,'xTick',[minn(1),0,maxx(1)]);
set(ax,'yTick',[minn(2),0,maxx(2)]);
title('Sampled Distribution');

% DISPLAY ANALYTIC DENSITY
ax = subplot(122);
[x1 ,x2] = meshgrid(bins1,bins2);
probX = p([x1(:), x2(:)]);
probX = reshape(probX ,nBins, nBins);
surf(probX); axis xy
view(-15,40)
xlabel('x_1'); ylabel('x_2'); zlabel('p({\bfx})');
colormap hot
axis square
set(ax,'xTick',[1,round(nBins/2),nBins]);
set(ax,'xTickLabel',[minn(1),0,maxx(1)]);
set(ax,'yTick',[1,round(nBins/2),nBins]);
set(ax,'yTickLabel',[minn(2),0,maxx(2)]);
title('Analytic Distribution')

Wrapping Up

Here we saw how we can use block- and component-wise updates to derive two different implementations of the Metropolis-Hastings algorithm. In the next post we will use component-wise updates introduced above to motivate the Gibbs sampler, which is often used to increase the efficiency of sampling well-defined probability multivariate distributions.

Advertisements

About dustinstansbury

I recently received my PhD from UC Berkeley where I studied computational neuroscience and machine learning.

Posted on November 4, 2012, in Algorithms, Sampling, Sampling Methods, Simulations, Statistics and tagged , , , , , , . Bookmark the permalink. 7 Comments.

  1. I guess the location of the xStart in the p([xStar xCurrent(dims~=iD)]) array, should vary with the dimension that you are updating.

  2. pratio = p([xStar xCurrent(dims~=iD)])/ …

    In your component-wise MH implementation, it seems that xStar is always in the first position even if you are dealing with the 2nd dimension. xStar should probably shift when other dimensions are considered. Please clarify. Very nice tutorial by the way…

  3. Thank you for a fantastic site. Incredible work. May I ask how the acceptance probability shall be calculated if I have more than two dimensions? For an instance, if I have four states, is the following reasoning correct for e.g. i = 1, p(xi,xj) = p(x1,x0)*p(x1,x2)*p(x1,x3)? And by taking the log-probability I can sum instead of multiplying? Thank you!

  1. Pingback: MCMC: The Gibbs Sampler « The Clever Machine

  2. Pingback: MCMC: The Gibbs Sampler, simple example w/ Matlab code | Victor Fang's Computing Space

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: