MCMC: Multivariate Distributions, Block-wise, & Component-wise Updates

In the previous posts on MCMC methods, we focused on how to sample from univariate target distributions. This was done mainly to give the reader some intuition about MCMC implementations with fairly tangible examples that can be visualized. However, MCMC can easily be extended to sample multivariate distributions.

In this post we will discuss two flavors of MCMC update procedure for sampling distributions in multiple dimensions: block-wise, and component-wise update procedures. We will show how these two different procedures can give rise to different implementations of the Metropolis-Hastings sampler to solve the same problem.

Block-wise Sampling

The first approach for performing multidimensional sampling is to use block-wise updates. In this approach the proposal distribution $q(\bold{x})$ has the same dimensionality as the target distribution $p(\bold x)$. Specifically, if $p(\bold x)$ is a distribution over $D$ variables, ie. $\bold{x} = (x_1, x_2, \dots, x_D)$, then we must design a proposal distribution that is also a distribution involving $D$ variables. We then accept or reject a proposed state $\bold x^*$ sampled from the proposal distribution $q(\bold x)$ in exactly the same way as for the univariate Metropolis-Hastings algorithm. To generate $M$ multivariate samples we perform the following block-wise sampling procedure:

1. set $t = 0$
2. generate an initial state $\bold x^{(0)} \sim \pi^{(0)}$
3. repeat until $t = M$

set $t = t+1$

generate a proposal state $\bold x^*$ from $q(\bold x | \bold x^{(t-1)})$

calculate the proposal correction factor $c = \frac{q(\bold x^{(t-1)} | \bold x^*) }{q(\bold x^*|\bold x^{(t-1)})}$

calculate the acceptance probability $\alpha = \text{min} \left (1,\frac{p(\bold x^*)}{p(\bold x^{(t-1)})} \times c\right )$

draw a random number $u$ from $\text{Unif}(0,1)$

if $u \leq \alpha$ accept the proposal state $\bold x^*$ and set $\bold x^{(t)}=\bold x^*$

else set $\bold x^{(t)} = \bold x^{(t-1)}$

Let’s take a look at the block-wise sampling routine in action.

Example 1: Block-wise Metropolis-Hastings for sampling of bivariate Normal distribution

In this example we use block-wise Metropolis-Hastings algorithm to sample from a bivariate (i.e. $D = 2$) Normal distribution:

$p(\bold x) = \mathcal N (\bold{\mu}, \bold \Sigma)$

with mean

$\mu = [0, 0]$

and covariance

$\bold \Sigma = \begin{bmatrix} 1 & 0.8 \\ 0.8 & 1\end{bmatrix}$

Usually the target distribution $p(\bold x)$ will have a complex mathematical form, but for this example we’ll circumvent that by using MATLAB’s built-in function $\text{mvnpdf}$ to evaluate $p(\bold x)$. For our proposal distribution, $q(\bold x)$, let’s use a circular Normal centered at the the previous state/sample of the Markov chain/sampler, i.e:

$q(\bold x | \bold x^{(t-1)}) \sim \mathcal N (\bold x^{(t-1)}, \bold I)$,

where $\bold I$ is a 2-D identity matrix, giving the proposal distribution unit variance along both dimensions $x_1$ and $x_2$, and zero covariance. You can find an MATLAB implementation of the block-wise sampler at the end of the section. The display of the samples and the target distribution output by the sampler implementation are shown below:

Samples drawn from block-wise Metropolis-Hastings sampler

We can see from the output that the block-wise sampler does a good job of drawing samples from the target distribution.

Note that our proposal distribution in this example is symmetric, therefore it was not necessary to calculate the correction factor $c$ per se. This means that this Metropolis-Hastings implementation is identical to the simpler Metropolis sampler.

%------------------------------------------------------
% EXAMPLE 1: METROPOLIS-HASTINGS
% BLOCK-WISE SAMPLER (BIVARIATE NORMAL)
rand('seed' ,12345);

D = 2; % # VARIABLES
nBurnIn = 100;

% TARGET DISTRIBUTION IS A 2D NORMAL WITH STRONG COVARIANCE
p = inline('mvnpdf(x,[0 0],[1 0.8;0.8 1])','x');

% PROPOSAL DISTRIBUTION STANDARD 2D GUASSIAN
q = inline('mvnpdf(x,mu)','x','mu')

nSamples = 5000;
minn = [-3 -3];
maxx = [3 3];

% INITIALIZE BLOCK-WISE SAMPLER
t = 1;
x = zeros(nSamples,2);
x(1,:) = randn(1,D);

% RUN SAMPLER
while t < nSamples
t = t + 1;

% SAMPLE FROM PROPOSAL
xStar = mvnrnd(x(t-1,:),eye(D));

% CORRECTION FACTOR (SHOULD EQUAL 1)
c = q(x(t-1,:),xStar)/q(xStar,x(t-1,:));

% CALCULATE THE M-H ACCEPTANCE PROBABILITY
alpha = min([1, p(xStar)/p(x(t-1,:))]);

% ACCEPT OR REJECT?
u = rand;
if u < alpha
x(t,:) = xStar;
else
x(t,:) = x(t-1,:);
end
end

% DISPLAY
nBins = 20;
bins1 = linspace(minn(1), maxx(1), nBins);
bins2 = linspace(minn(2), maxx(2), nBins);

% DISPLAY SAMPLED DISTRIBUTION
ax = subplot(121);
bins1 = linspace(minn(1), maxx(1), nBins);
bins2 = linspace(minn(2), maxx(2), nBins);
sampX = hist3(x, 'Edges', {bins1, bins2});
hist3(x, 'Edges', {bins1, bins2});
view(-15,40)

% COLOR HISTOGRAM BARS ACCORDING TO HEIGHT
colormap hot
set(gcf,'renderer','opengl');
set(get(gca,'child'),'FaceColor','interp','CDataMode','auto');
xlabel('x_1'); ylabel('x_2'); zlabel('Frequency');
axis square
set(ax,'xTick',[minn(1),0,maxx(1)]);
set(ax,'yTick',[minn(2),0,maxx(2)]);
title('Sampled Distribution');

% DISPLAY ANALYTIC DENSITY
ax = subplot(122);
[x1 ,x2] = meshgrid(bins1,bins2);
probX = p([x1(:), x2(:)]);
probX = reshape(probX ,nBins, nBins);
surf(probX); axis xy
view(-15,40)
xlabel('x_1'); ylabel('x_2'); zlabel('p({\bfx})');
colormap hot
axis square
set(ax,'xTick',[1,round(nBins/2),nBins]);
set(ax,'xTickLabel',[minn(1),0,maxx(1)]);
set(ax,'yTick',[1,round(nBins/2),nBins]);
set(ax,'yTickLabel',[minn(2),0,maxx(2)]);
title('Analytic Distribution')


Component-wise Sampling

A problem with block-wise updates, particularly when the number of dimensions $D$ becomes large, is that finding a suitable proposal distribution is difficult. This leads to a large proportion of the samples being rejected. One way to remedy this is to simply loop over the the $D$ dimensions of $\bold x$ in sequence, sampling each dimension independently from the others. This is what is known as using component-wise updates. Note that now the proposal distribution $q(x)$ is univariate, working only in one dimension, namely the current dimension that we are trying to sample. The component-wise Metropolis-Hastings algorithm is outlined below.

1. set $t = 0$
2. generate an initial state $\bold x^{(0)} \sim \pi^{(0)}$
3. repeat until $t = M$

set $t = t+1$

for each dimension $i = 1..D$

generate a proposal state $x_i^*$ from $q(x_i | x_i^{(t-1)})$

calculate the proposal correction factor $c = \frac{q(x_i^{(t-1)}) | x_i^*)}{q(x_i^* | x_i^{(t-1)})}$

calculate the acceptance probability $\alpha = \text{min} \left (1,\frac{p( x_i^*, \bold x_j^{(t-1)})}{p( x_i^{(t-1)}, \bold x_j^{(t-1)})} \times c\right )$

draw a random number $u$ from $\text{Unif}(0,1)$

if $u \leq \alpha$ accept the proposal state $x_i^*$ and set $x_i^{(t)}=x_i^*$

else set $x_i^{(t)} = x_i^{(t-1)}$

Note that in the component-wise implementation a sample for the $i$-th dimension is proposed, then  accepted or rejected while all other dimensions ($j \neq i$) are held fixed. We then move on to the next ($(i + 1)$-th) dimension and repeat the process while holding all other variables ($j \neq (i + 1)$) fixed. In each successive step we are using updated values for the dimensions that have occurred since increasing $(t -1) \rightarrow t$.

Example 2: Component-wise Metropolis-Hastings for sampling of bivariate Normal distribution

In this example we draw samples from the same bivariate Normal target distribution described in Example 1, but using component-wise updates. Therefore $p(x)$ is the same, however, the proposal distribution $q(x)$ is now a univariate Normal distribution with unit unit variance in the direction of the $i$-th dimension to be sampled. The MATLAB implementation of the component-wise sampler is at the end of the section. The samples and comparison to the analytic target distribution are shown below.

Samples drawn from component-wise Metropolis-Hastings algorithm compared to target distribution

Again, we see that we get a good characterization of the bivariate target distribution.

%--------------------------------------------------
% EXAMPLE 2: METROPOLIS-HASTINGS
% COMPONENT-WISE SAMPLING OF BIVARIATE NORMAL
rand('seed' ,12345);

% TARGET DISTRIBUTION
p = inline('mvnpdf(x,[0 0],[1 0.8;0.8 1])','x');

nSamples = 5000;
propSigma = 1;		% PROPOSAL VARIANCE
minn = [-3 -3];
maxx = [3 3];

% INITIALIZE COMPONENT-WISE SAMPLER
x = zeros(nSamples,2);
xCurrent(1) = randn;
xCurrent(2) = randn;
dims = 1:2; % INDICES INTO EACH DIMENSION
t = 1;
x(t,1) = xCurrent(1);
x(t,2) = xCurrent(2);

% RUN SAMPLER
while t < nSamples
t = t + 1;
for iD = 1:2 % LOOP OVER DIMENSIONS

% SAMPLE PROPOSAL
xStar = normrnd(xCurrent(:,iD), propSigma);

% NOTE: CORRECTION FACTOR c=1 BECAUSE
% N(mu,1) IS SYMMETRIC, NO NEED TO CALCULATE

% CALCULATE THE ACCEPTANCE PROBABILITY
pratio = p([xStar xCurrent(dims~=iD)])/ ...
p([xCurrent(1) xCurrent(2)]);
alpha = min([1, pratio]);

% ACCEPT OR REJECT?
u = rand;
if u < alpha
xCurrent(iD) = xStar;
end
end

% UPDATE SAMPLES
x(t,:) = xCurrent;
end

% DISPLAY
nBins = 20;
bins1 = linspace(minn(1), maxx(1), nBins);
bins2 = linspace(minn(2), maxx(2), nBins);

% DISPLAY SAMPLED DISTRIBUTION
figure;
ax = subplot(121);
bins1 = linspace(minn(1), maxx(1), nBins);
bins2 = linspace(minn(2), maxx(2), nBins);
sampX = hist3(x, 'Edges', {bins1, bins2});
hist3(x, 'Edges', {bins1, bins2});
view(-15,40)

% COLOR HISTOGRAM BARS ACCORDING TO HEIGHT
colormap hot
set(gcf,'renderer','opengl');
set(get(gca,'child'),'FaceColor','interp','CDataMode','auto');
xlabel('x_1'); ylabel('x_2'); zlabel('Frequency');
axis square
set(ax,'xTick',[minn(1),0,maxx(1)]);
set(ax,'yTick',[minn(2),0,maxx(2)]);
title('Sampled Distribution');

% DISPLAY ANALYTIC DENSITY
ax = subplot(122);
[x1 ,x2] = meshgrid(bins1,bins2);
probX = p([x1(:), x2(:)]);
probX = reshape(probX ,nBins, nBins);
surf(probX); axis xy
view(-15,40)
xlabel('x_1'); ylabel('x_2'); zlabel('p({\bfx})');
colormap hot
axis square
set(ax,'xTick',[1,round(nBins/2),nBins]);
set(ax,'xTickLabel',[minn(1),0,maxx(1)]);
set(ax,'yTick',[1,round(nBins/2),nBins]);
set(ax,'yTickLabel',[minn(2),0,maxx(2)]);
title('Analytic Distribution')



Wrapping Up

Here we saw how we can use block- and component-wise updates to derive two different implementations of the Metropolis-Hastings algorithm. In the next post we will use component-wise updates introduced above to motivate the Gibbs sampler, which is often used to increase the efficiency of sampling well-defined probability multivariate distributions.

I recently received my PhD from UC Berkeley where I studied computational neuroscience and machine learning.

Posted on November 4, 2012, in Algorithms, Sampling, Sampling Methods, Simulations, Statistics and tagged , , , , , , . Bookmark the permalink. 8 Comments.

1. A

I guess the location of the xStart in the p([xStar xCurrent(dims~=iD)]) array, should vary with the dimension that you are updating.

• Thanks for catching the typo A. You’re correct the proposal should vary with each dimension. The code has been updated.

2. Anon

pratio = p([xStar xCurrent(dims~=iD)])/ …

In your component-wise MH implementation, it seems that xStar is always in the first position even if you are dealing with the 2nd dimension. xStar should probably shift when other dimensions are considered. Please clarify. Very nice tutorial by the way…

• You’re correct Anon, the proposal should consider each dimenion. I’ve fixed the typo/bug.

3. rblilja

Thank you for a fantastic site. Incredible work. May I ask how the acceptance probability shall be calculated if I have more than two dimensions? For an instance, if I have four states, is the following reasoning correct for e.g. i = 1, p(xi,xj) = p(x1,x0)*p(x1,x2)*p(x1,x3)? And by taking the log-probability I can sum instead of multiplying? Thank you!

4. Thank you very much for such nice and well explained tutorial.