# Category Archives: Sampling

## MCMC: Hamiltonian Monte Carlo (a.k.a. Hybrid Monte Carlo)

The random-walk behavior of many Markov Chain Monte Carlo (MCMC) algorithms makes Markov chain convergence to a target stationary distribution inefficient, resulting in slow mixing. Hamiltonian/Hybrid Monte Carlo (HMC), is a MCMC method that adopts physical system dynamics rather than a probability distribution to propose future states in the Markov chain. This allows the Markov chain to explore the target distribution much more efficiently, resulting in faster convergence. Here we introduce basic analytic and numerical concepts for simulation of Hamiltonian dynamics. We then show how Hamiltonian dynamics can be used as the Markov chain proposal function for an MCMC sampling algorithm (HMC).

## First off, a brief physics lesson in Hamiltonian dynamics

Before we can develop Hamiltonian Monte Carlo, we need to become familiar with the concept of Hamiltonian dynamics. Hamiltonian dynamics is one way that physicists describe how objects move throughout a system. Hamiltonian dynamics describe an object’s motion in terms of its location and momentum (equivalent to the object’s mass times its velocity) at some time . For each location the object takes, there is an associated potential energy , and for each momentum there is an associated kinetic energy . The total energy of the system is constant and known as the Hamiltonian , defined simply as the sum of the potential and kinetic energies:

Hamiltonian dynamics describe how kinetic energy is converted to potential energy (and vice versa) as an object moves throughout a system in time. This description is implemented quantitatively via a set of differential equations known as the Hamiltonian equations:

Therefore, if we have expressions for and and a set of initial conditions (i.e. an initial position and initial momentum at time ), it is possible to predict the location and momentum of an object at any point in time by simulating these dynamics for a duration

## Simulating Hamiltonian dynamics — the Leap Frog Method

The Hamiltonian equations describe an object’s motion in time, which is a continuous variable. In order to simulate Hamiltonian dynamics numerically on a computer, it is necessary to approximate the Hamiltonian equations by discretizing time. This is done by splitting the interval up into a series of smaller intervals of length . The smaller the value of the closer the approximation is to the dynamics in continuous time. There are a number of procedures that have been developed for discretizing time including Euler’s method and the Leap Frog Method, which I will introduce briefly in the context of Hamiltonian dynamics. The Leap Frog method updates the momentum and position variables sequentially, starting by simulating the momentum dynamics over a small interval of time , then simulating the position dynamics over a slightly longer interval in time , then completing the momentum simulation over another small interval of time so that and now exist at the same point in time. Specifically, the Leap Frog method is as follows: 1. Take a half step in time to update the momentum variable:

2. Take a full step in time to update the position variable

3. Take the remaining half step in time to finish updating the momentum variable

The Leap Fog method can be run for steps to simulate dynamics over units of time. This particular discretization method has a number of properties that make it preferable to other approximation methods like Euler’s method, particularly for use in MCMC, but discussion of these properties are beyond the scope of this post. Let’s see how we can use the Leap Frog method to simulate Hamiltonian dynamics in a simple 1D example.

### Example 1: Simulating Hamiltonian dynamics of an harmonic oscillator

Imagine a ball with mass equal to one is attached to a horizontally-oriented spring. The spring exerts a force on the ball equal to

which works to restore the ball’s position to the equilibrium position of the spring at . Let’s assume that the spring constant , which defines the strength of the restoring force is also equal to one. If the ball is displaced by some distance from equilibrium, then the potential energy is

In addition, the kinetic energy an object with mass moving with velocity within a linear system is known to be

,

if the object’s mass is equal to one, like the ball this example. Notice that we now have in hand the expressions for both and . In order to simulate the Hamiltonian dynamics of the system using the Leap Frog method, we also need expressions for the partial derivatives of each variable (in this 1D example there are only one for each variable):

Therefore one iteration the Leap Frog algorithm for simulating Hamiltonian dynamics in this system is:

1.

2. 3.

We simulate the dynamics of the spring-mass system described using the Leap Frog method in Matlab below (if the graph is not animated, try clicking on it to open up the linked .gif). The left bar in the bottom left subpanel of the simulation output demonstrates the trade-off between potential and kinetic energy described by Hamiltonian dynamics. The cyan portion of the bar is the proportion of the Hamiltonian contributed by the potential energy , and the yellow portion represents is the contribution of the kinetic energy . The right bar (in all yellow), is the total value of the Hamiltonian . Here we see that the ball oscillates about the equilibrium position of the spring with a constant period/frequency. As the ball passes the equilibrium position , it has a minimum potential energy and maximum kinetic energy. At the extremes of the ball’s trajectory, the potential energy is at a maximum, while the kinetic energy is minimized. The procession of momentum and position map out positions in what is referred to as ** phase space**, which is displayed in the bottom right subpanel of the output. The harmonic oscillator maps out an ellipse in phase space. The size of the ellipse depends on the energy of the system defined by initial conditions.

You may also notice that the value of the Hamiltonian is not a exactly constant in the simulation, but oscillates slightly. This is an artifact known as ** energy drift** due to approximations used to the discretize time.

% EXAMPLE 1: SIMULATING HAMILTONIAN DYNAMICS % OF HARMONIC OSCILLATOR % STEP SIZE delta = 0.1; % # LEAP FROG L = 70; % DEFINE KINETIC ENERGY FUNCTION K = inline('p^2/2','p'); % DEFINE POTENTIAL ENERGY FUNCTION FOR SPRING (K =1) U = inline('1/2*x^2','x'); % DEFINE GRADIENT OF POTENTIAL ENERGY dU = inline('x','x'); % INITIAL CONDITIONS x0 = -4; % POSTIION p0 = 1; % MOMENTUM figure %% SIMULATE HAMILTONIAN DYNAMICS WITH LEAPFROG METHOD % FIRST HALF STEP FOR MOMENTUM pStep = p0 - delta/2*dU(x0)'; % FIRST FULL STEP FOR POSITION/SAMPLE xStep = x0 + delta*pStep; % FULL STEPS for jL = 1:L-1 % UPDATE MOMENTUM pStep = pStep - delta*dU(xStep); % UPDATE POSITION xStep = xStep + delta*pStep; % UPDATE DISPLAYS subplot(211), cla hold on; xx = linspace(-6,xStep,1000); plot(xx,sin(6*linspace(0,2*pi,1000)),'k-'); plot(xStep+.5,0,'bo','Linewidth',20) xlim([-6 6]);ylim([-1 1]) hold off; title('Harmonic Oscillator') subplot(223), cla b = bar([U(xStep),K(pStep);0,U(xStep)+K(pStep)],'stacked'); set(gca,'xTickLabel',{'U+K','H'}) ylim([0 10]); title('Energy') subplot(224); plot(xStep,pStep,'ko','Linewidth',20); xlim([-6 6]); ylim([-6 6]); axis square xlabel('x'); ylabel('p'); title('Phase Space') pause(.1) end % (LAST HALF STEP FOR MOMENTUM) pStep = pStep - delta/2*dU(xStep);

## Hamiltonian dynamics and the target distribution

Now that we have a better understanding of what Hamiltonian dynamics are and how they can be simulated, let’s now discuss how we can use Hamiltonian dynamics for MCMC. The main idea behind Hamiltonian/Hibrid Monte Carlo is to develop a Hamiltonian function such that the resulting Hamiltonian dynamics allow us to efficiently explore some target distribution . How can we choose such a Hamiltonian function? It turns out it is pretty simple to relate a to using a basic concept adopted from statistical mechanics known as the ** canonical distribution**. For any energy function over a set of variables , we can define the corresponding canonical distribution as: where we simply take the exponential of the negative of the energy function. The variable is a normalizing constant called the

**that scales the canonical distribution such that is sums to one, creating a valid probability distribution. Don’t worry about , it isn’t really important because, as you may recall from an earlier post, MCMC methods can sample from unscaled probability distributions. Now, as we saw above, the energy function for Hamiltonian dynamics is a combination of potential and kinetic energies:**

*partition function*Therefore the canoncial distribution for the Hamiltonian dynamics energy function is

Here we see that joint (canonical) distribution for and factorizes. This means that the two variables are independent, and the canoncial distribution is independent of the analogous distribution for the momentum. Therefore, as we’ll see shortly, we can use Hamiltonian dynamics to sample from the joint canonical distribution over and and simply ignore the momentum contributions. Note that this is an example of introducing ** auxiliary variables **to facilitate the Markov chain path. Introducing the auxiliary variable allows us to use Hamiltonian dynamics, which are unavailable without them. Because the canonical distribution for is independent of the canonical distribution for , we can choose any distribution from which to sample the momentum variables. A common choice is to use a zero-mean Normal distribution with unit variance:

Note that this is equivalent to having a quadratic potential energy term in the Hamiltonian:

Recall that this is is the exact quadratic kinetic energy function (albeit in 1D) used in the harmonic oscillator example above. This is a convenient choice for the kinetic energy function as all partial derivatives are easy to compute. Now that we have defined a kinetic energy function, all we have to do is find a potential energy function that when negated and run through the exponential function, gives the target distribution (or an unscaled version of it). Another way of thinking of it is that we can define the potential energy function as

.

If we can calculate , then we’re in business and we can simulate Hamiltonian dynamics that can be used in an MCMC technique.

## Hamiltonian Monte Carlo

In HMC we use Hamiltonian dynamics as a proposal function for a Markov Chain in order to explore the target (canonical) density defined by more efficiently than using a proposal probability distribution. Starting at an initial state , we simulate Hamiltonian dynamics for a short time using the Leap Frog method. We then use the state of the position and momentum variables at the end of the simulation as our proposed states variables and . The proposed state is accepted using an update rule analogous to the Metropolis acceptance criterion. Specifically if the probability of the proposed state after Hamiltonian dynamics

is greater than probability of the state prior to the Hamiltonian dynamics

then the proposed state is accepted, otherwise, the proposed state is accepted randomly. If the state is rejected, the next state of the Markov chain is set as the state at . For a given set of initial conditions, Hamiltonian dynamics will follow contours of constant energy in phase space (analogous to the circle traced out in phase space in the example above). Therefore we must randomly perturb the dynamics so as to explore all of . This is done by simply drawing a random momentum from the corresponding canonical distribution before running the dynamics prior to each sampling iteration . Combining these steps, sampling random momentum, followed by Hamiltonian dynamics and Metropolis acceptance criterion defines the HMC algorithm for drawing samples from a target distribution:

- set
- generate an initial position state
- repeat until

set

– sample a new initial momentum variable from the momentum canonical distribution

– set

– run Leap Frog algorithm starting at for steps and stepsize to obtain proposed states and

– calculate the Metropolis acceptance probability:

– draw a random number from

if accept the proposed state position and set the next state in the Markov chain

else set

In the next example we implement HMC to sample from a multivariate target distribution that we have sampled from previously using multi-variate Metropolis-Hastings, the bivariate Normal. We also qualitatively compare the sampling dynamics of HMC to multivariate Metropolis-Hastings for the sampling the same distribution.

### Example 2: Hamiltonian Monte for sampling a Bivariate Normal distribution

As a reminder, the target distribution for this exampleis a Normal form with following parameterization:

with mean

and covariance

In order to sample from (assuming that we are using a quadratic energy function), we need to determine the expressions for and

.

Recall that the target potential energy function can be defined from the canonical form as

If we take the negative log of the Normal distribution outline above, this defines the following potential energy function:

Where is the normalizing constant for a Normal distribution (and can be ignored because it will eventually cancel). The potential energy function is then simply:

with partial derivatives

Using these expressions for the potential energy and its partial derivatives, we implement HMC for sampling from the bivariate Normal in Matlab:

In the graph above we display HMC samples of the target distribution, starting from an initial position very far from the mean of the target. We can see that HMC rapidly approaches areas of high density under the target distribution. We compare these samples with samples drawn using the Metropolis-Hastings (MH) algorithm below. The MH algorithm converges much slower than HMC, and consecutive samples have much higher autocorrelation than samples drawn using HMC.

The Matlab code for the HMC sampler:

% EXAMPLE 2: HYBRID MONTE CARLO SAMPLING -- BIVARIATE NORMAL rand('seed',12345); randn('seed',12345); % STEP SIZE delta = 0.3; nSamples = 1000; L = 20; % DEFINE POTENTIAL ENERGY FUNCTION U = inline('transp(x)*inv([1,.8;.8,1])*x','x'); % DEFINE GRADIENT OF POTENTIAL ENERGY dU = inline('transp(x)*inv([1,.8;.8,1])','x'); % DEFINE KINETIC ENERGY FUNCTION K = inline('sum((transp(p)*p))/2','p'); % INITIAL STATE x = zeros(2,nSamples); x0 = [0;6]; x(:,1) = x0; t = 1; while t < nSamples t = t + 1; % SAMPLE RANDOM MOMENTUM p0 = randn(2,1); %% SIMULATE HAMILTONIAN DYNAMICS % FIRST 1/2 STEP OF MOMENTUM pStar = p0 - delta/2*dU(x(:,t-1))'; % FIRST FULL STEP FOR POSITION/SAMPLE xStar = x(:,t-1) + delta*pStar; % FULL STEPS for jL = 1:L-1 % MOMENTUM pStar = pStar - delta*dU(xStar)'; % POSITION/SAMPLE xStar = xStar + delta*pStar; end % LAST HALP STEP pStar = pStar - delta/2*dU(xStar)'; % COULD NEGATE MOMENTUM HERE TO LEAVE % THE PROPOSAL DISTRIBUTION SYMMETRIC. % HOWEVER WE THROW THIS AWAY FOR NEXT % SAMPLE, SO IT DOESN'T MATTER % EVALUATE ENERGIES AT % START AND END OF TRAJECTORY U0 = U(x(:,t-1)); UStar = U(xStar); K0 = K(p0); KStar = K(pStar); % ACCEPTANCE/REJECTION CRITERION alpha = min(1,exp((U0 + K0) - (UStar + KStar))); u = rand; if u < alpha x(:,t) = xStar; else x(:,t) = x(:,t-1); end end % DISPLAY figure scatter(x(1,:),x(2,:),'k.'); hold on; plot(x(1,1:50),x(2,1:50),'ro-','Linewidth',2); xlim([-6 6]); ylim([-6 6]); legend({'Samples','1st 50 States'},'Location','Northwest') title('Hamiltonian Monte Carlo')

## Wrapping up

In this post we introduced the Hamiltonian/Hybrid Monte Carlo algorithm for more efficient MCMC sampling. The HMC algorithm is extremely powerful for sampling distributions that can be represented terms of a potential energy function and its partial derivatives. Despite the efficiency and elegance of HMC, it is an underrepresented sampling routine in the literature. This may be due to the popularity of simpler algorithms such as Gibbs sampling or Metropolis-Hastings, or perhaps due to the fact that one must select hyperparameters such as the number of Leap Frog steps and Leap Frog step size when using HMC. However, recent research has provided effective heuristics such as adapting the Leap Frog step size in order to maintain a constant Metropolis rejection rate, which facilitate the use of HMC for general applications.

## MCMC: The Gibbs Sampler

In the previous post, we compared using block-wise and component-wise implementations of the Metropolis-Hastings algorithm for sampling from a multivariate probability distribution. Component-wise updates for MCMC algorithms are generally more efficient for multivariate problems than blockwise updates in that we are more likely to accept a proposed sample by drawing each component/dimension independently of the others. However, samples may still be rejected, leading to excess computation that is never used. The Gibbs sampler, another popular MCMC sampling technique, provides a means of avoiding such wasted computation. Like the component-wise implementation of the Metropolis-Hastings algorithm, the Gibbs sampler also uses component-wise updates. However, unlike in the Metropolis-Hastings algorithm, all proposed samples are accepted, so there is no wasted computation.

The Gibbs sampler is applicable for certain classes of problems, based on two main criterion. Given a target distribution , where , ), The first criterion is 1) that it is necessary that we have an analytic (mathematical) expression for the conditional distribution of each variable in the joint distribution given all other variables in the joint. Formally, if the target distribution is -dimensional, we must have individual expressions for

.

Each of these expressions defines the probability of the -th dimension given that we have values for all other () dimensions. Having the conditional distribution for each variable means that we don’t need a proposal distribution or an accept/reject criterion, like in the Metropolis-Hastings algorithm. Therefore, we can simply sample from each conditional while keeping all other variables held fixed. This leads to the second criterion 2) that we must be able to sample from each conditional distribution. This caveat is obvious if we want an implementable algorithm.

The Gibbs sampler works in much the same way as the component-wise Metropolis-Hastings algorithms except that instead drawing from a proposal distribution for each dimension, then accepting or rejecting the proposed sample, we simply draw a value for that dimension according to the variable’s corresponding conditional distribution. We also accept all values that are drawn. Similar to the component-wise Metropolis-Hastings algorithm, we step through each variable sequentially, sampling it while keeping all other variables fixed. The Gibbs sampling procedure is outlined below

- set
- generate an initial state
- repeat until

set

for each dimension

draw from

To get a better understanding of the Gibbs sampler at work, let’s implement the Gibbs sampler to solve the same multivariate sampling problem addressed in the previous post.

### Example: Sampling from a bivariate a Normal distribution

This example parallels the examples in the previous post where we sampled from a 2-D Normal distribution using block-wise and component-wise Metropolis-Hastings algorithms. Here, we show how to implement a Gibbs sampler to draw samples from the same target distribution. As a reminder, the target distribution is a Normal form with following parameterization:

with mean

and covariance

In order to sample from this distribution using a Gibbs sampler, we need to have in hand the conditional distributions for variables/dimensions and :

(i.e. the conditional for the first dimension, )

and

(the conditional for the second dimension, )

Where is the previous state of the second dimension, and is the state of the first dimension after drawing from . The reason for the discrepancy between updating and using states and , can be is seen in step 3 of the algorithm outlined in the previous section. At iteration we first sample a new state for variable conditioned on the most recent state of variable , which is from iteration . We then sample a new state for the variable conditioned on the most recent state of variable , which is now from the current iteration, .

After some math (which which I will skip for some brevity, but see the following for some details), we find that the two conditional distributions for the target Normal distribution are:

and

,

which are both univariate Normal distributions, each with a mean that is dependent on the value of the most recent state of the conditioning variable, and a variance that is dependent on the target covariances between the two variables.

Using the above expressions for the conditional probabilities of variables and , we implement the Gibbs sampler using MATLAB below. The output of the sampler is shown here:

Inspecting the figure above, note how at each iteration the Markov chain for the Gibbs sampler first takes a step only along the direction, then only along the direction. This shows how the Gibbs sampler sequentially samples the value of each variable separately, in a component-wise fashion.

% EXAMPLE: GIBBS SAMPLER FOR BIVARIATE NORMAL rand('seed' ,12345); nSamples = 5000; mu = [0 0]; % TARGET MEAN rho(1) = 0.8; % rho_21 rho(2) = 0.8; % rho_12 % INITIALIZE THE GIBBS SAMPLER propSigma = 1; % PROPOSAL VARIANCE minn = [-3 -3]; maxx = [3 3]; % INITIALIZE SAMPLES x = zeros(nSamples,2); x(1,1) = unifrnd(minn(1), maxx(1)); x(1,2) = unifrnd(minn(2), maxx(2)); dims = 1:2; % INDEX INTO EACH DIMENSION % RUN GIBBS SAMPLER t = 1; while t < nSamples t = t + 1; T = [t-1,t]; for iD = 1:2 % LOOP OVER DIMENSIONS % UPDATE SAMPLES nIx = dims~=iD; % *NOT* THE CURRENT DIMENSION % CONDITIONAL MEAN muCond = mu(iD) + rho(iD)*(x(T(iD),nIx)-mu(nIx)); % CONDITIONAL VARIANCE varCond = sqrt(1-rho(iD)^2); % DRAW FROM CONDITIONAL x(t,iD) = normrnd(muCond,varCond); end end % DISPLAY SAMPLING DYNAMICS figure; h1 = scatter(x(:,1),x(:,2),'r.'); % CONDITIONAL STEPS/SAMPLES hold on; for t = 1:50 plot([x(t,1),x(t+1,1)],[x(t,2),x(t,2)],'k-'); plot([x(t+1,1),x(t+1,1)],[x(t,2),x(t+1,2)],'k-'); h2 = plot(x(t+1,1),x(t+1,2),'ko'); end h3 = scatter(x(1,1),x(1,2),'go','Linewidth',3); legend([h1,h2,h3],{'Samples','1st 50 Samples','x(t=0)'},'Location','Northwest') hold off; xlabel('x_1'); ylabel('x_2'); axis square

## Wrapping Up

The Gibbs sampler is a popular MCMC method for sampling from complex, multivariate probability distributions. However, the Gibbs sampler cannot be used for general sampling problems. For many target distributions, it may difficult or impossible to obtain a closed-form expression for all the needed conditional distributions. In other scenarios, analytic expressions may exist for all conditionals but it may be difficult to sample from any or all of the conditional distributions (in these scenarios it is common to use univariate sampling methods such as rejection sampling and (surprise!) Metropolis-type MCMC techniques to approximate samples from each conditional). Gibbs samplers are very popular for Bayesian methods where models are often devised in such a way that conditional expressions for all model variables are easily obtained and take well-known forms that can be sampled from efficiently.

Gibbs sampling, like many MCMC techniques suffer from what is often called “slow mixing.” Slow mixing occurs when the underlying Markov chain takes a long time to sufficiently explore the values of in order to give a good characterization of . Slow mixing is due to a number of factors including the “random walk” nature of the Markov chain, as well as the tendency of the Markov chain to get “stuck,” only sampling a single region of having high-probability under . Such behaviors are bad for sampling distributions with multiple modes or heavy tails. More advanced techniques, such as Hybrid Monte Carlo have been developed to incorporate additional dynamics that increase the efficiency of the Markov chain path. We will discuss Hybrid Monte Carlo in a future post.

## MCMC: Multivariate Distributions, Block-wise, & Component-wise Updates

In the previous posts on MCMC methods, we focused on how to sample from univariate target distributions. This was done mainly to give the reader some intuition about MCMC implementations with fairly tangible examples that can be visualized. However, MCMC can easily be extended to sample multivariate distributions.

In this post we will discuss two flavors of MCMC update procedure for sampling distributions in multiple dimensions: block-wise, and component-wise update procedures. We will show how these two different procedures can give rise to different implementations of the Metropolis-Hastings sampler to solve the same problem.

## Block-wise Sampling

The first approach for performing multidimensional sampling is to use ** block-wise updates**. In this approach the proposal distribution has the same dimensionality as the target distribution . Specifically, if is a distribution over variables, ie. , then we must design a proposal distribution that is also a distribution involving variables. We then accept or reject a proposed state sampled from the proposal distribution in exactly the same way as for the univariate Metropolis-Hastings algorithm. To generate multivariate samples we perform the following block-wise sampling procedure:

- set
- generate an initial state
- repeat until

set

generate a proposal state from

calculate the proposal correction factor

calculate the acceptance probability

draw a random number from

if accept the proposal state and set

else set

Let’s take a look at the block-wise sampling routine in action.

### Example 1: Block-wise Metropolis-Hastings for sampling of bivariate Normal distribution

In this example we use block-wise Metropolis-Hastings algorithm to sample from a bivariate (i.e. ) Normal distribution:

with mean

and covariance

Usually the target distribution will have a complex mathematical form, but for this example we’ll circumvent that by using MATLAB’s built-in function to evaluate . For our proposal distribution, , let’s use a circular Normal centered at the the previous state/sample of the Markov chain/sampler, i.e:

,

where is a 2-D identity matrix, giving the proposal distribution unit variance along both dimensions and , and zero covariance. You can find an MATLAB implementation of the block-wise sampler at the end of the section. The display of the samples and the target distribution output by the sampler implementation are shown below:

We can see from the output that the block-wise sampler does a good job of drawing samples from the target distribution.

Note that our proposal distribution in this example is symmetric, therefore it was not necessary to calculate the correction factor per se. This means that this Metropolis-Hastings implementation is identical to the simpler Metropolis sampler.

%------------------------------------------------------ % EXAMPLE 1: METROPOLIS-HASTINGS % BLOCK-WISE SAMPLER (BIVARIATE NORMAL) rand('seed' ,12345); D = 2; % # VARIABLES nBurnIn = 100; % TARGET DISTRIBUTION IS A 2D NORMAL WITH STRONG COVARIANCE p = inline('mvnpdf(x,[0 0],[1 0.8;0.8 1])','x'); % PROPOSAL DISTRIBUTION STANDARD 2D GUASSIAN q = inline('mvnpdf(x,mu)','x','mu') nSamples = 5000; minn = [-3 -3]; maxx = [3 3]; % INITIALIZE BLOCK-WISE SAMPLER t = 1; x = zeros(nSamples,2); x(1,:) = randn(1,D); % RUN SAMPLER while t < nSamples t = t + 1; % SAMPLE FROM PROPOSAL xStar = mvnrnd(x(t-1,:),eye(D)); % CORRECTION FACTOR (SHOULD EQUAL 1) c = q(x(t-1,:),xStar)/q(xStar,x(t-1,:)); % CALCULATE THE M-H ACCEPTANCE PROBABILITY alpha = min([1, p(xStar)/p(x(t-1,:))]); % ACCEPT OR REJECT? u = rand; if u < alpha x(t,:) = xStar; else x(t,:) = x(t-1,:); end end % DISPLAY nBins = 20; bins1 = linspace(minn(1), maxx(1), nBins); bins2 = linspace(minn(2), maxx(2), nBins); % DISPLAY SAMPLED DISTRIBUTION ax = subplot(121); bins1 = linspace(minn(1), maxx(1), nBins); bins2 = linspace(minn(2), maxx(2), nBins); sampX = hist3(x, 'Edges', {bins1, bins2}); hist3(x, 'Edges', {bins1, bins2}); view(-15,40) % COLOR HISTOGRAM BARS ACCORDING TO HEIGHT colormap hot set(gcf,'renderer','opengl'); set(get(gca,'child'),'FaceColor','interp','CDataMode','auto'); xlabel('x_1'); ylabel('x_2'); zlabel('Frequency'); axis square set(ax,'xTick',[minn(1),0,maxx(1)]); set(ax,'yTick',[minn(2),0,maxx(2)]); title('Sampled Distribution'); % DISPLAY ANALYTIC DENSITY ax = subplot(122); [x1 ,x2] = meshgrid(bins1,bins2); probX = p([x1(:), x2(:)]); probX = reshape(probX ,nBins, nBins); surf(probX); axis xy view(-15,40) xlabel('x_1'); ylabel('x_2'); zlabel('p({\bfx})'); colormap hot axis square set(ax,'xTick',[1,round(nBins/2),nBins]); set(ax,'xTickLabel',[minn(1),0,maxx(1)]); set(ax,'yTick',[1,round(nBins/2),nBins]); set(ax,'yTickLabel',[minn(2),0,maxx(2)]); title('Analytic Distribution')

## Component-wise Sampling

A problem with block-wise updates, particularly when the number of dimensions becomes large, is that finding a suitable proposal distribution is difficult. This leads to a large proportion of the samples being rejected. One way to remedy this is to simply loop over the the dimensions of in sequence, sampling each dimension independently from the others. This is what is known as using * component-wise updates*. Note that now the proposal distribution is univariate, working only in one dimension, namely the current dimension that we are trying to sample. The component-wise Metropolis-Hastings algorithm is outlined below.

- set
- generate an initial state
- repeat until

set

for each dimension

generate a proposal state from

calculate the proposal correction factor

calculate the acceptance probability

draw a random number from

if accept the proposal state and set

else set

Note that in the component-wise implementation a sample for the -th dimension is proposed, then accepted or rejected while all other dimensions () are held fixed. We then move on to the next (-th) dimension and repeat the process while holding all other variables () fixed. In each successive step we are using updated values for the dimensions that have occurred since increasing .

### Example 2: Component-wise Metropolis-Hastings for sampling of bivariate Normal distribution

In this example we draw samples from the same bivariate Normal target distribution described in Example 1, but using component-wise updates. Therefore is the same, however, the proposal distribution is now a univariate Normal distribution with unit unit variance in the direction of the -th dimension to be sampled. The MATLAB implementation of the component-wise sampler is at the end of the section. The samples and comparison to the analytic target distribution are shown below.

Again, we see that we get a good characterization of the bivariate target distribution.

%-------------------------------------------------- % EXAMPLE 2: METROPOLIS-HASTINGS % COMPONENT-WISE SAMPLING OF BIVARIATE NORMAL rand('seed' ,12345); % TARGET DISTRIBUTION p = inline('mvnpdf(x,[0 0],[1 0.8;0.8 1])','x'); nSamples = 5000; propSigma = 1; % PROPOSAL VARIANCE minn = [-3 -3]; maxx = [3 3]; % INITIALIZE COMPONENT-WISE SAMPLER x = zeros(nSamples,2); xCurrent(1) = randn; xCurrent(2) = randn; dims = 1:2; % INDICES INTO EACH DIMENSION t = 1; x(t,1) = xCurrent(1); x(t,2) = xCurrent(2); % RUN SAMPLER while t < nSamples t = t + 1; for iD = 1:2 % LOOP OVER DIMENSIONS % SAMPLE PROPOSAL xStar = normrnd(xCurrent(:,iD), propSigma); % NOTE: CORRECTION FACTOR c=1 BECAUSE % N(mu,1) IS SYMMETRIC, NO NEED TO CALCULATE % CALCULATE THE ACCEPTANCE PROBABILITY pratio = p([xStar xCurrent(dims~=iD)])/ ... p([xCurrent(1) xCurrent(2)]); alpha = min([1, pratio]); % ACCEPT OR REJECT? u = rand; if u < alpha xCurrent(iD) = xStar; end end % UPDATE SAMPLES x(t,:) = xCurrent; end % DISPLAY nBins = 20; bins1 = linspace(minn(1), maxx(1), nBins); bins2 = linspace(minn(2), maxx(2), nBins); % DISPLAY SAMPLED DISTRIBUTION figure; ax = subplot(121); bins1 = linspace(minn(1), maxx(1), nBins); bins2 = linspace(minn(2), maxx(2), nBins); sampX = hist3(x, 'Edges', {bins1, bins2}); hist3(x, 'Edges', {bins1, bins2}); view(-15,40) % COLOR HISTOGRAM BARS ACCORDING TO HEIGHT colormap hot set(gcf,'renderer','opengl'); set(get(gca,'child'),'FaceColor','interp','CDataMode','auto'); xlabel('x_1'); ylabel('x_2'); zlabel('Frequency'); axis square set(ax,'xTick',[minn(1),0,maxx(1)]); set(ax,'yTick',[minn(2),0,maxx(2)]); title('Sampled Distribution'); % DISPLAY ANALYTIC DENSITY ax = subplot(122); [x1 ,x2] = meshgrid(bins1,bins2); probX = p([x1(:), x2(:)]); probX = reshape(probX ,nBins, nBins); surf(probX); axis xy view(-15,40) xlabel('x_1'); ylabel('x_2'); zlabel('p({\bfx})'); colormap hot axis square set(ax,'xTick',[1,round(nBins/2),nBins]); set(ax,'xTickLabel',[minn(1),0,maxx(1)]); set(ax,'yTick',[1,round(nBins/2),nBins]); set(ax,'yTickLabel',[minn(2),0,maxx(2)]); title('Analytic Distribution')

## Wrapping Up

Here we saw how we can use block- and component-wise updates to derive two different implementations of the Metropolis-Hastings algorithm. In the next post we will use component-wise updates introduced above to motivate the Gibbs sampler, which is often used to increase the efficiency of sampling well-defined probability multivariate distributions.