# Blog Archives

## A Gentle Introduction to Markov Chain Monte Carlo (MCMC)

Applying probabilistic models to data usually involves integrating a complex, multi-dimensional probability distribution. For example, calculating the expectation/mean of a model distribution involves such an integration. Many (most) times, these integrals are not calculable due to the high dimensionality of the distribution or because there is no closed-form expression for the integral available using calculus. Markov Chain Monte Carlo (MCMC) is a method that allows one to approximate complex integrals using stochastic sampling routines. As MCMC’s name indicates, the method is composed of two components, the ** Markov chain** and

**.**

*Monte Carlo integration*Monte Carlo integration* *is a powerful technique that exploits stochastic sampling of the distribution in question in order to approximate the difficult integration. However, in order to use* *Monte Carlo integration it is necessary to be able to sample from the probability distribution in question, which may be difficult or impossible to do directly. This is where the second component of MCMC, the Markov chain,** **comes in. A Markov chain is a sequential model that transitions from one state to another in a probabilistic fashion, where the next state that the chain takes is conditioned on the previous state. Markov chains are useful in that if they are constructed properly, and allowed to run for a long time, the states that a chain will take also sample from a target probability distribution. Therefore we can construct Markov chains to sample from the distribution whose integral we would like to approximate, then use Monte Carlo integration to perform the approximation.

Here I introduce a series of posts where I describe the basic concepts underlying MCMC, starting off by describing Monte Carlo Integration, then giving a brief introduction of Markov chains and how they can be constructed to sample from a target probability distribution. Given these foundation principles, we can then discuss MCMC techniques such as the Metropolis and Metropolis-Hastings algorithms, the Gibbs sampler, and the Hybrid Monte Carlo algorithm.

As always, each post has a somewhat formal/mathematical introduction, along with an example and simple Matlab implementations of the associated algorithms.

## Monte Carlo Approximations

# Monte Carlo Approximation for Integration

Using statistical methods we often run into integrals that take the form:

For instance, the expected value of a some function of a random variable

and many quantities essential for Bayesian methods such as the marginal likelihood a.k.a “model evidence”

involve integrals of this form. Sometimes (not often) such an integral can be evaluated analytically. When a closed form solution does not exist, numeric integration methods can be applied. However numerical methods quickly become intractable for any practical application that requires more than a small number of dimensions. This is where Monte Carlo approximation comes in. Monte Carlo approximation allows us to calculate an estimate for the value of by transforming the integration problem into a procedure of sampling values from a tractable probability distribution and calculating the average of those samples. Here’s what I mean:

If the function fullfills two simple criteria, namely that the function is always positive on the interval

and that the integral of the function is finite

then we can define a corresponding probability distribution on the interval :

Another way to think of it is that is a probability distribution scaled by a constant .

Using this link between probability distributions and , we can restate the original integration as

This statement says that if we can sample values of using , then the value of the original integral is simply a scaled version of the expected value of the integrand function calculated using those samples. Turns out that the expected value can be easily approximated by the sample mean:

where samples are drawn independently from . This leads to a simple 4-Step Procedure for performing Monte Carlo approximation to the integral :

- Identify
- Identify and from it determine and
- Draw independent samples from
- Evaluate

The larger the number of samples we draw, the better our approximation to the actual value of . This 4-step procedure is demonstrated in a some toy examples below:

### Example 1: Approximating the integral

Say we want to calculate the integral:

We can calculate the closed form solution of this integral using integration by parts:

and

Orr…we could calculate the Monte Carlo approximation of this integral.

**Step 1** we identify

**Step 2** we identify

and from this can also determine the probability distribution function . According to the definition expression for given above we detemine to be:

.

**Step 3:** The expression on the right is the definition for the uniform distribution , which is easy to sample from using the MATLAB (Notice too that the constant ).

**Step 4:** we calculate the Monte Carlo approximation as

,

where each is sampled from the standard uniform distribution. Below is some MATLAB code running the Monte Carlo Approximation for two different values of

% MONTE CARLO APPROXIMATION OF INT(xexp(x))dx % FOR TWO DIFFERENT SAMPLE SIZES rand('seed',12345); % THE FIRST APPROXIMATION USING N1 = 100 SAMPLES N1 = 100; x = rand(N1,1); Ihat1 = sum(x.*exp(x))/N1 % A SECOND APPROXIMATION USING N2 = 5000 SAMPLES N2 = 5000; x = rand(N2,1); Ihat2 = sum(x.*exp(x))/N2

Comparing the values of the variables and we see that the Monte Carlo approximation is better for a larger number of samples.

### Example 2: Approximating the expected value of the Beta distribution

Lets look at how the 4-step Monte Carlo approximation procedure can be used to calculate expectations. In this example we will calculate

,

where .

**Step 1**: we identify .

**Step 2:** the function is simply the probability density function due the expression for above:

.

**Step 3** we can use MATLAB to easily draw independent samples using the function . And finally,

**Step 4** we approximate the expectation with the expression

Below is some MATLAB code that performs this approximation of the expected value.

rand('seed',12345); alpha1 = 2; alpha2 = 10; N = 10000; x = betarnd(alpha1,alpha2,1,N); % MONTE CARLO EXPECTATION expectMC = mean(x); % ANALYTIC EXPRESSION FOR BETA MEAN expectAnalytic = alpha1/(alpha1 + alpha2); % DISPLAY figure; bins = linspace(0,1,50); counts = histc(x,bins); probSampled = counts/sum(counts); probTheory = betapdf(bins,alpha1,alpha2); b = bar(bins,probSampled); colormap hot; hold on; t = plot(bins,probTheory/sum(probTheory),'r','Linewidth',2) m = plot([expectMC,expectMC],[0 .1],'g') e = plot([expectAnalytic,expectAnalytic],[0 .1],'b') xlim([expectAnalytic - .05,expectAnalytic + 0.05]) legend([b,t,m,e],{'Samples','Theory','$\hat{E}$','$E_{Theory}$'},'interpreter','latex'); title(['E_{MC} = ',num2str(expectMC),'; E_{Theory} = ',num2str(expectAnalytic)]) hold off

And the output of the code:

The analytical solution for the expected value of this Beta distribution:

is quite close to our approximation (also indicated by the small distance between the blue and green lines on the plot above).

## Monte Carlo Approximation for Optimization

Monte Carlo Approximation can also be used to solve optimization problems of the form:

If fulfills the same criteria described above (namely that it is a scaled version of a probability distribution), then (as above) we can define the probability function

This allows us to instead solve the problem

If we can sample from , the solution is easily found by drawing samples from and determining the location of the samples that has the highest density (Note that the solution is not dependent of the value of ). The following example demonstrates Monte Carlo optimization:

### Example: Monte Carlo Optimization of

Say we would like to find the value of which optimizes the function . In other words we want to solve the problem

We could solve for using standard calculus methods, but a clever trick is to use Monte Carlo approximation to solve the problem. First, notice that is a scaled version of a Normal distribution with mean equal to 4 and unit variance:

where . This means we can solve for by drawing samples from the normal distribution and determining where those samples have the highest density. The following chunk of matlab code solves the optimization problem in this way.

% MONTE CARLO OPTIMIZATION OF exp(x-4)^2 randn('seed',12345) % INITIALZIE N = 100000; x = 0:.1:6; C = sqrt(2*pi); g = inline('exp(-.5*(x-4).^2)','x'); ph = plot(x,g(x)/C,'r','Linewidth',3); hold on gh = plot(x,g(x),'b','Linewidth',2); hold on; y = normpdf(x,4,1); % CALCULATE MONTE CARLO APPROXIMATION x = normrnd(4,1,1,N); bins = linspace(min(x),max(x),100); counts = histc(x,bins); [~,optIdx] = max(counts); xHat = bins(optIdx); % OPTIMA AND ESTIMATED OPTIMA oh = plot([4 4],[0,1],'k'); hh = plot([xHat,xHat],[0,1],'g'); set(gca,'fontsize',16) legend([gh,ph,oh,hh],{'g(x)','$p(x)=\frac{g(x)}{C}$','$x_{opt}$','$\hat{x}$'},'interpreter','latex','Location','Northwest');

In the code output above we see the function we want to optimize in blue and the Normal distribution from which we draw samples in red. The Monte Carlo method provides a good approximation (green) to the real solution (black).

## Wrapping Up

In the toy examples above it was easy to sample from . However, for practical problems the distributions we want to sample from are often complex and operate in many dimensions. For these problems more clever sampling methods have to be used. Such sampling methods include Inverse Transform Sampling, Rejection Sampling, Importance Sampling, and Markov Chain Monte Carlo methods such as the Metropolis Hasting algorithm and the Gibbs sampler, each of which I plan to cover in separate posts.