Blog Archives

Sampling From the Normal Distribution Using the Box-Muller Transform

The Normal Distribution is the workhorse of many common statistical analyses and being able to draw samples from this distribution lies at the heart of many statistical/machine learning algorithms. There have been a number of methods developed to sample from the Normal distribution including Inverse Transform Sampling, the Ziggurat Algorithm, and the Ratio Method (a rejection sampling technique). In this post we will focus on an elegant method called the Box-Muller transform.

A quick review of Cartesian and polar coordinates.

Before we can talk about using the Box-Muller transform, let’s refresh our understanding of the relationship between Cartesian and polar coordinates. You may remember from geometry that if x and y are two points in the Cartesian plane they can be represented in polar coordinates with a radius r and an angle \theta using the following relationships:

r^2 = x^2 + y^2

\tan(\theta) = \frac{y}{x}, and therefore

x = r\cos(\theta)

y = r\sin(\theta)

Notice that if r \leq 1 and \theta \in [0 , 2\pi], then we map out values contained in the unit circle, as shown in the figure below. Also note that random variables in such a circle can be generated by transforming values sampled from the uniform distribution. Specifically, radii can be sampled from r \sim Unif(0,1) and angle can be sampled from \theta \sim 2\pi \times Unif(0,1). A similar mechanism (i.e. drawing points in a circle using uniform variables) is at the heart of the Box-Muller transform for sampling Normal random variables.

Example of the relationship between Cartesian and polar coordinates

Drawing Normally-distributed samples with the Box-Muller transform

Ok, now that we’ve discussed how Cartesian coordinates are represented in polar coordinates, let’s move on to how we can use this relationship to generate random variables. Box-Muller sampling is based on representing the joint distribution of two independent standard Normal random Cartesian variables X and Y

X \sim N(0,1)

Y \sim N(0,1)

in polar coordinates. The joint distribution p(x,y) (which is circular-symmetric) is:

p(x,y) = p(x)p(y) = \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}\frac{1}{\sqrt{2\pi}}e^{-\frac{y^2}{2}}

= \frac{1}{2\pi}e^{-\frac{x^2 + y^2}{2}}

If we notice that the x^2 + y^2 term in the numerator of the exponent is equal to r^2 (as above) we can make the connection between the the Cartesian representation of the joint Normal distribution and its polar representation:

p(x,y) = \left ( \frac{1}{2\pi} \right ) \left ( e^{\frac{-r^2}{2}} \right )

which is the product of two density functions, an exponential distribution over squared radii:

r^2 \sim Exp(\frac{1}{2})

and a uniform distribution over angles:

\theta \sim Unif(0,2\pi)

just like those mentioned above when generating points on the unit circle. Now, if we make another connection between the exponential distribution and the uniform distribution, namely that:

Exp(\lambda) = \frac{-\log (Unif(0,1))}{\lambda}

then r \sim \sqrt{-2\log (Unif(0,1))}

This gives us a way to generate points from the joint Gaussian distribution by sampling from two independent uniform distributions, one for r and another for \theta, and transforming them into Cartesian coordinates via the relationships above. In detail, the procedure goes as follows:

  1. Draw, u_1,u_2 \sim Unif(0,1)
  2. Transform the variables into radius and angle representation r = \sqrt{-2\log(u_1)} , and \theta = 2\pi u_2
  3. Transform radius and angle into Cartesian coordinates: x = r \cos(\theta), y = r \sin(\theta)

What results are two independent Normal random variables, X and Y. A MATLAB implementation of the Box-Muller algorithm is shown below:

u = rand(2,100000);
r = sqrt(-2*log(u(1,:)));
theta = 2*pi*u(2,:);
x = r.*cos(theta);
y = r.*sin(theta);

colormap hot;axis square
title(sprintf('Box-Muller Samples Y\n Mean = %1.2f\n Variance = %1.2f\n Kurtosis = %1.2f',mean(x),var(x),3-kurtosis(x)))
xlim([-6 6])

colormap hot;axis square
title(sprintf('Box-Muller Samples X\n Mean = %1.2f\n Variance = %1.2f\n Kurtosis = %1.2f',mean(y),var(y),3-kurtosis(y)))
xlim([-6 6])

Box-Muller Samples for Normal Distribution

Wrapping Up

The output of the MATLAB code is shown above. Notice the first, second, and fourth central moments (mean, variance, and kurtosis)  of the generated samples are consistent with the standard normal. The Box-Muller transform is another example of of how uniform variables on the interval (0,1) and can be  transformed in order to sample from a more complicated distribution.


Emergence of Various Probability Distributions: Some Incite via Simulations

There a a large number of probability distributions that arise in statistical modeling and inference procedures. These distributions can take many complex analytic forms, and can have, what appear to be, arbitrary parameterizations. Though the definitions of many of these distributions are mathematically esoteric, it is often very helpful to see how they can emerge from simple simulation procedures. In this post we perform a number of simple manipulations to standard Normal/Gaussian-distributed variables that give rise to some probability distributions that predominate statistics:the  \chi^2 , the F, and the Student’s t-distributions

The Standard Normal Distribution

Before we begin the simulations, we need to first describe an important building block, the standard Normal distribution. The Normal distribution is a smooth continuous function that is symmetric about an axis and is often described as having a “bell” shape. The distribution lies at the heart of describing a vast number of physical processes. Also, many statistical quantities and distributions emerge from  simply transforming values drawn from the standard Normal distribution (I’ll describe what I mean by “standard” shortly).

Any Normal distribution is characterized by two parameters: a mean parameter \mu, which characterizes the location of the distribution center,  and a variance parameter \sigma^2, which characterizes the width of the distribution. Formally, the Normal distribution defines the probability p(x) of some value x occurring as:

p(x) = \frac{1}{2 \pi \sigma^2}e^{-\frac{1}{2 \sigma^2}(x - \mu)^2}

The standard Normal distribution has zero mean and and unit  variance (i.e. equal to 1), and therefore takes the simple form:

p(x) = \frac{1}{2 \pi}e^{-\frac{1}{2}x^2}

As I will show, a number of common probability distributions emerge from performing simple manipulations of values that follow the standard Normal distribution. In order to simulate these distributions we need to be able to draw values that follow the standard Normal distribution. Turns out that we can generate pseudo-random samples from a standard Normal distribution using a clever concept known as the Box-Muller transformation. Though I won’t go into the details, the Box-Muller transformation relates measurements on the unit circle (i.e. in polar coordinates) to norms of random vectors in Cartesian space (i.e. x-y coordinates). If none of that makes any sense, don’t worry it’s not important. What is important is that if we know how to independently draw two random numbers R_1 and R_2 that take on values 0 to 1, then it is possible to draw a sample N from the standard Normal via the following relationship

N = \sqrt{-2 \log R_1} \cos(2 \pi R_2)

or also

N = \sqrt{-2 \log R_1} \sin(2 \pi R_2)

since sin(x) and cos(x) are orthogonal to one another. Figure 1 compares the empirical probability distribution of values drawn from a standard Normal distribution using the MATLAB function \text{randn} to the distribution of values drawn from the uniform distribution using the function \text{rand} and transforming those values using the Box-Muller transformation.

Figure 1: Theoretical and simulated random Normal variables. Simulated variables are calculated using the Box-Muller transformation of uniform variables on the interval (0 1).

Note that the Box-Muller transformation is not the only (or even the best/most efficient) method for generating random variables from the standard Normal distribution (for instance, see the Polar Method). The reason for demonstrating the Box-Muller method is its simplicity. For a MATLAB example of drawing samples using the Box-Muller method, see lines (16-18) in the source code below.

Ok, so now that we have a way generating samples from the standard Normal distribution, we can now begin transforming these samples in order to develop other distributions.

Simulating the \chi^2 Distribution

The \chi^2 distribution is a common probability distribution used in statistical inference (i.e. the \chi^2 test) and assessing the goodness of fit of linear models to data. Specifically, a random variable Z drawn from the \chi^2 with \nu degrees of freedom is obtained by drawing \nu independent variables N from the standard normal distribution, squaring each value drawn and taking the sum of those squared values. Formally, this process is described as:

Z = \sum_{i=1}^\nu N_i^2

 For a MATLAB example of simulating \chi^2-distributed variables, see lines (37-54) in the source code below. The results of this simulation are shown in right of Figure 2, and are compared to the theoretical distribution, which takes the mathematical form:

p(x) = \frac{x^{(\nu/2) -1} e^{-(x/2)}}{2^\nu \Gamma(\frac{\nu}{2})}

For values of x \geq 0 The \Gamma is what is known as the Gamma function, and can be thought of as an extension to the factorial (i.e. !) operator to continuous values (i.e. ! only works on integers).

Figure 2: Theoretical (left) and simulated (right) Chi-Squared distributions having 1-10 degrees of freedom (dF)

Perhaps it is just me, but I feel that it is far more natural to interpret the \chi^2 distribution as a sum of squares of standard Normal variables, than this fairly complicated expression. The interpretation also lends itself well to testing the goodness of fit of a linear model to data using the sum of squares loss function. This brings us to a second probability distribution, which is often used in the analysis of variance of linear model fits.

Simulating the F-Distribution

In order to test the goodness of fit of a linear model to some data, it is often helpful to look at the ratio of  the mean squared error of the model prediction to the mean squared deviation of the data from its mean (i.e. the variance of the data). This ratio can be thought of as the proportion of the variance in the data that is explained by the model. Because this ratio is composed of two values that result from squaring and summing deviations or errors, we are equivalently taking the ratio of values that follow two \chi^2 distributions. If the numerator and denominator of this ratio are scaled by their respective degrees of freedom (i.e. the number of model parameters minus one for the numerator value and the number of datapoints minus one for the denominator), then the value that results is what is known as an F-statistic.  These F-statistics can be used to quantify how well (i.e. the statistical significance) of the model fit by comparing it to the distribution of F-statistics values that follow the form:

F(x,\nu_1,\nu_2) = \frac{\sqrt{\frac{\nu_1 x^{\nu_1}\nu_2^{\nu_2}}{(\nu_1 x + \nu_2)^{\nu_1+\nu_2}}}}{xB(\nu_1/2,\nu_2/2)}

for variables having \nu_1 and \nu_2 degrees of freedom. The funcion B is known as the Beta function or integral.

We can also simulate distributions of F-statistics by generating two \chi^2 variables with respective degrees of freedom \nu_1 and \nu_2 (just as we did above), scaling each variable by it’s relative degrees of freedom and taking their ratio. Formally this is described by:

F(\nu_1,\nu_2) = \frac{\chi_{\nu_1}^2/\nu_1}{\chi_{\nu_2}^2/\nu_2}

A MATLAB example of this simulation is shown in lines (77-104) in the code below. The results of the simulation are displayed in the bottom row of Figure 3.

Figure 3: Theoretical (top row) and simulated (bottom row) F-distributions with 2,10,50, & l00 degrees of freedom (dF). Columns represent theoretical and simulated distributions resulting from having a numerator Chi-Squared distribution with each dF. Each function corresponds to the distribution resulting form a denominator Chi-Squared distribution with a particular dF.

Again, it insightful to interpret the F-distribution as simply the ratio of scaled  \chi^2 distributions rather than the complicated mathematical expression for the F-distribution shown above.

Simulating the Student’s t-distribution

During statistics analyses, it is often helpful to determine if the mean \bar x of a dataset of size N  is different than some specified value \mu_0. In order to test the hypothesis is customary to calculate what is called a test statistic:

t = \frac{\bar x - \mu_0}{\sqrt{s^2/n}}

Here the sample mean is

\bar x = \frac{1}{n}\sum_{i=1}^n x_n

and the sample variance is

s^2 = \frac{1}{n-1}\sum_{i=1}^n(x - \bar x)^2

The distribution of these t-values is known as the t-distribution.  Like the Normal distribution, the t-distribution is symmetric and “bell” shaped, but has heavier tails than a Normal distribution and is parameterized by a single parameter \nu  that corresponds to the degrees of freedom in the sample (i.e. sample size). Formally, the t-distribution has the following form

p(t) = \frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu \pi}\Gamma(\nu/2)} \left( 1 + \frac{t^2}{\nu}\right)^{\frac{\nu+1}{2}}

The t-distribution is used  in a number of other statistical tests as well, particularly when an available sample size is small. Also, because it was developed in order to make Guiness the fine beer that it is today, the Student’s t-distribution is likely even more important than the Normal distribution!

The t-distribution with degrees for freedom \nu can be simulated by repeatedly sampling \nu independent datapoints from the standard normal distributions and calculating t-values on these samples. A MATLAB simulation of various t-distributions is shown in Figure 4, along with comparisons to the theoretical function shown above (see lines (130-148) of the code below).

Figure 4: Theoretical (left) and simulated (right) Student’s-t distribution with 2,10,50,& 100 degrees of freedom (dF).

Also, another way to simulate a t-distribution (not shown here) is to sample values that are the ratio of a standard normal variable N to the square root of a \chi^2 -distributed variable scaled by its degrees of freedom:

T(\nu) = \frac{N}{\sqrt{Z_{\nu}/\nu}}

Wrapping up

So, hopefully these simulations give you some incite on how some standard probability distributions can come about. Just another example of how “complex” ideas  can be derived from simple manipulations of a general mechanism, but are often hidden in the mathematical detail and formalisms.


(Use the toolbar in the upper right corner of the source in order to copy to your clipboard)

clear all; close all

nNu = 10;
nSimu = 1e6;
dx = 0.1;
x = -6:dx:6;


fprintf('\nSimulating Normal Distribution From Uniform Random Variables.');
R = rand(nSimu,2);
NSimu = sqrt(-2*log(R(:,1))).*cos(2*pi*R(:,2));
NTheory = randn(1,nSimu);
% OR...
% N = sqrt(-2*log(R(:,1))).*sin(2*pi*R(:,2));

figure('Name','Simulated Normal Distribution');
[nHistTheory,nBin] = hist(NTheory,x);
nHistTheory = nHistTheory/(sum(nHistTheory)*dx);
hold on;
[nHist,nBin] = hist(NSimu,x);
nHist = nHist/(dx*sum(nHist));
title(sprintf('Simulated Mean = %1.4f\n Simulated Variance = %1.4f',mean(NSimu),var(NSimu)))
xlim([-6 6]);
fprintf('\nPress any key to continue.\n')

fprintf('\nSimulating Chi-Squared Distribution\n');
zFun0 = 'randn(1,nSimu).^2'; zFun = zFun0;
dx = 0.01;
x = 0:dx:15;
legStr = {};
for nu = 1:nNu
	fprintf('\r dF = %d',nu);
	zTmp = eval(zFun);
	[zTmp,zBin] = hist(zTmp,x);
	zSimu(:,nu) = zTmp/(dx*sum(zTmp));	% SIMULATED
	zTheory(:,nu) = chi2pdf(x,nu);		% THEORETICAL
	zFun = [zFun,'+',zFun0];
	legStr{end+1} = sprintf('dF = %d',nu);

figure('Name','Simulated Chi-Squared Distribution')
xlim([0 15]); ylim([0,.4]);
xlabel('x'); ylabel('p(x)')

xlim([0 15]); ylim([0,.4]);
xlabel('x'); ylabel('p(x)')
fprintf('\nPress any key to continue.\n')

fprintf('\nSimulating F-Distribution\n');
nSimu = 1e5;
cnt = 1;
nu = [2,10,50,100];
x = 0:dx:5;
zFun0 = 'randn(1,nSimu).^2'; zFun = zFun0;
legStr = {};

for nuTop = nu
	zFunTop = zFun0;
	for iF = 1:nuTop-1
		zFunTop = [zFunTop,'+',zFun0];
	for nuBottom = nu
		fprintf('\r dF1 = %d, dF2 = %d',nuTop,nuBottom);
		zFunBottom = zFun0;
		for iF = 1:nuBottom-1
			zFunBottom = [zFunBottom,'+',zFun0];
		zTop = eval(zFunTop)/nuTop;	% CHI-2 DISTRIBUTIONS SCALED BY dF
		zBottom = eval(zFunBottom)/nuBottom;
		Ftmp = zTop./zBottom;   	% F IS THE RATIO OF SCALED CHI-2s
		[Ftmp,zBin] = hist(Ftmp,x);
		FSimu(:,cnt) = Ftmp/(dx*sum(Ftmp));
		FTheory(:,cnt) = fpdf(x,nuTop,nuBottom);
		cnt = cnt+1;
	legStr{end+1} = sprintf('dF = %d',nuTop);

figure('Name','Simulated F Distribution')
axCnt = cnt;
for iP = 1:numel(nu);
	xlim([0 5]); ylim([0 2]);
	title(sprintf('dF = %d',nu(iP)))
	xlabel('x'); ylabel('p(x) Theoretical');

	xlim([0 5]); ylim([0 2]);
	title(sprintf('dF = %d',nu(iP)));
	xlabel('x'); ylabel('p(x) Simulated');
fprintf('\nPress any key to continue.\n')

fprintf('\nSimulating Student''s t Distribution\n');
nSimu = 1e6;
dx = 0.01;
x = -6:dx:6;
nu = [2 10 50 100];
legStr = {};
cnt = 1;
for dF = nu
	fprintf('\r dF = %d',dF);
	x0 = randn(dF,nSimu);
	xBar = mean(x0);
	s = var(x0);
	mu = 0;                             % randn IS ZERO MEAN
	tTmp = (xBar - mu)./sqrt(s/dF);
	[tTmp,tBin] = hist(tTmp,x);
	tSimu(:,cnt) = tTmp/(dx*sum(tTmp));	% SIMULATED
	tTheory(:,cnt) = tpdf(x,dF);		% THEORETICAL
	legStr{end+1} = sprintf('dF = %d',dF);
	cnt = cnt+1;

figure('Name','Simulated Student''s t Distribution')
xlim([-6 6]); ylim([0 .4]);
xlabel('x'); ylabel('p(x)')

xlim([-6 6]); ylim([0 .4]);
xlabel('x'); ylabel('p(x)')
fprintf('\nPress any key to finish demo.\n')

clear all; close all