Blog Archives
Rejection Sampling
Suppose that we want to sample from a distribution that is difficult or impossible to sample from directly, but instead have a simpler distribution
from which sampling is easy. The idea behind Rejection sampling (aka Acceptance-rejection sampling) is to sample from
and apply some rejection/acceptance criterion such that the samples that are accepted are distributed according to
.
Envelope distribution and rejection criterion
In order to be able to reject samples from such that they are sampled from
,
must “cover” or envelop the distribution
. This is generally done by choosing a constant
such that
for all
. For this reason
is often called the envelope distribution. A common criterion for accepting samples from
is based on the ratio of the target distribution to that of the envelope distribution. The samples are accepted if
where , and rejected otherwise. If the ratio is close to one, then
must have a large amount of probability mass around
and that sample should be more likely accepted. If the ratio is small, then it means that
has low probability mass around
and we should be less likely to accept the sample. This criterion is demonstrated in the chunk of MATLAB code and the resulting figure below:
rand('seed',12345); x = -10:.1:10; % CREATE A "COMPLEX DISTRIBUTION" f(x) AS A MIXTURE OF TWO NORMAL % DISTRIBUTIONS f = inline('normpdf(x,3,2) + normpdf(x,-5,1)','x'); t = plot(x,f(x),'b','linewidth',2); hold on; % PROPOSAL IS A CENTERED NORMAL DISTRIBUTION q = inline('normpdf(x,0,4)','x'); % DETERMINE SCALING CONSTANT c = max(f(x)./q(x)) %PLOT SCALED PROPOSAL/ENVELOP DISTRIBUTION p = plot(x,c*q(x),'k--'); % DRAW A SAMPLE FROM q(x); qx = normrnd(0,4); fx = f(qx); % PLOT THE RATIO OF f(q(x)) to cq(x) a = plot([qx,qx],[0 fx],'g','Linewidth',2); r = plot([qx,qx],[fx,c*q(qx)],'r','Linewidth',2); legend([t,p,a,r],{'Target','Proposal','Accept','Reject'}); xlabel('x');
Here a zero-mean Normal distribution is used as the proposal distribution. This distribution is scaled by a factor , determined from
and
to ensure that the proposal distribution covers
. We then sample from
, and compare the proportion of
occupied by
. If we compare this proportion to a random number sampled from
(i.e. the criterion outlined above), then we would accept this sample with probability proportional to the length of the green line segment and reject the sample with probability proportional to the length of the red line segment.
Rejection sampling of a random discrete distribution
This next example shows how rejection sampling can be used to sample from any arbitrary distribution, continuous or not, and with or without an analytic probability density function.
The figure above shows a random discrete probability density function generated on the interval (0,15). We will use rejection sampling as described above to sample from
. Our proposal/envelope distribution is the uniform discrete distribution on the same interval (i.e. any of the integers from 1-15 are equally probable) multiplied by a constant
that is determined such that the maximum value of
lies under (or equal to)
.
rand('seed',12345) randn('seed',12345) fLength = 15; % CREATE A RANDOM DISTRIBUTION ON THE INTERVAL [1 fLength] f = rand(1,fLength); f = f/sum(f); figure; h = plot(f,'r','Linewidth',2); hold on; l = plot([1 fLength],[max(f) max(f)],'k','Linewidth',2); legend([h,l],{'f(x)','q(x)'},'Location','Southwest'); xlim([0 fLength + 1]) xlabel('x'); ylabel('p(x)'); title('Target (f(x)) and Proposal (q(x)) Distributions'); % OUR PROPOSAL IS THE DISCRETE UNIFORM ON THE INTERVAL [1 fLength] % SO OUR CONSTANT IS c = max(f/(1/fLength)); nSamples = 10000; i = 1; while i < nSamples proposal = unidrnd(fLength); q = c*1/fLength; % ENVELOPE DISTRIBUTION if rand < f(proposal)/q samps(i) = proposal; i = i + 1; end end % DISPLAY THE SAMPLES AND COMPARE TO THE TARGET DISTRIBUTION bins = 1:fLength; counts = histc(samps,bins); figure b = bar(1:fLength,counts/sum(counts),'FaceColor',[.8 .8 .8]) hold on; h = plot(f,'r','Linewidth',2) legend([h,b],{'f(x)','samples'}); xlabel('x'); ylabel('p(x)'); xlim([0 fLength + 1]);
Rejection sampling from the unit circle to estimate 
Though the ratio-based acceptance-rejection criterion introduced above is a common choice for drawing samples from complex distributions, it is not the only criterion we could use. For instance we could use a different set of criteria to generate some geometrically-bounded distribution. If we wanted to generate points uniformly within the unit circle (i.e. a circle centered at and with radius
), we could do so by sampling Cartesian spatial coordinates
and
uniformly from the interval (-1,1)–which samples form a square centered at (0,0)–and reject those points that lie outside of the radius
and the unit circle has the area:
We can use the ratio of their areas to approximate :
The figure below shows the rejection sampling process and the resulting estimate of from the samples. One-hundred thousand 2D points are sampled uniformly from the interval (-1,1). Those points that lie within the unit circle are plotted as blue dots. Those points that lie outside of the unit circle are plotted as red x’s. If we take four times the ratio of the area in blue to the entire area, we get a very close approximation to 3.14 for
.
The MATLAB code used to generate the example figures is below:
% DISPLAY A CIRCLE INSCRIBED IN A SQUARE figure; a = 0:.01:2*pi; x = cos(a); y = sin(a); hold on plot(x,y,'k','Linewidth',2) t = text(0.5, 0.05,'r'); l = line([0 1],[0 0],'Linewidth',2); axis equal box on xlim([-1 1]) ylim([-1 1]) title('Unit Circle Inscribed in a Square') pause; rand('seed',12345) randn('seed',12345) delete(l); delete(t); % DRAW SAMPLES FROM PROPOSAL DISTRIBUTION samples = 2*rand(2,100000) - 1; % REJECTION reject = sum(samples.^2) > 1; % DISPLAY REJECTION CRITERION scatter(samples(1,~reject),samples(2,~reject),'b.') scatter(samples(1,reject),samples(2,reject),'rx') hold off xlim([-1 1]) ylim([-1 1]) piHat = mean(sum(samples.*samples)<1)*4; title(['Estimate of \pi = ',num2str(piHat)]);
Wrapping Up
Rejection sampling is a simple way to generate samples from complex distributions. However, Rejection sampling also has a number of weaknesses:
- Finding a proposal distribution that can cover the support of the target distribution is a non-trivial task.
- Additionally, as the dimensionality of the target distribution increases, the proportion of points that are rejected also increases. This curse of dimensionality makes rejection sampling an inefficient technique for sampling multi-dimensional distributions, as the majority of the points proposed are not accepted as valid samples.
- Some of these problems are solved by changing the form of the proposal distribution to “hug” the target distribution as we gain knowledge of the target from observing accepted samples. Such a process is called Adaptive Rejection Sampling, which will be covered in another post.