Blog Archives

Derivation: Error Backpropagation & Gradient Descent for Neural Networks

The material in this post has been migraged with python implementations to my github pages website.

Inverse Transform Sampling

There are a number of sampling methods used in machine learning, each of which has various strengths and/or weaknesses depending on the nature of the sampling task at hand. One simple method for generating samples from distributions with closed-form descriptions is Inverse Transform (IT) Sampling.

The idea behind IT Sampling is that the probability mass for a random variable X distributed according to the probability density function f(x) integrates to one and therefore the cumulative distribution function C(x) can be used to map from values in the interval (0,1) (i.e. probabilities) to the domain of f(x). Because it is easy to sample values z uniformly from the interval (0,1), we can use the inverse of the CDF C(x)^{-1} to transform these sampled probabilities into samples x. The code below demonstrates this process at work in order to sample from a student’s t distribution with 10 degrees of freedom.

rand('seed',12345)

% DEGREES OF FREEDOM
dF = 10;
x = -3:.1:3;
Cx = cdf('t',x,dF)
z = rand;

% COMPARE VALUES OF
zIdx = min(find(Cx>z));

% DRAW SAMPLE
sample = x(zIdx);

% DISPLAY
figure; hold on
plot(x,Cx,'k','Linewidth',2);
plot([x(1),x(zIdx)],[Cx(zIdx),Cx(zIdx)],'r','LineWidth',2);
plot([x(zIdx),x(zIdx)],[Cx(zIdx),0],'b','LineWidth',2);
plot(x(zIdx),z,'ko','LineWidth',2);
text(x(1)+.1,z + .05,'z','Color','r')
text(x(zIdx)+.05,.05,'x_{sampled}','Color','b')
ylabel('C(x)')
xlabel('x')
hold off

IT Sampling from student’s-t(10)

However, the scheme used to create to plot above is inefficient in that one must compare current values of z with the C(x) for all values of x. A much more efficient method is to evaluate C^{-1} directly:

  1. Derive C^{-1}(x) (or a good approximation) from f(x)
  2. for i = 1:n
  • – draw z_i from Unif(0,1)
  • x_i = CDF^{-1}(z_i)
  • – end for

The IT sampling process is demonstrated in the next chunk of code to sample from the Beta distribution, a distribution for which C^{-1}  is easy to approximate using Netwon’s method (which we let MATLAB do for us within the function icdf.m)

rand('seed',12345)
nSamples = 1000;

% BETA PARAMETERS
alpha = 2; beta = 10;

% DRAW PROPOSAL SAMPLES
z = rand(1,nSamples);

% EVALUATE PROPOSAL SAMPLES AT INVERSE CDF
samples = icdf('beta',z,alpha,beta);
bins = linspace(0,1,50);
counts = histc(samples,bins);
probSampled = counts/sum(counts)
probTheory = betapdf(bins,alpha,beta);

% DISPLAY
b = bar(bins,probSampled,'FaceColor',[.9 .9 .9]);
hold on;
t = plot(bins,probTheory/sum(probTheory),'r','LineWidth',2);
xlim([0 1])
xlabel('x')
ylabel('p(x)')
legend([t,b],{'Theory','IT Samples'})
hold off

Inverse Transform Sampling of Beta(2,10)

Wrapping Up

The IT sampling method is generally only used for univariate distributions where C^{-1} can be computed in closed form, or approximated. However, it is a nice example of how uniform random variables can be used to sample from much more complicated distributions.