# Blog Archives

## Derivation: The Covariance Matrix of an OLS Estimator (and applications to GLS)

We showed in an earlier post that for the linear regression model

,

the optimal Ordinary Least Squares (OLS) estimator for model parameters is

However, because independent variables and responses can take on any value, they are both random variables. And, because is a linear combination of and , it is also a random variable, and therefore has a covariance. The definition of the covariance matrix for the OLS estimator is defined as:

where, denotes the expected value operator. In order to find an expression for , we first need an expression for . The following derives this expression:

,

where we use the fact that

.

It follows that

and therefore

Now following the original definition for …

where we take advantage of in order to rewrite the second term in the product of the expectation. If we take to be fixed for a given estimator of (in other words we don’t randomly resample the independent variables), then the expectation only depends on the remaining stochastic/random variable, namely . Therefore the above expression can be written as

.

where is the covariance of the noise term in the model. Because OLS assumes uncorrelated noise, the noise covariance is equal to , where is the variance along each dimension, and is an identity matrix of size equal to the number of dimensions. The expression for the estimator covariance is now:

,

which simplifies to

A further simplifying assumption made by OLS that is often made is that is drawn from a zero mean multivariate Guassian distribution of unit variances (i.e. ), resulting in a noise covariance equal to the identity. Thus

## Applying the derivation results to Generalized Least Squares

Notice that the expression for the OLS estimator covariance is equal to first inverse term in the expression for the OLS estimator. Identitying the covariance for the OLS estimator in this way gives a helpful heuristic to easily identify the covariance of related estimators that do not make the simplifying assumptions about the covariance that are made in OLS. For instance in Generalized Least Squares (GLS), it is possible for the noise terms to co-vary. The covariance is represented as a noise covariance matrix . This gives the model form

,

where .

In otherwords, under GLS, the noise terms have zero mean, and covariance . It turns out that estimator for the GLS model parameters is

.

Notice the similarity between the GLS and OLS estimators. The only difference is that in GLS, the solution for the parameters is scaled by the inverse of the noise covariance. And, in a similar fashion to the OLS estimator, the covariance for the GLS estimator is first term in the product that defines the GLS estimator:

## fMRI in Neuroscience: Modeling the HRF With FIR Basis Functions

In the previous post on fMRI methods, we discussed how to model the selectivity of a voxel using the General Linear Model (GLM). One of the basic assumptions that we must make in order to use the GLM is that we also have an accurate model of the **Hemodynamic Response Function (HRF)** for the voxel. A common practice is to use a canonical HRF model established from previous empirical studies of fMRI timeseries. However, voxels throughout the brain and across subjects exhibit a variety of shapes, so the canonical model is often incorrect. Therefore it becomes necessary to estimate the shape of the HRF for each voxel.

There are a number of ways that have been developed for estimating HRFs, most of them are based on temporal basis function models. (For details on basis function models, see this previous post.). There are a number of basis function sets available, but in this post we’ll discuss modeling the HRF using a flexible basis set composed of a set of delayed impulses called **Finite Impulse Response (FIR)** basis.

## Modeling HRFs With a Set of Time-delayed Impulses

Let’s say that we have an HRF with the following shape.

We would like to be able to model the HRF as a weighted combination of simple basis functions. The simplest set of basis functions is the FIR basis, which is a series of distinct unit-magnitude (i.e. equal to one) impulses, each of which is delayed in time by TRs. An example of modeling the HRF above using FIR basis functions is below:

%% REPRESENTING AN HRF WITH FIR BASIS FUNCTIONS % CREATE ACTUAL HRF (AS MEASURED BY MRI SCANNER) rand('seed',12345) TR = 1 % REPETITION TIME t = 1:TR:20; % MEASUREMENTS h = gampdf(t,6) + -.5*gampdf(t,10); % ACTUAL HRF h = h/max(h); % DISPLAY THE HRF figure; stem(t,h,'k','Linewidth',2) axis square xlabel(sprintf('Basis Function Contribution\nTo HRF')) title(sprintf('HRF as a Series of \nWeighted FIR Basis Functions')) % CREATE/DISPLAY FIR REPRESENTATION figure; hold on cnt = 1; % COLORS BASIS FUNCTIONS ACCORDING TO HRF WEIGHT map = jet(64); cRange = linspace(min(h),max(h),64); for iT = numel(h):-1:1 firSignal = ones(size(h)); firSignal(cnt) = 2; [~,cIdx] = min(abs(cRange-h(cnt))); color = map(cIdx,:); plot(1:numel(h),firSignal + 2*(iT-1),'Color',color,'Linewidth',2) cnt = cnt+1; end colormap(map); colorbar; caxis([min(h) max(h)]); % DISPLAY axis square; ylabel('Basis Function') xlabel('Time (TR)') set(gca,'YTick',0:2:39,'YTickLabel',20:-1:1) title(sprintf('Weighted FIR Basis\n Set (20 Functions)'));

Each of the basis functions has an unit impulse that occurs at time ; otherwise it is equal to zero. Weighting each basis function with the corresponding value of the HRF at each time point , followed by a sum across all the functions gives the target HRF in the first plot above. The FIR basis model makes no assumptions about the shape of the HRF–the weight applied to each basis function can take any value–which allows the model to capture a wide range of HRF profiles.

Given an experiment where various stimuli are presented to a subject and BOLD responses evoked within the subject’s brain, the goal is to determine the HRF to each of the stimuli within each voxel. Let’s take a look at a concrete example of how we can use the FIR basis to simultaneously estimate HRFs to many stimuli for multiple voxels with distint tuning properties.

## Estimating the HRF of Simulated Voxels Using the FIR Basis

For this example we revisit a simulation of voxels with 4 different types of tuning (for details, see the previous post on fMRI in Neuroscience). One voxel is strongly tuned for visual stimuli (such as a light), the second voxel is weakly tuned for auditory stimuli (such as a tone), the third is moderately tuned for somatosensory stimuli (such as warmth applied to the palm), and the final voxel is unselective (i.e. weakly and equally selective for all three types of stimuli). We simulate an experiment where the blood-oxygen-level dependent (BOLD) signals evoked in each voxel by a series of stimuli consisting of nonoverlapping lights, tones, and applications of warmth to the palm, are measured over fMRI measurments (TRs). Below is the simulation of the experiment and the resulting simulated BOLD signals:

%% SIMULATE AN EXPERIMENT % SOME CONSTANTS trPerStim = 30; nRepeat = 10; nTRs = trPerStim*nRepeat + length(h); nCond = 3; nVox = 4; impulseTrain0 = zeros(1,nTRs); % RANDOM ONSET TIMES (TRs) onsetIdx = randperm(nTRs-length(h)); % VISUAL STIMULUS impulseTrainLight = impulseTrain0; impulseTrainLight(onsetIdx(1:nRepeat)) = 1; onsetIdx(1:nRepeat) = []; % AUDITORY STIMULUS impulseTrainTone = impulseTrain0; impulseTrainTone(onsetIdx(1:nRepeat)) = 1; onsetIdx(1:nRepeat) = []; % SOMATOSENSORY STIMULUS impulseTrainHeat = impulseTrain0; impulseTrainHeat(onsetIdx(1:nRepeat)) = 1; % EXPERIMENT DESIGN / STIMULUS SEQUENCE D = [impulseTrainLight',impulseTrainTone',impulseTrainHeat']; X = conv2(D,h'); X = X(1:nTRs,:); %% SIMULATE RESPONSES OF VOXELS WITH VARIOUS SELECTIVITIES visualTuning = [4 0 0]; % VISUAL VOXEL TUNING auditoryTuning = [0 2 0]; % AUDITORY VOXEL TUNING somatoTuning = [0 0 3]; % SOMATOSENSORY VOXEL TUNING noTuning = [1 1 1]; % NON-SELECTIVE beta = [visualTuning', ... auditoryTuning', ... somatoTuning', ... noTuning']; y0 = X*beta; SNR = 5; noiseSTD = max(y0)/SNR; noise = bsxfun(@times,randn(size(y0)),noiseSTD); y = y0 + noise; % VOXEL RESPONSES % DISPLAY VOXEL TIMECOURSES voxNames = {'Visual','Auditory','Somat.','Unselective'}; cols = lines(4); figure; for iV = 1:4 subplot(4,1,iV) plot(y(:,iV),'Color',cols(iV,:),'Linewidth',2); xlim([0,nTRs]); ylabel('BOLD Signal') legend(sprintf('%s Voxel',voxNames{iV})) end xlabel('Time (TR)') set(gcf,'Position',[100,100,880,500])

Now let’s estimate the HRF of each voxel to each of the stimulus conditions using an FIR basis function model. To do so, we create a design matrix composed of successive sets of delayed impulses, where each set of impulses begins at the onset of each stimulus condition. For the -sized stimulus onset matrix , we calculate an FIR design matrix , where is the assumed length of the HRF we are trying to estimate. The code for creating and displaying the design matrix for an assumed HRF length is below:

%% ESTIMATE HRF USING FIR BASIS SET % CREATE FIR DESIGN MATRIX hrfLen = 16; % WE ASSUME HRF IS 16 TRS LONG % BASIS SET FOR EACH CONDITOIN IS A TRAIN OF INPULSES X_FIR = zeros(nTRs,hrfLen*nCond); for iC = 1:nCond onsets = find(D(:,iC)); idxCols = (iC-1)*hrfLen+1:iC*hrfLen; for jO = 1:numel(onsets) idxRows = onsets(jO):onsets(jO)+hrfLen-1; for kR = 1:numel(idxRows); X_FIR(idxRows(kR),idxCols(kR)) = 1; end end end % DISPLAY figure; subplot(121); imagesc(D); colormap gray; set(gca,'XTickLabel',{'Light','Tone','Som.'}) title('Stimulus Train'); subplot(122); imagesc(X_FIR); colormap gray; title('FIR Design Matrix'); set(gca,'XTick',[8,24,40]) set(gca,'XTickLabel',{'Light','Tone','Som.'}) set(gcf,'Position',[100,100,550,400])In the right panel of the plot above, we see the form of the FIR design matrix for the stimulus onset on the left. For each voxel, we want to determine the weight on each column of that will best explain the BOLD signals measured from each voxel. We can form this problem in terms of a General Linear Model:

Where are the weights on each column of the FIR design matrix. If we set the values of such as to minimize the sum of the squared errors (SSE) between the model above and the measured actual responses

,

then we can use the Ordinary Least Squares (OLS) solution discussed earlier to solve the for . Specifically, we solve for the weights as:

Once determined, the resulting matrix of weights has the HRF of each of the different voxels to each stimulus condition along its columns. The first (1-16) of the weights along a column define the HRF to the first stimulus (the light). The second (17-32) weights along a column determine the HRF to the second stimulus (the tone), etc… Below we parse out these weights and display the resulting HRFs for each voxel:

% ESTIMATE HRF FOR EACH CONDITION AND VOXEL betaHatFIR = pinv(X_FIR'*X_FIR)*X_FIR'*y; % RESHAPE HRFS hHatFIR = reshape(betaHatFIR,hrfLen,nCond,nVox); % DISPLAY figure cols = lines(4); names = {'Visual','Auditory','Somat.','Unselective'}; for iV = 1:nVox subplot(2,2,iV) hold on; for jC = 1:nCond hl = plot(1:hrfLen,hHatFIR(:,jC,iV),'Linewidth',2); set(hl,'Color',cols(jC,:)) end hl = plot(1:numel(h),h,'Linewidth',2); xlabel('TR') legend({'Light','Tone','Heat','True HRF'}) set(hl,'Color','k') xlim([0 hrfLen]) grid on axis tight title(sprintf('%s Voxel',names{iV})); end set(gcf,'Position',[100,100,880,500])

Here we see that estimated HRFs accurately capture both the shape of the HRF and the selectivity of each of the voxels. For instance, the HRFs estimated from the responses of first voxel indicate strong tuning for the light stimulus. The HRF estimated for the light stimulus has an amplitude that is approximately 4 times that of the true HRF. This corresponds with the actual tuning of the voxel (compare this to the value of ). Additionally, time delay till the maximum value (time-to-peak) of the HRF to the light is the same as the true HRF. The first voxel’s HRFs estimated for the other stimuli are essentially noise around baseline. This (correctly) indicates that the first voxel has no selectivity for those stimuli. Further inspection of the remaining estimated HRFs indicate accurate tuning and HRF shape is recovered for the other three voxels as well.

## Wrapping Up

In this post we discussed how to apply a simple basis function model (the FIR basis) to estimate the HRF profile and get an idea of the tuning of individual voxels. Though the FIR basis model can accurately model any HRF shape, it is often times too flexible. In scenarios where voxel signals are very noisy, the FIR basis model will tend to model the noise.

Additionally, the FIR basis set needs to incorporate a basis function for each time measurement. For the example above, we assumed the HRF had a length of 16 TRs. The FIR basis therefore had 16 tuneable weights for each condition. This leads to a model with 48 () tunable parameters for the GLM model. For experiments with many different stimulus conditions, the number of parameters can grow quickly (as ). If the number of parameters is comparable (or more) than the number of BOLD signal measurements, it will be difficult accurately estimate . As we’ll see in later posts, we can often improve upon the FIR basis set by using more clever basis functions.

Another important but indirect issue that effects estimating the HRF is the experimental design, or rather the schedule used to present the stimuli. In the example above, the stimuli were presented in random, non-overlapping order. What if the stimuli were presented in the same order every time, with some set frequency? We’ll discuss in a later post the concept of design efficiency and how it affects our ability to characterize the shape of the HRF and, consequently, voxel selectivity.

## Basis Function Models

Often times we want to model data that emerges from some underlying function of independent variables such that for some future input we’ll be able to accurately predict the future output values. There are various methods for devising such a model, all of which make particular assumptions about the types of functions the model can emulate. In this post we’ll focus on one set of methods called * Basis Function Models* (BFMs).

## Basis Sets and Linear Independence

The idea behind BFMs is to model the complex target function as a linear combination of a set of simpler functions, for which we have closed form expressions. This set of simpler functions is called a * basis set*, and work in a similar manner to bases that compose vector spaces in linear algebra. For instance, any vector in the 2D spatial coordinate system (which is a vector space in ) can be composed of linear combinations of the and directions. This is demonstrated in the figures below:

Above we see a target vector in black pointing from the origin (at xy coordinates (0,0)) to the xy coordinates (2,3), and the coordinate basis vectors and , each of which point one unit along the x- (in blue) and y- (in red) directions.

We can compose the target vector as as a linear combination of the x- and y- basis vectors. Namely the target vector can be composed by adding (in the vector sense) 2 times the basis to 3 times the basis :

One thing that is important to note about the bases and is that they are **linearly independent.**** **This means that no matter how hard you try, you can’t compose the basis vector as a linear combination of the other basis vector , and vice versa. In the 2D vector space, we can easily see this because the red and blue lines are perpendicular to one another (a condition called * orthogonality*). But we can formally determine if two (column) vectors are independent by calculating the (column) rank of a matrix that is composed by concatenating the two vectors.

The rank of a matrix is the number of linearly independent columns in the matrix. If the rank of has the same value as the number of columns in the matrix, then the columns of forms a linearly independent set of vectors. The rank of above is 2. So is the number of columns. Therefore the basis vectors and are indeed linearly independent. We can use this same matrix rank-based test to verify if vectors of much higher dimension than two are independent. Linear independence of the basis set is important if we want to be able to define a unique model.

%% EXAMPLE OF COMPOSING A VECTOR OF BASIS VECTORS figure; targetVector = [0 0; 2 3] basisX = [0 0; 1 0]; basisY = [0 0; 0 1]; hv = plot(targetVector(:,1),targetVector(:,2),'k','Linewidth',2) hold on; hx = plot(basisX(:,1),basisX(:,2),'b','Linewidth',2); hy = plot(basisY(:,1),basisY(:,2),'r','Linewidth',2); xlim([-4 4]); ylim([-4 4]); xlabel('x-direction'), ylabel('y-direction') axis square grid legend([hv,hx,hy],{'Target','b^{(x)}','b^{(y)}'},'Location','bestoutside'); figure hv = plot(targetVector(:,1),targetVector(:,2),'k','Linewidth',2); hold on; hx = plot(2*basisX(:,1),2*basisX(:,2),'b','Linewidth',2); hy = plot(3*basisY(:,1),3*basisY(:,2),'r','Linewidth',2); xlim([-4 4]); ylim([-4 4]); xlabel('x-direction'), ylabel('y-direction'); axis square grid legend([hv,hx,hy],{'Target','2b^{(x)}','3b^{(y)}'},'Location','bestoutside') A = [1 0; 0 1]; % TEST TO SEE IF basisX AND basisY ARE % LINEARLY INDEPENDENT isIndependent = rank(A) == size(A,2)

## Modeling Functions with Linear Basis Sets

In a similar fashion to creating arbitrary vectors with vector bases, we can compose arbitrary functions in “function space” as a linear combination of simpler basis functions (note that basis functions are also sometimes called * kernels*). One such set of basis functions is the set of polynomials:

Here each basis function is a polynomial of order . We can then compose a basis set of functions, where the function is , then model the function as a linear combinations of these polynomial bases:

where is the weight on the -th basis function. In matrix format this model takes the form

Here, again the matrix is the concatenation of each of the polynomial bases into its columns. What we then want to do is determine all the weights such that is as close to as possible. We can do this by using Ordinary Least Squares (OLS) regression, which was discussed in earlier posts. The optimal solution for the weights under OLS is:

Let’s take a look at a concrete example, where we use a set of polynomial basis functions to model a complex data trend.

## Example: Modeling with Polynomial Basis Functions

In this example we model a set of data whose underlying function is:

In particular we’ll create a polynomial basis set of degree 10 and fit the weights using OLS. The Matlab code for this example, and the resulting graphical output are below:

%% EXAMPLE: MODELING A TARGET FUNCTION x = [0:.1:20]'; f = inline('cos(.5*x) + sin(x)','x'); % CREATE A POLYNOMIAL BASIS SET polyBasis = []; nPoly = 10; px = linspace(-10,10,numel(x))'; for iP = 1:nPoly polyParams = zeros(1,nPoly); polyParams(iP) = 1; polyBasis = [polyBasis,polyval(polyParams,px)]; end % SCALE THE BASIS SET TO HAVE MAX AMPLTUDE OF 1 polyBasis = fliplr(bsxfun(@rdivide,polyBasis,max(polyBasis))); % CHECK LINEAR INDEPENDENCE isIndependent = rank(polyBasis) == size(polyBasis,2) % SAMPLE SOME DATA FROM THE TARGET FUNCTION randIdx = randperm(numel(x)); xx = x(randIdx(1:30)); y = f(xx) + randn(size(xx))*.2; % FIT THE POLYNOMIAL BASIS MODEL TO THE DATA(USING polyfit.m) basisWeights = polyfit(xx,y,nPoly); % MODEL OF TARGET FUNCTION yHat = polyval(basisWeights,x); % DISPLAY BASIS SET AND AND MODEL subplot(131) plot(polyBasis,'Linewidth',2) axis square xlim([0,numel(px)]) ylim([-1.2 1.2]) title(sprintf('Polynomial Basis Set\n(%d Functions)',nPoly)) subplot(132) bar(fliplr(basisWeights)); axis square xlim([0 nPoly + 1]); colormap hot xlabel('Basis Function') ylabel('Estimated Weight') title('Model Weights on Basis Functions') subplot(133); hy = plot(x,f(x),'b','Linewidth',2); hold on hd = scatter(xx,y,'ko'); hh = plot(x,yHat,'r','Linewidth',2); xlim([0,max(x)]) axis square legend([hy,hd,hh],{'f(x)','y','Model'},'Location','Best') title('Model Fit') hold off;

First off, let’s make sure that the polynomial basis is indeed linearly independent. As above, we’ll compute the rank of the matrix composed of the basis functions along its columns. The rank of the basis matrix has a value of 10, which is also the number of columns of the matrix (line 19 in the code above). This proves that the basis functions are linearly independent.

We fit the model using Matlab’s internal function , which performs OLS on the basis set matrix. We see that the basis set of 10 polynomial functions (including the zeroth-bias term) does a pretty good job of modeling a very complex function . We essentially get to model a highly nonlinear function using simple linear regression (i.e. OLS).

## Wrapping up

Though the polynomial basis set works well in many modeling problems, it may be a poor fit for some applications. Luckily we aren’t limited to using only polynomial basis functions. Other basis sets include Gaussian basis functions, Sigmoid basis functions, and finite impulse response (FIR) basis functions, just to name a few (a future post, we’ll demonstrate how the FIR basis set can be used to model the hemodynamic response function (HRF) of an fMRI voxel measured from brain).

## fMRI in Neuroscience: Estimating Voxel Selectivity & the General Linear Model (GLM)

In a typical fMRI experiment a series of stimuli are presented to an observer and evoked brain activity–in the form of blood-oxygen-level-dependent (BOLD) signals–are measured from tiny chunks of the brain called voxels. The task of the researcher is then to infer the tuning of the voxels to features in the presented stimuli based on the evoked BOLD signals. In order to make this inference quantitatively, it is necessary to have a model of how BOLD signals are evoked in the presence of stimuli. In this post we’ll develop a model of evoked BOLD signals, and from this model recover the tuning of individual voxels measured during an fMRI experiment.

## Modeling the Evoked BOLD Signals — The Stimulus and Design Matrices

Suppose we are running an event-related fMRI experiment where we present different stimulus conditions to an observer while recording the BOLD signals evoked in their brain over a series of consecutive fMRI measurements (TRs). We can represent the stimulus presentation quantitatively with a binary * Stimulus Matrix,* , whose entries indicate the onset of each stimulus condition (columns) at each point in time (rows). Now let’s assume that we have an accurate model of how a voxel is activated by a single, very short stimulus. This activation model is called hemodynamic response function (HRF), , for the voxel, and, as we’ll discuss in a later post, can be estimated from the measured BOLD signals. Let’s assume for now that the voxel is also activated to an equal degree to all stimuli. In this scenario we can represent the BOLD signal evoked over the entire experiment with another matrix called the

*that is the convolution of the stimulus matrix with the voxel’s HRF .*

**Design Matrix**Note that this model of the BOLD signal is an example of the Finite Impulse Response (FIR) model that was introduced in the previous post on fMRI Basics.

To make the concepts of and more concrete, let’s say our experiment consists of different stimulus conditions: a light, a tone, and heat applied to the palm. Each stimulus condition is presented twice in a staggered manner during 80 TRs of fMRI measurements. The stimulus matrix and the design matrix are simulated here in Matlab:

TR = 1; % REPETITION TIME t = 1:TR:20; % MEASUREMENTS h = gampdf(t,6) + -.5*gampdf(t,10); % HRF MODEL h = h/max(h); % SCALE HRF TO HAVE MAX AMPLITUDE OF 1 trPerStim = 30; % # TR PER STIMULUS nRepeat = 2; % # OF STIMULUS REPEATES nTRs = trPerStim*nRepeat + length(h); impulseTrain0 = zeros(1,nTRs); % VISUAL STIMULUS impulseTrainLight = impulseTrain0; impulseTrainLight(1:trPerStim:trPerStim*nRepeat) = 1; % AUDITORY STIMULUS impulseTrainTone = impulseTrain0; impulseTrainTone(5:trPerStim:trPerStim*nRepeat) = 1; % SOMATOSENSORY STIMULUS impulseTrainHeat = impulseTrain0; impulseTrainHeat(9:trPerStim:trPerStim*nRepeat) = 1; % COMBINATION OF ALL STIMULI impulseTrainAll = impulseTrainLight + impulseTrainTone + impulseTrainHeat; % SIMULATE VOXELS WITH VARIOUS SELECTIVITIES visualTuning = [4 0 0]; % VISUAL VOXEL TUNING auditoryTuning = [0 2 0]; % AUDITORY VOXEL TUNING somatoTuning = [0 0 3]; % SOMATOSENSORY VOXEL TUNING noTuning = [1 1 1]; % NON-SELECTIVE beta = [visualTuning', ... auditoryTuning', ... somatoTuning', ... noTuning']; % EXPERIMENT DESIGN / STIMULUS SEQUENCE D = [impulseTrainLight',impulseTrainTone',impulseTrainHeat']; % CREATE DESIGN MATRIX FOR THE THREE STIMULI X = conv2(D,h'); % X = D * h X(nTRs+1:end,:) = []; % REMOVE EXCESS FROM CONVOLUTION % DISPLAY STIMULUS AND DESIGN MATRICES subplot(121); imagesc(D); colormap gray; xlabel('Stimulus Condition') ylabel('Time (TRs)'); title('Stimulus Train, D'); set(gca,'XTick',1:3); set(gca,'XTickLabel',{'Light','Tone','Heat'}); subplot(122); imagesc(X); xlabel('Stimulus Condition') ylabel('Time (TRs)'); title('Design Matrix, X = D * h') set(gca,'XTick',1:3); set(gca,'XTickLabel',{'Light','Tone','Heat'});

Each column of the design matrix above (the right subpanel in the above figure) is essentially a model of the BOLD signal evoked independently by each stimulus condition, and the total signal is simply a sum of these independent signals.

## Modeling Voxel Tuning — The Selectivity Matrix

In order to develop the concept of the design matrix we assumed that our theoretical voxel is equally tuned to all stimuli. However, few voxels in the brain exhibit such non-selective tuning. For instance, a voxel located in visual cortex will be more selective for the light than for the tone or the heat stimulus. A voxel in auditory cortex will be more selective for the tone than for the other two stimuli. A voxel in the somoatorsensory cortex will likely be more selective for the heat than the visual or auditory stimuli. How can we represent the tuning of these different voxels?

A simple way to model tuning to the stimulus conditions in an experiment is to multiplying each column of the design matrix by a weight that modulates the BOLD signal according to the presence of the corresponding stimulus condition. For example, we could model a visual cortex voxel by weighting the first column of with a positive value, and the remaining two columns with much smaller values (or even negative values to model suppression). It turns out that we can model the selectivity of individual voxels simultaneously through a * Selectivity Matrix*, . Each entry in is the amount that the -th voxel (columns) is tuned to the -th stimulus condition (rows). Given the design matrix and the selectivity matrix, we can then predict the BOLD signals of selectively-tuned voxels with a simple matrix multiplication:

Keeping with our example experiment, let’s assume that we are modeling the selectivity of four different voxels: a strongly-tuned visual voxel, a moderately-tuned somatosensory voxel, a weakly tuned auditory voxel, and an unselective voxel that is very weakly tuned to all three stimulus conditions. We can represent the tuning of these four voxels with a selectivity matrix. Below we define a selectivity matrix that represents the tuning of these 4 theoretical voxels and simulate the evoked BOLD signals to our 3-stimulus experiment.

% SIMULATE NOISELESS VOXELS' BOLD SIGNAL % (ASSUMING VARIABLES FROM ABOVE STILL IN WORKSPACE) y0 = X*beta; figure; subplot(211); imagesc(beta); colormap hot; axis tight ylabel('Condition') set(gca,'YTickLabel',{'Visual','Auditory','Somato.'}) xlabel('Voxel'); set(gca,'XTick',1:4) title('Voxel Selectivity, \beta') subplot(212); plot(y0,'Linewidth',2); legend({'Visual Voxel','Auditory Voxel','Somato. Voxel','Unselective'}); xlabel('Time (TRs)'); ylabel('BOLD Signal'); title('Activity for Voxels with Different Stimulus Tuning') set(gcf,'Position',[100 100 750 540]) subplot(211); colorbar

The top subpanel in the simulation output visualizes the selectivity matrix defined for the four theoretical voxels. The bottom subpanel plots the columns of the matrix of voxel responses . We see that the maximum response of the strongly-tuned visual voxel (plotted in blue) is larger than that of the other voxels, corresponding to the larger weight upper left of the selectivity matrix. Also note that the response for the unselective voxel (plotted in cyan) demonstrates the linearity property of the FIR model. The attenuated but complex BOLD signal from the unselective voxel results from the sum of small independent signals evoked by each stimulus.

## Modeling Voxel Noise

The example above demonstrates how we can model BOLD signals evoked in noisless theoretical voxels. Though this noisless scenario is helpful for developing a modeling framework, real-world voxels exhibit variable amounts of * noise *(noise is any signal that cannot be accounted by the FIR model). Therefore we need to incorporate a noise term into our BOLD signal model.

The noise in a voxel is often modeled as a random variable . A common choice for the noise model is a zero-mean Normal/Gaussian distribution with some variance :

Though the variance of the noise model may not be known apriori, there are methods for estimating it from data. We’ll get to estimating noise variance in a later post when we discuss various sources of noise and how to account for them using more advance techniques. For simplicity, let’s just assume that the noise variance is 1 as we proceed.

## Putting It All Together — The General Linear Model (GLM)

So far we have introduced on the concepts of the stimulus matrix, the HRF, the design matrix, selectivity matrix, and the noise model. We can combine all of these to compose a comprehensive quantitative model of BOLD signals measured from a set of voxels during an experiment:

This is referred to as the **General Linear Model ****(****GLM****)**.

In a typical fMRI experiment the researcher controls the stimulus presentation , and measures the evoked BOLD responses from a set of voxels. The problem then is to estimate the selectivities of the voxels based on these measurments. Specifically, we want to determine the parameters that best explain the measured BOLD signals during our experiment. The most common way to do this is a method known as * Ordinary Least Squares (OLS) Regression*. Using OLS the idea is to adjust the values of such that the predicted model BOLD signals are as similar to the measured signals as possible. In other words, the goal is to infer the selectivity each voxel would have to exhibit in order to produce the measured BOLD signals. I showed in an earlier post that the optimal OLS solution for the selectivities is given by:

Therefore, given a design matrix and a set of voxel responses associated with the design matrix, we can calculate the selectivities of voxels to the stimulus conditions represented by the columns of the design matrix. This works even when the BOLD signals are noisy. To get a better idea of this process at work let’s look at a quick example based on our toy fMRI experiment.

## Example: Recovering Voxel Selectivity Using OLS

Here the goal is to recover the selectivities of the four voxels in our toy experiment they have been corrupted with noise. First, we add noise to the voxel responses. In this example the variance of the added noise is based on a concept known as * signal-to-noise-ration* or

*. As the name suggests, SNR is the ratio of the underlying signal to the noise “on top of” the signal. SNR is a very important concept when interpreting fMRI analyses. If a voxel exhibits a low SNR, it will be far more difficult to estimate its tuning. Though there are many ways to define SNR, in this example it is defined as the ratio of the maximum signal amplitude to the variance of the noise model. The underlying noise model variance is adjusted to be one-fifth of the maximum amplitude of the BOLD signal, i.e. an SNR of 5. Feel free to try different values of SNR by changing the value of the variable in the Matlab simulation. Noisy versions of the 4 model BOLD signals are plotted in the top subpanel of the figure below. We see that the noisy signals are very different from the actual underlying BOLD signals.*

**SNR**Here we estimate the selectivities from the GLM using OLS, and then predict the BOLD signals in our experiment with this estimate. We see in the bottom subpanel of the above figure that the resulting GLM predictions of are quite accurate. We also compare the estimated selectivity matrix to the actual selectivity matrix below. We see that OLS is able to recover the selectivity of all the voxels.

% SIMULATE NOISY VOXELS & ESTIMATE TUNING % (ASSUMING VARIABLES FROM ABOVE STILL IN WORKSPACE) SNR = 5; % (APPROX.) SIGNAL-TO-NOISE RATIO noiseSTD = max(y0(:))./SNR; % NOISE LEVEL FOR EACH VOXEL noise = bsxfun(@times,randn(size(y0)),noiseSTD); y = y0 + noise; betaHat = inv(X'*X)*X'*y % OLS yHat = X*betaHat; % GLM PREDICTION figure subplot(211); plot(y,'Linewidth',3); xlabel('Time (s)'); ylabel('BOLD Signal'); legend({'Visual Voxel','Auditory Voxel','Somato. Voxel','Unselective'}); title('Noisy Voxel Responses'); subplot(212) h1 = plot(y0,'Linewidth',3); hold on h2 = plot(yHat,'-o'); legend([h1(end),h2(end)],{'Actual Responses','Predicted Responses'}) xlabel('Time (s)'); ylabel('BOLD Signal'); title('Model Predictions') set(gcf,'Position',[100 100 750 540]) figure subplot(211); imagesc(beta); colormap hot(5); axis tight ylabel('Condition') set(gca,'YTickLabel',{'Visual','Auditory','Somato.'}) xlabel('Voxel'); set(gca,'XTick',1:4) title('Actual Selectivity, \beta') subplot(212) imagesc(betaHat); colormap hot(5); axis tight ylabel('Condition') set(gca,'YTickLabel',{'Visual','Auditory','Somato.'}) xlabel('Voxel'); set(gca,'XTick',1:4) title('Noisy Estimated Selectivity') drawnow

## Wrapping Up

Here we introduced the GLM commonly used for fMRI data analyses and used the GLM framework to recover the selectivities of simulated voxels. We saw that the GLM is quite powerful of recovering the selectivity in the presence of noise. However, there are a few details left out of the story.

First, we assumed that we had an accurate (albeit exact) model for each voxel’s HRF. This is generally not the case. In real-world scenarios the HRF is either assumed to have some canonical shape, or the shape of the HRF is estimated the experiment data. Though assuming a canonical HRF shape has been validated for block design studies of peripheral sensory areas, this assumption becomes dangerous when using event-related designs, or when studying other areas of the brain.

Additionally, we did not include any physiological noise signals in our theoretical voxels. In real voxels, the BOLD signal changes due to physiological processes such as breathing and heartbeat can be far larger than the signal change due to underlying neural activation. It then becomes necessary to either account for the nuisance signals in the GLM framework, or remove them before using the model described above. In two upcoming posts we’ll discuss these two issues: estimating the HRF shape from data, and dealing with nuisance signals.

## Derivation: Ordinary Least Squares Solution and Normal Equations

In a linear regression framework, we assume some output variable is a linear combination of some independent input variables plus some independent noise . The way the independent variables are combined is defined by a parameter vector :

We also assume that the noise term is drawn from a standard Normal distribution:

For some estimate of the model parameters , the model’s prediction errors/residuals are the difference between the model prediction and the observed ouput values

The Ordinary Least Squares (OLS) solution to the problem (i.e. determining an optimal solution for ) involves minimizing the sum of the squared errors with respect to the model parameters, . The sum of squared errors is equal to the inner product of the residuals vector with itself :

To determine the parameters, , we minimize the sum of squared residuals with respect to the parameters.

due to the identity , for vectors and . This relationship is matrix form of the Normal Equations. Solving for gives the analytical solution to the Ordinary Least Squares problem.

Boom.