The general linear model is a univariate statistical approach that relates a set of predictor covariates (a model) to a data vector using least-squares estimation. The parameters and errors that result are the basis of such common statistics as the t-test, Analysis of Variance (ANOVA), Analysis of Covariance (ANCOVA), regression analysis, and many multivariate tests. The GLM may be expressed as:
Y = Gβ + ε
Where Y is a vector of n data points, G is an n x m design matrix, β is a vector of m parameters, and ε is a vector of n errors. The creation of a GLM analysis of fMRI data in VoxBo centers upon the development of the design matrix, G.
There are several excellent texts that introduce the concepts of linear analysis in general, and several journal articles that consider the application of these methods to functional imaging data (Friston et al., 1995).
Within IDL, this expression solves for β given Y and G:
Betas = (invert(transpose(G)##G)##transpose(G))##Y Residuals = Y-G##Betas ErrorSquared = total(transpose(Residuals)##Residuals)
The corresponding MATLAB code is:
function [Betas,ErrorSq,Residuals] = getBetas(G,Vec) % [Betas,ErrorSq,Residuals] = getBetas(G,Vec) % % given a G matrix and a vector, gives the betas % fitting the G matrix to that vector, and the errorsq % % 2007 ddrucker@psych.upenn.edu adapted from gka OrderG=length(Vec); Fac1 = (inv(G'*G)*G'); Betas = Fac1*Vec; R=(-1).*(G*Fac1); R((([1:OrderG]-1)*(OrderG+1))+1)=1+R((([1:OrderG]-1)*(OrderG+1))+1); %matlab starts indexing at 1 Residuals=R*Vec; ErrorSq=(Residuals'*Residuals);
One of the assumptions of the GLM is that the data are independent (i.e., uncorrelated) under the null-hypothesis. There is good evidence that fMRI data do not satisfy this assumption (i.e., they are “colored”) (Zarahn et al., 1997a). As a consequence, parametric (Aguirre et al., 1997; Zarahn et al., 1997a) and non-parametric (Aguirre et al., 1998a) analysis methods that assume independence do not provide adequate control of the false-positive rate. In 1995, Worsley and Friston advocated the use of a modified GLM that can accommodate serial correlation in the error terms (Worsley and Friston, Analysis of fMRI Time-Series Revisited - Again, 1995). This is the statistical approach implemented in VoxBo.
The first step is to prepare the dependent data for analysis by pre-processing. In most cases, this involves converting the raw scanner files into TES files and performing a series of processing steps (e.g., distortion correction, realignment). If data are to be pooled across subjects using a fixed-effects analysis, conversion of the TES files to a standard spatial frame is necessary. Also, it may be desirable to smooth the dependent data in space prior to analysis.
The second step (usually) is to create condition functions. These text files contain a list of values that code the state of the experiment at each data point (image). As is discussed below, these files may be created for each TES file or a composite condition function may be made.
The third step is to start IDL and use the GLM software to design a modified GLM. There are six primary steps here, including the selection of a location for the analysis, identification of the dependent data, design of the G matrix, defining endogenous filtering and defining an intrinsic noise model. These steps, in particular the design of the G matrix, are described in some detail below.
The fourth step begins when the analysis job is submitted to the VoxBo job scheduler. The analysis will proceed automatically from here and you will receive e-mail when it is complete. You may monitor the status of your analysis job using the VoxQ job tool.
In general, TES files contain the four-dimensional (three in space, one in time) data that are acquired from the scanner. However, the GLM only ever considers a single-time series at a time (i.e., it is a univariate statistic). Generally, an analysis of a series of TES files involves obtaining the parameter estimates and residual error for the time-series from each point (voxel) in the brain. Notably, a given model can be applied to any vector of data of the appropriate length. Thus, one might also average together the time-series data over several voxels (perhaps within an a priori region of interest) and use this average fMRI signal as the dependent data.
In the process of conducting a GLM analysis, VoxBo places a number of files within the directory named by the user. Here is a thumb-nail sketch of the contents of a GLM directory, listed roughly in the order that they are created:
File | Description |
---|---|
.sub | A text file that lists the full path(s) to the TES file(s) that contain the dependent data for the analysis. Update this file if you move your data to another location. |
.ref | The concatenated condition function. This text file contains as many values as there are observations in the analysis (i.e., order G). Each value codes for the experimental state during that observation (e.g., flash of light, ITI, button press, etc.). This file is used, for example, to determine the behavior of trial-averaging. |
.G | The design matrix. This is a two-dimensional array with the number of columns equal to the number of independent variables in the design, and the number of rows equal to the number of observations in the dependent data. The text header of this file contains a list of the parameters. |
.preG | A high-temporal resolution version of the “G” design matrix. This is the internal representation used by the “Design G Workshop” software, with a temporal resolution of (usually) 100msecs. This representation is down-sampled to produce the G matrix prior to its use in analysis. |
.ExoFilt | Time-domain representation of the exogenous filtering to be performed upon data. This may include high and low pass notch filters, and/or a representation of the IRF. Only the magnitude component is used (i.e., this kernel should contain no phase information). |
.IntrinCor | Time-domain representation of intrinsic noise under the null hypothesis (e.g. fft1 of the square root of the 1/f power spectrum). The mean of the signal must be zero. All phase information in the signal will be discarded. |
.K | The identity matrix, convolved with a model of 1) intrinsic temporal autocorrelation under the null-hypothesis and 2) any exogenous smoothing to be applied to the data. |
.KG | The G matrix times the K matrix. It contains covariates that reflect both endogenous and exogenous smoothing. Note that the G matrix is de-convolved by the representation of the intrinsic noise model prior to the creation of KG. KG=K•G |
.V | The variance-covariance structure of the errors. V=K•KT |
.F1, .F3 | Factors of the covariance matrix of the parameter estimates. F1 is the pseudo-inverse* of KG. F1=(KGT•KG)-1•KGT. F3=V•KG•(KGT•KG)-1 |
.R | The residual forming matrix. R=I-(KG•F1) (where I is the identity matrix) |
.RV, .RVRV | Matrices related to the variance-covariance structure of the residuals. These are temporary files that are automatically deleted when the analysis is complete. RV=R•V. RVRV=RV•RV |
.traces | Scalars that contribute to the calculation of the effective degrees of freedom of the analysis. They are used to render the sum-of-squares of the residuals an unbiased estimate of the true error. TraceRV=trace(RV). TraceRVRV=trace(RVRV) where trace(•) indicates the sum of the diagonal elements |
.prm | The parameter estimates (and error term of the model) for the parameters of interest for each voxel in the data set. This file, which is four-dimensional, is effectively a TES file. |
* Note that Matlab has a pseudo-inverse function, pinv, which calculates the Moore-Penrose pseudo-inverse of a matrix. This is not the same pseudo-inverse.
Within the modified GLM, a t-statistic is calculated as follows (in IDL):
Betas = (invert(transpose(G)##G)##transpose(G))##Y Residuals = Y-G##Betas ErrorSquared = total(transpose(Residuals)##Residuals) Fact = float(ContrastArray##F1##F3##transpose(ContrastArray)) Fact = Fact(0) ScaledError = sqrt(ErrorSquared*Fact) tval = (ContrastArray##transpose(Betas)) / ScaledError
where ContrastArray
is a vector describing how the beta values are to be combined for the test of interest.