User Tools

Site Tools


public:support_vector_machines

Procedures for Running a Support Vector Machine on fMRI Data

Support Vector Machines Theory

Outside sources

Summary

Black Box View

The purpose of a Support Vector Machine (SVM) is to be able to predict which class a test set of data belongs to, based on the characteristics of the data that it saw previously.

This process is shown in figure below: <center> </center>

From the training data, the SVM creates a high dimensional space by allowing each voxel to be a dimension. The SVM then calculates a hyperplane to separate the data into distinct classes. When it is given the test data, it analyzes what side of the hyperplane the test vector lies on.

In this case, we are using SVM to make binary decisions between classes. This is because the accuracy results were quite unfavorable when we attempted to use various multi-class SVM software packages. Many of these packages simply distill multi-class data to a one vs rest algorithm for each class. Instead, we choose to be more specific and do our analysis through pairwise decisions between each of our classes. This method necessitates a large amount of processing power and hard drive space while the analysis is running. After results are gained, temporary files are deleted to conserve space.

In the ideal case where all the data lies neatly on one side or the other of a hyperplane, the following Lagrangian is used to compute the equation of the hyperplane. It is clear that this formula only applies to the linear hyperplane case. This can be easily expanded to cover higher degree hyperplanes.

The following figures are from (3). This yields the following picture: To find the hyperplane, one wishes to maximize w in the Lagrangian equation with respect to the Karush-Kuhn-Tucker (KKT) Conditions listed below. Clearly, for a higher order hyperplane, these equations are expanded using similar partial derivatives. Not all data fits nicely into the high dimensional space, however. As a result, error terms must be incorporated into the Lagrangian. This results in a Lagrangian that looks like this: These variables are defined in this more realistic figure: Due to the fact that the Lagrangian changed, the KKT Conditions must also be edited. They now look like this: It is unrealistic for a human researcher to compute all of these partial derivatives for all of the training and test data. That is why software packages have already been created to do the necessary computations.

How to Run MATLAB Scripts and Necessities for Using These Scripts

There is now a completely automated MATLAB script for running SVM with Voxbo. All variables are defined in a prep file and can be changed easily for any purpose.

svmprep svmrun

Our SVM script relies heavily on our various utility functions, which in turn require SPM, and SVMlight.

We are using a modified version of SVMlight which does a balanced “hold two out” (h2o) accuracy computation instead of the default “hold one out”. To build it, you will need this svml code.

Data Prep

You will need a CUB for each stimulus-exemplar. For example, if you have 5 subjects, 16 stimuli, and 5 exemplars per stimuli, you will have a total of 400 CUBs; read creating_direct_effect_and_training_cubs for details.

The svmprep script defines the subjects, ROIs, and pathnames for your experiment, and calls svmrun. You should not need to modify the svmrun m-file.

SVMRUN

What does SVMRUN do?

  1. load all the training cubes and write a training file for each pair
  2. submit all the training jobs to Voxbo
    • vbbatch is used to distribute svm_learn_h2o processes across the cluster
  3. wait for Voxbo to finish
  4. create per-subject and average w-maps
    • These are maps of the brain with the regions of high discrimination highlighted. The regions displayed are the voxels that have the largest w value and therefore show the most difference between the classes being differentiated.
  5. parse the output from svm_learn_h2o to obtain “hold two out” accuracies
  6. clean up temporary files

References and Acknowledgments

  1. Joachims, T, Learning to Classify Test Using Support Vector Machines. Dissertation, Kluwer, 2002.
  2. SVMl is a software package created by Thorsten Joachims.
public/support_vector_machines.txt · Last modified: 2016/07/06 18:20 by malhotra