[pymvpa] Class weighting in unbalanced EEG data

Wed Apr 27 20:07:03 UTC 2016

Hello,

I am running an EEG experiment which involves presenting one of two images on the screen and recording the participant's neural activation. Right now I'm writing a script to take these recordings and attempt to decode from the EEG data which image has been displayed. I have had success on similar scripts making use of PyMVPA's PLR, however in these cases the classes are balanced whereas in the experiment I'm trying to do now the ratio is closer to 1:4. The initial approach I had in mind was to use asymmetrical error weights to penalize misclassification of the less numerous class more than the dominant class. I understand that there are various other approaches to handling unbalanced classes, such as over/undersampling or using a classifier more robust in this regard such as random forests, but this seemed to me to be the most clean way to start.

I tried the class weight parameter in SKLearn as per this tutorial http://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane_unbalanced.html, and this seemed to work well for the data here, but failed to make any difference when I used my EEG data. Upon increasing the number of dimensions in the sample data (changing the 2 in n_samples_1, 2), I noticed the difference between the predictions of the classifier with weights and the classifier without weights diminish. By the time I had reached 7 features per training sample the predictions were the same. Does this parameter just not work well with many dimensions? My EEG data has around 200 trials (samples) and each sample has 180 datapoints in each of 20 electrode channels, so my training data is shaped 200 * 3600.

I thought the weight parameter here was exactly what I was looking for http://www.pymvpa.org/generated/mvpa2.clfs.svm.LinearCSVMC.html#mvpa2.clfs.svm.LinearCSVMC, however when I tried to use it even with toy data I saw no difference, similar to the issue in https://www.mail-archive.com/pkg-exppsy-pymvpa@lists.alioth.debian.org/msg02918.html. Some excerpts from my script follow, am I just overlooking something?

Thanks,
Ben

wclf = LinearCSVMC(weight=[2,8], weight_label=[True, False])
wanovaSelectedClassifier = M.FeatureSelectionClassifier(
            wclf,
            M.SensitivityBasedFeatureSelection(
                        M.OneWayAnova(),
                        M.FixedNElementTailSelector(numFeatures, mode='select', tail='upper')
            )
)
wfoldwiseCvedAnovaSelectedSVM = M.CrossValidation(
            wanovaSelectedClassifier,
            M.NFoldPartitioner(),
            enable_ca=['samples_error','stats', 'calling_time','confusion']
)
dataset = M.dataset_wizard(samples=X, targets=y, chunks=np.mod(np.arange(0,len(X)),numFolds))
wresults = wfoldwiseCvedAnovaSelectedSVM(dataset)
print wfoldwiseCvedAnovaSelectedSVM.ca.stats.as_string()

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20160427/481ec9d5/attachment.html>