[pymvpa] Class weighting in unbalanced EEG data

Thu Apr 28 18:39:43 UTC 2016

Thank you for the quick reply Yaroslav, I substituted in GNB at your suggestion and it didn't seem to perform quite as well as the linearCSVMC method I was using without class weights. What kind of data have you been using it for? Perhaps it isn't so well suited to EEG data

________________________________
From: Yaroslav Halchenko <yoh at onerussian.com>
Sent: 27 April 2016 21:41:52
To: Development and support of PyMVPA; Ben McCartney
Subject: Re: [pymvpa] Class weighting in unbalanced EEG data

Quick one for now
Did you try GNB? It has few weighting schemes and laplacian prior iirc was working for me quite well in a dataset with ratio of samples 100 to 1.

On April 27, 2016 4:07:03 PM EDT, Ben McCartney <bmccartney06 at qub.ac.uk> wrote:
Hello,

I am running an EEG experiment which involves presenting one of two images on the screen and recording the participant's neural activation. Right now I'm writing a script to take these recordings and attempt to decode from the EEG data which image has been displayed. I have had success on similar scripts making use of PyMVPA's PLR, however in these cases the classes are balanced whereas in the experiment I'm trying to do now the ratio is closer to 1:4. The initial approach I had in mind was to use asymmetrical error weights to penalize misclassification of the less numerous class more than the dominant class. I understand that there are various other approaches to handling unbalanced classes, such as over/undersampling or using a classifier more robust in this regard such as random forests, but this seemed to me to be the most clean way to start.

I tried the class weight parameter in SKLearn as per this tutorial http://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane_unbalanced.html, and this seemed to work well for the data here, but failed to make any difference when I used my EEG data. Upon increasing the number of dimensions in the sample data (changing the 2 in n_samples_1, 2), I noticed the difference between the predictions of the classifier with weights and the classifier without weights diminish. By the time I had reached 7 features per training sample the predictions were the same. Does this parameter just not work well with many dimensions? My EEG data has around 200 trials (samples) and each sample has 180 datapoints in each of 20 electrode channels, so my training data is shaped 200 * 3600.

I thought the weight parameter here was exactly what I was looking for http://www.pymvpa.org/generated/mvpa2.clfs.svm.LinearCSVMC.html#mvpa2.clfs.svm.LinearCSVMC, however when I tried to use it even with toy data I saw no difference, similar to the issue in https://www.mail-archive.com/pkg-exppsy-pymvpa@lists.alioth.debian.org/msg02918.html. Some excerpts from my script follow, am I just overlooking something?

Thanks,
Ben

wclf = LinearCSVMC(weight=[2,8], weight_label=[True, False])
wanovaSelectedClassifier = M.FeatureSelectionClassifier(
            wclf,
            M.SensitivityBasedFeatureSelection(
                        M.OneWayAnova(),
                        M.FixedNElementTailSelector(numFeatures, mode='select', tail='upper')
            )
)
wfoldwiseCvedAnovaSelectedSVM = M.CrossValidation(
            wanovaSelectedClassifier,
            M.NFoldPartitioner(),
            enable_ca=['samples_error','stats', 'calling_time','confusion']
)
dataset = M.dataset_wizard(samples=X, targets=y, chunks=np.mod(np.arange(0,len(X)),numFolds))
wresults = wfoldwiseCvedAnovaSelectedSVM(dataset)
print wfoldwiseCvedAnovaSelectedSVM.ca.stats.as_string()

________________________________

Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

--
Sent from a phone which beats iPhone.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20160428/fb79e4d0/attachment.html>