[pymvpa] Unbalanced Datasets
Bill Broderick
billbrod at gmail.com
Fri Apr 17 15:37:49 UTC 2015
Hi all,
I'm trying to run MVPA on a classification with unbalanced labels, and I'm
not sure the best way to proceed. Originally, we were classifying on a
subject-by-subject basis, using leave-one-run-out cross-validation, using
spherical searchlights and a linear SVM (PyMVPA's LinearCSVMC
<http://www.pymvpa.org/generated/mvpa2.clfs.svm.LinearCSVMC.html>).
However, we have some runs with no trials in the rare category and they're
very unbalanced overall. We decided to switch to a leave-one-subject-out
cross-validation scheme to account for this, as every subject has at least
some trials in the rarer category. The categories are still unbalanced, so
we wanted to account for this in some way, to prevent the classifier
defaulting to the common category.
Our first idea was to simply weight the error signal used by the linear SVM
in a proportional manner, such that, if there are 3 times as many trials in
the common category, the error signal in the rare category would be three
times stronger. However, it's unclear if this is possible within PyMVPA.
The documentation page of the LinearCSVMC lists the weight and weight_label
parameters, which appear like they may be what I'm looking for, but I
cannot find any examples of how to use them.
Additionally, I came across this
<http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2638552/> PyMVPA paper which,
under the MEG section, near footnotes 23 and 24, discusses changing the
SVM's C parameter for the different classes, scaling it with respect to the
number of samples in each class as a way to deal with the imbalance. This
sounds similar, but I am not sure how to implement this either, nor am I
sure what the differences between the two methods would be.
This message thread
<http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2012q3/002196.html>
from this mailing list in 2012 discusses a similar issue, but I would
prefer to start with weighting the different classes instead of changing
the sampling of trials.
So my main question is, what's the best way to implement this kind of
correction for unbalanced classes?
Thanks,
William
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20150417/f9c25182/attachment.html>
More information about the Pkg-ExpPsy-PyMVPA
mailing list