[pymvpa] searchlight for data with different runs with different masks

Thu Jan 21 12:44:41 UTC 2016

> On Sat, Jan 16, 2016 at 5:40 AM, Kaustubh Patil <kaustubh.patil at gmail.com>
wrote:
> BTW another way to handle imbbalanced data (and perhaps easier to
implement and test) could be assign weights in libvsm. This has to be done
for each partition separately, any ideas on how this can be done?
> Thanks

just to note that this functionality is currently in PyMVPA broken.
https://github.com/PyMVPA/PyMVPA/issues/40
You might be able to plug SVM from scikitlearn if it works there.
However, based on my poor understanding on math, statistics and machine
learning, class weighting should converge to up-sampling. So up-sample
maybe?
However num2, they are both suboptimal, because you are indeed moving the
decision boundary to the minority class, but the angle of the boundary will
be wrong, therefore less naive resampling like SMOTE or ROSE is with
combination to moving the cutoff value is advised. Unfortunately I cannot
find the paper where was it written.
However num3, everything breaks in high dimensional data. And just moving
the cutoff(threshold) value of SVM or similar classifier might be enough
http://www.ncbi.nlm.nih.gov/pubmed/22408190

> On Fri, Jan 15, 2016 at 11:28 PM, Kaustubh Patil <kaustubh.patil at gmail.com
> wrote:
> Thanks again Yaroslav.
> I agree that the classifier might end up giving 0 or very small balanced
accuracy (or micro accuracy) values but I think thats still a better
measure than using overall accuracy  (or macro accuracy).

what do you mean by balanced accuracy? What I know by that name is the mean
of sensitivity and specificity, or sensitivity + specificity / 2. You are
not suppose to get to 0 with that.

>There are couple of other measures that can be useful for imbalanced
datasets:
> 1. A-mean: arithmetic mean, same as average class-wise accuracy or
micro-accuracy
> 2. G-mean: geometric mean instead of arithmetic mean above
> 3. F-measure
> 4. Area under the ROC curve

You can also use Cohen's alpha and area under the precision-recall curve.

> Of course a better solution would be using a classifier that can handle
imbalanced datasets, as you suggested. I have previously used SVMperf that
can optimize AU-ROC:
https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.
<https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html>
> On Fri, Jan 15, 2016 at 11:08 PM, Yaroslav Halchenko <
debian at onerussian.com> wrote:
> another solution is to try a classifier which provides weighting
> to the classes, e.g. as GNB with default prior setting does.

You don't have to weight classes differently, you can just move
cutoff(threshold) value towards the minority class. Or just look at AUC ...
html <https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html>

Choosing a classifier is a different problem than choosing a performance
measure. Even if you have a classifier that deals with imbalances nicely,
you still need to have a performance measure that makes sense. So not
accuracy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20160121/307fe380/attachment.html>