[pymvpa] Binary classification

Tue Mar 7 22:34:25 UTC 2017

> Consider a class-counting classifier that ignores all neural data; it
just counts how often each target occurs in the training set and uses that
as as prediction for all samples in the test set. In this case, all
elements in the test set would be marked as other. If 80% of the samples in
the test set are "other", then accuracy would be 80% - for a classifier
that does not even consider the neural data.
> To avoid this problem you may have to balance the number of targets for
each class. However with only 4 chunks this does not give much power to
detect any effects, even if there is a difference between the first and the
other targets. Maybe you could use a partitioning scheme where, for every
fold, you select a random subset of the other targets, with the number
equal to the number of samples corresponding to the first target.
Alternatively you could consider some split-half correlation approach.

The only problem here is the use of accuracy as a measure of your model
performance. You can use balanced accuracy (mean of sensitivity and
specificity) or better, use a measure that is not dependent on a arbitrary
threshold, e.g ROC.

On Tue, Mar 7, 2017 at 2:37 PM, Nick Oosterhof <n.n.oosterhof at googlemail.com
> wrote:

>
> > On 7 Mar 2017, at 14:23, jasi3k <jasi3k at gmail.com> wrote:
> >
> > Dear Users,
> >
> > I am running searchlight analysis with NFoldPartitioner on my fMRI beta
> data (5 different targets, 4 chunks). I would like to perform a binary
> classification between my first target and the rest of my data. Would it
> make sense to change the name of 4 remaining targets to (i.e. „other”) in
> the attributes file and run the searchlight analysis?
>
> That should work, though it's probably easier to first load your dataset
> and set the sample attributes. For this particular analysis you could then
> change the sample attributes to a common label. If you were then to change
> the analysis you would only have to change the last part.
>
> >
> > I would expect that accuracy higher than 0.5 would mean that voxels
> contribute to the classification task even though targets marked as "other"
> are different.
> >
> > Would this be correct?
>
> Not necessarily, as a classifier could use the frequency of the class to
> make a prediction. Consider a class-counting classifier that ignores all
> neural data; it just counts how often each target occurs in the training
> set and uses that as as prediction for all samples in the test set. In this
> case, all elements in the test set would be marked as other. If 80% of the
> samples in the test set are "other", then accuracy would be 80% - for a
> classifier that does not even consider the neural data.
>
> To avoid this problem you may have to balance the number of targets for
> each class. However with only 4 chunks this does not give much power to
> detect any effects, even if there is a difference between the first and the
> other targets. Maybe you could use a partitioning scheme where, for every
> fold, you select a random subset of the other targets, with the number
> equal to the number of samples corresponding to the first target.
> Alternatively you could consider some split-half correlation approach.
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20170307/7026eb4f/attachment.html>