[pymvpa] biased accuracy with nperlabel='equal'?

Fri Oct 28 20:00:53 UTC 2011

Perhaps constructing test datasets could help narrow down the problem 
(and 55.7% accurate on random data is a problem). For example, make a 
dataset that has the same number of images in each class (to cut out the 
balancing part), make one that has the same number of images as your 
real data but make all the images the same (so classification is 
impossible), etc.

Jo

On 10/28/2011 1:57 PM, Yaroslav Halchenko wrote:
>
> sorry about the delay. nothing strikes my mind as obvious reason...
> also we don't know anything about data preprocessing/nature (are
> there inherent groups etc)
>
> blind guess that it could also be related to leave-1-out... what if
> you group samples (randomly if you like) into 4 groups (chunks) and
> then do cross-validation -- does bias persist?
>
>>> I have 140 structural images: 78 are in class A and 62 are in
>>> class B. To ensure that the training algorithm (LinearNuSVMC)
>>> doesn't build a biased model, I am using the nperlabel='equal'
>>> option in my splitter. I know this part of my code is working
>>> (see below), so I'm confused why my CVs (leave-one-scan-out) are
>>> biased with random data (e.g., 55.71%). Can someone please
>>> clarify why I'm not getting 50% with random data? I suspect I'm
>>> just not understanding something simple...
>
>>> Thanks! David
>