[pymvpa] biased accuracy with nperlabel='equal'?

Fri Oct 28 23:50:45 UTC 2011

Hi -- Thanks for the reply.

There are inherent groups (e.g., group A and group B), but that's what we're trying to classify -- so maybe I don't understand your point here... There is a "confound" that is also somewhat predictive of group membership, and indeed, when I include that as a feature, the permuted labels do yield CVs that average to 50% instead of *always* being 55.71%. Maybe that was the issue?

I was wondering if it might be related to the leave-1-out; however, it's not like it's flipping a coin on each fold of the validation. In these cases, it's just guessing that everything is one class (i.e., sensitivity = 100%; specificity = 0%). However, maybe it's related to something I'm doing because I mainly see it happen when I give the classifier a minimal of features (e.g., one voxel).

What would be the easiest way to run some tests and randomly break my dataset up into 4 equal chunks with each test? Right now, each subject is a chunk because I thought the leave-1subject-out would be optimal here (e.g., more data for training). I'm just reading in a text file with my labels in one column and the chunks in the other column: attr = SampleAttributes(attr_file). Should I just make a new attributes file and make the second column randomly vary from 1 to 4 (with 35 observations in each chunk since I have 140 subjects)? Generally speaking, would that be a preferred way of doing this anyway?

Regarding Jo's point: I should clarify that the accuracy is 50% when the labels are 50/50. I thought that was the point of nperlabel='equal' -- but I guess that's only part of the issue...

Sorry for the newbie questions/problems...

Thanks for the help!
David

On Oct 28, 2011, at 2:57 PM, Yaroslav Halchenko wrote:

> 
> sorry about the delay. 
> nothing strikes my mind as obvious reason... also we don't know anything
> about data preprocessing/nature (are there inherent groups etc)
> 
> blind guess that it could also be related to leave-1-out... what if you
> group samples (randomly if you like) into 4 groups (chunks) and then do
> cross-validation -- does bias persist?
> 
>>> I have 140 structural images: 78 are in class A and 62 are in class B. To ensure that the training algorithm (LinearNuSVMC) doesn't build a biased model, I am using the nperlabel='equal' option in my splitter. I know this part of my code is working (see below), so I'm confused why my CVs (leave-one-scan-out) are biased with random data (e.g., 55.71%). Can someone please clarify why I'm not getting 50% with random data? I suspect I'm just not understanding something simple...
> 
>>> Thanks!
>>> David
> 
> -- 
> =------------------------------------------------------------------=
> Keep in touch                                     www.onerussian.com
> Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic
> 
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa