[pymvpa] Splitter and nperlabel='equal'

Fri Mar 26 18:19:14 UTC 2010

Hi,

I have a couple of questions about the nperlabel parameter of the Splitter
class (NFoldSplitter, actually). I have unequal numbers of each class within
each scan, and also across scans, so I have been manually balancing the
number of exemplars used from each class in each chunk by throwing out
random trials from the over-represented class before classification. I'd
like to take advantage of the nperlabel='equal' option on my splitter to do
this for me, but I have a couple of questions about how this affects the
error rate, which I could not figure out from the documentation (sorry if I
missed something obvious):

- Suppose I am using NFoldSplitter to leave one chunk out, and I have 11
examples of C1 and 13 examples of C2 in chunk 1, but only 8 C1 and 10 C2 for
chunk 2. Will the NFoldSplitter with nperlabel='equal' force the number of
examples of each category from each chunk down to 8? Or will it use 11 of
each class for chunk 1, and 8 of each class for chunk 2?

- If it is the latter (balanced separately within chunks), how is the error
rate determined with the CrossValidatedTransferError class? Does the error
rate reflect the simple average error across folds (error run 1 + error run
2)/2, or is the average weighted by the number of exemplars from each fold
(equivalent to the total error / total number of tests)? If it is averaging
fold performance, is there a way to force it to report the overall test case
performance, instead? The simple average over fold performance would seem to
be skewed by better or worse performance on chunk 2 in the example above,
since it has fewer test cases.

Thanks for your help!
-Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20100326/32ee2959/attachment.htm>