[pymvpa] significance

Thu May 7 21:57:17 UTC 2009

Hello,

I wonder if anyone could help me think through the issue of testing  
classifier results for significance and how it relates to cross- 
validation.

We are running a design with 8 chunks, 27 trials in each chunk divided  
into 3 classes.   Let's say we do an eight way (leave one out) cross- 
validation.  This results in an accuracy value for each set of 27  
tests... 8 x 27 for a total of 216 trials that were predicted  
correctly or incorrectly.

Is it wrong to use a binomial test for significance on the total  
number of correct predictions out of 216?  Or would that be  
inappropriate given that the 8 cross-validation steps are not really  
independent from each other and we must test each cross-validation  
step separately as a binomial with n=27? This latter option raises the  
issue of how to combine across the 8 tests.

Alternatively, if we use the Monte Carlo simulation to produce a null  
distribution we have the same issue -- we are generating this null  
distribution for each cross-validation step -- and therefore not  
taking into account the overall success of the cross-validation  
routine across all 216 trials.   Would it make sense to generate a  
null distribution by scrambling the regressors and generating the  
results of an entire cross-validation procedure for scrambled  
regressors?  If so, does pymvpa have a routine for doing this?

Thanks for any input or corrections of my thought,

Jonas

P.S. we are using pymvpa for several active projects with much  
pleasure and will happily send you posters/papers when our work is  
more complete.

----
Jonas Kaplan, Ph.D.
Research Assistant Professor
Brain & Creativity Institute
University of Southern California