[pymvpa] significance
Jonas Kaplan
jtkaplan at usc.edu
Thu May 7 21:57:17 UTC 2009
Hello,
I wonder if anyone could help me think through the issue of testing
classifier results for significance and how it relates to cross-
validation.
We are running a design with 8 chunks, 27 trials in each chunk divided
into 3 classes. Let's say we do an eight way (leave one out) cross-
validation. This results in an accuracy value for each set of 27
tests... 8 x 27 for a total of 216 trials that were predicted
correctly or incorrectly.
Is it wrong to use a binomial test for significance on the total
number of correct predictions out of 216? Or would that be
inappropriate given that the 8 cross-validation steps are not really
independent from each other and we must test each cross-validation
step separately as a binomial with n=27? This latter option raises the
issue of how to combine across the 8 tests.
Alternatively, if we use the Monte Carlo simulation to produce a null
distribution we have the same issue -- we are generating this null
distribution for each cross-validation step -- and therefore not
taking into account the overall success of the cross-validation
routine across all 216 trials. Would it make sense to generate a
null distribution by scrambling the regressors and generating the
results of an entire cross-validation procedure for scrambled
regressors? If so, does pymvpa have a routine for doing this?
Thanks for any input or corrections of my thought,
Jonas
P.S. we are using pymvpa for several active projects with much
pleasure and will happily send you posters/papers when our work is
more complete.
----
Jonas Kaplan, Ph.D.
Research Assistant Professor
Brain & Creativity Institute
University of Southern California
More information about the Pkg-ExpPsy-PyMVPA
mailing list