[pymvpa] significance

Thu May 7 22:17:07 UTC 2009

Are your 8 chunks just separate scans, or were the conditions 
different?  If they're just repetitions of the same thing (especially, 
if chance performance is equal) then there should not be an issue with 
summing the confusion matrices in each.  In theory the results are 
independent provided that for each chunk you're training your 
classifiers on the others.  Although, if you can pair up trials across 
your classes, I prefer the McNemar test to the binomial.  This will also 
let you handle things like if the baseline chance is expected to be 
different in each chunk, ie if you use different stimuli.

What do you mean by scrambling the regressors?  the null distribution 
test only requires that you scramble the labels (ie see how seperable 
the data are in directions orthogonal to the effect of interest); if you 
mean GLM regressors then you probably should leave those alone.  It 
might be interesting to do a ND test by placing the stim regressors 
randomly throughout the scan but assuming that your data are balanced 
and there's no bias in the GLM estimation, I don't see there being any 
benefit to this over simply scrambling the labels.  This would surely be 
much slower too if you need to go back to SPM/FSL

-Scott

Jonas Kaplan wrote:
> Hello,
>
> I wonder if anyone could help me think through the issue of testing 
> classifier results for significance and how it relates to 
> cross-validation.
>
> We are running a design with 8 chunks, 27 trials in each chunk divided 
> into 3 classes.   Let's say we do an eight way (leave one out) 
> cross-validation.  This results in an accuracy value for each set of 
> 27 tests... 8 x 27 for a total of 216 trials that were predicted 
> correctly or incorrectly.
>
> Is it wrong to use a binomial test for significance on the total 
> number of correct predictions out of 216?  Or would that be 
> inappropriate given that the 8 cross-validation steps are not really 
> independent from each other and we must test each cross-validation 
> step separately as a binomial with n=27? This latter option raises the 
> issue of how to combine across the 8 tests.
>
> Alternatively, if we use the Monte Carlo simulation to produce a null 
> distribution we have the same issue -- we are generating this null 
> distribution for each cross-validation step -- and therefore not 
> taking into account the overall success of the cross-validation 
> routine across all 216 trials.   Would it make sense to generate a 
> null distribution by scrambling the regressors and generating the 
> results of an entire cross-validation procedure for scrambled 
> regressors?  If so, does pymvpa have a routine for doing this?
>
> Thanks for any input or corrections of my thought,
>
>
> Jonas
>
>
>
> P.S. we are using pymvpa for several active projects with much 
> pleasure and will happily send you posters/papers when our work is 
> more complete.
>
>
> ----
> Jonas Kaplan, Ph.D.
> Research Assistant Professor
> Brain & Creativity Institute
> University of Southern California
>
>
>
>
>
>
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa