[pymvpa] significance
Scott Gorlin
gorlins at MIT.EDU
Thu May 7 22:17:07 UTC 2009
Are your 8 chunks just separate scans, or were the conditions
different? If they're just repetitions of the same thing (especially,
if chance performance is equal) then there should not be an issue with
summing the confusion matrices in each. In theory the results are
independent provided that for each chunk you're training your
classifiers on the others. Although, if you can pair up trials across
your classes, I prefer the McNemar test to the binomial. This will also
let you handle things like if the baseline chance is expected to be
different in each chunk, ie if you use different stimuli.
What do you mean by scrambling the regressors? the null distribution
test only requires that you scramble the labels (ie see how seperable
the data are in directions orthogonal to the effect of interest); if you
mean GLM regressors then you probably should leave those alone. It
might be interesting to do a ND test by placing the stim regressors
randomly throughout the scan but assuming that your data are balanced
and there's no bias in the GLM estimation, I don't see there being any
benefit to this over simply scrambling the labels. This would surely be
much slower too if you need to go back to SPM/FSL
-Scott
Jonas Kaplan wrote:
> Hello,
>
> I wonder if anyone could help me think through the issue of testing
> classifier results for significance and how it relates to
> cross-validation.
>
> We are running a design with 8 chunks, 27 trials in each chunk divided
> into 3 classes. Let's say we do an eight way (leave one out)
> cross-validation. This results in an accuracy value for each set of
> 27 tests... 8 x 27 for a total of 216 trials that were predicted
> correctly or incorrectly.
>
> Is it wrong to use a binomial test for significance on the total
> number of correct predictions out of 216? Or would that be
> inappropriate given that the 8 cross-validation steps are not really
> independent from each other and we must test each cross-validation
> step separately as a binomial with n=27? This latter option raises the
> issue of how to combine across the 8 tests.
>
> Alternatively, if we use the Monte Carlo simulation to produce a null
> distribution we have the same issue -- we are generating this null
> distribution for each cross-validation step -- and therefore not
> taking into account the overall success of the cross-validation
> routine across all 216 trials. Would it make sense to generate a
> null distribution by scrambling the regressors and generating the
> results of an entire cross-validation procedure for scrambled
> regressors? If so, does pymvpa have a routine for doing this?
>
> Thanks for any input or corrections of my thought,
>
>
> Jonas
>
>
>
> P.S. we are using pymvpa for several active projects with much
> pleasure and will happily send you posters/papers when our work is
> more complete.
>
>
> ----
> Jonas Kaplan, Ph.D.
> Research Assistant Professor
> Brain & Creativity Institute
> University of Southern California
>
>
>
>
>
>
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
More information about the Pkg-ExpPsy-PyMVPA
mailing list