[pymvpa] high prediction rate in a permutation test

J.A. Etzel jetzel at artsci.wustl.edu
Thu May 19 20:31:54 UTC 2011

On 5/19/2011 12:36 PM, Jonas Kaplan wrote:
> This is an issue I have been thinking about quite a bit recently, as we
> have used t-tests across subjects in the past (after checking for
> violation of normalcy and also performing arsine transformation),
> however I'm no longer convinced it's a great idea, and the main reason
> for me comes down to interpretation. The interesting hypothetical case
> to my mind is where a t-test is significant across subjects, but no
> single subject has significant performance according to a within-subject
> permutation test. How would we interpret such a result?
> A related issue is, what does it mean to have prediction performance
> that is consistently above chance in all subjects, but so small that
> prediction is still practically speaking pretty bad? What conclusions
> does that case allow us to draw about the underlying neural
> representations? Yes, they contain more information about the stimuli
> than pure noise would... but is that meaningful? The problem is I'm not
> sure what an alternative criterion would be. The example quoted above
> appeals to some sense of this... clearly we want the performance numbers
> to be higher, but what objective standard do we have other than
> statistical significance?
> Just a bit of rambling...
> -Jonas

Another version of this dilemma is: Should you consider two results 
equally important/"good" if they have the same (properly calculated) 
p-value but one accuracy is 0.8 and the other is 0.56?

I haven't heard a fully convincing answer. It seems clear that higher 
accuracies are "better" than lower, but what should the thresholds be 
when we're dealing with something as noisy as fMRI data? Very small 
differences are considered important in mass-univariate analyses; does 
that apply to MVPA as well?

Sometimes people (e.g. Rajeev Raizada) have been able to tie 
classification accuracy to behavior, but that's not possible with many 

So this is a bit more rambling! :)  My general strategy right now is to 
lean heavily on permutation test results, combined with looking at the 
variability and setting up control tests whenever possible (e.g. a 
classification or region that should definitely work or definitely not 
work). I tend to put more weight on results that are consistent across 
cross-validation folds and replications (e.g. if I drop trials, does the 
accuracy vary a lot?), as well as across subjects (e.g. is one 
really-high-accuracy subject pulling up the average?). This is all 
rather subjective, of course.

Relatedly, I'd probably not believe a significant t-test if the 
single-subject permutation tests were all non-significant. This suggests 
to me that the variance is so high within each person that the results 
shouldn't be trusted, even though the means come out a bit above chance. 
So much of the variance structure is lost in a t-test ...


More information about the Pkg-ExpPsy-PyMVPA mailing list