[pymvpa] high prediction rate in a permutation test
J.A. Etzel
jetzel at artsci.wustl.edu
Thu May 19 20:31:54 UTC 2011
On 5/19/2011 12:36 PM, Jonas Kaplan wrote:
> This is an issue I have been thinking about quite a bit recently, as we
> have used t-tests across subjects in the past (after checking for
> violation of normalcy and also performing arsine transformation),
> however I'm no longer convinced it's a great idea, and the main reason
> for me comes down to interpretation. The interesting hypothetical case
> to my mind is where a t-test is significant across subjects, but no
> single subject has significant performance according to a within-subject
> permutation test. How would we interpret such a result?
>
> A related issue is, what does it mean to have prediction performance
> that is consistently above chance in all subjects, but so small that
> prediction is still practically speaking pretty bad? What conclusions
> does that case allow us to draw about the underlying neural
> representations? Yes, they contain more information about the stimuli
> than pure noise would... but is that meaningful? The problem is I'm not
> sure what an alternative criterion would be. The example quoted above
> appeals to some sense of this... clearly we want the performance numbers
> to be higher, but what objective standard do we have other than
> statistical significance?
>
> Just a bit of rambling...
>
> -Jonas
Another version of this dilemma is: Should you consider two results
equally important/"good" if they have the same (properly calculated)
p-value but one accuracy is 0.8 and the other is 0.56?
I haven't heard a fully convincing answer. It seems clear that higher
accuracies are "better" than lower, but what should the thresholds be
when we're dealing with something as noisy as fMRI data? Very small
differences are considered important in mass-univariate analyses; does
that apply to MVPA as well?
Sometimes people (e.g. Rajeev Raizada) have been able to tie
classification accuracy to behavior, but that's not possible with many
experiments.
So this is a bit more rambling! :) My general strategy right now is to
lean heavily on permutation test results, combined with looking at the
variability and setting up control tests whenever possible (e.g. a
classification or region that should definitely work or definitely not
work). I tend to put more weight on results that are consistent across
cross-validation folds and replications (e.g. if I drop trials, does the
accuracy vary a lot?), as well as across subjects (e.g. is one
really-high-accuracy subject pulling up the average?). This is all
rather subjective, of course.
Relatedly, I'd probably not believe a significant t-test if the
single-subject permutation tests were all non-significant. This suggests
to me that the variance is so high within each person that the results
shouldn't be trusted, even though the means come out a bit above chance.
So much of the variance structure is lost in a t-test ...
Jo
More information about the Pkg-ExpPsy-PyMVPA
mailing list