[pymvpa] high prediction rate in a permutation test

J.A. Etzel jetzel at artsci.wustl.edu
Thu May 19 18:51:05 UTC 2011

On 5/19/2011 1:35 AM, Vadim Axel wrote:
> Yes, I agree with you. However, I somehow feel that reporting
> significance based on permutation values is more cumbersome than
> t-tests. Consider the case that out of 10 subjects 8 have significant
> result (based on permutation) and two remaining are not. What should I
> say in my results? Does the ROI discriminate between two classes? When I
> use group t-test everything is simple  - the result is true or false for
> the whole group. Now, suppose that I have more than one ROI and I want
> to compare their results. Though I can show average prediction rate
> across subjects, I am afraid that when I start to report for each ROI
> for how many subjects it was significant and for how many not,
> everybody (including myself) would be confused....
Yes, more detail is required when reporting a permutation test; I like 
to see a description of the label permutation scheme and number of 
permutations, at minimum.

For describing a within-subjects analysis (accuracy calculated for each 
subject separately, but you want to talk about general results - not 
just each person separately) my usual strategy is to calculate the 
p-value for the across-subjects mean, using the permutations calculated 
for each person separately. You can then report a single p-value for the 
across-subjects mean, plus the individual subjects' p-values as well if 
you want.

Specifically, I pre-calculate my label permutations, and use the same 
permutations for every subject (as much as possible, if missings). This 
gives (say) 1000 accuracies for each person: accuracy for subject 1 
label rearrangement 1, subject 2 rearrangement 1, etc. I use those 1000 
accuracies to get the p-value for each person's accuracy. But you can 
also use them to make a group distribution by averaging the accuracies 
for each of the permutations (mean of subject 1 rearrangement 2, subject 
2 rearrangement 1, etc), then comparing against the real average accuracy.

Comparing the results from multiple ROIs is tricky; I don't know that 
I've seen a really satisfactory general answer. Building up a test for 
each particular analysis is probably the way to go; answer questions 
like: exactly what are you trying to compare? Do the ROIs have a similar 
number of voxels? Are they spatially very distinct or perhaps overlapping?

> BTW, how you recommend to correct for multiple comparisons? For example
> I run 100 search lights.Making Bonferoni correction (0.05/100) = 0.0005
> results in very high threshold. Consider my case with the mean values,
> which is based on 1000 tests only. Based on 0.0005 threshold I need to
> get classification of 0.75+ (!). My data are not that good :( What
> people are doing for whole brain when the number of search lights is
> tens of thousands...
For ROI-based analyses with only a few ROIs Bonferroni is fine. But I 
have went back to parametric for searchlight, using the FDR/cluster 
size/etc. stats built into SPM. Kriegeskorte describes some permutation 
tests in the original searchlight paper, but most people seem to use 
parametric stats adapted from GLM fMRI analyses.


More information about the Pkg-ExpPsy-PyMVPA mailing list