[pymvpa] high prediction rate in a permutation test

Thu May 19 18:51:05 UTC 2011

On 5/19/2011 1:35 AM, Vadim Axel wrote:
> Yes, I agree with you. However, I somehow feel that reporting
> significance based on permutation values is more cumbersome than
> t-tests. Consider the case that out of 10 subjects 8 have significant
> result (based on permutation) and two remaining are not. What should I
> say in my results? Does the ROI discriminate between two classes? When I
> use group t-test everything is simple  - the result is true or false for
> the whole group. Now, suppose that I have more than one ROI and I want
> to compare their results. Though I can show average prediction rate
> across subjects, I am afraid that when I start to report for each ROI
> for how many subjects it was significant and for how many not,
> everybody (including myself) would be confused....
Yes, more detail is required when reporting a permutation test; I like 
to see a description of the label permutation scheme and number of 
permutations, at minimum.

For describing a within-subjects analysis (accuracy calculated for each 
subject separately, but you want to talk about general results - not 
just each person separately) my usual strategy is to calculate the 
p-value for the across-subjects mean, using the permutations calculated 
for each person separately. You can then report a single p-value for the 
across-subjects mean, plus the individual subjects' p-values as well if 
you want.

Specifically, I pre-calculate my label permutations, and use the same 
permutations for every subject (as much as possible, if missings). This 
gives (say) 1000 accuracies for each person: accuracy for subject 1 
label rearrangement 1, subject 2 rearrangement 1, etc. I use those 1000 
accuracies to get the p-value for each person's accuracy. But you can 
also use them to make a group distribution by averaging the accuracies 
for each of the permutations (mean of subject 1 rearrangement 2, subject 
2 rearrangement 1, etc), then comparing against the real average accuracy.

Comparing the results from multiple ROIs is tricky; I don't know that 
I've seen a really satisfactory general answer. Building up a test for 
each particular analysis is probably the way to go; answer questions 
like: exactly what are you trying to compare? Do the ROIs have a similar 
number of voxels? Are they spatially very distinct or perhaps overlapping?

> BTW, how you recommend to correct for multiple comparisons? For example
> I run 100 search lights.Making Bonferoni correction (0.05/100) = 0.0005
> results in very high threshold. Consider my case with the mean values,
> which is based on 1000 tests only. Based on 0.0005 threshold I need to
> get classification of 0.75+ (!). My data are not that good :( What
> people are doing for whole brain when the number of search lights is
> tens of thousands...
For ROI-based analyses with only a few ROIs Bonferroni is fine. But I 
have went back to parametric for searchlight, using the FDR/cluster 
size/etc. stats built into SPM. Kriegeskorte describes some permutation 
tests in the original searchlight paper, but most people seem to use 
parametric stats adapted from GLM fMRI analyses.

Jo