[pymvpa] high prediction rate in a permutation test

Thu May 19 22:13:55 UTC 2011

On 5/19/2011 4:29 PM, Vadim Axel wrote:
> Thanks a lot, Jo.
>
> I just wanted to make sure I understand your suggestions:
>
> 1. If I understand correctly, the way you propose to report the
> permutation results is the one explained here:
> www.bcn-nic.nl/txt/people/publications/Etzel2009.pdf
> <http://www.bcn-nic.nl/txt/people/publications/Etzel2009.pdf> When
> you refer to "single p-value for the across-subjects mean" you mean
> that based on permutation test I establish significance individually
> for each subject and then I just average those p-vals across
> subjects?
Yes, I describe some of these methods in that paper. But no, I don't
mean averaging the p-values across subjects.

Here's a toy example (within-subjects classification):
subject 1 - real accuracy 0.6; perm 1 accuracy 0.51, perm 2 accuracy
0.5, ...
subject 2 - real accuracy 0.65; perm 1 accuracy 0.5, perm 2 accuracy
0.52, ...
...
subject n - real accuracy 0.55; perm 1 accuracy 0.52, perm 2 accuracy
0.48, ...

We'd calculate the across-subjects accuracy as:
(subject 1 accuracy) + (subject 2 accuracy) + ... (subject n accuracy)/n

we'd calculate the permutation p-values for each person as:
subject 1: (# perms with accuracy > 0.6) / (# perms)
subject 2: (# perms with accuracy > 0.65) / (# perms)
etc

the across-subjects permutation p-value is:
step 1: calculate the average across-subjects accuracy for each label
permutation
	perm 1: (0.51 + 0.5 + ... + 0.52)/n
	perm 2: (0.5 + 0.52 + ... + 0.48)/n
	etc.
step 2: calculate the permutation p-value for the average as:
(# average perms with accuracy > real average accuracy) / (# perms)

Whether this makes sense of course depends on the particular design and
questions. The key is to try to always compare the value of interest
(e.g. mean across-subjects accuracy) to a fair set of permutations:
accuracies that came from the exact same procedure as the value of
interest, except for the class labels.

> 2. The FDR search light case: I first establish the significance in
> whatever way for each light separately, without any correction. Then,
> I pass my p-vals vector to FDR routine while I provide desired FDR
> threshold. At the end I get back which lights are significant after
> FDR correction and which are not. Correct?
I think it's sometimes done that way. But more often I've seen treating 
the searchlight accuracy of each voxel like it was an "activation" and 
then running "normal" GLM-type statistical tests on the accuracy maps. 
That's appealing, since searchlight accuracy maps are necessarily 
smoothed (in the sense of overlapping searchlights), so taking the 
spatial organization into account is useful. The standard fMRI packages 
(spm, whatever) have stats in place to handle FDR/etc. in a 
neuroimaging-sensible manner (e.g. thresholds for the number of voxels).

Any other opinions on second-level searchlight analysis?

Jo