[pymvpa] high prediction rate in a permutation test

Wed May 18 20:43:46 UTC 2011

On 5/18/2011 2:55 PM, Yaroslav Halchenko wrote:
>
> On Wed, 18 May 2011, J.A. Etzel wrote:
>> The curves look reasonable to me; sometimes the tails of the
>> permutation distribution can be quite long.
>
> yeap -- look quite symmetric, as they should (they could have been
> visualized a bit better if you instructed to have bins so middle of
> the center one points at 0.5 sharp). Now it is hard to say how much
> of that positive 0.6 bias is there (where it should not be
> theoretically afaik)
I agree; I would be worried if the *middle* of the permutation
distribution was around 0.6, but a wide distribution such that 0.6 is in
the top 0.05 can happen.

>> Randomizing the real data labels is often the best strategy,
>> because you want to make sure the permuted data sets have the same
>> structure (as much as possible) as the real data. For example, if
>> you're partitioning on the runs, you should permute the data labels
>> within each run. Similarly, if you need to omit some examples for
>> balance
>
> within each run -- is applicable if trials are independent (trial
> order is truly random, no bold spill overs, etc).  More stringent
> test imho, if there is equal number of trials across runs, is to
> permute truly independent (must be in the correct design) items:
> sequences of trials across runs: i.e. take sequence of labels from
> run 1, and place it into run X, and so across all runs.  That should
> account for possible inter-trial dependencies within runs, and thus I
> would expect that distribution would get even slightly wider (than if
> permuted within each run)
Not sure I follow ... you mean taking the order of trials from one run 
and copying it to another, then partitioning on the runs?
>
>> Something to look at when trying to figure out the difference in
>> your averaged or not-averaged results might be the block
>> structure.
>
> please correct me if I am wrong -- under permutation of samples
> labels, those must differ regardless of block structure, simple due
> to the change of number of trials (just compare binomial
> distributions for 2 trials vs 4 ;) )
Yes, the change in the variance of the permutation distribution could be 
just from the smaller number of samples. But I can imagine setting up 
dodgy classifications of individual trials from block designs that could 
also make the permutation distributions change (not that Vadim did 
that!), so wanted to mention double-checking the not-averaged 
partitioning scheme.

Jo