[pymvpa] high prediction rate in a permutation test

Fri May 20 15:02:03 UTC 2011

On Fri, 20 May 2011, Thorsten Kranz wrote:
> Yes, the number of permutations is calculated by "!", but you have to
> keep in mind that from these permutations, you have

> 5! = 120 realisations where one run is assigned its own labels
> and accordingly
> 4! = 24 with 2 times own labels
> 3! = 6 with 3 times own labels
> etc.

yeap, but what would be the effect? e.g. if you have 120 realizations
where 1 run has its own labels, while other 5 not -- if you do
cross-validation, your results should look exactly the same "chance"
performance because if that 1 run gets into training set, and classifier
managed to account for 1 out of 5 have real information,
cross-validation set would still be randomly assigned, so if design for
correctly randomized, performance on it should be as random as before
but would still account for possible run-order effects which were not
destroyed.

With 2 runs having original labels, logic pretty much follows, besides
that in some cases there will be 1 out of 5 sets with real labels in
training , while 1 with real in testing; so if classifier is uber cool
and manages to learn anything realizing that 80% of data is junk,
it might indeed get good generalization for the held out set with real
labels, but on average it would still be quite random... although indeed
at the end probably providing some positively weighted tail...

let me think about it more ;)

> two conditions, each 25*6 trials => 150 trials cond. A, 150 trials cond. B
> => use combinatorics, binomial coefficient =>
> In [1]: from scipy.misc import comb
> In [2]: comb(300,150)
> Out[2]: array(9.3759702772810688e+88)
> So this is a lot of combinations, and this is really save.

;-) well, it is a number of combinations but they might be "illegal".
Non-parametric permutation testing requires the permuted units to be
independent.  I you believe/assume/guarantee_somehow that your samples
are independent and free from order/run effects -- then go ahead.  If
not -- you might take the permutation unit where you can guarantee
independence and that was my intent for the suggested permutation of
label sequences across runs.

> Maybe one should restrict these realisations "to the limitations of
> the paradigm" in terms of trial sequences.

rright, but wouldn't it be somewhat difficult to characterize/specify?

altogether, alternatively, the experiment could be designed with such
suggested permutation in mind, i.e. having sufficient number of
independent runs (e.g. 10!=3*10^6 or 11! > 39*10^6)

-- 
=------------------------------------------------------------------=
Keep in touch                                     www.onerussian.com
Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic