[pymvpa] Cross validation and permutation test

Sun Mar 13 01:39:55 UTC 2011

On Sun, 13 Mar 2011, Meng Liang wrote:
>    What I meant by 'quicker' is that I don't have to perform the
>    permutation for each split separately. Unless the permutation number
>    specified in MCNullDist embedded within CrossValidationTransferError
>    (the code you suggested in the last email) is also for each split,
>    i.e., permutations=1000 for an OddEvenSplit actually means 2000
>    permutations in total? But the number of distribution samples obtained
>    is still 1000.

yes -- it is just a total number of permutations: i.e. it first permutes
the labels and then computes the measure (i.e.
CrossValidationTransferError in your case), so it is just 1000
permutations in total now.  Before you did permutations independently
per each split thus acquiring 2000 permutations in total.

>    > > Since my MVPA is run on each subject, the final P
>    > > value will be determined by performing a t test (against 50%) or
>    > > non-parametric test (e.g. Wilcoxon signed rank test) on prediction
>    > > rates across all subjects. Please let me know if there is anything
>    > > incorrect.
>    > Everything sounds correct in terms of my knowledge of currently  accepted
>    > practices ;-)
>    Thanks for your confirmation.

clarification:  I said "accepted practices" without disclosing my
personal view where I would consider t-test inappropriate and
possibly detrimental... Aspects I have in mind:

*  imagine that there is a confound in the design (conditions order
   effect) or data (design correlated motion) leading to only
   slightly, but consistent across subjects above chance performance --
   so you might end up with "significant" 0.51 accuracy across subjects.
   t-test would assign "significance" without taking into account the
   "effect size"

*  strictly speaking, "mean accuracy classifier performance" per subject
   is not generally the mean of samples from the same distribution
   of "subjects performances" if taken across subjects -- it is just a
   sample of some such distribution (which is bound to lie in [0, 1.0]
   range), so there is no guarantee that the values should follow normal
   distribution (test for normality before testing for significance?)

   Depending on study/design/processing/confounds you might get a quite
   wide distribution of subject performances (classifications). E.g.
   consider having obtained classification accuracies
   0.6, 0.7, 0.8, 0.9, 1.0 -- all "significant" on their own, but
   probably not so significant if you do a t-test against the chance
   performance of 0.5 ;)  So, in this case  you might take result as
   'non-significant' whenever there is clearly a good performance.

  N.B. Some readers of this list might even suggest to use some inferior
  classifier which would not provide as wide range of good performances
  with the aim to keep the results all "lowerish" to gain some
  'smoothness' to make t-testing more comforting ;)

-- 
=------------------------------------------------------------------=
Keep in touch                                     www.onerussian.com
Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic