[pymvpa] Cross validation and permutation test

Sat Mar 12 18:35:38 UTC 2011

Dear Yaroslav,
Thanks very much for your help! I wasn't sure if I can use MCNullDist for CrossValidationTransferError in the same way as for TransferError. Now I can make the code much simpler and run much quicker!
PS, I think the examples in my data should be well independent. It is a slow event-related design. The inter-stimulus interval is at least 9 seconds and only one fMRI volume (acquired at about 9 sec after each stimulus) per stimulus/trial was taken and fed into the classifier, that is, only one example for each stimulus. Actually, my purpose of performing a permutation here is to see how the null distribution looks like, i.e., if it is centred at 50%, to confirm that there is no unknown technical problem which shifts the chance level away from 50%. Since my MVPA is run on each subject, the final P value will be determined by performing a t test (against 50%) or non-parametric test (e.g. Wilcoxon signed rank test) on prediction rates across all subjects. Please let me know if there is anything incorrect.
Best,Meng

> Date: Fri, 11 Mar 2011 22:29:52 -0500
> From: debian at onerussian.com
> To: pkg-exppsy-pymvpa at lists.alioth.debian.org
> Subject: Re: [pymvpa] Cross validation and permutation test
> 
> >    Thanks, Yaroslav, I've had a glance at the tutorial for 0.6 version and
> >    it looks like I do have to learn twice... Is there a PDF file of the
> >    tutorial? - it would be easier to read and search for specific
> >    commands.
> 
> ha -- we just thought noone would be interested in PDF any more , so we
> haven't fixed issues emerged while building it... ok -- demand is here, we will
> fix it up and let you know
> 
> >    For performing permutation test for cross validation on 0.4 version, if
> >    there isn't an easy way to do the permutation directly
> >    using CrossValidatedTransferError
> 
> there is -- but then not on per split but on the average error at once:
> CrossValidatedTransferError is just yet another Measure so it could take
> null_dist as the argument the same as you did for TransferError itself:
> 
>         # Lets do the same for CVTE
>         cvte = CrossValidatedTransferError(
>             TransferError(clf=l_clf),
>             OddEvenSplitter(),
>             null_dist=MCNullDist(permutations=num_perm,
>                                  tail='left',
>                                  enable_states=['dist_samples']))
>         cv_err = cvte(train)
> 
> NB this is a snippet from our unittests... for some reason we haven't included
> something like that into the examples, e.g.
> 
>  http://v04.pymvpa.org/examples/permutation_test.html
> 
> but there are issues with such testing if your samples might not be fully independent...
> 
> in 0.6 we are going to make it more flexible to allow more adequate permutation
> testing - starting from permutting only within training split, and then,
> if there is a belief that even that is not enough, permutting chunks labeling
> at once, instead of independent samples -- that would allow for possibly more
> conservative, but more trustful testing.
> 
> watchout for announcements on the updates of 
> http://www.pymvpa.org/tutorial_significance.html
> 
> >, is the following code (using TransferError) correct?
> 
> >    nd =
> >    MCNullDist(permutations=2500,tail='left',enable_states=['dist_samples']
> >    )
> >    terr = TransferError(clf,null_dist=nd)
> >    cv_index = 0
> >    splitter = NFoldSplitter(cvtype=1)
> >    for wdata, vdata in splitter(dataset):
> >       err = terr(vdata,wdata)
> >       Error[SubjIndex,cv_index] = err
> >       Pvalue[SubjIndex,cv_index] = terr.null_prob
> >       Distribution_normalizedMeanSD = nd.dist_samples
> >       nd.clean()
> 
> >       cv_index += 1
> 
> >    #===========
> 
> >    In this way, I can get the null distribution for each split (i.e.
> >    dist_samples). The prediction error of the cross-validation is the
> >    average of prediction error of all splits, I suppose I will need to
> >    generate another null distribution for the mean predication error by
> >    sampling from these null distributions of all split and then
> >    calculating the mean of the new samples and repeat the procedure for,
> >    e.g., 1000 times? Is it correct?
> 
> looks and sounds correct. But once again, if you are not really
> interested in per-split assessment, just use the construct above ;-)
> 
> -- 
> =------------------------------------------------------------------=
> Keep in touch                                     www.onerussian.com
> Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic
> 
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20110312/b661b262/attachment.htm>