[pymvpa] Cross validation and permutation test

Sat Mar 12 03:29:52 UTC 2011

>    Thanks, Yaroslav, I've had a glance at the tutorial for 0.6 version and
>    it looks like I do have to learn twice... Is there a PDF file of the
>    tutorial? - it would be easier to read and search for specific
>    commands.

ha -- we just thought noone would be interested in PDF any more , so we
haven't fixed issues emerged while building it... ok -- demand is here, we will
fix it up and let you know

>    For performing permutation test for cross validation on 0.4 version, if
>    there isn't an easy way to do the permutation directly
>    using CrossValidatedTransferError

there is -- but then not on per split but on the average error at once:
CrossValidatedTransferError is just yet another Measure so it could take
null_dist as the argument the same as you did for TransferError itself:

        # Lets do the same for CVTE
        cvte = CrossValidatedTransferError(
            TransferError(clf=l_clf),
            OddEvenSplitter(),
            null_dist=MCNullDist(permutations=num_perm,
                                 tail='left',
                                 enable_states=['dist_samples']))
        cv_err = cvte(train)

NB this is a snippet from our unittests... for some reason we haven't included
something like that into the examples, e.g.

 http://v04.pymvpa.org/examples/permutation_test.html

but there are issues with such testing if your samples might not be fully independent...

in 0.6 we are going to make it more flexible to allow more adequate permutation
testing - starting from permutting only within training split, and then,
if there is a belief that even that is not enough, permutting chunks labeling
at once, instead of independent samples -- that would allow for possibly more
conservative, but more trustful testing.

watchout for announcements on the updates of 
http://www.pymvpa.org/tutorial_significance.html

>, is the following code (using TransferError) correct?

>    nd =
>    MCNullDist(permutations=2500,tail='left',enable_states=['dist_samples']
>    )
>    terr = TransferError(clf,null_dist=nd)
>    cv_index = 0
>    splitter = NFoldSplitter(cvtype=1)
>    for wdata, vdata in splitter(dataset):
>       err = terr(vdata,wdata)
>       Error[SubjIndex,cv_index] = err
>       Pvalue[SubjIndex,cv_index] = terr.null_prob
>       Distribution_normalizedMeanSD = nd.dist_samples
>       nd.clean()

>       cv_index += 1

>    #===========

>    In this way, I can get the null distribution for each split (i.e.
>    dist_samples). The prediction error of the cross-validation is the
>    average of prediction error of all splits, I suppose I will need to
>    generate another null distribution for the mean predication error by
>    sampling from these null distributions of all split and then
>    calculating the mean of the new samples and repeat the procedure for,
>    e.g., 1000 times? Is it correct?

looks and sounds correct. But once again, if you are not really
interested in per-split assessment, just use the construct above ;-)

-- 
=------------------------------------------------------------------=
Keep in touch                                     www.onerussian.com
Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic