[pymvpa] Cross validation and permutation test
Yaroslav Halchenko
debian at onerussian.com
Sat Mar 12 03:29:52 UTC 2011
> Thanks, Yaroslav, I've had a glance at the tutorial for 0.6 version and
> it looks like I do have to learn twice... Is there a PDF file of the
> tutorial? - it would be easier to read and search for specific
> commands.
ha -- we just thought noone would be interested in PDF any more , so we
haven't fixed issues emerged while building it... ok -- demand is here, we will
fix it up and let you know
> For performing permutation test for cross validation on 0.4 version, if
> there isn't an easy way to do the permutation directly
> using CrossValidatedTransferError
there is -- but then not on per split but on the average error at once:
CrossValidatedTransferError is just yet another Measure so it could take
null_dist as the argument the same as you did for TransferError itself:
# Lets do the same for CVTE
cvte = CrossValidatedTransferError(
TransferError(clf=l_clf),
OddEvenSplitter(),
null_dist=MCNullDist(permutations=num_perm,
tail='left',
enable_states=['dist_samples']))
cv_err = cvte(train)
NB this is a snippet from our unittests... for some reason we haven't included
something like that into the examples, e.g.
http://v04.pymvpa.org/examples/permutation_test.html
but there are issues with such testing if your samples might not be fully independent...
in 0.6 we are going to make it more flexible to allow more adequate permutation
testing - starting from permutting only within training split, and then,
if there is a belief that even that is not enough, permutting chunks labeling
at once, instead of independent samples -- that would allow for possibly more
conservative, but more trustful testing.
watchout for announcements on the updates of
http://www.pymvpa.org/tutorial_significance.html
>, is the following code (using TransferError) correct?
> nd =
> MCNullDist(permutations=2500,tail='left',enable_states=['dist_samples']
> )
> terr = TransferError(clf,null_dist=nd)
> cv_index = 0
> splitter = NFoldSplitter(cvtype=1)
> for wdata, vdata in splitter(dataset):
> err = terr(vdata,wdata)
> Error[SubjIndex,cv_index] = err
> Pvalue[SubjIndex,cv_index] = terr.null_prob
> Distribution_normalizedMeanSD = nd.dist_samples
> nd.clean()
> cv_index += 1
> #===========
> In this way, I can get the null distribution for each split (i.e.
> dist_samples). The prediction error of the cross-validation is the
> average of prediction error of all splits, I suppose I will need to
> generate another null distribution for the mean predication error by
> sampling from these null distributions of all split and then
> calculating the mean of the new samples and repeat the procedure for,
> e.g., 1000 times? Is it correct?
looks and sounds correct. But once again, if you are not really
interested in per-split assessment, just use the construct above ;-)
--
=------------------------------------------------------------------=
Keep in touch www.onerussian.com
Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic
More information about the Pkg-ExpPsy-PyMVPA
mailing list