[pymvpa] RFE & Permutation

Fri Jan 29 14:38:28 UTC 2010

btw -- few hints.

if you have some assumptions (e.g. indeed you have independent samples
in testing etc) about chance distribution, then you could revert to
parametric testing... e.g., if I think that it should be close to
binomial distribution, then with sufficient number of trials (like in
your case) it is
relatively well approximated with normal.  Then, instead of using
default Nonparametric distribution estimator in MCNullDist you can use
smth like

null_dist = MCNullDist(scipy.stats.norm, permutations=100, tail='left')

That would fit normal distribution to the data from 100 permutations and
assess p-value from it.

NB Normal approximates binomial quite well for a reasonable number of
trials.  Above function though doesn't do
http://en.wikipedia.org/wiki/Continuity_correction yet, but that is
negligible under reasonable sample size

More over, lets say I know that by chance mean performance should be
0.5, then I can help it out by fixating it at that mean
(unfortunately for that you would need to use maint/0.4
or yoh/0.4 or yoh/master with the fix I've submitted yesterday):

null_dist = MCNullDist(rv_semifrozen(scipy.stats.norm, loc=0.5), 
                       permutations=100, tail='left')

Advantage of those parametric tests is exactly for your case -- very low
p-values, where you simply don't have enough power from doing
non-parametric testing (e.g. to get any p-value as low as 10^(-x) you
would need to do 10^x permutations), e.g. in your case you simply can't
get precision higher than 0.001 since you are doing 1000 permutations.

On the other hand, parametric testing approximates non-parametric
results even when tested value (error) lies in the heavy part of the
distribution.

I hope this is of some value ;)

On Wed, 27 Jan 2010, Yaroslav Halchenko wrote:

> could you also enable storing all estimates from MC... i.e.

>         cv = CrossValidatedTransferError(
>             TransferError(clf),
>             splitter,
>             null_dist=MCNullDist(permutations=no_permutations,
>                                  tail='left',
>                                  enable_states=['dist_samples']),
>             enable_states=['confusion'])

> weird enough -- either I do not thing straight or smth is strange -- chance
> distribution after permutation on our testdata is indeed quite biased into high
> values (which are errors), although I would expect it to mean at chance
> (i.e.  0.5 since I did binary classification).

-- 
                                  .-.
=------------------------------   /v\  ----------------------------=
Keep in touch                    // \\     (yoh@|www.)onerussian.com
Yaroslav Halchenko              /(   )\               ICQ#: 60653192
                   Linux User    ^^-^^    [175555]