[pymvpa] multiple comparison in classification: theoretical question

Tue Apr 29 21:20:29 UTC 2014

Hi Vadim,

If you try one analysis -- let's say classification -- and use
cross-validation you get one result and an associated p-value under a
particular null hypothesis. Your result might have occurred by chance, and
your p-value tells you what the probability of getting the result you got
was if the null were true. This is exactly like the situation you would
have if not using cross-validation; whether you are using it only changes
how your p-value is computed.

If you were to do the analysis multiple times or with multiple options you
would have a set of results instead of a single one, and your concern would
be entirely valid. This is where you would need to use a multiple
comparison correction and, possibly, report all the results (your
correction can take into account the fact that they are likely correlated).

A separate issue is how to use cross-validation to get around the issue of
trying multiple analyses options (e.g. regularization parameter settings).
The idea there is running nested cross-validations within the training set
to test those analysis options, pick one setting and then use that setting
to analyse the test set (and repeat this for every cross-validation fold).

cheers,
Francisco

On Tue, Apr 29, 2014 at 5:06 PM, Vadim Axel <axel.vadim at gmail.com> wrote:

> Hi guys,
>
> For a given dataset, in a statistical analysis where all the data analyzed
> together (no cross-validation) if I change some analysis parameter and
> rerun the analysis I should decrease the p-value (at least in theory). In
> the other words, if I am successful in getting significant result with
> p=0.05 after I tried before 19 different analysis options this result might
> be purely by chance.  My question: what if the data tested with
> cross-validation (like in pattern classification), does it mean that I can
> try million different options and I am fine? Intuitively, it still looks to
> me that parameters can be fitted for data even with cross-validation, so
> the result would be biased. Though, probably less, than without
> cross-validation.
>
> What do you think?
>
> Thanks!
> Vadim
>
>
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20140429/d317d513/attachment.html>