[pymvpa] multiple comparison in classification: theoretical question

Yaroslav Halchenko debian at onerussian.com
Mon May 5 14:59:19 UTC 2014


On Mon, 05 May 2014, Emanuele Olivetti wrote:
> >>    A separate issue is how to use cross-validation to get around the issue of
> >>    trying multiple analyses options (e.g. regularization parameter settings).
> >>    The idea there is running nested cross-validations within the training set
> >>    to test those analysis options, pick one setting and then use that setting
> >>    to analyse the test set (and repeat this for every cross-validation fold).
> >+1  but nested cross-validation with model selection might become too
> >'flexible' thus again overfitting your data.  So I guess it all would
> >depend on how big is the space of models explored.

> I mildly disagree. If your point is to find out the best model, then - in the
> limit of the ranking you can get with nested cross-validation and taking
> into account the variance of that estimate, thus ties - I would not expect
> overfitting, whatever the size of the space of models.

My comment was primarily on the fact (which you also confirmed) that by
exploring many models and choosing the 'best' you would just pick the
one in the tail of the distribution, and it might actually be quite
inferior one which happened to provide the best estimate on training
data.  More models you explore -- further in the tail you might go.
With infinite number of models to explore -- you can overfit heavily.

Indeed taking into account variance (not just accuracy) to choose among
the models might provide improvement (if not an ultimate remedy)
in practice... need to try/explore (thanks for the reference) ;)

> >another idea which I have exercised once but didn't push yet forward, so
> >would not mind collaborating with someone interested:  derive statistics
> >on the 'mean' (could be weighted/sum) generalization performance(s)
> >across different estimators.  In an ongoing study I got literally sick
> >of trying to make sense of different models (varying classification
> >schemes/feature extractions), that is why I did statistics across
> >subjects on the 'mean' among those per-subject metrics: here is the
> >poster presenting those results
> >http://haxbylab.dartmouth.edu/publications/HGG+12_sfn12_famfaces.png
> >there were obvious 'cons' on how I have done it, but I think this can
> >serve the base for doing it 'right'.

> >By relying on average performance across different metrics I hoped to
> >reduce (false positives) noise of a single estimation, while bringing
> >out consistent results which might be weak but consistently
> >detected by various models.

> I guess that in practice this is acceptable. But in principle
> you are putting different things into the averaging process. 

;-) that is what I meant with "cons"...

> What qualifies an algorithm, e.g. feature extraction/selection or
> classification algorithm or else, to enter the set you are considering
> and then averaging?  Would a random feature selection algorithm or a
> random classifier enter your set of different models? If not, why? :)

I would say that any reasonable considered algorithm could/should
enter... the tricky part here would be on how to provide adequate
"averaging" -- e.g.  normalization of the metric values somehow before
averaging.  For significance assessment MC permutation would be carried
out as usual anyways.

-- 
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist,            Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        



More information about the Pkg-ExpPsy-PyMVPA mailing list