[pymvpa] cross-validation

Tue Mar 16 13:43:07 UTC 2010

On Tue, 16 Mar 2010, Emanuele Olivetti wrote:
> dataset coming from the same distribution. In case of SVMs (and some
> others) and leave-one-out this estimator is proved to be almost
> unbiased [0]. Yarik, you mentioned issues with bias in this case, can
> you send me some references on that? I am eagerly collecting
> information on the topic.
yes -- I did mention bias... and yes there is one, like with any other estimate
;) Indeed in general LOO is known indeed to provide small (and small is
generally != 0) bias and often large variance of the estimate (like you pointed
out below).  But lets look into the 'fine print's for the
theorem/propositions you've mentioned:

,---
| 1. The term "almost" refers to the fact that the leave-one-out error
| provides an estimate for training on sets of size m-1 rather than
| m, cf. Proposition 7.4.
`---
So, as any other estimator, LOO does have its bias, which is known to be quite
small... and this note points to probably a critical
point which becomes important on small data set sizes: it is an estimate
for training on m-1 samples... lets keep it mind for now

and now proposition:

,---
| Proposition 7.4 The expectation of the number of Support Vectors obtained
| during training on a training set of size m, divided by m, is an upper bound
| on the expected probability of test error of the SVM trained on training sets
| of size m-1.
`---
and that is what makes SVMs great -- decision boundary derived on relatively
few number of samples out of the whole large population is known to provide
good generalization performance.  But then what happens if, once again, we have
small sample size and large feature space?  We do get lots of SVs (in my
experience you can easily get all of your samples in SVs even with relatively
hard margin, ie high C), theoretical bounds become wider; hence "almost"
might loose its meaning of "close to none" because none of the estimates
become reliable (variance is high)

going back to comment 1., it becomes very important if you have small sample
size, possibly balanced and then you assess performance on n-1, where set
becomes unbalanced:

> In the leave-one-out case you have just one example to compute the
> error rate on the test set which can be (as Yarik said) 0% or
> 100%. This is a poor estimate of the error rate for that fold, but it
> will be used just to compute the final average, which is then OK.
and you easily achieve your average at 100% (got myself quite often with SVM
when did this evil, ie taking just a single sample out) or 0% (never crafted
such, but it is easy to come with such if you already have one giving 100%
error ;) ) as well.  So here comes your bias of LOO with literally no variance,
simply because it is ok with the bounds which are now very wide ;)

> is not a problem: in cross-validation the variance depends on the
> number/size of the folds so it is quite arbitrary (and of little use
> in most of practical cases) since you are free decide this
> number/size.
well, not exactly in the case with fMRI (read non-independent samples) where
you are quite actually restricted on what to take as take-out pieces.  Imagine
a database of handwritten digits (e.g. MNIST) where exactly the same picture
for each digit was present in data set multiple times? what would happen with
your LOO bias if you don't take care about keeping "the same" pictures together
(in terms of training/testing splits)?

and that is imho one of nuisances of applying any generalization estimate /
mixed effects analysis to fMRI data.

> As for the small sample set case I am reading about it and testing
> some code. It seems to lead to high bias under some regimes.
here we go.... to get consistent results -- just analyze some nice noise
without structure, so SVM would have no chance to  learn anything, bound gets
wide, and you get your bias.

> But it
> sounds conceivable to me since small samples can't represent properly
> the problem except in very simple cases. Again if you have literature
> on this please let me know.

I never really investigated this topic deeply enough to share good
references I have digested myself and found worth sharing, sorry ;) Just was
sharing my own experience (and what others (e.g. advisors, teachers) put into
my brain).

but this one might become interesting on a touching point (if you don't have this one yet in your toolbelt):
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.1930&rep=rep1&type=pdf
An Analysis of the Anti-Learning Phenomenon Of the Class Symmetric Polyhedron

>  I can share mine if interested.
sure!
-- 
                                  .-.
=------------------------------   /v\  ----------------------------=
Keep in touch                    // \\     (yoh@|www.)onerussian.com
Yaroslav Halchenko              /(   )\               ICQ#: 60653192
                   Linux User    ^^-^^    [175555]