[pymvpa] retraining
Yaroslav Halchenko
debian at onerussian.com
Wed May 13 03:41:37 UTC 2009
On Sat, 11 Apr 2009, Scott Gorlin wrote:
> Hmm... I'm not sure I understand your suggestion. Won't the support
> vectors change with each new chunk in cross validation? Or at least the
> coefficients won't be identical. Is there a paper you know of which
> describes this? It seems like training on the whole set will ruin the
> notion of iid distribution of the output error on each fold.
Let me just give 1 little example...
Let say you have 10 samples (with indexes 0 1 2 3 4 5 6 7 8 9).
Then you took ALL of them and trained SVM which selected only (3 4 6) to
be SVs. Now, if you take any subsample of samples, which includes (3 4
6) and any other of other samples out of those 10, you will get THE
SAME result, since those additional samples are not SVs (even whenever
trained on all the samples).
So, fast CV is possible if you first train SVM on all the samples, then
go by each training/testing split and
if testing split consists of what is known to be a SV if trained on all
samples, you have to retrain SVM on that training part of the split and
estimate results on the testing part
if testing split does not contain any SVs (as from the SVM trained on
all samples) -- then you know that they were classified correctly ;)
(otherwise they would get into the margin or even misclassified, and in
both cases would be SVs in original trained-on-all samples SVM)
makes sense? sorry that I don't remember if there is any paper, but
svmlight has this logic built in -- so you could check the author's book
or papers ( I believe he had quite a few) -- may be there is smth more
descriptive
--
.-.
=------------------------------ /v\ ----------------------------=
Keep in touch // \\ (yoh@|www.)onerussian.com
Yaroslav Halchenko /( )\ ICQ#: 60653192
Linux User ^^-^^ [175555]
More information about the Pkg-ExpPsy-PyMVPA
mailing list