[pymvpa] retraining

Wed May 13 03:41:37 UTC 2009

On Sat, 11 Apr 2009, Scott Gorlin wrote:
> Hmm... I'm not sure I understand your suggestion.  Won't the support  
> vectors change with each new chunk in cross validation?  Or at least the  
> coefficients  won't be identical.  Is there a paper you know of which  
> describes this?  It seems like training on the whole set will ruin the  
> notion of iid distribution of the output error on each fold.

Let me just give 1 little example...

Let say you have 10 samples (with indexes 0 1 2 3 4 5 6 7 8 9).
Then you took ALL of them and trained SVM which selected only (3 4 6) to
be SVs.  Now, if you take any subsample of samples, which includes (3 4
6) and any other of other samples out of those 10, you will get THE
SAME result, since those additional samples are not SVs (even whenever
trained on all the samples).

So, fast CV is possible if you first train SVM on all the samples, then
go by each training/testing split and 

 if testing split consists of what is known to be a SV if trained on all
 samples, you have to retrain SVM on that training part of the split and
 estimate results on the testing part

 if testing split does not contain any SVs (as from the SVM trained on
 all samples) -- then you know that they were classified correctly ;)
 (otherwise they would get into the margin or even misclassified, and in
 both cases would be SVs in original trained-on-all samples SVM)

makes sense? sorry that I don't remember if there is any paper, but
svmlight has this logic built in -- so you could check the author's book
or papers  ( I believe he had quite a few) -- may be there is smth more
descriptive

-- 
                                  .-.
=------------------------------   /v\  ----------------------------=
Keep in touch                    // \\     (yoh@|www.)onerussian.com
Yaroslav Halchenko              /(   )\               ICQ#: 60653192
                   Linux User    ^^-^^    [175555]