[pymvpa] retraining

Sun Apr 12 00:53:56 UTC 2009


Yaroslav Halchenko wrote:
> Another aspect you might benefit from in case of SVM is the fact that
> some samples are not influencing SVM performance (ie the non-support
> vector ones). So, you can speed up n-fold (or pure leave-one-out)
> considerably if there is only few SVs -- same strategy is used by
> lightsvm:
>
> 1. train SVM on all samples
>
> 2. in n-fold testing check if testing set includes any of support
> vectors. If not -- then you already know what would be the result (the
> same) if you train the SVM without their participation ;)
>
> This strategy is giving especially large speed up if number of chunks is
> large (or each sample is a chunk like in leave-1-out) and number of SVs
> is small.
>
>   
Hmm... I'm not sure I understand your suggestion.  Won't the support 
vectors change with each new chunk in cross validation?  Or at least the 
coefficients  won't be identical.  Is there a paper you know of which 
describes this?  It seems like training on the whole set will ruin the 
notion of iid distribution of the output error on each fold.

> Just think may be about accounting for such scenario as well?
>
> sorry for not being too up-to-point with the reply, but I will get
> through your email/code some time later whenever I get a chance ;)
>
>   
No problem... VSS is looming anyway, so polished code isn't my priority 
now either ;)

-S