[pymvpa] retraining

Wed May 13 17:38:57 UTC 2009

i'll look into this... i understand your logic in the case of a 
hard-margin classifier but it's not intuitively obvious to me that this 
will be true on a soft-margin.  then again, i've been on the beach all 
week ;)

Yaroslav Halchenko wrote:
> On Sat, 11 Apr 2009, Scott Gorlin wrote:
>   
>> Hmm... I'm not sure I understand your suggestion.  Won't the support  
>> vectors change with each new chunk in cross validation?  Or at least the  
>> coefficients  won't be identical.  Is there a paper you know of which  
>> describes this?  It seems like training on the whole set will ruin the  
>> notion of iid distribution of the output error on each fold.
>>     
>
> Let me just give 1 little example...
>
> Let say you have 10 samples (with indexes 0 1 2 3 4 5 6 7 8 9).
> Then you took ALL of them and trained SVM which selected only (3 4 6) to
> be SVs.  Now, if you take any subsample of samples, which includes (3 4
> 6) and any other of other samples out of those 10, you will get THE
> SAME result, since those additional samples are not SVs (even whenever
> trained on all the samples).
>
> So, fast CV is possible if you first train SVM on all the samples, then
> go by each training/testing split and 
>
>  if testing split consists of what is known to be a SV if trained on all
>  samples, you have to retrain SVM on that training part of the split and
>  estimate results on the testing part
>
>  if testing split does not contain any SVs (as from the SVM trained on
>  all samples) -- then you know that they were classified correctly ;)
>  (otherwise they would get into the margin or even misclassified, and in
>  both cases would be SVs in original trained-on-all samples SVM)
>
>
> makes sense? sorry that I don't remember if there is any paper, but
> svmlight has this logic built in -- so you could check the author's book
> or papers  ( I believe he had quite a few) -- may be there is smth more
> descriptive
>
>
>