[pymvpa] retraining
Scott Gorlin
gorlins at MIT.EDU
Wed May 13 17:38:57 UTC 2009
i'll look into this... i understand your logic in the case of a
hard-margin classifier but it's not intuitively obvious to me that this
will be true on a soft-margin. then again, i've been on the beach all
week ;)
Yaroslav Halchenko wrote:
> On Sat, 11 Apr 2009, Scott Gorlin wrote:
>
>> Hmm... I'm not sure I understand your suggestion. Won't the support
>> vectors change with each new chunk in cross validation? Or at least the
>> coefficients won't be identical. Is there a paper you know of which
>> describes this? It seems like training on the whole set will ruin the
>> notion of iid distribution of the output error on each fold.
>>
>
> Let me just give 1 little example...
>
> Let say you have 10 samples (with indexes 0 1 2 3 4 5 6 7 8 9).
> Then you took ALL of them and trained SVM which selected only (3 4 6) to
> be SVs. Now, if you take any subsample of samples, which includes (3 4
> 6) and any other of other samples out of those 10, you will get THE
> SAME result, since those additional samples are not SVs (even whenever
> trained on all the samples).
>
> So, fast CV is possible if you first train SVM on all the samples, then
> go by each training/testing split and
>
> if testing split consists of what is known to be a SV if trained on all
> samples, you have to retrain SVM on that training part of the split and
> estimate results on the testing part
>
> if testing split does not contain any SVs (as from the SVM trained on
> all samples) -- then you know that they were classified correctly ;)
> (otherwise they would get into the margin or even misclassified, and in
> both cases would be SVs in original trained-on-all samples SVM)
>
>
> makes sense? sorry that I don't remember if there is any paper, but
> svmlight has this logic built in -- so you could check the author's book
> or papers ( I believe he had quite a few) -- may be there is smth more
> descriptive
>
>
>
More information about the Pkg-ExpPsy-PyMVPA
mailing list