[pymvpa] retraining

Sat Apr 11 18:13:37 UTC 2009

Hi Scott!

Thanks a lot for your contributions!!! I am too swamped with everything
at the moment to give them any careful look -- but all sounds very nice
and worthwhile.

Another aspect you might benefit from in case of SVM is the fact that
some samples are not influencing SVM performance (ie the non-support
vector ones). So, you can speed up n-fold (or pure leave-one-out)
considerably if there is only few SVs -- same strategy is used by
lightsvm:

1. train SVM on all samples

2. in n-fold testing check if testing set includes any of support
vectors. If not -- then you already know what would be the result (the
same) if you train the SVM without their participation ;)

This strategy is giving especially large speed up if number of chunks is
large (or each sample is a chunk like in leave-1-out) and number of SVs
is small.

Just think may be about accounting for such scenario as well?

sorry for not being too up-to-point with the reply, but I will get
through your email/code some time later whenever I get a chance ;)

Cheers
Yarik

On Wed, 08 Apr 2009, Scott Gorlin wrote:

> Okay this email is admittedly way to long... but hopefully some of you  
> will find it as exciting as I do!
>
> I looked through some of the retraining stuff, and (for my purposes at  
> least) it doesn't seem to address the issue - the main reason I need to  
> accelerate retraining is for cross-validation, specifically model  
> selection, which will repeatedly change the trained data set etc (and  
> basically force full training in each implementation I looked at).
>
> To this end, I have an implementation of a cached-kernel SVM that  
> greatly accelerates n-fold cv since it only computes the kernel once for  
> an entire cv session.  This can theoretically be interfaced with model  
> selection to change model params quickly and kernel params slowly, and  
> speed up the entire process further.  I'm having some odd errors with  
> alioth right now and can't push (don't know if it's server-side or  
> something's actually wrong with my repo - i'm getting weird sha1 file  
> permission errors, so i'll try again tomorrow)
>
> This seems like something which would be of great general interest,  
> especially w.r.t. model selection, so let me spell out the architecture  
> if anyone cares to comment (or wait for my push):
>
> CachedSVM inherits from sg.SVM.  doesn't overide __init__, so you create  
> any shogun SVM implementation you want.
>
> implements cache(self, dataset), which will take in *all* data that clf  
> is meant to be used with.  This:
> 1) creates instance of the kernel defined in __init__ as normal, ie  
> still respecting normal kernel params, but stores the kernel matrix  
> using Kernel.get_kernel_matrix()
> 2) strips the dataset with dout = dset.selectFeatures([0]), dout.samples  
> = N.arange(dset.nsamples), return dout
>
> Usage: dcached = clf.cache(dset); CrossValidatedTransferError(clf,  
> ...)(dcached)
>
> _train(self, dset) and _predict(self, samples):
> 1) Accept only integer samples, ie those returned by cache(), which  
> indicate the stored sample index
> 2) Build a new kernel using shogun.Kernel.CustomKernel(), and taking the  
> appropriate indices from the samples array from the kernel matrix
> 3) other training, predicting steps basically copied from sg.SVM
>
> On an NFoldSplitter of normalFeatureDataset(perlabel=500,  
> nfeatures=2000, nchunks=4):
> Kernel cache: 2.152509 s
> Cached CV: 0.492900 s
> Normal CV: 12.817234 s
>
> using CrossValidatedTransferError, sg.SVM() and corresponding default  
> params in mine
>
> From my perspective, the benefits of this architecture are:
>
> 1) Inheriting from sg.SVM allows for, in theory, any known SVM  
> implementation, based on any shogun kernel
> 2) Otherwise acts as standard clf
> 3) Easy to take parameters from this (ie after cv model selection) and  
> plop them in a normal shogun clf of the same type
> 4) Bootstrapping should be awesomely fast
>
> Drawbacks:
> 1) much of the _train and _predict is copy/pasted from the sg.SVM class  
> since it couldn't be directly inherited.  It's a bit messy, and will  
> require some attention, but this isn't necessarily critical.  The main  
> issue is that I don't know all the details of why each bit in those  
> routines is important, so I may inadvertently snip out a critical bit.
> 2) Requires explicitly caching a dataset (or raw samples).  Would be  
> better to handle this automatically, but I have the intuition that this  
> may slow things down.  Since I'm also building a parameter selection  
> utility, it would be nice to have it act identically to a normal clf -  
> or maybe I'll just have a subclass of the psel since it is based on CV  
> error and could be directly designed to work with this.
> 3) Shogun-dependent.  Dunno yet about writing a libsvm version, I see  
> there is a precomputed kernel option, but I started with shogun due to  
> notes below.  Either way, looks like we'd need a version for each.
> 4) Shogun-SVM dependent.  Would be great to make it general for any  
> kernel-based clf, but for now it looks like things will be  
> implementation-dependent
>
> Alternate strategies:
> 1) Add 'custom' to sg.SVM._KNOWN_KERNELS.  However, the interface isn't  
> the same (CustomKernel can't be called with lhs, rhs, so I'd have to  
> modify sg.SVM._train and _predict) and this doesn't prevent requiring a  
> cache() method, since _train is never called with the full data set
> 2) Inherit from another target which provides a better kernel  
> abstraction.  To my knowledge, none exist, and I can't imagine it being  
> worth refactoring sg.SVM just for this class.  Libsvm backend looks like  
> it would have the same issues.
> 3) Write a kernel abstraction class, or even just a class that exposes  
> the shogun.Kernel interface, to handle the caching directly.  Add this  
> to known kernel types in sg.SVM.  Not sure if this has a general use  
> case since CustomKernel is the only class who's API differs
> 4) One possibility for automatic caching is a runtime hash of each new  
> input sample.  I expect this to degrade performance, but I don't have an  
> intuition to how much.  This may also cause problems of growing in a  
> loop depending on implementation.  but, it would allow for complete  
> automation without a cache() method
>
> Use cases (where it's better than a standard clf)
> 1) CV
> 2) Bootstrapping/jack-knifing for clfs which don't implement retrain()  
> or where trainingdata changes
> 3) others?? could also do something similar for any predefined custom  
> kernel, but i can't think of a real use for that at the momement
>
> Fortunately this all turns out to be minimal work - not including stuff  
> copied from from SVM, I'm only looking at a handful of lines of code.  I  
> love python :)
>
> Thoughts?  Sorry for the wind...
>
> Btw yarik -- congratulations!!
> -s
>
>
> Yaroslav Halchenko wrote:
>> heh heh... indeed... phd defense tomorrow... computers/software does not
>> cooperate etc...

>> so... retraining -- it is only partially implemented now in pymvpa

>> sg.SVM
>> SVR

>> have some retraining capabilities and in sg.SVM it is somewhat messy...
>> in SVR it is somewhat not working (as I discovered recently while
>> changing kernel parameters).

>> but those are the codes which you can check out to start crafting your
>> way through 'retraining' -- it should help in many cases (lots of
>> features, not that many samples -- so whenever lots of time is taken to
>> precompute the kernels). It should be really nice to have in SMLR... but
>> it is not there.  Meta classifiers (split,featureselection) also do not
>> have it all now (iirc).

>> I will come back to it after this Wed ;) but you are welcome to inspect
>> what is already done and suggest what should be done

>> On Mon, 30 Mar 2009, Michael Hanke wrote:

>>> Hi Scott,

>>> On Mon, Mar 30, 2009 at 02:01:06PM -0400, Scott Gorlin wrote:

>>>> Hi guys,

>>>> What's the latest thoughts on precalculated/retrainable 
>>>> classifiers?  I.e., it could greatly speed up cross-validated model 
>>>> selection... it looks like libsvm's precomputed kernel isn't 
>>>> implemented though, is this due to a wrapper limitation?  Or are 
>>>> people generally using other classifiers which have clf.retrain() 
>>>> implemented?

>>>> If I were to implement this, any tips on where to begin?

>>> Yarik is the expert on classifier retraining, but he is currently quite
>>> busy and it might take a while until he can reply...

>>> Michael

>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>

-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik