[Pkg-exppsy-pymvpa] model selection and other questions

Thu Apr 3 02:04:08 UTC 2008

On Wed, 02 Apr 2008, Emanuele Olivetti wrote:
> GPR is about regression (like SVR). I plan to work quite a bit on
> regression (the humble sister of classification :) ) so it would be
> nice to have the right place for it in PyMVPA.
Even now you can place regressors as a subclass of Classifier. Actually
having some regression and example dataset to apply it over would be
great to tessellate things out of Classifier which are not adherent to
regression (uniquelabels etc). Actually Per has introduced some time ago
RidgeReg (Ridge Regression) but so far I haven't chance to use it...

> I'm still in very early stages in understanding your package. For
> what it could help,  I'll try to give you feedback on regression
> since it is one of my main interests.
thank you in advance -- we need input on that I think.

> There are some GPR implementations available [0] but none of
oops... either me is too tired or URL was omitted from the trailer.

> them seems to be in python.
Well - may be there is a reasonable to adopt and to bind via swig or
ctypes?

> Mine is not yet public because was
> implemented to solve a specific problem over the last 2 years [1].
oops... again... I guess it had to do with PBAIC? ;)

> It is quite usable by me, but hard for anyone else due to the quick dirty
> coding. Much of my time during the last 2 years were spent on a different
> >...<
> By the way GPR seems to perform nicely on fMRI data :)
better it does. I guess you've done comparison to SVM regressions, since
you've chosen this one specifically over them... was it much superior?

> >> About what you call "sensitivity analyzers" (which we name "feature
> >> raters"),
> > And I guess (Michael?) if we stop at FeaturewiseMeasure name as the
> I'd prefer to use standard naming for machine learning terms. You
> could object asking "which is the standard naming"... :) 
exactly. I wish there was such 100% proof term.
iirc in ML it is common to describe it as a
"sensitivity" which is linked to 
Sensitivity Analysis http://en.wikipedia.org/wiki/Sensitivity_analysis
and is simply all over in ANN (just google ANN and sensitivity ;-))

But Sensitivity is wrong term for ANOVA, which is model-less, whenever
sensitivity assumes something to operate on (ie sensitivity of ANN to
hidden layer inputs, etc). Thus we think about refactoring super class
into some other name.

> . Just take
> into account that regression has its feature raters too. I have one.
Sure -- everything and anything is sensitive to different factors, thus
some raters might help up say to what it is sensitive most ;-)

> > to describe at least some of the internals of PyMVPA, thus you can start
> > off adopting your algorithms into PyMVPA within your own clone of git
> > repository and feed us only with desired pieces whenever you feel so ;-)
> I have no problem in delivering everything I can. The problem is
> to make it usable.
just let others to judge on usability and help you along.
Probably you have already a clone of our repository, just create
eo/master (or whatever abbreviation you like for you name) and start by
pulling at least something in and composing a tiny testcase for it. If
there is no really something specific to it -- ie it is just a
classifier -- just add it to corresponding group in
tests/tests_warehouse_clfs.py and it might become already tested by some
basic "common" unittests. Then just push your branch back to our
repository (as Michael pointed out we are quite open in providing write
access to it) or send us a patch (git-format-patch and git-email are
quite handy). Anything would count!

Actually usually we use <Name>/master for development which is somewhat
definitely should make into master. For tentative branches we use _tent/
prefix. Thus having _tent/gpr branch would be great!

The point is -- just to start! Code might be far from perfect at the
beginning and custom to something specific -- but spare eyes and hands
might come handy ;-) just push, fetch and merge often ;-)

> Thank you for pointing me to the dev guide. I
> will read it for sure. I really appreciate you have started defining
> explicitly coding guidelines etc. This seems quite unusual for
> other machine learning packages.
well... since we work in philosophically different environments (read it
emacs vs vim vs ...), reaching firm agreement was needed ;-) our naming
agreements (camelcase mixed with non camelcase) is somewhat evil but at
least consistent (thanks for pylint).
unfortunately devguide is far from being complete but it should be a
good helper to stay "in style" ;)

> I know of MLOSS and that article. I really think it is a worthwhile
> initiative and I want my effort to reach that level.
imho OSS should be the only way to go in most of the cases for the
software developed in government-funded research and education. It
is so sad to see how much of development effort was wasted in disjoint
little projects which died soon after a person was graduated/lost
interest/etc.

> > http://pkg-exppsy.alioth.debian.org/pymvpa/devguide.html
> > http://pkg-exppsy.alioth.debian.org/pymvpa/api/index.html
> Thanks again for the pointers.
Also there is somewhat a discussion on scipy mailing list and google
summer of code proposal from Anton
http://slesarev.anton.googlepages.com/proposal
which might be worth looking at/discussing

> > we are open to discussion any time (though some times we might get
> > busy... like now I should have done something else ... not PyMVPA
> > related ;-)). But for quick and interesting discussion we should
> > really do voice + VNC sessions. Are you skype by any chance?
> Interesting. I'll mail you privately my skype ID. But give me a bit of
> time to jump on PyMVPA: I don't want to waste your time.
I think it might be very productive, thus wouldn't be a waste... I
really wasted some other times in much worse fashion ;-) Since interest
raises may be we could setup some general meeting (I meant voice +
VNC) when everyone would be available to participate... and invite
Anton and any other party intrested as well...

> There are many ways to define "best" parameters for a learning
> model: maximize accuracy, maximize marginal likelihood, Bayesian
> methods, a priori knowledge etc. Each of them can be reached with
> different techniques (standard optimization, grid search, MCMC etc.).
> I would prefer to have several algorithms to do that task, each with
> their own strength and weakness. Or at least an architecture to plug
> them easily when needed.
right in the pot. Defining clear interface should help everyone out.
And I am not sure what should be specification of those at the moment.
Since you seems to have better exposure to what is needed it would be
great indeed if you provided some input/ideas.

> My current code is quite simple: GPR can express analytically the
> marginal likelihood of labels given (training) data as a function of
> hyperparameters.
sounds sweet.

> using SVMs one year ago (from SciPy sandbox, now scikits.learn)
> maximizing the accuracy on training data this time, but in the same
> way. It was not efficient, but it worked.
accuracy on training data -- is that learning error or some other
internal assessment (e.g. leave-1-out cross-validation on the
training data)?

-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik