[Pkg-exppsy-pymvpa] model selection and other questions

Tue Apr 1 15:30:21 UTC 2008

Hi Emanuele,

Let me follow-up to keep fruitful discussion rolling. And I am sorry if
my reply is somewhat sketchy and hard to digest in places ;-)

> > algorithms. So in the future PyMVPA will be based on functionality in
> > scikits.learn, but the actual user interface with its workflow will not
> > necessarily be part of it.
> and has hope to gain popularity. My current candidates are
> scikits.learn and, more recently, PyMVPA. The learn module seems
> to lack a global architecture a bit (most probably because it is an
> explicit a choice)
Imho it should be that way since toolkit should be just a set of tools
based on which higher-level frameworks could be developed (such as
PyMVPA ;-))

> in order not to encumber new contributions and be
> simple. PyMVPA  is definitely more workflow oriented, and (good news!)
Of cause I would advocate you to contribute to PyMVPA. Mutliple reasons
are: we are trying to push pymvpa to be usable and used by people.
Besides that we home that the internal structure of the framework is
good enough for contributors to extend it with additional ML tools.

> Since you plan to be based on scikits.learn in (near?) future
It is not that we are planing to be based on it - but rather use
scikits.learn functionality whenever it becomes usable ;-) One of the
design decisions of PyMVPA -- we don't want to reimplement the wheel.
Thus if scikits.learn provides implementations of some algorithms we
better make use of them instead of recoding them. It is somewhat
pathetic that virtually every python toolbox out there which uses libsvm
needs its custom swig wrapper to get needed interface of libsvm exposed.
That is afaik due to the fact that libsvm authors don't accept outside
patches for their library, which is imho really sad and demolishes big
part of open-sourceness advantage of the project. In PyMVPA any
ideas/submissions are welcome if they are reasonable.

> , here is a new question: what do you suggest to do to me and other
> possible contributors? Deliver something first to scikits.learn or to
> try to address directly PyMVPA?
Well -- anyway you think is the best, but scikits.learn at the moment is
not that active and it is not that clear if it will remain in current
development shape. In our case we are planning to contribute relevant
parts of PyMVPA which should be part of scikits.learn whenever it gets
off the dead point: we have few classifiers implementations missing from
.learn (eg SMLR). That would make them usable by wider range of users,
but also would offload some of maintenance burden from PyMVPA.
Also as I stated in scipy mailing list -- I think we have got few
interesting design decisions, which scikits.learn could adopt to benefit
its users. Thus, even if you contribute now to PyMVPA, it doesn't mean
that all your contributions will get stuck there. Since scikits.learn is
for a wider audience it would make sense to offload as much as possible
of generic ML machinery away from PyMVPA into scikits.learn, not now
though, but whenever it is ready to accept it so we could use it easily
without adopting our bindings to it weekly.

> I work with probabilistic models and I have some code on Gaussian
> process regression (GPR) [0] (classification will be available in near
> future). Since GPR is a kernel method it shares many underlying idea
> with other kernel methods (e.g., SVM) it can be part of a common
> architecture.
yet 1 more thing to refactor a bit in PyMVPA is to adopt ML regressors.
They are already there but basic class for all classifiers is now
Classifier and it makes use of ConfusionMatrix for some state variables. That
should be different for regressors since there is no confusion matrix
per se.
And of cause having new ML tools would be simply GREAT!
Did you find GPR performing nicely for fMRI data? is there python
implementation under some OSS license available? ;)

> About what you call "sensitivity analyzers" (which we name "feature
> raters"),
Naming in PyMVPA is still somewhat unstable, and Sensitivity Analyzer is
one of the candidates for name refactoring ;-) Actually in our case
SensitivityAnalyzer is just a FeaturewiseDatasetMeasure class...
But it is not performing rating by default (actually we are yet to add a
simple Rater transformer) but rather some raw measure per feature.

And I guess (Michael?) if we stop at FeaturewiseMeasure name as the
basic class, we could refactor ClassifierBasedSensitivityAnalyzer simply
into ClassifierSensitivity because that is what it is (that is btw what
confused Dr. Hanson in naming scheme  -- ANOVA is not a sensitivity per
se)...

> I have implemented the I-Relief algorithm [1] for multi-class
> problems, together with some simpler rater based on mutual information
> or linear regression. More recently I implemented the GPR-ARD [2]
> algorithm and applied to neuroimaging data.
Everything sounds really cool and that is great that you are thinking
about exposing your implementation to the world ;-) If you decide to
contribute to PyMVPA -- let us know, we will provide you with full
access to the repository. Also we have developer guidelines
http://pkg-exppsy.alioth.debian.org/pymvpa/devguide.html
to describe at least some of the internals of PyMVPA, thus you can start
off adopting your algorithms into PyMVPA within your own clone of git
repository and feed us only with desired pieces whenever you feel so ;-)

I am not sure if you are aware of it but let me bring your attention to
http://mloss.org/about/ which was created primarily to advocate
open-sourcing of scientific software for various reasons. There is a
paper which accompanies it
http://jmlr.csail.mit.edu/papers/v8/sonnenburg07a.html
which is worth glancing over.

> In these days I'm trying/reading PyMVP and scikits.learn code. If you
> have suggestions on how/where to start please let me know. I'm somewhat
> proficient in Python, but not that much.
well.. pymvpa links you must know by now:
http://pkg-exppsy.alioth.debian.org/pymvpa/
which leads to
http://pkg-exppsy.alioth.debian.org/pymvpa/devguide.html
http://pkg-exppsy.alioth.debian.org/pymvpa/api/index.html

> > So, there probably won't be a separate algorithm sitting in top of the
> > classifiers (like libsvm grid search script), but another meta-classifier
> > that enables parameter optimization for every other type of classifier
> > and additionally can be used as a classifier on its own.
> The model selection problem is quite crucial IMHO.
Let me add -- unbiased model selection ;-) It is indeed critical ;-)

> I completely agree on
> having a generic tool to deal with all algorithms and different ways of
> doing it (grid search, different optimization algorithms etc.). If you
> are discussing on what to do, please let me know: I'm really interested.
we are open to discussion any time (though some times we might get
busy... like now I should have done something else ... not PyMVPA
related ;-)). But for quick and interesting discussion we should really
do voice + VNC sessions. Are you skype by any chance?

> Currently I'm using a really nice optimization tool (OpenOpt, from
> scikits) and optimizing the marginal likelihood of train data (approximating
> Bayesian model selection) for selecting the model instance of GPR. I want to
> extend this part (e.g.: optimizing accuracy with cross-validation) and
> enforce
> the use of OpenOpt which is definitely better than scipy.optimize.
We are still blurbing on what should be the best decision on how to
handle classifier parameters to provide efficient and easy way to
optimize things. If you could share some code pieces which make use of
OpenOpt -- that would be terriffic -- that would allow to see the
situation in wider perspective.

> A last question about the license:
> - NumPy/SciPy enforces the (modified) BSD license
> - scikits adresses a wider range but mainly (modified) BSD and GPL (v2?v3?)
> - PyMVPA is distributed under the MIT (X11) license.
> - shogun is GPL (v3)

> Which is the reason of MIT/X11 license instead of (modified) BSD or GPL?
Michael is much better aware of license situation so I will be quiet on
this one, but so far all copyright holders of the code are in-contact
and flexible in terms of what license to choose. We already thought
about double-licensing it so pymvpa could be used with shogun (GPL drags
us into GPL world)

> Kind Regards,
Cheers

> Emanuele

> [0]: http://www.gaussianprocess.org/
> [1]: http://plaza.ufl.edu/sunyijun/Paper/PAMI_1.pdf
> [2]: it is very similar to "Linear SVM Weights" but uses a
> squared exponential kernel instead
> http://books.nips.cc/papers/files/nips08/0514.pdf

> _______________________________________________
> Pkg-exppsy-pymvpa mailing list
> Pkg-exppsy-pymvpa at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik