[pymvpa] adding GPR: some questions

Emanuele Olivetti emanuele at relativita.com
Mon May 19 15:05:52 UTC 2008

Dear all,

I'm not disappeared. I'm pretty busy because of my forthcoming daughter
in these weeks) and other new tasks. But I've spent some time to bridge
Process Regression to PyMVPA.

My log so far:
- cloned git master
- branched master to "eo/master" as suggested by Yaroslav
- added a class with (basic) implementation of Gaussian Process
Regression (GPR)
- added a Kernel class and subclass KernelSquaredExponential to be fed
to GPR
- added hopefully compliant docstrings
- clean up many parts to comply most of pylint warnings
- addedd GPR to standard unit tests. One fails.
- set up a little test, even though not in unittest style (it's in the
  of gpr.py). Will be moved in the proper place later.

To be honest I've spent almost all time to jump on tools and follow dev
guidelines. So the "new" code is pretty simple. I expect to be quicker as
time goes on.

Now some questions:
- where should I put the variance of each prediction? GPR compute mean and
variance of each prediction. Mean is returned by _predict. What about
- would you like to have a look to my commits/patches and send me feedback?
- Yaroslav mentioned to send patches by email. I'm still learning git
and fighting with git-format-patch. Any hint?
- is it OK to name "kernel.py" and "gpr.py" my two new files? I can't
find rules on this but looks like other files.
- one of the unit tests fails:
FAIL: testCorrectDimensionsOrder (test_clf.ClassifiersTests)
Traceback (most recent call last):
  File "~/work/pymvpa/git/pymvpa/tests/tests_warehouse.py", line 50, in
    method(*args_, **kwargs_)
  File "~/work/pymvpa/git/pymvpa/tests/test_clf.py", line 365, in
    (`clf`, traindata.samples, clf.training_confusion.percentCorrect))
AssertionError: Classifier
enabled_states=['training_time', 'predicting_time',
'training_confusion', 'predictions', 'trained_labels']) must have 100%
correct learning on [[ 0.  0.  1.]
 [ 1.  0.  0.]]. Has 0.000000 on clf =
enabled_states=['training_time', 'predicting_time',
'training_confusion', 'predictions', 'trained_labels'])

It seems that GPR is not able to learn a simple dataset (as well as
RidgeReg, as the source says). Is this a big problem?



More information about the Pkg-ExpPsy-PyMVPA mailing list