[pymvpa] adding GPR: some questions

Mon May 19 15:05:52 UTC 2008

Dear all,

I'm not disappeared. I'm pretty busy because of my forthcoming daughter
(expected
in these weeks) and other new tasks. But I've spent some time to bridge
Gaussian
Process Regression to PyMVPA.

My log so far:
- cloned git master
- branched master to "eo/master" as suggested by Yaroslav
- added a class with (basic) implementation of Gaussian Process
Regression (GPR)
- added a Kernel class and subclass KernelSquaredExponential to be fed
to GPR
- added hopefully compliant docstrings
- clean up many parts to comply most of pylint warnings
- addedd GPR to standard unit tests. One fails.
- set up a little test, even though not in unittest style (it's in the
__main__
  of gpr.py). Will be moved in the proper place later.

To be honest I've spent almost all time to jump on tools and follow dev
guidelines. So the "new" code is pretty simple. I expect to be quicker as
time goes on.

Now some questions:
- where should I put the variance of each prediction? GPR compute mean and
variance of each prediction. Mean is returned by _predict. What about
variance?
- would you like to have a look to my commits/patches and send me feedback?
- Yaroslav mentioned to send patches by email. I'm still learning git
and fighting with git-format-patch. Any hint?
- is it OK to name "kernel.py" and "gpr.py" my two new files? I can't
find rules on this but looks like other files.
- one of the unit tests fails:
======================================================================
FAIL: testCorrectDimensionsOrder (test_clf.ClassifiersTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "~/work/pymvpa/git/pymvpa/tests/tests_warehouse.py", line 50, in
do_sweep
    method(*args_, **kwargs_)
  File "~/work/pymvpa/git/pymvpa/tests/test_clf.py", line 365, in
testCorrectDimensionsOrder
    (`clf`, traindata.samples, clf.training_confusion.percentCorrect))
AssertionError: Classifier
GPR(kernel=Squared_Exponential_kernel(length_scale=0.010000),
enabled_states=['training_time', 'predicting_time',
'training_confusion', 'predictions', 'trained_labels']) must have 100%
correct learning on [[ 0.  0.  1.]
 [ 1.  0.  0.]]. Has 0.000000 on clf =
GPR(kernel=Squared_Exponential_kernel(length_scale=0.010000),
enabled_states=['training_time', 'predicting_time',
'training_confusion', 'predictions', 'trained_labels'])

======================================================================
It seems that GPR is not able to learn a simple dataset (as well as
RidgeReg, as the source says). Is this a big problem?

Regards,

Emanuele