[pymvpa] adding GPR: some questions

Mon May 19 15:32:31 UTC 2008

Hi Emanuele,

THANK YOU for fascinating news.

> I'm not disappeared. I'm pretty busy because of my forthcoming
> daughter (expected in these weeks) and other new tasks. But I've
> spent some time to bridge Gaussian Process Regression to PyMVPA.
CONGRATULATIONS!!!! We have 2 experienced kid cares in the team (Michael
and yours truly) and the 3rd active member is in the similar situation
to yours -- just give him few more month ;-) So if anything -- we might
serve as a good source of information on babies side as well ;-)

Let me first give you a slight update -- recently we have done quite a
bit of refactoring and introducing new features (mostly internal, ie not
altering user visible API) which are not yet in the master branch. Don't
worry about them -- we will take care about tuning up your code so it
takes advantage from those changes ;-) Just lets stay tuned (I am sorry
if this update is vague on words)

> To be honest I've spent almost all time to jump on tools and follow dev
> guidelines. So the "new" code is pretty simple. I expect to be quicker as
> time goes on.
Thank you very much for the patience and for giving a PyMVPA a try ;-)

Let me now go through the questions:
> - where should I put the variance of each prediction? GPR compute mean and
> variance of each prediction. Mean is returned by _predict. What about
> variance?
Since none of the other classifier/regressors has variance computed (at
least so it is visible outside) I think the best would be just to add
another StateVariable into your GPR class and store the values there.
Does it sound like a good fit? ;-)

> - would you like to have a look to my commits/patches and send me feedback?
I would be thrilled to do so, or we can simply add you into exppsy
project on alioth, so you could simply push your changes (ie your whole
branch). That would be the most effective way I guess. If you don't mind
to proceed that way -- just register yourself on
http://alioth.debian.org/
and provide us with login name (or request to join the exppsy team, I
think there must be a 'link' to do that)
Then we will grant you access to git repository

> - Yaroslav mentioned to send patches by email. I'm still learning git
> and fighting with git-format-patch. Any hint?
imho it works fine, what obstacles did you hit (git gives us hick up in
various aspects from time to time, so it would be of no surprise that
you hit some other stone on the way). Usually I just did
git-format-patch and was sending that battery of patches from my regular
mail client. Or you could try do smth like

git-format-patch --stdout master | git-send-email --to yoh at onerussian.com

> - is it OK to name "kernel.py" and "gpr.py" my two new files? I can't
> find rules on this but looks like other files.
you mean under mvpa/clfs right? then sure! though we kept thinking about
crafting our kernel.py but so far there were no direct urge thus we
didn't do that, but having your kernel.py we might see better if we
might need to elaborate it more for other uses.

> - one of the unit tests fails:
> ======================================================================
> FAIL: testCorrectDimensionsOrder (test_clf.ClassifiersTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "~/work/pymvpa/git/pymvpa/tests/tests_warehouse.py", line 50, in
> do_sweep
>     method(*args_, **kwargs_)
>   File "~/work/pymvpa/git/pymvpa/tests/test_clf.py", line 365, in
> testCorrectDimensionsOrder
>     (`clf`, traindata.samples, clf.training_confusion.percentCorrect))
> AssertionError: Classifier
> GPR(kernel=Squared_Exponential_kernel(length_scale=0.010000),
> enabled_states=['training_time', 'predicting_time',
> 'training_confusion', 'predictions', 'trained_labels']) must have 100%
> correct learning on [[ 0.  0.  1.]
>  [ 1.  0.  0.]]. Has 0.000000 on clf =
> GPR(kernel=Squared_Exponential_kernel(length_scale=0.010000),
> enabled_states=['training_time', 'predicting_time',
> 'training_confusion', 'predictions', 'trained_labels'])

> ======================================================================
> It seems that GPR is not able to learn a simple dataset (as well as
> RidgeReg, as the source says). Is this a big problem?
the problem only that someone needs to sit down and tune unittest up so
finally it works for all the classifiers ;-) so don't worry about it for
now, and just add another or into
if isinstance(clf, RidgeReg):
;-)

Once again -- thanks for all your work!

On Mon, 19 May 2008, Emanuele Olivetti wrote:

> Dear all,

> I'm not disappeared. I'm pretty busy because of my forthcoming daughter
> (expected
> in these weeks) and other new tasks. But I've spent some time to bridge
> Gaussian
> Process Regression to PyMVPA.

> My log so far:
> - cloned git master
> - branched master to "eo/master" as suggested by Yaroslav
> - added a class with (basic) implementation of Gaussian Process
> Regression (GPR)
> - added a Kernel class and subclass KernelSquaredExponential to be fed
> to GPR
> - added hopefully compliant docstrings
> - clean up many parts to comply most of pylint warnings
> - addedd GPR to standard unit tests. One fails.
> - set up a little test, even though not in unittest style (it's in the
> __main__
>   of gpr.py). Will be moved in the proper place later.

> To be honest I've spent almost all time to jump on tools and follow dev
> guidelines. So the "new" code is pretty simple. I expect to be quicker as
> time goes on.

> Now some questions:
> - where should I put the variance of each prediction? GPR compute mean and
> variance of each prediction. Mean is returned by _predict. What about
> variance?
> - would you like to have a look to my commits/patches and send me feedback?
> - Yaroslav mentioned to send patches by email. I'm still learning git
> and fighting with git-format-patch. Any hint?
> - is it OK to name "kernel.py" and "gpr.py" my two new files? I can't
> find rules on this but looks like other files.
> - one of the unit tests fails:
> ======================================================================
> FAIL: testCorrectDimensionsOrder (test_clf.ClassifiersTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "~/work/pymvpa/git/pymvpa/tests/tests_warehouse.py", line 50, in
> do_sweep
>     method(*args_, **kwargs_)
>   File "~/work/pymvpa/git/pymvpa/tests/test_clf.py", line 365, in
> testCorrectDimensionsOrder
>     (`clf`, traindata.samples, clf.training_confusion.percentCorrect))
> AssertionError: Classifier
> GPR(kernel=Squared_Exponential_kernel(length_scale=0.010000),
> enabled_states=['training_time', 'predicting_time',
> 'training_confusion', 'predictions', 'trained_labels']) must have 100%
> correct learning on [[ 0.  0.  1.]
>  [ 1.  0.  0.]]. Has 0.000000 on clf =
> GPR(kernel=Squared_Exponential_kernel(length_scale=0.010000),
> enabled_states=['training_time', 'predicting_time',
> 'training_confusion', 'predictions', 'trained_labels'])

> ======================================================================
> It seems that GPR is not able to learn a simple dataset (as well as
> RidgeReg, as the source says). Is this a big problem?

> Regards,

> Emanuele

> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik