[pymvpa] GPR woes

Sat Mar 28 19:43:33 UTC 2009

fair enough... btw are chunks 5 and may be 1 the ones which gave you
head ache? 

On Sat, 28 Mar 2009, Per B. Sederberg wrote:

> Here's the summary:

> In [41]: print dat.summary()
> Dataset / float32 2142 x 29155
> uniq: 6 chunks 2 labels
> stats: mean=-4.0314e-06 std=0.916641 var=0.84023 min=-1.23648 max=43.8952

> Counts of labels in each chunk:
>   chunks\labels  0   1
>                 --- ---
>        0        173 184
>        1        168 189
>        2        184 173
>        3        188 169
>        4        173 184
>        5        196 161

> Summary per label across chunks
>   label mean  std min max #chunks
>    0     180 9.81 168 196    6
>    1     177 9.81 161 189    6

> Summary per chunk across labels
>   chunk mean  std min max #labels
>    0     178  5.5 173 184    2
>    1     178 10.5 168 189    2
>    2     178  5.5 173 184    2
>    3     178  9.5 169 188    2
>    4     178  5.5 173 184    2
>    5     178 17.5 161 196    2

> It does look like there are some outliers in there based on the min
> and max above.  Note that I have already zscored the data at this
> point.

> I'm getting some really good classification now with the
> regularization parameter in there.  At some point I may try looking
> for outliers and explicitly removing them and then seeing if I need
> the regularization.  The good news is that the regularization is
> seeming to have no effect on the chunks of the data that didn't need
> it before, so I'm not really hurting anything by having it there (at
> least for this dataset).

> Best,
> P

> 2009/3/28 Yaroslav Halchenko <debian at onerussian.com>:
> > Hypothesis:
> > I think that one of the possible causes for such behavior is whenever
> > your features are not normed to lie approximately in the same range...
> > assume that you have one which is 100 and the rest is "proper" ones in
> > [-5, 5] (zscores?). Or you have some outliers/excursions from this
> > 'normed' case. Then while doing summation/dot-products. That heavy
> > feature would demolish the effect of variance in the "proper" features.

> > What is the histogram of your features? or just

> > print dataset.summary()

> > ?

> > On Sat, 28 Mar 2009, Per B. Sederberg wrote:

> >> Howdy all:

> >> I've got it working for my data (had to add what I thought was a
> >> pretty hefty regularization of .001 before it stopped giving me the
> >> error, but it turns out I get the same classification performance with
> >> a value of 2.0).

> >> I made it so that regularization is always applied, but that it
> >> defaults to 0.0, so the default behavior of the classifier is
> >> unchanged.  Here's the commit message:

> >> --------------------
> >> I'm not sure what to call it, so I put it all.  Anyway, I just added
> >> the regularization
> >> parameter to be used all the time by GPR.  It still defaults to 0.0,
> >> so there should be no
> >> change for people currently using GPR with no issues.  However, this
> >> enables the manual
> >> setting of the regularization if you run into the not positive,
> >> definite error.  I figured
> >> that if you were going to use regularization, you would want to use it
> >> all the time and not
> >> have different values on different folds of your data.
> >> ----------------------

> >> Thanks for all the input today.  I now have working GPR classification :)

> >> Best,
> >> Per

> >> On Sat, Mar 28, 2009 at 12:48 PM, Per B. Sederberg <persed at princeton.edu> wrote:
> >> > On Sat, Mar 28, 2009 at 12:06 PM, Scott Gorlin <gorlins at mit.edu> wrote:
> >> >> you could try 10x the calculated eps :)

> >> >> Depending on how long it takes to run your classifier, why don't you try
> >> >> varying epsilon logarithmically and plotting the error as a function of
> >> >> epsilon.  Most likely you'll see a local min around some optimum value of
> >> >> regularization, and then it will plateau out as epsilon increases - this
> >> >> should, at least, convince you that a high epsilon isn't that bad (if it is,
> >> >> then your data probably aren't as separable as they should be...)

> >> > A great idea!  I'll expose the regularization parameter as a variable
> >> > to GPR and try it out.

> >> >> All that high epsilon does in Tikhonov regularization is penalize steep
> >> >> hyperplanes, as it adds the norm of a function to the loss, so the results
> >> >> will be smoother (and theoretically generalize better).  So, it should be
> >> >> set so that  epsilon*function norm is ballpark equal to the raw classifier
> >> >> loss, so that it makes a difference (though I don't know the exact form in
> >> >> the GPR implementation; i'm just trying to say that there is an effective
> >> >> range which isn't necessarily approaching 0).  As long as Xfold validation
> >> >> still works with high epsilon, there's nothing to worry about - in fact,
> >> >> that's what it's designed to do.

> >> > So, this begs the question of whether we should have some level of
> >> > regularization all the time.  The main difficulty would be figuring
> >> > out what it should be (ballpark value) for different datasets (i.e.,
> >> > whether there is some principled first-pass for setting it based on
> >> > the features of your data.)

> >> > Thanks for all the info,
> >> > Per

> >> >> if you still get a linalg error with very high epsilon, then there's
> >> >> probably a different problem ;)

> >> >> Per B. Sederberg wrote:

> >> >>> hi s&e:

> >> >>> this is almost exactly what I did, but I still get the same error.  in
> >> >>> numpy you get the precision with:

> >> >>> np.finfo(np.float64).eps

> >> >>> I tried using that value, but actually checking the eps of the C
> >> >>> matrix, and also tried using the float32 eps, which is greater than 2
> >> >>> times the float64 eps.  all to no avail, so i'm worried my data will
> >> >>> need a much larger eps.  what are the bad consequences I could run
> >> >>> into by adding in a major regularization?

> >> >>> thanks,
> >> >>> p

> >> >>> ps: in case it's informative, I have over 1000 samples so the kernel
> >> >>> matrix is quite large...

> >> >>> On 3/28/09, Scott Gorlin <gorlins at mit.edu> wrote:

> >> >>>> I haven't checked the python syntax, but in Matlab, the eps function can
> >> >>>> take a number and return the smallest recordable numerical difference at
> >> >>>> that level.  If there is an equivalent in numpy then (say 2x) this would
> >> >>>> be a good alternative default epsilon (calculated at train time)

> >> >>>> Emanuele Olivetti wrote:

> >> >>>>> Dear Per and Scott,

> >> >>>>> You are both right and I feel quite guilty since I did not spend
> >> >>>>> time on GPR as I promised long time ago (basically lack of time). The
> >> >>>>> epsilon issue is exactly what Scott says: I discussed the same issue
> >> >>>>> with Yarick some time ago but then did not work on a principled
> >> >>>>> solution.
> >> >>>>> I'm fine if you patch the current code in order to increase epsilon
> >> >>>>> till a
> >> >>>>> certain extent. Any idea about alternative/principled solutions?

> >> >>>>> Moreover I'm (slowly) after another numerical instability due, I
> >> >>>>> guess, to
> >> >>>>> underflow in summation which could affect the marginal likelihood
> >> >>>>> computation (which means it is highly relevant for GPR). This is more
> >> >>>>> likely to happen when the dataset is large. So be prepared to that ;-)

> >> >>>>> As far as I remember I forced the kernel matrix to be double precision
> >> >>>>> for exactly the same reasons you mentioned, but checking more
> >> >>>>> accurately this fact is highly desirable.

> >> >>>>> Unfortunately I'm not going to work on GPR very soon, even though
> >> >>>>> I'd love to. So any patch is warmly welcome, as well as bug reports ;-)

> >> >>>>> Best,

> >> >>>>> Emanuele

> >> >>>>> Per B. Sederberg wrote:

> >> >>>>>> Hi Scott:

> >> >>>>>> You are correct on all accounts.  epsilon is currently hard-coded and
> >> >>>>>> either of your proposed solutions would help to make GPR work more
> >> >>>>>> frequently/consistently.  I'll try out some combination of both and
> >> >>>>>> report back.

> >> >>>>>> Best,
> >> >>>>>> Per

> >> >>>>>> On Sat, Mar 28, 2009 at 9:46 AM, Scott Gorlin <gorlins at mit.edu> wrote:

> >> >>>>>>> Epsilon is the Tikhonov regularization parameter, no?  (Haven't read
> >> >>>>>>> much about this classifier, but it looks similar to the RLS solution
> >> >>>>>>> to
> >> >>>>>>> me...).  As such it should be adjustable to enable punishing overfit
> >> >>>>>>> solutions, so maybe it should actually be a formal parameter.

> >> >>>>>>> If the default returns a LinAlg error, you could also stick the
> >> >>>>>>> cholesky
> >> >>>>>>> decomposition in a finite loop where epsilon is doubled each time
> >> >>>>>>> (though not too many iterations...)

> >> >>>>>>> On a wild tangent, also check that your kernel matrix (here self._C)
> >> >>>>>>> is
> >> >>>>>>> double precision, since 1e-20 is close to the precision limit for
> >> >>>>>>> single
> >> >>>>>>> floats.

> >> >>>>>>> Per B. Sederberg wrote:

> >> >>>>>>>> Hi Everybody (and Emanuele in particular):

> >> >>>>>>>> I've tried GPR on about 5 different datasets and I've never actually
> >> >>>>>>>> gotten it to run through an entire cross validation process.  I
> >> >>>>>>>> always
> >> >>>>>>>> get the following error:

> >> >>>>>>>> /home/per/lib/python/mvpa/clfs/gpr.pyc in _train(self, data)
> >> >>>>>>>>    308             except SLAError:
> >> >>>>>>>>    309                 epsilon = 1.0e-20 * N.eye(self._C.shape[0])
> >> >>>>>>>> --> 310                 self._L = SLcholesky(self._C + epsilon,
> >> >>>>>>>> lower=True)
> >> >>>>>>>>    311                 self._LL = (self._L, True)
> >> >>>>>>>>    312                 pass

> >> >>>>>>>> /usr/lib/python2.5/site-packages/scipy/linalg/decomp.pyc in
> >> >>>>>>>> cholesky(a, lower, overwrite_a)
> >> >>>>>>>>    552     potrf, = get_lapack_funcs(('potrf',),(a1,))
> >> >>>>>>>>    553     c,info =
> >> >>>>>>>> potrf(a1,lower=lower,overwrite_a=overwrite_a,clean=1)
> >> >>>>>>>> --> 554     if info>0: raise LinAlgError, "matrix not positive
> >> >>>>>>>> definite"
> >> >>>>>>>>    555     if info<0: raise ValueError,\
> >> >>>>>>>>    556        'illegal value in %-th argument of internal
> >> >>>>>>>> potrf'%(-info)

> >> >>>>>>>> LinAlgError: matrix not positive definite

> >> >>>>>>>> I think Yarik mentioned changing epsilon or something like that, but
> >> >>>>>>>> it seems to me that having to change source code to tweak a
> >> >>>>>>>> classifier
> >> >>>>>>>> to not crash is not ideal.

> >> >>>>>>>> Do you have any suggestions for how to fix this?  I've been
> >> >>>>>>>> tantalized
> >> >>>>>>>> with some very good transfer errors in the folds that GPR was able to
> >> >>>>>>>> complete before crashing.  Is there a permanent change we could make
> >> >>>>>>>> to the GPR code to make it more stable across datasets?  Or could we
> >> >>>>>>>> turn epsilon into a configurable variable when setting up the
> >> >>>>>>>> classifier?  If so, what should I change epsilon to in order to see
> >> >>>>>>>> if
> >> >>>>>>>> I can get it to run?

> >> >>>>>>>> Thanks,
> >> >>>>>>>> Per

> >> >>>>>>>> _______________________________________________
> >> >>>>>>>> Pkg-ExpPsy-PyMVPA mailing list
> >> >>>>>>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> >> >>>>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

> >> >>>>>>> _______________________________________________
> >> >>>>>>> Pkg-ExpPsy-PyMVPA mailing list
> >> >>>>>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> >> >>>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

> >> >>>>>> _______________________________________________
> >> >>>>>> Pkg-ExpPsy-PyMVPA mailing list
> >> >>>>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> >> >>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

> >> _______________________________________________
> >> Pkg-ExpPsy-PyMVPA mailing list
> >> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> >> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

> > --
> > Yaroslav Halchenko
> > Research Assistant, Psychology Department, Rutgers-Newark
> > Student  Ph.D. @ CS Dept. NJIT
> > Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171
> >        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > WWW:     http://www.linkedin.com/in/yarik

-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik