[pymvpa] GPR woes

Sat Mar 28 16:06:37 UTC 2009

you could try 10x the calculated eps :)

Depending on how long it takes to run your classifier, why don't you try 
varying epsilon logarithmically and plotting the error as a function of 
epsilon.  Most likely you'll see a local min around some optimum value 
of regularization, and then it will plateau out as epsilon increases - 
this should, at least, convince you that a high epsilon isn't that bad 
(if it is, then your data probably aren't as separable as they should be...)

All that high epsilon does in Tikhonov regularization is penalize steep 
hyperplanes, as it adds the norm of a function to the loss, so the 
results will be smoother (and theoretically generalize better).  So, it 
should be set so that  epsilon*function norm is ballpark equal to the 
raw classifier loss, so that it makes a difference (though I don't know 
the exact form in the GPR implementation; i'm just trying to say that 
there is an effective range which isn't necessarily approaching 0).  As 
long as Xfold validation still works with high epsilon, there's nothing 
to worry about - in fact, that's what it's designed to do.

if you still get a linalg error with very high epsilon, then there's 
probably a different problem ;)

Per B. Sederberg wrote:
> hi s&e:
>
> this is almost exactly what I did, but I still get the same error.  in
> numpy you get the precision with:
>
> np.finfo(np.float64).eps
>
> I tried using that value, but actually checking the eps of the C
> matrix, and also tried using the float32 eps, which is greater than 2
> times the float64 eps.  all to no avail, so i'm worried my data will
> need a much larger eps.  what are the bad consequences I could run
> into by adding in a major regularization?
>
> thanks,
> p
>
> ps: in case it's informative, I have over 1000 samples so the kernel
> matrix is quite large...
>
> On 3/28/09, Scott Gorlin <gorlins at mit.edu> wrote:
>   
>> I haven't checked the python syntax, but in Matlab, the eps function can
>> take a number and return the smallest recordable numerical difference at
>> that level.  If there is an equivalent in numpy then (say 2x) this would
>> be a good alternative default epsilon (calculated at train time)
>>
>> Emanuele Olivetti wrote:
>>     
>>> Dear Per and Scott,
>>>
>>> You are both right and I feel quite guilty since I did not spend
>>> time on GPR as I promised long time ago (basically lack of time). The
>>> epsilon issue is exactly what Scott says: I discussed the same issue
>>> with Yarick some time ago but then did not work on a principled solution.
>>> I'm fine if you patch the current code in order to increase epsilon
>>> till a
>>> certain extent. Any idea about alternative/principled solutions?
>>>
>>> Moreover I'm (slowly) after another numerical instability due, I
>>> guess, to
>>> underflow in summation which could affect the marginal likelihood
>>> computation (which means it is highly relevant for GPR). This is more
>>> likely to happen when the dataset is large. So be prepared to that ;-)
>>>
>>> As far as I remember I forced the kernel matrix to be double precision
>>> for exactly the same reasons you mentioned, but checking more
>>> accurately this fact is highly desirable.
>>>
>>> Unfortunately I'm not going to work on GPR very soon, even though
>>> I'd love to. So any patch is warmly welcome, as well as bug reports ;-)
>>>
>>> Best,
>>>
>>> Emanuele
>>>
>>>
>>> Per B. Sederberg wrote:
>>>       
>>>> Hi Scott:
>>>>
>>>> You are correct on all accounts.  epsilon is currently hard-coded and
>>>> either of your proposed solutions would help to make GPR work more
>>>> frequently/consistently.  I'll try out some combination of both and
>>>> report back.
>>>>
>>>> Best,
>>>> Per
>>>>
>>>> On Sat, Mar 28, 2009 at 9:46 AM, Scott Gorlin <gorlins at mit.edu> wrote:
>>>>
>>>>         
>>>>> Epsilon is the Tikhonov regularization parameter, no?  (Haven't read
>>>>> much about this classifier, but it looks similar to the RLS solution to
>>>>> me...).  As such it should be adjustable to enable punishing overfit
>>>>> solutions, so maybe it should actually be a formal parameter.
>>>>>
>>>>> If the default returns a LinAlg error, you could also stick the
>>>>> cholesky
>>>>> decomposition in a finite loop where epsilon is doubled each time
>>>>> (though not too many iterations...)
>>>>>
>>>>> On a wild tangent, also check that your kernel matrix (here self._C) is
>>>>> double precision, since 1e-20 is close to the precision limit for
>>>>> single
>>>>> floats.
>>>>>
>>>>> Per B. Sederberg wrote:
>>>>>
>>>>>           
>>>>>> Hi Everybody (and Emanuele in particular):
>>>>>>
>>>>>> I've tried GPR on about 5 different datasets and I've never actually
>>>>>> gotten it to run through an entire cross validation process.  I always
>>>>>> get the following error:
>>>>>>
>>>>>> /home/per/lib/python/mvpa/clfs/gpr.pyc in _train(self, data)
>>>>>>     308             except SLAError:
>>>>>>     309                 epsilon = 1.0e-20 * N.eye(self._C.shape[0])
>>>>>> --> 310                 self._L = SLcholesky(self._C + epsilon,
>>>>>> lower=True)
>>>>>>     311                 self._LL = (self._L, True)
>>>>>>     312                 pass
>>>>>>
>>>>>> /usr/lib/python2.5/site-packages/scipy/linalg/decomp.pyc in
>>>>>> cholesky(a, lower, overwrite_a)
>>>>>>     552     potrf, = get_lapack_funcs(('potrf',),(a1,))
>>>>>>     553     c,info =
>>>>>> potrf(a1,lower=lower,overwrite_a=overwrite_a,clean=1)
>>>>>> --> 554     if info>0: raise LinAlgError, "matrix not positive
>>>>>> definite"
>>>>>>     555     if info<0: raise ValueError,\
>>>>>>     556        'illegal value in %-th argument of internal
>>>>>> potrf'%(-info)
>>>>>>
>>>>>> LinAlgError: matrix not positive definite
>>>>>>
>>>>>> I think Yarik mentioned changing epsilon or something like that, but
>>>>>> it seems to me that having to change source code to tweak a classifier
>>>>>> to not crash is not ideal.
>>>>>>
>>>>>> Do you have any suggestions for how to fix this?  I've been tantalized
>>>>>> with some very good transfer errors in the folds that GPR was able to
>>>>>> complete before crashing.  Is there a permanent change we could make
>>>>>> to the GPR code to make it more stable across datasets?  Or could we
>>>>>> turn epsilon into a configurable variable when setting up the
>>>>>> classifier?  If so, what should I change epsilon to in order to see if
>>>>>> I can get it to run?
>>>>>>
>>>>>> Thanks,
>>>>>> Per
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pkg-ExpPsy-PyMVPA mailing list
>>>>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>>>>>>
>>>>>>
>>>>>>             
>>>>> _______________________________________________
>>>>> Pkg-ExpPsy-PyMVPA mailing list
>>>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>>>>>
>>>>>
>>>>>           
>>>> _______________________________________________
>>>> Pkg-ExpPsy-PyMVPA mailing list
>>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>>>>
>>>>
>>>>