[pymvpa] GPR woes
Yaroslav Halchenko
yoh at psychology.rutgers.edu
Sat Mar 28 19:43:33 UTC 2009
fair enough... btw are chunks 5 and may be 1 the ones which gave you
head ache?
On Sat, 28 Mar 2009, Per B. Sederberg wrote:
> Here's the summary:
> In [41]: print dat.summary()
> Dataset / float32 2142 x 29155
> uniq: 6 chunks 2 labels
> stats: mean=-4.0314e-06 std=0.916641 var=0.84023 min=-1.23648 max=43.8952
> Counts of labels in each chunk:
> chunks\labels 0 1
> --- ---
> 0 173 184
> 1 168 189
> 2 184 173
> 3 188 169
> 4 173 184
> 5 196 161
> Summary per label across chunks
> label mean std min max #chunks
> 0 180 9.81 168 196 6
> 1 177 9.81 161 189 6
> Summary per chunk across labels
> chunk mean std min max #labels
> 0 178 5.5 173 184 2
> 1 178 10.5 168 189 2
> 2 178 5.5 173 184 2
> 3 178 9.5 169 188 2
> 4 178 5.5 173 184 2
> 5 178 17.5 161 196 2
> It does look like there are some outliers in there based on the min
> and max above. Note that I have already zscored the data at this
> point.
> I'm getting some really good classification now with the
> regularization parameter in there. At some point I may try looking
> for outliers and explicitly removing them and then seeing if I need
> the regularization. The good news is that the regularization is
> seeming to have no effect on the chunks of the data that didn't need
> it before, so I'm not really hurting anything by having it there (at
> least for this dataset).
> Best,
> P
> 2009/3/28 Yaroslav Halchenko <debian at onerussian.com>:
> > Hypothesis:
> > I think that one of the possible causes for such behavior is whenever
> > your features are not normed to lie approximately in the same range...
> > assume that you have one which is 100 and the rest is "proper" ones in
> > [-5, 5] (zscores?). Or you have some outliers/excursions from this
> > 'normed' case. Then while doing summation/dot-products. That heavy
> > feature would demolish the effect of variance in the "proper" features.
> > What is the histogram of your features? or just
> > print dataset.summary()
> > ?
> > On Sat, 28 Mar 2009, Per B. Sederberg wrote:
> >> Howdy all:
> >> I've got it working for my data (had to add what I thought was a
> >> pretty hefty regularization of .001 before it stopped giving me the
> >> error, but it turns out I get the same classification performance with
> >> a value of 2.0).
> >> I made it so that regularization is always applied, but that it
> >> defaults to 0.0, so the default behavior of the classifier is
> >> unchanged. Here's the commit message:
> >> --------------------
> >> I'm not sure what to call it, so I put it all. Anyway, I just added
> >> the regularization
> >> parameter to be used all the time by GPR. It still defaults to 0.0,
> >> so there should be no
> >> change for people currently using GPR with no issues. However, this
> >> enables the manual
> >> setting of the regularization if you run into the not positive,
> >> definite error. I figured
> >> that if you were going to use regularization, you would want to use it
> >> all the time and not
> >> have different values on different folds of your data.
> >> ----------------------
> >> Thanks for all the input today. I now have working GPR classification :)
> >> Best,
> >> Per
> >> On Sat, Mar 28, 2009 at 12:48 PM, Per B. Sederberg <persed at princeton.edu> wrote:
> >> > On Sat, Mar 28, 2009 at 12:06 PM, Scott Gorlin <gorlins at mit.edu> wrote:
> >> >> you could try 10x the calculated eps :)
> >> >> Depending on how long it takes to run your classifier, why don't you try
> >> >> varying epsilon logarithmically and plotting the error as a function of
> >> >> epsilon. Most likely you'll see a local min around some optimum value of
> >> >> regularization, and then it will plateau out as epsilon increases - this
> >> >> should, at least, convince you that a high epsilon isn't that bad (if it is,
> >> >> then your data probably aren't as separable as they should be...)
> >> > A great idea! I'll expose the regularization parameter as a variable
> >> > to GPR and try it out.
> >> >> All that high epsilon does in Tikhonov regularization is penalize steep
> >> >> hyperplanes, as it adds the norm of a function to the loss, so the results
> >> >> will be smoother (and theoretically generalize better). So, it should be
> >> >> set so that epsilon*function norm is ballpark equal to the raw classifier
> >> >> loss, so that it makes a difference (though I don't know the exact form in
> >> >> the GPR implementation; i'm just trying to say that there is an effective
> >> >> range which isn't necessarily approaching 0). As long as Xfold validation
> >> >> still works with high epsilon, there's nothing to worry about - in fact,
> >> >> that's what it's designed to do.
> >> > So, this begs the question of whether we should have some level of
> >> > regularization all the time. The main difficulty would be figuring
> >> > out what it should be (ballpark value) for different datasets (i.e.,
> >> > whether there is some principled first-pass for setting it based on
> >> > the features of your data.)
> >> > Thanks for all the info,
> >> > Per
> >> >> if you still get a linalg error with very high epsilon, then there's
> >> >> probably a different problem ;)
> >> >> Per B. Sederberg wrote:
> >> >>> hi s&e:
> >> >>> this is almost exactly what I did, but I still get the same error. in
> >> >>> numpy you get the precision with:
> >> >>> np.finfo(np.float64).eps
> >> >>> I tried using that value, but actually checking the eps of the C
> >> >>> matrix, and also tried using the float32 eps, which is greater than 2
> >> >>> times the float64 eps. all to no avail, so i'm worried my data will
> >> >>> need a much larger eps. what are the bad consequences I could run
> >> >>> into by adding in a major regularization?
> >> >>> thanks,
> >> >>> p
> >> >>> ps: in case it's informative, I have over 1000 samples so the kernel
> >> >>> matrix is quite large...
> >> >>> On 3/28/09, Scott Gorlin <gorlins at mit.edu> wrote:
> >> >>>> I haven't checked the python syntax, but in Matlab, the eps function can
> >> >>>> take a number and return the smallest recordable numerical difference at
> >> >>>> that level. If there is an equivalent in numpy then (say 2x) this would
> >> >>>> be a good alternative default epsilon (calculated at train time)
> >> >>>> Emanuele Olivetti wrote:
> >> >>>>> Dear Per and Scott,
> >> >>>>> You are both right and I feel quite guilty since I did not spend
> >> >>>>> time on GPR as I promised long time ago (basically lack of time). The
> >> >>>>> epsilon issue is exactly what Scott says: I discussed the same issue
> >> >>>>> with Yarick some time ago but then did not work on a principled
> >> >>>>> solution.
> >> >>>>> I'm fine if you patch the current code in order to increase epsilon
> >> >>>>> till a
> >> >>>>> certain extent. Any idea about alternative/principled solutions?
> >> >>>>> Moreover I'm (slowly) after another numerical instability due, I
> >> >>>>> guess, to
> >> >>>>> underflow in summation which could affect the marginal likelihood
> >> >>>>> computation (which means it is highly relevant for GPR). This is more
> >> >>>>> likely to happen when the dataset is large. So be prepared to that ;-)
> >> >>>>> As far as I remember I forced the kernel matrix to be double precision
> >> >>>>> for exactly the same reasons you mentioned, but checking more
> >> >>>>> accurately this fact is highly desirable.
> >> >>>>> Unfortunately I'm not going to work on GPR very soon, even though
> >> >>>>> I'd love to. So any patch is warmly welcome, as well as bug reports ;-)
> >> >>>>> Best,
> >> >>>>> Emanuele
> >> >>>>> Per B. Sederberg wrote:
> >> >>>>>> Hi Scott:
> >> >>>>>> You are correct on all accounts. epsilon is currently hard-coded and
> >> >>>>>> either of your proposed solutions would help to make GPR work more
> >> >>>>>> frequently/consistently. I'll try out some combination of both and
> >> >>>>>> report back.
> >> >>>>>> Best,
> >> >>>>>> Per
> >> >>>>>> On Sat, Mar 28, 2009 at 9:46 AM, Scott Gorlin <gorlins at mit.edu> wrote:
> >> >>>>>>> Epsilon is the Tikhonov regularization parameter, no? (Haven't read
> >> >>>>>>> much about this classifier, but it looks similar to the RLS solution
> >> >>>>>>> to
> >> >>>>>>> me...). As such it should be adjustable to enable punishing overfit
> >> >>>>>>> solutions, so maybe it should actually be a formal parameter.
> >> >>>>>>> If the default returns a LinAlg error, you could also stick the
> >> >>>>>>> cholesky
> >> >>>>>>> decomposition in a finite loop where epsilon is doubled each time
> >> >>>>>>> (though not too many iterations...)
> >> >>>>>>> On a wild tangent, also check that your kernel matrix (here self._C)
> >> >>>>>>> is
> >> >>>>>>> double precision, since 1e-20 is close to the precision limit for
> >> >>>>>>> single
> >> >>>>>>> floats.
> >> >>>>>>> Per B. Sederberg wrote:
> >> >>>>>>>> Hi Everybody (and Emanuele in particular):
> >> >>>>>>>> I've tried GPR on about 5 different datasets and I've never actually
> >> >>>>>>>> gotten it to run through an entire cross validation process. I
> >> >>>>>>>> always
> >> >>>>>>>> get the following error:
> >> >>>>>>>> /home/per/lib/python/mvpa/clfs/gpr.pyc in _train(self, data)
> >> >>>>>>>> 308 except SLAError:
> >> >>>>>>>> 309 epsilon = 1.0e-20 * N.eye(self._C.shape[0])
> >> >>>>>>>> --> 310 self._L = SLcholesky(self._C + epsilon,
> >> >>>>>>>> lower=True)
> >> >>>>>>>> 311 self._LL = (self._L, True)
> >> >>>>>>>> 312 pass
> >> >>>>>>>> /usr/lib/python2.5/site-packages/scipy/linalg/decomp.pyc in
> >> >>>>>>>> cholesky(a, lower, overwrite_a)
> >> >>>>>>>> 552 potrf, = get_lapack_funcs(('potrf',),(a1,))
> >> >>>>>>>> 553 c,info =
> >> >>>>>>>> potrf(a1,lower=lower,overwrite_a=overwrite_a,clean=1)
> >> >>>>>>>> --> 554 if info>0: raise LinAlgError, "matrix not positive
> >> >>>>>>>> definite"
> >> >>>>>>>> 555 if info<0: raise ValueError,\
> >> >>>>>>>> 556 'illegal value in %-th argument of internal
> >> >>>>>>>> potrf'%(-info)
> >> >>>>>>>> LinAlgError: matrix not positive definite
> >> >>>>>>>> I think Yarik mentioned changing epsilon or something like that, but
> >> >>>>>>>> it seems to me that having to change source code to tweak a
> >> >>>>>>>> classifier
> >> >>>>>>>> to not crash is not ideal.
> >> >>>>>>>> Do you have any suggestions for how to fix this? I've been
> >> >>>>>>>> tantalized
> >> >>>>>>>> with some very good transfer errors in the folds that GPR was able to
> >> >>>>>>>> complete before crashing. Is there a permanent change we could make
> >> >>>>>>>> to the GPR code to make it more stable across datasets? Or could we
> >> >>>>>>>> turn epsilon into a configurable variable when setting up the
> >> >>>>>>>> classifier? If so, what should I change epsilon to in order to see
> >> >>>>>>>> if
> >> >>>>>>>> I can get it to run?
> >> >>>>>>>> Thanks,
> >> >>>>>>>> Per
> >> >>>>>>>> _______________________________________________
> >> >>>>>>>> Pkg-ExpPsy-PyMVPA mailing list
> >> >>>>>>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> >> >>>>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
> >> >>>>>>> _______________________________________________
> >> >>>>>>> Pkg-ExpPsy-PyMVPA mailing list
> >> >>>>>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> >> >>>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
> >> >>>>>> _______________________________________________
> >> >>>>>> Pkg-ExpPsy-PyMVPA mailing list
> >> >>>>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> >> >>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
> >> _______________________________________________
> >> Pkg-ExpPsy-PyMVPA mailing list
> >> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> >> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
> > --
> > Yaroslav Halchenko
> > Research Assistant, Psychology Department, Rutgers-Newark
> > Student Ph.D. @ CS Dept. NJIT
> > Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171
> > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > WWW: http://www.linkedin.com/in/yarik
--
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student Ph.D. @ CS Dept. NJIT
Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171
101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW: http://www.linkedin.com/in/yarik
More information about the Pkg-ExpPsy-PyMVPA
mailing list