[pymvpa] SMLR needs fixing?
Yaroslav Halchenko
debian at onerussian.com
Wed Jun 4 15:37:00 UTC 2008
> Fascinating!
heh... indeed ;)
> Is it the case that you have different numbers for each label type and
> specifically for the two that you are switching. If so, then the
> models it is using to fit the data are actually different for the two
> conditions and I would expect the differences you see.
hm... Need to check SMLR paper again -- I don't remember seeing such
dependence on number of instances and order of the labels/classes.
my guess is that some normalization is lacking somewhere?
> How about we set the default to be to always train the full model?
I don't mind ;) probably would remove some confusion among the users who
will get into the same misery. Michael?
> Best,
> Per
> On Wed, Jun 4, 2008 at 11:19 AM, Yaroslav Halchenko
> <debian at onerussian.com> wrote:
> > SMLR confuses me a bit... it seems that classification (generalization actually
> > but I don't think that it is a problem anywhere outside of SMLR although could
> > be) is heavily dependent on the order of labels... sorry for being cryptic --
> > in a data with 5 labels, which I code to numbers from 0 to 4 here are 2
> > outcomes where in 2nd one labels 0 and 4 are interchanged (as you can see by
> > total number of samples for that category in P)
> > ----------.
> > predictions\targets 0 1 2 3 4
> > `------ ----- ----- ----- ----- ----- P' N' FP FN PPV NPV TPR SPC FDR MCC
> > 0 105 0 0 0 1 106 303 1 0 0.99 1 1 1 0.01 0.95
> > 1 0 82 0 0 22 104 328 22 2 0.79 0.99 0.98 0.94 0.21 0.83
> > 2 0 0 84 0 4 88 324 4 0 0.95 1 1 0.99 0.05 0.93
> > 3 0 0 0 84 4 88 324 4 0 0.95 1 1 0.99 0.05 0.93
> > 4 0 2 0 0 53 55 386 2 31 0.96 0.92 0.63 0.99 0.04 0.74
> > Per target: ----- ----- ----- ----- -----
> > P 105 84 84 84 84
> > N 336 357 357 357 357
> > TP 105 82 84 84 53
> > TN 303 326 324 324 355
> > SUMMARY: ----- ----- ----- ----- -----
> > ACC 0.93
> > ACC% 92.52
> > # of sets 4
> > ----------.
> > predictions\targets 0 1 2 3 4
> > `------ ----- ----- ----- ----- ----- P' N' FP FN PPV NPV TPR SPC FDR MCC
> > 0 84 1 0 0 28 113 256 29 0 0.74 1 1 0.9 0.26 0.73
> > 1 0 83 0 0 23 106 258 23 1 0.78 1 0.99 0.92 0.22 0.74
> > 2 0 0 84 0 21 105 256 21 0 0.8 1 1 0.92 0.2 0.76
> > 3 0 0 0 84 28 112 256 28 0 0.75 1 1 0.9 0.25 0.73
> > 4 0 0 0 0 5 5 435 0 100 1 0.77 0.05 1 0 0.19
> > Per target: ----- ----- ----- ----- -----
> > P 84 84 84 84 105
> > N 357 357 357 357 336
> > TP 84 83 84 84 5
> > TN 256 257 256 256 335
> > SUMMARY: ----- ----- ----- ----- -----
> > ACC 0.77
> > ACC% 77.1
> > # of sets 4
> > and if fit_all_weights == True, then results are consistent and much nicer:
> > ----------.
> > predictions\targets 0 1 2 3 4
> > `------ ----- ----- ----- ----- ----- P' N' FP FN PPV NPV TPR SPC FDR MCC
> > 0 105 0 0 0 0 105 334 0 0 1 1 1 1 0 1
> > 1 0 83 0 0 0 83 357 0 1 1 1 0.99 1 0 0.99
> > 2 0 0 84 1 0 85 355 1 0 0.99 1 1 1 0.01 0.99
> > 3 0 0 0 83 0 83 357 0 1 1 1 0.99 1 0 0.99
> > 4 0 1 0 0 84 85 355 1 0 0.99 1 1 1 0.01 0.99
> > Per target: ----- ----- ----- ----- -----
> > P 105 84 84 84 84
> > N 336 357 357 357 357
> > TP 105 83 84 83 84
> > TN 334 356 355 356 355
> > SUMMARY: ----- ----- ----- ----- -----
> > ACC 1
> > ACC% 99.55
> > # of sets 4
> > ----------.
> > predictions\targets 0 1 2 3 4
> > `------ ----- ----- ----- ----- ----- P' N' FP FN PPV NPV TPR SPC FDR MCC
> > 0 84 1 0 0 0 85 355 1 0 0.99 1 1 1 0.01 0.99
> > 1 0 83 0 0 0 83 357 0 1 1 1 0.99 1 0 0.99
> > 2 0 0 84 1 0 85 355 1 0 0.99 1 1 1 0.01 0.99
> > 3 0 0 0 83 0 83 357 0 1 1 1 0.99 1 0 0.99
> > 4 0 0 0 0 105 105 334 0 0 1 1 1 1 0 1
> > Per target: ----- ----- ----- ----- -----
> > P 84 84 84 84 105
> > N 357 357 357 357 336
> > TP 84 83 84 83 105
> > TN 355 356 355 356 334
> > SUMMARY: ----- ----- ----- ----- -----
> > ACC 1
> > ACC% 99.55
> > # of sets 4
> > I just wanted to check if anyone observed smth like that before with SMLR
> > (before I dive into figuring out wtf), or may be Per has a clue right away
> > because it seems to be just a little issue in computing probability for 'the
> > other label'..?
> > --
> > Yaroslav Halchenko
> > Research Assistant, Psychology Department, Rutgers-Newark
> > Student Ph.D. @ CS Dept. NJIT
> > Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > WWW: http://www.linkedin.com/in/yarik
> > _______________________________________________
> > Pkg-ExpPsy-PyMVPA mailing list
> > Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> > http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
--
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW: http://www.linkedin.com/in/yarik
More information about the Pkg-ExpPsy-PyMVPA
mailing list