[pymvpa] SMLR needs fixing?

Yaroslav Halchenko debian at onerussian.com
Wed Jun 4 15:37:00 UTC 2008


> Fascinating!
heh... indeed ;)

> Is it the case that you have different numbers for each label type and
> specifically for the two that you are switching.  If so, then the
> models it is using to fit the data are actually different for the two
> conditions and I would expect the differences you see.
hm... Need to check SMLR paper again -- I don't remember seeing such
dependence on number of instances and order of the labels/classes.
my guess is that some normalization is lacking somewhere?


> How about we set the default to be to always train the full model?
I don't mind ;) probably would remove some confusion among the users who
will get into the same misery. Michael?

> Best,
> Per

> On Wed, Jun 4, 2008 at 11:19 AM, Yaroslav Halchenko
> <debian at onerussian.com> wrote:
> > SMLR confuses me a bit... it seems that classification (generalization actually
> > but I don't think that it is a problem anywhere outside of SMLR although could
> > be) is heavily dependent on the order of labels... sorry for being cryptic --
> > in a data with 5 labels, which I code to numbers from 0 to 4 here are 2
> > outcomes where in 2nd one labels 0 and 4 are interchanged (as you can see by
> > total number of samples for that category in P)

> > ----------.
> > predictions\targets    0      1      2      3      4
> >            `------  -----  -----  -----  -----  -----  P'  N' FP FN  PPV  NPV  TPR  SPC  FDR  MCC
> >         0            105     0      0      0      1   106 303  1  0 0.99   1    1    1  0.01 0.95
> >         1             0     82      0      0     22   104 328 22  2 0.79 0.99 0.98 0.94 0.21 0.83
> >         2             0      0     84      0      4    88 324  4  0 0.95   1    1  0.99 0.05 0.93
> >         3             0      0      0     84      4    88 324  4  0 0.95   1    1  0.99 0.05 0.93
> >         4             0      2      0      0     53    55 386  2 31 0.96 0.92 0.63 0.99 0.04 0.74
> >    Per target:      -----  -----  -----  -----  -----
> >         P            105    84     84     84     84
> >         N            336    357    357    357    357
> >         TP           105    82     84     84     53
> >         TN           303    326    324    324    355
> >      SUMMARY:       -----  -----  -----  -----  -----
> >        ACC          0.93
> >        ACC%         92.52
> >     # of sets         4

> > ----------.
> > predictions\targets    0      1      2      3      4
> >            `------  -----  -----  -----  -----  -----  P'  N' FP  FN  PPV  NPV  TPR  SPC  FDR  MCC
> >         0            84      1      0      0     28   113 256 29  0  0.74   1    1   0.9 0.26 0.73
> >         1             0     83      0      0     23   106 258 23  1  0.78   1  0.99 0.92 0.22 0.74
> >         2             0      0     84      0     21   105 256 21  0   0.8   1    1  0.92  0.2 0.76
> >         3             0      0      0     84     28   112 256 28  0  0.75   1    1   0.9 0.25 0.73
> >         4             0      0      0      0      5    5  435  0 100   1  0.77 0.05   1    0  0.19
> >    Per target:      -----  -----  -----  -----  -----
> >         P            84     84     84     84     105
> >         N            357    357    357    357    336
> >         TP           84     83     84     84      5
> >         TN           256    257    256    256    335
> >      SUMMARY:       -----  -----  -----  -----  -----
> >        ACC          0.77
> >        ACC%         77.1
> >     # of sets         4

> > and if fit_all_weights == True, then results are consistent and much nicer:

> > ----------.
> > predictions\targets    0      1      2      3      4
> >            `------  -----  -----  -----  -----  -----  P'  N' FP FN  PPV NPV  TPR SPC  FDR  MCC
> >         0            105     0      0      0      0   105 334  0  0   1   1    1   1    0    1
> >         1             0     83      0      0      0    83 357  0  1   1   1  0.99  1    0  0.99
> >         2             0      0     84      1      0    85 355  1  0 0.99  1    1   1  0.01 0.99
> >         3             0      0      0     83      0    83 357  0  1   1   1  0.99  1    0  0.99
> >         4             0      1      0      0     84    85 355  1  0 0.99  1    1   1  0.01 0.99
> >    Per target:      -----  -----  -----  -----  -----
> >         P            105    84     84     84     84
> >         N            336    357    357    357    357
> >         TP           105    83     84     83     84
> >         TN           334    356    355    356    355
> >      SUMMARY:       -----  -----  -----  -----  -----
> >        ACC            1
> >        ACC%         99.55
> >     # of sets         4

> > ----------.
> > predictions\targets    0      1      2      3      4
> >            `------  -----  -----  -----  -----  -----  P'  N' FP FN  PPV NPV  TPR SPC  FDR  MCC
> >         0            84      1      0      0      0    85 355  1  0 0.99  1    1   1  0.01 0.99
> >         1             0     83      0      0      0    83 357  0  1   1   1  0.99  1    0  0.99
> >         2             0      0     84      1      0    85 355  1  0 0.99  1    1   1  0.01 0.99
> >         3             0      0      0     83      0    83 357  0  1   1   1  0.99  1    0  0.99
> >         4             0      0      0      0     105  105 334  0  0   1   1    1   1    0    1
> >    Per target:      -----  -----  -----  -----  -----
> >         P            84     84     84     84     105
> >         N            357    357    357    357    336
> >         TP           84     83     84     83     105
> >         TN           355    356    355    356    334
> >      SUMMARY:       -----  -----  -----  -----  -----
> >        ACC            1
> >        ACC%         99.55
> >     # of sets         4



> > I just wanted to check if anyone observed smth like that before with SMLR
> > (before I dive into figuring out wtf), or may be Per has a clue right away
> > because it seems to be just a little issue in computing probability for 'the
> > other label'..?

> > --
> > Yaroslav Halchenko
> > Research Assistant, Psychology Department, Rutgers-Newark
> > Student  Ph.D. @ CS Dept. NJIT
> > Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > WWW:     http://www.linkedin.com/in/yarik

> > _______________________________________________
> > Pkg-ExpPsy-PyMVPA mailing list
> > Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> > http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa



-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik        



More information about the Pkg-ExpPsy-PyMVPA mailing list