[pymvpa] SMLR needs fixing?

Wed Jun 4 15:19:59 UTC 2008

SMLR confuses me a bit... it seems that classification (generalization actually
but I don't think that it is a problem anywhere outside of SMLR although could
be) is heavily dependent on the order of labels... sorry for being cryptic --
in a data with 5 labels, which I code to numbers from 0 to 4 here are 2
outcomes where in 2nd one labels 0 and 4 are interchanged (as you can see by
total number of samples for that category in P)

----------.
predictions\targets    0      1      2      3      4
            `------  -----  -----  -----  -----  -----  P'  N' FP FN  PPV  NPV  TPR  SPC  FDR  MCC
         0            105     0      0      0      1   106 303  1  0 0.99   1    1    1  0.01 0.95
         1             0     82      0      0     22   104 328 22  2 0.79 0.99 0.98 0.94 0.21 0.83
         2             0      0     84      0      4    88 324  4  0 0.95   1    1  0.99 0.05 0.93
         3             0      0      0     84      4    88 324  4  0 0.95   1    1  0.99 0.05 0.93
         4             0      2      0      0     53    55 386  2 31 0.96 0.92 0.63 0.99 0.04 0.74
    Per target:      -----  -----  -----  -----  -----
         P            105    84     84     84     84
         N            336    357    357    357    357
         TP           105    82     84     84     53
         TN           303    326    324    324    355
      SUMMARY:       -----  -----  -----  -----  -----
        ACC          0.93
        ACC%         92.52
     # of sets         4

----------.
predictions\targets    0      1      2      3      4
            `------  -----  -----  -----  -----  -----  P'  N' FP  FN  PPV  NPV  TPR  SPC  FDR  MCC
         0            84      1      0      0     28   113 256 29  0  0.74   1    1   0.9 0.26 0.73
         1             0     83      0      0     23   106 258 23  1  0.78   1  0.99 0.92 0.22 0.74
         2             0      0     84      0     21   105 256 21  0   0.8   1    1  0.92  0.2 0.76
         3             0      0      0     84     28   112 256 28  0  0.75   1    1   0.9 0.25 0.73
         4             0      0      0      0      5    5  435  0 100   1  0.77 0.05   1    0  0.19
    Per target:      -----  -----  -----  -----  -----
         P            84     84     84     84     105
         N            357    357    357    357    336
         TP           84     83     84     84      5
         TN           256    257    256    256    335
      SUMMARY:       -----  -----  -----  -----  -----
        ACC          0.77
        ACC%         77.1
     # of sets         4

and if fit_all_weights == True, then results are consistent and much nicer:

----------.
predictions\targets    0      1      2      3      4
            `------  -----  -----  -----  -----  -----  P'  N' FP FN  PPV NPV  TPR SPC  FDR  MCC
         0            105     0      0      0      0   105 334  0  0   1   1    1   1    0    1
         1             0     83      0      0      0    83 357  0  1   1   1  0.99  1    0  0.99
         2             0      0     84      1      0    85 355  1  0 0.99  1    1   1  0.01 0.99
         3             0      0      0     83      0    83 357  0  1   1   1  0.99  1    0  0.99
         4             0      1      0      0     84    85 355  1  0 0.99  1    1   1  0.01 0.99
    Per target:      -----  -----  -----  -----  -----
         P            105    84     84     84     84
         N            336    357    357    357    357
         TP           105    83     84     83     84
         TN           334    356    355    356    355
      SUMMARY:       -----  -----  -----  -----  -----
        ACC            1
        ACC%         99.55
     # of sets         4

----------.
predictions\targets    0      1      2      3      4
            `------  -----  -----  -----  -----  -----  P'  N' FP FN  PPV NPV  TPR SPC  FDR  MCC
         0            84      1      0      0      0    85 355  1  0 0.99  1    1   1  0.01 0.99
         1             0     83      0      0      0    83 357  0  1   1   1  0.99  1    0  0.99
         2             0      0     84      1      0    85 355  1  0 0.99  1    1   1  0.01 0.99
         3             0      0      0     83      0    83 357  0  1   1   1  0.99  1    0  0.99
         4             0      0      0      0     105  105 334  0  0   1   1    1   1    0    1
    Per target:      -----  -----  -----  -----  -----
         P            84     84     84     84     105
         N            357    357    357    357    336
         TP           84     83     84     83     105
         TN           355    356    355    356    334
      SUMMARY:       -----  -----  -----  -----  -----
        ACC            1
        ACC%         99.55
     # of sets         4

I just wanted to check if anyone observed smth like that before with SMLR
(before I dive into figuring out wtf), or may be Per has a clue right away
because it seems to be just a little issue in computing probability for 'the
other label'..?

-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik