[pymvpa] SMLR needs fixing?

Per B. Sederberg persed at princeton.edu
Wed Jun 4 15:27:52 UTC 2008


Fascinating!

Is it the case that you have different numbers for each label type and
specifically for the two that you are switching.  If so, then the
models it is using to fit the data are actually different for the two
conditions and I would expect the differences you see.

Perhaps the safe thing to do whenever your N are different for each
label is to just train the full model.  Although it is slightly
slower, the weights you get from training the full model are actually
meaningful.  They are not meaningful for a multiclass classification
if you train the N-1 model.

How about we set the default to be to always train the full model?

Best,
Per

On Wed, Jun 4, 2008 at 11:19 AM, Yaroslav Halchenko
<debian at onerussian.com> wrote:
> SMLR confuses me a bit... it seems that classification (generalization actually
> but I don't think that it is a problem anywhere outside of SMLR although could
> be) is heavily dependent on the order of labels... sorry for being cryptic --
> in a data with 5 labels, which I code to numbers from 0 to 4 here are 2
> outcomes where in 2nd one labels 0 and 4 are interchanged (as you can see by
> total number of samples for that category in P)
>
> ----------.
> predictions\targets    0      1      2      3      4
>            `------  -----  -----  -----  -----  -----  P'  N' FP FN  PPV  NPV  TPR  SPC  FDR  MCC
>         0            105     0      0      0      1   106 303  1  0 0.99   1    1    1  0.01 0.95
>         1             0     82      0      0     22   104 328 22  2 0.79 0.99 0.98 0.94 0.21 0.83
>         2             0      0     84      0      4    88 324  4  0 0.95   1    1  0.99 0.05 0.93
>         3             0      0      0     84      4    88 324  4  0 0.95   1    1  0.99 0.05 0.93
>         4             0      2      0      0     53    55 386  2 31 0.96 0.92 0.63 0.99 0.04 0.74
>    Per target:      -----  -----  -----  -----  -----
>         P            105    84     84     84     84
>         N            336    357    357    357    357
>         TP           105    82     84     84     53
>         TN           303    326    324    324    355
>      SUMMARY:       -----  -----  -----  -----  -----
>        ACC          0.93
>        ACC%         92.52
>     # of sets         4
>
> ----------.
> predictions\targets    0      1      2      3      4
>            `------  -----  -----  -----  -----  -----  P'  N' FP  FN  PPV  NPV  TPR  SPC  FDR  MCC
>         0            84      1      0      0     28   113 256 29  0  0.74   1    1   0.9 0.26 0.73
>         1             0     83      0      0     23   106 258 23  1  0.78   1  0.99 0.92 0.22 0.74
>         2             0      0     84      0     21   105 256 21  0   0.8   1    1  0.92  0.2 0.76
>         3             0      0      0     84     28   112 256 28  0  0.75   1    1   0.9 0.25 0.73
>         4             0      0      0      0      5    5  435  0 100   1  0.77 0.05   1    0  0.19
>    Per target:      -----  -----  -----  -----  -----
>         P            84     84     84     84     105
>         N            357    357    357    357    336
>         TP           84     83     84     84      5
>         TN           256    257    256    256    335
>      SUMMARY:       -----  -----  -----  -----  -----
>        ACC          0.77
>        ACC%         77.1
>     # of sets         4
>
> and if fit_all_weights == True, then results are consistent and much nicer:
>
> ----------.
> predictions\targets    0      1      2      3      4
>            `------  -----  -----  -----  -----  -----  P'  N' FP FN  PPV NPV  TPR SPC  FDR  MCC
>         0            105     0      0      0      0   105 334  0  0   1   1    1   1    0    1
>         1             0     83      0      0      0    83 357  0  1   1   1  0.99  1    0  0.99
>         2             0      0     84      1      0    85 355  1  0 0.99  1    1   1  0.01 0.99
>         3             0      0      0     83      0    83 357  0  1   1   1  0.99  1    0  0.99
>         4             0      1      0      0     84    85 355  1  0 0.99  1    1   1  0.01 0.99
>    Per target:      -----  -----  -----  -----  -----
>         P            105    84     84     84     84
>         N            336    357    357    357    357
>         TP           105    83     84     83     84
>         TN           334    356    355    356    355
>      SUMMARY:       -----  -----  -----  -----  -----
>        ACC            1
>        ACC%         99.55
>     # of sets         4
>
> ----------.
> predictions\targets    0      1      2      3      4
>            `------  -----  -----  -----  -----  -----  P'  N' FP FN  PPV NPV  TPR SPC  FDR  MCC
>         0            84      1      0      0      0    85 355  1  0 0.99  1    1   1  0.01 0.99
>         1             0     83      0      0      0    83 357  0  1   1   1  0.99  1    0  0.99
>         2             0      0     84      1      0    85 355  1  0 0.99  1    1   1  0.01 0.99
>         3             0      0      0     83      0    83 357  0  1   1   1  0.99  1    0  0.99
>         4             0      0      0      0     105  105 334  0  0   1   1    1   1    0    1
>    Per target:      -----  -----  -----  -----  -----
>         P            84     84     84     84     105
>         N            357    357    357    357    336
>         TP           84     83     84     83     105
>         TN           355    356    355    356    334
>      SUMMARY:       -----  -----  -----  -----  -----
>        ACC            1
>        ACC%         99.55
>     # of sets         4
>
>
>
> I just wanted to check if anyone observed smth like that before with SMLR
> (before I dive into figuring out wtf), or may be Per has a clue right away
> because it seems to be just a little issue in computing probability for 'the
> other label'..?
>
> --
> Yaroslav Halchenko
> Research Assistant, Psychology Department, Rutgers-Newark
> Student  Ph.D. @ CS Dept. NJIT
> Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> WWW:     http://www.linkedin.com/in/yarik
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>



More information about the Pkg-ExpPsy-PyMVPA mailing list