[pymvpa] TreeClassifier and decision trees

Yaroslav Halchenko debian at onerussian.com
Tue Feb 15 03:14:38 UTC 2011


On Tue, 15 Feb 2011, Thorsten Kranz wrote:
> If I have 4 labels in my data, the tree I want to use might look like:

>             /\
>            /  \
>           /    \
>          3    / \
>              1  /\
>                2 4

so what happens, as you correctly pointed out,  there comes heavy
disbalance at each node besides 2 and 4; that is why classifier, if
decision is not obvious, goes after majoritylabel-takes-all with SVM.

Therefor first it chooses (1,2,4), then (2,4) and only then decides
correctly between the two which come to the classifier balanced.

Logical would be per label weighting to compensate (checkout
weight_label in SVM) or some other classifier which is not prone to such
"race" conditions, e.g. GNB... but your example brought up an
"interesting" usecase which shows problems with TreeClassifier
assumptions (e.g. there should be no dangling single-class choice)
and GNB inability to train on a single label.... more tomorrow,
meanwhile you can try something like

clf = GNB
tclf = TreeClassifier(clf(),
                      {"g3": ([3], SVM()),
                       "g6": ([1,2,4], TreeClassifier(clf(),
                             {"g1": ([1], SVM()),
                              "g5": ([2,4],clf())}))})

-- 
=------------------------------------------------------------------=
Keep in touch                                     www.onerussian.com
Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic



More information about the Pkg-ExpPsy-PyMVPA mailing list