[pymvpa] TreeClassifier and decision trees
Yaroslav Halchenko
debian at onerussian.com
Tue Feb 15 03:14:38 UTC 2011
On Tue, 15 Feb 2011, Thorsten Kranz wrote:
> If I have 4 labels in my data, the tree I want to use might look like:
> /\
> / \
> / \
> 3 / \
> 1 /\
> 2 4
so what happens, as you correctly pointed out, there comes heavy
disbalance at each node besides 2 and 4; that is why classifier, if
decision is not obvious, goes after majoritylabel-takes-all with SVM.
Therefor first it chooses (1,2,4), then (2,4) and only then decides
correctly between the two which come to the classifier balanced.
Logical would be per label weighting to compensate (checkout
weight_label in SVM) or some other classifier which is not prone to such
"race" conditions, e.g. GNB... but your example brought up an
"interesting" usecase which shows problems with TreeClassifier
assumptions (e.g. there should be no dangling single-class choice)
and GNB inability to train on a single label.... more tomorrow,
meanwhile you can try something like
clf = GNB
tclf = TreeClassifier(clf(),
{"g3": ([3], SVM()),
"g6": ([1,2,4], TreeClassifier(clf(),
{"g1": ([1], SVM()),
"g5": ([2,4],clf())}))})
--
=------------------------------------------------------------------=
Keep in touch www.onerussian.com
Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic
More information about the Pkg-ExpPsy-PyMVPA
mailing list