[pymvpa] TreeClassifier and decision trees

Yaroslav Halchenko debian at onerussian.com
Tue Feb 15 22:51:25 UTC 2011


On Mon, 14 Feb 2011, Yaroslav Halchenko wrote:
> and GNB inability to train on a single label.... more tomorrow,
> meanwhile you can try something like

> clf = GNB
> tclf = TreeClassifier(clf(),
>                       {"g3": ([3], SVM()),
>                        "g6": ([1,2,4], TreeClassifier(clf(),
>                              {"g1": ([1], SVM()),
>                               "g5": ([2,4],clf())}))})

upon inspection, unfortunately our schema for weighting per labels is
far from being optimal and for some reason is not even in effect for
libsvm bindings -- I will leave it for TODO. Meanwhile I have pushed
into maint/0.4 branch ability to not specify classifiers in such
trailing nodes as g3 (i.e. just having None there instead of SVM()), and
stability for GNB to be able to cope with just a single label. 

For you to do not mess with installing from sources you could try
following (if you have shogun available) on your installation of PyMVPA:

tclf = TreeClassifier(sg.SVM(C=(-1., -3.)),
                      {"g3": ([3], SVM()),
                       "g6": ([1,2,4], TreeClassifier(
                           sg.SVM(C=(-1., -2)),
                           {"g1": ([1], SVM()),
                            "g5": ([2,4], SVM())}))})

which would use shogun's SVM implementation where balancing is necessary and
would balance using per-class C values.  unfortunately, once again our fault,
due to use of dictionaries (which do not guarantee the order of keys), what C
value gets associated with what branch is although deterministic might not
corresponding to the -1/+1 labels assignments for the corresponding split.
That is why if you see that you are still getting 'winner takes all' situation
-- just swap -1, -3 and -1 ,-2 accordingly in specification of C values.

We will fix it up to become more coherent in the next release:
https://github.com/PyMVPA/PyMVPA/issues/issue/40
https://github.com/PyMVPA/PyMVPA/issues/issue/41

P.S. I am not sure what was your goal for TreeClassifier, but in my
experimenting with it, it did not provide any generalization improvement over
straight multiclass SVMs/SMLR.

-- 
=------------------------------------------------------------------=
Keep in touch                                     www.onerussian.com
Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic



More information about the Pkg-ExpPsy-PyMVPA mailing list