[pymvpa] TreeClassifier and decision trees

Thorsten Kranz thorstenkranz at googlemail.com
Tue Feb 15 23:07:04 UTC 2011


Hey Yarik,

thanks a lot for your work. It's really nice to have such quick
reactions for problems with a library.

The installation from source is not a problem, I am only testing this
approach on one machine, so I won't have to do it an all computers
where PyMVPA is installed.

Actually, I used TreeClassifier already before, for classification of
iEEG data, very multiclass-style with 16 class labels. These labels
correspond to 16 different stimuli, and these come from four
categories. For cross validation, I balanced the data to have equal
number of trials for each class (due to artifact rejection this isn't
the case in my dataset) using a custom splitter object.

The first approach of course was straight SVM (SMLR didn't perform too
well), but TreeClassifiers, first deciding on the category, later on
the exact label, improved the results. Not by orders, but noticeably.

What I try to do now is a little bit more complicated. Don't know if
it makes sense, will have to try it tomorrow, but I will let you know
if it works.

Once again thank you for your support, greetings,

Thorsten

What


2011/2/15 Yaroslav Halchenko <debian at onerussian.com>:
>
> On Mon, 14 Feb 2011, Yaroslav Halchenko wrote:
>> and GNB inability to train on a single label.... more tomorrow,
>> meanwhile you can try something like
>
>> clf = GNB
>> tclf = TreeClassifier(clf(),
>>                       {"g3": ([3], SVM()),
>>                        "g6": ([1,2,4], TreeClassifier(clf(),
>>                              {"g1": ([1], SVM()),
>>                               "g5": ([2,4],clf())}))})
>
> upon inspection, unfortunately our schema for weighting per labels is
> far from being optimal and for some reason is not even in effect for
> libsvm bindings -- I will leave it for TODO. Meanwhile I have pushed
> into maint/0.4 branch ability to not specify classifiers in such
> trailing nodes as g3 (i.e. just having None there instead of SVM()), and
> stability for GNB to be able to cope with just a single label.
>
> For you to do not mess with installing from sources you could try
> following (if you have shogun available) on your installation of PyMVPA:
>
> tclf = TreeClassifier(sg.SVM(C=(-1., -3.)),
>                      {"g3": ([3], SVM()),
>                       "g6": ([1,2,4], TreeClassifier(
>                           sg.SVM(C=(-1., -2)),
>                           {"g1": ([1], SVM()),
>                            "g5": ([2,4], SVM())}))})
>
> which would use shogun's SVM implementation where balancing is necessary and
> would balance using per-class C values.  unfortunately, once again our fault,
> due to use of dictionaries (which do not guarantee the order of keys), what C
> value gets associated with what branch is although deterministic might not
> corresponding to the -1/+1 labels assignments for the corresponding split.
> That is why if you see that you are still getting 'winner takes all' situation
> -- just swap -1, -3 and -1 ,-2 accordingly in specification of C values.
>
> We will fix it up to become more coherent in the next release:
> https://github.com/PyMVPA/PyMVPA/issues/issue/40
> https://github.com/PyMVPA/PyMVPA/issues/issue/41
>
> P.S. I am not sure what was your goal for TreeClassifier, but in my
> experimenting with it, it did not provide any generalization improvement over
> straight multiclass SVMs/SMLR.
>
> --
> =------------------------------------------------------------------=
> Keep in touch                                     www.onerussian.com
> Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>



More information about the Pkg-ExpPsy-PyMVPA mailing list