[pymvpa] TreeClassifier and decision trees

Mon Feb 14 23:23:10 UTC 2011

Hi all,

I try to reproduce rather complicated decision trees with your
TreeClassifier class. I'm still using 0.4-branch.

If I have 4 labels in my data, the tree I want to use might look like:

            /\
           /  \
          /    \
         3    / \
             1  /\
               2 4

I have now two questions:
1) How do I define the corresponding TreeClassifier? With only one
label in some branches...
 TreeClassifier(SVM(), {"g3": ([3],SVM()),
"g6":([1,2,4],TreeClassifier(SVM(),
{"g1":([1],SVM()),"g5":([2,4],SVM())])])
 It looks strange to me to give a classifier for "g3", if there is
only one label in g3... Same for g1.
2) When I do so, with same number of samples for every label,  the
predictions are always one of (2,4), as there seems to be a bias. For
decision one, there are always more samples in "g6" than in "g3", for
decision 3, "g5" always outnumbers "g1". The training sets are not
balanced this way.

I hope it was understandable what I tried to explain,

many greetings,

Thorsten