[pymvpa] Splitter and nperlabel='equal'
Yaroslav Halchenko
debian at onerussian.com
Fri Mar 26 19:00:20 UTC 2010
I am sorry for feeling being didactic today, but
why don't you simply look what splitter gives you?
ie smth like what I do below on some silly dataset
In [21]: print ds.summary()
Dataset / float64 10 x 1
uniq: 2 labels 2 chunks
stats: mean=-0.0652967 std=1.17973 var=1.39176 min=-2.27894 max=1.58153
Counts of labels in each chunk:
chunks\labels 0 1
--- ---
0 1 2
1 3 4
Summary per label across chunks
label mean std min max #chunks
0 2 1 1 3 2
1 3 1 2 4 2
Summary per chunk across labels
chunk mean std min max #labels
0 1.5 0.5 1 2 2
1 3.5 0.5 3 4 2
In [22]: print '\n'.join([d.summary() for d in list(NFoldSplitter()(ds))[0]])
Dataset / float64 7 x 1
uniq: 2 labels 1 chunks
stats: mean=-0.402729 std=1.1694 var=1.36749 min=-2.27894 max=1.25512
Counts of labels in each chunk:
chunks\labels 0 1
--- ---
1 3 4
Summary per label across chunks
label mean std min max #chunks
0 3 0 3 3 1
1 4 0 4 4 1
Summary per chunk across labels
chunk mean std min max #labels
1 3.5 0.5 3 4 2
Dataset / float64 3 x 1
uniq: 2 labels 1 chunks
stats: mean=0.722045 std=0.750201 var=0.562802 min=-0.246372 max=1.58153
Counts of labels in each chunk:
chunks\labels 0 1
--- ---
0 1 2
Summary per label across chunks
label mean std min max #chunks
0 1 0 1 1 1
1 2 0 2 2 1
Summary per chunk across labels
chunk mean std min max #labels
0 1.5 0.5 1 2 2
In [23]: print '\n'.join([d.summary() for d in list(NFoldSplitter(nperlabel='equal')(ds))[0]])
Dataset / float64 6 x 1
uniq: 2 labels 1 chunks
stats: mean=-0.612826 std=1.13421 var=1.28642 min=-2.27894 max=1.25512
Counts of labels in each chunk:
chunks\labels 0 1
--- ---
1 3 3
Summary per label across chunks
label mean std min max #chunks
0 3 0 3 3 1
1 3 0 3 3 1
Summary per chunk across labels
chunk mean std min max #labels
1 3 0 3 3 2
Dataset / float64 2 x 1
uniq: 2 labels 1 chunks
stats: mean=0.292305 std=0.538677 var=0.290173 min=-0.246372 max=0.830983
Counts of labels in each chunk:
chunks\labels 0 1
--- ---
0 1 1
Summary per label across chunks
label mean std min max #chunks
0 1 0 1 1 1
1 1 0 1 1 1
Summary per chunk across labels
chunk mean std min max #labels
0 1 0 1 1 2
On Fri, 26 Mar 2010, Timothy Vickery wrote:
> Hi,
> I have a couple of questions about the nperlabel parameter of the
> Splitter class (NFoldSplitter, actually). I have unequal numbers of
> each class within each scan, and also across scans, so I have been
> manually balancing the number of exemplars used from each class in each
> chunk by throwing out random trials from the over-represented class
> before classification. I'd like to take advantage of the
> nperlabel='equal' option on my splitter to do this for me, but I have a
> couple of questions about how this affects the error rate, which I
> could not figure out from the documentation (sorry if I missed
> something obvious):
> - Suppose I am using NFoldSplitter to leave one chunk out, and I have
> 11 examples of C1 and 13 examples of C2 in chunk 1, but only 8 C1 and
> 10 C2 for chunk 2. Will the NFoldSplitter with nperlabel='equal' force
> the number of examples of each category from each chunk down to 8? Or
> will it use 11 of each class for chunk 1, and 8 of each class for chunk
> 2?
> - If it is the latter (balanced separately within chunks), how is the
> error rate determined with the CrossValidatedTransferError class? Does
> the error rate reflect the simple average error across folds (error run
> 1 + error run 2)/2, or is the average weighted by the number of
> exemplars from each fold (equivalent to the total error / total number
> of tests)? If it is averaging fold performance, is there a way to force
> it to report the overall test case performance, instead? The simple
> average over fold performance would seem to be skewed by better or
> worse performance on chunk 2 in the example above, since it has fewer
> test cases.
> Thanks for your help!
> -Tim
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
--
.-.
=------------------------------ /v\ ----------------------------=
Keep in touch // \\ (yoh@|www.)onerussian.com
Yaroslav Halchenko /( )\ ICQ#: 60653192
Linux User ^^-^^ [175555]
More information about the Pkg-ExpPsy-PyMVPA
mailing list