[pymvpa] 2fold balanced partitioner
basile pinsard
basile.pinsard at gmail.com
Tue Jul 12 20:54:19 UTC 2016
Hi everybody,
It seems pretty simple but I cannot find a way to have a sensible 2fold
partitioner with balanced targets, feeding a balanced dataset (4 classes x
8 samples).
Optionally it would be sensible if all samples are used in testing the same
number of times.
ChainNode(
[ NFoldPartitioner(
attr='chunks',
cvtype=.5,
count=128,
selection_strategy='random'),
Sifter([('partitions', 2),('targets', dict(balanced=True)) ]) ])
does generate balanced partitions, but will have variable number of cv
folds, which is a problem.
ChainNode(
[ NFoldPartitioner(
attr='chunks',
cvtype=.5,
count=32,
selection_strategy='random'),
Balancer(
amount='equal',
attr='targets',
count=1,
apply_selection=False,
limit=['partitions'],
include_offlimit=True)])
does balance the partitions by eliminating some sample but this reduces the
number of samples in training/testing sets, and not in a consistent way
across folds.
FactorialPartitioner does the job but the count parameter is not working
(generate method is overloaded), then it's combinatorial yield thousands of
splits which is a bit much.
Would there be a way to repeatedly splits randomly taking half of each
classes samples in both of the partitions?
Or maybe should we make FactorialPartitioner to respect Partitioner
prototype (count/strategy parameters)?
Thanks.
basile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20160712/ed697e0c/attachment.html>
More information about the Pkg-ExpPsy-PyMVPA
mailing list