[pymvpa] 2fold balanced partitioner

basile pinsard basile.pinsard at gmail.com
Tue Jul 12 20:54:19 UTC 2016


Hi everybody,

It seems pretty simple but I cannot find a way to have a sensible 2fold
partitioner with balanced targets, feeding a balanced dataset (4 classes x
8 samples).
Optionally it would be sensible if all samples are used in testing the same
number of times.

ChainNode(

    [ NFoldPartitioner(
        attr='chunks',
        cvtype=.5,
        count=128,
        selection_strategy='random'),
      Sifter([('partitions', 2),('targets', dict(balanced=True)) ]) ])
does generate balanced partitions, but will have variable number of cv
folds, which is a problem.


ChainNode(
    [ NFoldPartitioner(
        attr='chunks',
        cvtype=.5,

count=32,

        selection_strategy='random'),

Balancer(

            amount='equal',

attr='targets',

            count=1,

apply_selection=False,

            limit=['partitions'],
            include_offlimit=True)])
does balance the partitions by eliminating some sample but this reduces the
number of samples in training/testing sets, and not in a consistent way
across folds.

FactorialPartitioner does the job but the count parameter is not working
(generate method is overloaded), then it's combinatorial yield thousands of
splits which is a bit much.

Would there be a way to repeatedly splits randomly taking half of each
classes samples in both of the partitions?
Or maybe should we make FactorialPartitioner to respect Partitioner
prototype (count/strategy parameters)?

Thanks.

basile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20160712/ed697e0c/attachment.html>


More information about the Pkg-ExpPsy-PyMVPA mailing list