[pymvpa] custom partitioner

Wed May 18 02:26:11 UTC 2016

On Tue, 17 May 2016, Wolfgang Pauli wrote:

> Hi,

> I think I just had a shocking revelation. I tried to do cross validation
> with a custom partitioner like so:

> splt_rule = [([0,1],[6,7]),([1,2],[4,7]),([2,3],[4,5]),([3,0],[5,6])]
> partitioner = CustomPartitioner(splitrule=splt_rule, attr='chunks')

> For example, i thought the classifier would be trained on chunks 0 and 1,
> and tested on 6 and 7. during cross-validation. However, when I used
> actually generated the partitions and looked which chunks are in each
> partition, I found that the partitioner would actually create three
> partitions, two as specified by the split rule, and one containing the
> remaining chunks.

> I.e. instead of getting e.g. [0,1],[6,7], I would get ([2, 3, 4,
> 5],[0,1],[6,7]).

> Is this correct? How can I keep it from creating that third partition with
> the remaining items?

well, in general partitioners are implemented so they don't cause any
memory impact and are fast... for that they just assign partitioning as
a new sample attribute, and do not split dataset into partitions.  That
job is later done by Splitter, e.g. within CrossValidation.  Within
CrossValidation, that splitter cares only about partitions labeled as 1
(for training) and 2 (for testing).  The others it ignores.

If you really need to select those partitions 1 and 2 asap, I guess just
use smth like

partitioned_ds.select(partitions=[1, 2])

?

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik