[pymvpa] balancing leave-one-out

Tue Aug 19 04:49:02 UTC 2014

Hi Ben,

   I did something nominally similar using Sifter (inspired by an 
earlier post from Yarick: 
http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2012q4/002322.html). 
Here's how I think you can set up your partitioning scheme.

# Make 4 dummy subjects and give them subject and group ids.
ds_all = [normal_feature_dataset(nlabels=4, snr=100, perlabel=10, 
nfeatures = 5, nchunks=5) for _ in range(4)]
for i in range(0,4,2):
     ds_all[i].sa['subject'] = np.repeat('s%s' %i, len(ds_all[i]))
     ds_all[i].sa['group'] = np.repeat('groupA', len(ds_all[i]))

for i in range(1,4,2):
     ds_all[i].sa['subject'] = np.repeat('s%s' %i, len(ds_all[i]))
     ds_all[i].sa['group'] = np.repeat('groupB', len(ds_all[i]))

# vstack the datasets
ds_all = vstack(ds_all, a=0)

# Setup the partitioner
npart = NFoldPartitioner(cvtype=2, attr='subject')

# Sift through the partitions excluding those where the test partition 
doesn't have one of each group.
# Should scale up to more than one group id.
sift = Sifter([('partitions',2), ('group', 
dict(uvalues=ds_all.sa['group'].unique, balanced=True))])

# Combine npart and sifter
part = ChainNode([npart, sift], space = 'partitions')

# Check partitions
for i, split in enumerate(list(part.generate(ds_all))):
     print 'Partition 2, Split %d:' %i
     print split[split.sa.partitions==2].sa['subject'].unique
     print split[split.sa.partitions==2].sa['group'].unique

# And of course...
clf = LinearCSVMC(space = 'group')

On 8/18/2014 10:46 PM, Ben Acland wrote:
> Okay, forget that. Simpler version of the same problem:
>
> I'd now like to leave out one subject from each group (I'm trying to decode 'sub_group'). Seems like it should be easy enough. Anyone have the quick answer on this?
>
> Ben
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa