[pymvpa] Dataset with multidimensional feature vector per voxel

Thu Nov 19 14:40:50 UTC 2015

On Thu, 19 Nov 2015, Ulrike Kuhl wrote:

> Dear Yaroslav, dear all,

> I might have solved the balancing problem using the pyMVPA's 'Balancer' (duh!).
> I extended the code of the partitioner like this:

> npart = ChainNode([
>   NFoldPartitioner(len(DS_noisy.sa['targets'].unique),
>                          attr='chunks'),
>   ## so it should select only those splits where we took 1 from
>   ## each of the targets categories leaving things in balance
>   Sifter([('partitions', 2),
>                 ('targets',
>                  { 'uvalues': DS_noisy.sa['targets'].unique,
>                    'balanced': True})
>                 ]),
>   Balancer(attr='targets',count=1,limit='partitions',apply_selection=True)
>   ], space='partitions')

> The classification result on noisy data looks perfect even on imbalanced group sizes - is it correct to do it like this?

it is correct BUT  how imbalanced your imbalance -- in the example you
gave they were balanced so no Balancer was due.

Since Balancer randomly subsamples your samples,  you might want to

1. mvpa2.seed(1) # or another other number
to get reproducible results

2. increase count to > 1 so you get a more stable estimate

> Also, I would still like know how I can see how the individual partitions look like.

smth like

[ds.sa.partitions for ds in partitioner.generate(ds)]

?  but since 'apply_selection=True' you would see them as subsampled.
if you do apply_selection=False I think this should work

[ds.sa.balanced_set for ds in partitioner.generate(ds)]

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik