[pymvpa] Dataset with multidimensional feature vector per voxel
    Yaroslav Halchenko 
    debian at onerussian.com
       
    Thu Nov 19 14:40:50 UTC 2015
    
    
  
On Thu, 19 Nov 2015, Ulrike Kuhl wrote:
> Dear Yaroslav, dear all,
> I might have solved the balancing problem using the pyMVPA's 'Balancer' (duh!).
> I extended the code of the partitioner like this:
> npart = ChainNode([
>   NFoldPartitioner(len(DS_noisy.sa['targets'].unique),
>                          attr='chunks'),
>   ## so it should select only those splits where we took 1 from
>   ## each of the targets categories leaving things in balance
>   Sifter([('partitions', 2),
>                 ('targets',
>                  { 'uvalues': DS_noisy.sa['targets'].unique,
>                    'balanced': True})
>                 ]),
>   Balancer(attr='targets',count=1,limit='partitions',apply_selection=True)
>   ], space='partitions')
> The classification result on noisy data looks perfect even on imbalanced group sizes - is it correct to do it like this?
it is correct BUT  how imbalanced your imbalance -- in the example you
gave they were balanced so no Balancer was due.
Since Balancer randomly subsamples your samples,  you might want to
1. mvpa2.seed(1) # or another other number
to get reproducible results
2. increase count to > 1 so you get a more stable estimate
> Also, I would still like know how I can see how the individual partitions look like.
smth like
[ds.sa.partitions for ds in partitioner.generate(ds)]
?  but since 'apply_selection=True' you would see them as subsampled.
if you do apply_selection=False I think this should work
[ds.sa.balanced_set for ds in partitioner.generate(ds)]
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        
    
    
More information about the Pkg-ExpPsy-PyMVPA
mailing list