[pymvpa] Balancing strategy

Sat Dec 7 05:36:08 UTC 2013

On Fri, 06 Dec 2013, J.A. Etzel wrote:

> I don't fully understand what you're asking about; to summarize: you
> want to do leave-one-run-out cross-validation. You don't want to
> include all scans, because of poor image quality. But omitting scans
> causes imbalance.

> I think your concern is that you end up with unequal numbers of
> examples of each task type in each run after getting rid of the bad
> images? If the imbalance isn't too bad (e.g. 10 examples of one
> class in the run, 8 of the other), my usual strategy is to subset
> the larger class (e.g. only using 8 of the 10 examples). Since there
> are many ways to do the subsetting, I usually suggest doing 10
> different random subsets (e.g. examples c(1:6,9,10); 2:9) and
> averaging over the subsets. 

which is accomplished by Balancer... e.g. 

    partitioner = ChainNode([NFoldPartitioner(cvtype=1),
                             Balancer(attr='targets',
                                      count=10,
                                      limit='partitions',
                                      apply_selection=True
                                      )],
                            space='partitions')

Cheers,

> But if the imbalance is quite bad (e.g.
> only 1 or 2 of examples left of a class in a run) I sometimes change
> the partitioning (e.g. leave-two-sequentially-presented-runs-out) to
> get the balance a bit closer.

i.e. assign coarser chunks (coarsen_chunks could be of help here) and then the
same as above ;)

-- 
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate,     Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik