[pymvpa] Balancing strategy
Yaroslav Halchenko
debian at onerussian.com
Sat Dec 7 05:36:08 UTC 2013
On Fri, 06 Dec 2013, J.A. Etzel wrote:
> I don't fully understand what you're asking about; to summarize: you
> want to do leave-one-run-out cross-validation. You don't want to
> include all scans, because of poor image quality. But omitting scans
> causes imbalance.
> I think your concern is that you end up with unequal numbers of
> examples of each task type in each run after getting rid of the bad
> images? If the imbalance isn't too bad (e.g. 10 examples of one
> class in the run, 8 of the other), my usual strategy is to subset
> the larger class (e.g. only using 8 of the 10 examples). Since there
> are many ways to do the subsetting, I usually suggest doing 10
> different random subsets (e.g. examples c(1:6,9,10); 2:9) and
> averaging over the subsets.
which is accomplished by Balancer... e.g.
partitioner = ChainNode([NFoldPartitioner(cvtype=1),
Balancer(attr='targets',
count=10,
limit='partitions',
apply_selection=True
)],
space='partitions')
Cheers,
> But if the imbalance is quite bad (e.g.
> only 1 or 2 of examples left of a class in a run) I sometimes change
> the partitioning (e.g. leave-two-sequentially-presented-runs-out) to
> get the balance a bit closer.
i.e. assign coarser chunks (coarsen_chunks could be of help here) and then the
same as above ;)
--
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik
More information about the Pkg-ExpPsy-PyMVPA
mailing list