[pymvpa] No samples of a class in a chunk (J.A. Etzel)

Fri Aug 9 10:59:12 UTC 2013

Dear Jo,

Thank you very much for your input! I'll play around with different 
chunk combinations and see whether I can find a solution that is better 
balanced.

Jan

On 07.08.2013 14:00 , pkg-exppsy-pymvpa-request at lists.alioth.debian.org 
wrote:
> It doesn't look like anyone's replied to this yet, so here's my two cents.
>
> I think of this sort of situation as a case of imbalance - there aren't
> equal numbers of examples of each class in each training/testing set
> (aka chunk). This happens in all sorts of situations, such as when which
> trials are included depends upon participant behavior (e.g.
> correctly-performed trials).
>
> There isn't a universally appropriate strategy to regain balance, but
> either the chunks or the examples will need to be changed.
>
> For example, in one dataset we wanted to do leave-one-run-out
> cross-validation, but the imbalance was too great (e.g. some runs with
> very few examples), so we combined runs, for leave-three-runs-out
> cross-validation. We combined temporally adjacent runs (e.g. 1-3, 4-6,
> 7-9) to make sure we didn't somehow inflate the accuracy. Depending on
> the design, you could potentially partition on something other than the
> runs to give more flexibility. If the imbalance is not too great (e.g.
> 10 of one class and 12 of the other), my usual practice is to subset the
> larger class at random, repeating the whole thing a few times (leaving
> out different examples).
>
> By changing the examples I mean strategies like averaging across
> examples within a run (or fitting parameter estimate images), so that
> instead of classifying with individual trials you have a fixed number of
> summary images (e.g. beta weights, averages) per person. In my
> experience this can really help performance, even though the number of
> samples is greatly reduced.
>
> good luck,
> Jo