[pymvpa] No samples of a class in a chunk

Tue Aug 6 21:04:12 UTC 2013

It doesn't look like anyone's replied to this yet, so here's my two cents.

I think of this sort of situation as a case of imbalance - there aren't 
equal numbers of examples of each class in each training/testing set 
(aka chunk). This happens in all sorts of situations, such as when which 
trials are included depends upon participant behavior (e.g. 
correctly-performed trials).

There isn't a universally appropriate strategy to regain balance, but 
either the chunks or the examples will need to be changed.

For example, in one dataset we wanted to do leave-one-run-out 
cross-validation, but the imbalance was too great (e.g. some runs with 
very few examples), so we combined runs, for leave-three-runs-out 
cross-validation. We combined temporally adjacent runs (e.g. 1-3, 4-6, 
7-9) to make sure we didn't somehow inflate the accuracy. Depending on 
the design, you could potentially partition on something other than the 
runs to give more flexibility. If the imbalance is not too great (e.g. 
10 of one class and 12 of the other), my usual practice is to subset the 
larger class at random, repeating the whole thing a few times (leaving 
out different examples).

By changing the examples I mean strategies like averaging across 
examples within a run (or fitting parameter estimate images), so that 
instead of classifying with individual trials you have a fixed number of 
summary images (e.g. beta weights, averages) per person. In my 
experience this can really help performance, even though the number of 
samples is greatly reduced.

good luck,
Jo

On 8/2/2013 6:49 AM, Jan Derrfuss wrote:
> Hello,
>
> I would like to run an exploratory analysis where the presence of
> samples in the chunks was not under experimental control. As a result,
> there are chunks where only one of the two target classes I'm decoding
> is present in the chunk the classifier is tested on (there is also a
> single chunk in one subject where neither of the two classes is
> present). I'm running a searchlight analysis with 6-fold
> cross-validation, use a linear SVM, and compute the mean true positive
> rate.
>
> Is there a preferred way to deal with such a situation?
>
> Jan
>
> PS. The temperature here in the Lower Rhine region is currently 33 °C
> (91 °F). Wherever you are, I hope it's colder there! :-)
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

-- 
Joset A. Etzel, Ph.D.
Research Analyst
Cognitive Control & Psychopathology Lab
Washington University in St. Louis
http://mvpa.blogspot.com/