[pymvpa] No samples of a class in a chunk
J.A. Etzel
jetzel at artsci.wustl.edu
Tue Aug 6 21:04:12 UTC 2013
It doesn't look like anyone's replied to this yet, so here's my two cents.
I think of this sort of situation as a case of imbalance - there aren't
equal numbers of examples of each class in each training/testing set
(aka chunk). This happens in all sorts of situations, such as when which
trials are included depends upon participant behavior (e.g.
correctly-performed trials).
There isn't a universally appropriate strategy to regain balance, but
either the chunks or the examples will need to be changed.
For example, in one dataset we wanted to do leave-one-run-out
cross-validation, but the imbalance was too great (e.g. some runs with
very few examples), so we combined runs, for leave-three-runs-out
cross-validation. We combined temporally adjacent runs (e.g. 1-3, 4-6,
7-9) to make sure we didn't somehow inflate the accuracy. Depending on
the design, you could potentially partition on something other than the
runs to give more flexibility. If the imbalance is not too great (e.g.
10 of one class and 12 of the other), my usual practice is to subset the
larger class at random, repeating the whole thing a few times (leaving
out different examples).
By changing the examples I mean strategies like averaging across
examples within a run (or fitting parameter estimate images), so that
instead of classifying with individual trials you have a fixed number of
summary images (e.g. beta weights, averages) per person. In my
experience this can really help performance, even though the number of
samples is greatly reduced.
good luck,
Jo
On 8/2/2013 6:49 AM, Jan Derrfuss wrote:
> Hello,
>
> I would like to run an exploratory analysis where the presence of
> samples in the chunks was not under experimental control. As a result,
> there are chunks where only one of the two target classes I'm decoding
> is present in the chunk the classifier is tested on (there is also a
> single chunk in one subject where neither of the two classes is
> present). I'm running a searchlight analysis with 6-fold
> cross-validation, use a linear SVM, and compute the mean true positive
> rate.
>
> Is there a preferred way to deal with such a situation?
>
> Jan
>
> PS. The temperature here in the Lower Rhine region is currently 33 °C
> (91 °F). Wherever you are, I hope it's colder there! :-)
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
--
Joset A. Etzel, Ph.D.
Research Analyst
Cognitive Control & Psychopathology Lab
Washington University in St. Louis
http://mvpa.blogspot.com/
More information about the Pkg-ExpPsy-PyMVPA
mailing list