[pymvpa] No samples of a class in a chunk

jason gors jason.d.gors at dartmouth.edu
Thu Aug 8 14:28:47 UTC 2013

> ------------------------------
> Message: 2
> Date: Wed, 07 Aug 2013 15:02:03 -0500
> From: "J.A. Etzel" <jetzel at artsci.wustl.edu>
> To: pkg-exppsy-pymvpa at lists.alioth.debian.org
> Subject: Re: [pymvpa] Pkg-ExpPsy-PyMVPA Digest, Vol 66, Issue 4
> Message-ID: <5202A7BB.9050208 at artsci.wustl.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Yes, I'm thinking of the usual assumption that adjacent runs may be more
> similar because of temporal dependencies (scanner drifts, subject
> fatigue, etc.).
> My suggestion for combining temporally adjacent runs is partly motivated
> by a desire to keep things simple: people often figure that
> leave-one-run-out is a reasonably sensible and conservative
> cross-validation scheme. But when we can't do that (such as in my
> example, when we have a lot of short runs and massive imbalance in the
> number of trials/run), the next best (simplest) thing seems combining
> adjacent runs. Pragmatically, there are a lot fewer ways to combine
> adjacent runs than random run subsets, also making the choice more
> appealing (I'd be wondering why if I saw a single random
> cross-validation scheme was used).
> But I don't want to claim that combining temporally adjacent runs is the
> only legitimate technique, or even to be preferred on strict theoretical
> (as opposed to pragmatic) grounds! I suggested it as something
> relatively straightforward (and hopefully safe) to consider.
> Jo
> Also, to cite myself, a few years ago I played with different
> cross-validation schemes systematically, including suggestions for
> justifying partitioning other than on the runs, in "The impact of
> certain methodological choices on multivariate analysis of fMRI data
> with support vector machines".
> http://dx.doi.org/10.1016/j.neuroimage.2010.08.050

Okey dokey.  The reason I ask is because I am currently in this very
situation -- where due to subject performance on the task, I am going to
have to exclude some of the data/trials from each run, thus was in the
process of deciding how to proceed, and one way of going about this that I
was kicking around, was, as you suggest, to combine runs to have
better/more representative samples from each stimulus category; however,
was just curious if there was some methodological reason for those runs to
be temporally adjacent.  I was curious because my intuition was the
opposite, that instead of combining runs in this way to prevent inflated
accuracy scores, it seemed to me that it might be limiting the potential
accuracy scores (same thing, just a different way of looking at it).  I
know this is the approach folks take in doing time series predictive
analysis, but hadn't considered this for fmri data so was just wondering.
 Also, thanks for the link; I was wondering about how folding on subsets of
the data other than just the runs would work...i'll have a look.


Jason Gors
Dartmouth College
Dept. of Psychological and Brain Sciences
6207 Moore Hall
Hanover, NH 03755
Phone: (603) 646-9689    Fax: (603) 646-1419
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20130808/16bcce64/attachment.html>

More information about the Pkg-ExpPsy-PyMVPA mailing list