[pymvpa] Suspicious results

Francisco Pereira francisco.pereira at gmail.com
Tue Mar 1 00:01:59 UTC 2011


Jo and Michael both make really good points. If you have some natural
division that acts as an epoch that is balanced, with more time than
usual between trials, leaving each epoch out in turn would allow class
balancing and also ensure that possible temporal correlation between
successive examples wouldn't hurt you (just change the computation of
result significance). If you can't separate them into epochs, you
could always eliminate one or two examples between the training and
test sets.


Francisco

On Mon, Feb 28, 2011 at 6:31 PM, J.A. Etzel <jetzel at artsci.wustl.edu> wrote:
> Lots of good advice in this thread. I'll mention a few additional things (in
> no particular order).
>
> First, I'd check the data and chunk labeling for errors; having mislabeled
> cases can lead to some very bizarre patterns in the results (and I'd call
> yours highly suspicious) ... and I'm speaking from experience. :)
>
> When temporal compression is done by averaging (as it sounds like you did;
> what was the TR and event length? How many images averaged for each event?
> How much time-forwarding?) detrending the voxels is, in my experience,
> nearly always required. I like to plot some voxels (intensity in each
> summary image, in acquisition order). Often you can see clear "jumps" in
> these plots at breaks (e.g. between blocks) or trends over time. Some image
> preprocessing software tries to minimize these trends (e.g. by fitting a
> linear model or temporal filtering), but something has to be done to
> minimize these trends, as they pretty much always occur (partly due to
> scanner drift, motion, etc.). You could check a couple of my methods papers
> (http://dx.doi.org/10.1016/j.neuroimage.2010.08.050
> http://dx.doi.org/10.1016/j.brainres.2009.05.090) for more details.
>
> If I'm understanding properly the data is from one 15-minute run, containing
> 38 events each of two types. As Francisco already mentioned, you need to be
> very, very careful about incorporating the ordering and timing of these
> events into your analysis. If you have less than 10 seconds between events
> (which you probably do at least sometimes) you will almost certainly have
> quite a bit of overlaps in the BOLDs from the different events. That's not
> necessarily fatal to an analysis, but has to be taken into account.
>
> Why use a non-linear SVM? As a first step it's often good to start with
> something linear (e.g. linear SVM) or distance (e.g. Mahalanobis) in the
> searchlights. Even if there's a theoretical reason for using something
> nonlinear or a specific classifier I'd be inclined to try a linear svm for
> comparison.
>
> Also, if you have 38 examples of each type in your run I'd probably do some
> sort of partitioning other than leave-one-out; perhaps leave-5-out. It's
> nice to have a large training set, but if the testing set is too small the
> variance can get large, hurting significance and stability. The timing
> concerns will influence this quite a bit, though: for example, you may need
> to partition so that adjacent trials are always in the same partition.
>
> good luck,
> Jo
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>



More information about the Pkg-ExpPsy-PyMVPA mailing list