[pymvpa] Suspicious results
J.A. Etzel
jetzel at artsci.wustl.edu
Mon Feb 28 23:31:58 UTC 2011
Lots of good advice in this thread. I'll mention a few additional things
(in no particular order).
First, I'd check the data and chunk labeling for errors; having
mislabeled cases can lead to some very bizarre patterns in the results
(and I'd call yours highly suspicious) ... and I'm speaking from
experience. :)
When temporal compression is done by averaging (as it sounds like you
did; what was the TR and event length? How many images averaged for each
event? How much time-forwarding?) detrending the voxels is, in my
experience, nearly always required. I like to plot some voxels
(intensity in each summary image, in acquisition order). Often you can
see clear "jumps" in these plots at breaks (e.g. between blocks) or
trends over time. Some image preprocessing software tries to minimize
these trends (e.g. by fitting a linear model or temporal filtering), but
something has to be done to minimize these trends, as they pretty much
always occur (partly due to scanner drift, motion, etc.). You could
check a couple of my methods papers
(http://dx.doi.org/10.1016/j.neuroimage.2010.08.050
http://dx.doi.org/10.1016/j.brainres.2009.05.090) for more details.
If I'm understanding properly the data is from one 15-minute run,
containing 38 events each of two types. As Francisco already mentioned,
you need to be very, very careful about incorporating the ordering and
timing of these events into your analysis. If you have less than 10
seconds between events (which you probably do at least sometimes) you
will almost certainly have quite a bit of overlaps in the BOLDs from the
different events. That's not necessarily fatal to an analysis, but has
to be taken into account.
Why use a non-linear SVM? As a first step it's often good to start with
something linear (e.g. linear SVM) or distance (e.g. Mahalanobis) in the
searchlights. Even if there's a theoretical reason for using something
nonlinear or a specific classifier I'd be inclined to try a linear svm
for comparison.
Also, if you have 38 examples of each type in your run I'd probably do
some sort of partitioning other than leave-one-out; perhaps leave-5-out.
It's nice to have a large training set, but if the testing set is too
small the variance can get large, hurting significance and stability.
The timing concerns will influence this quite a bit, though: for
example, you may need to partition so that adjacent trials are always in
the same partition.
good luck,
Jo
More information about the Pkg-ExpPsy-PyMVPA
mailing list