[pymvpa] Help needed untangling searchlight accuracy bias

Thu Oct 20 18:45:17 UTC 2011

Hi everyone,

Thanks in advance for any help and sorry for the length of this:

I've been performing a searchlight analysis on fMRI data from an
auditory experiment I conducted. Running a whole-brain searchlight on
binary comparisons is showing a consistent right-bias in accuracies.
Typically I can see a smallish peak centered directly on 50% and then
a much larger peak in the 65% range. I'm pretty sure something fishy
is going on, as I scrambled the order of my runs in the attribute file
and get very similar histogram plots. Also, I reran some of my
analyses with a searchlight radius of 0 and am still seeing this huge
rightward bias in accuracies (the majority of voxels are showing
above-chance level performance). This looks to be the case for all the
individual participants I've tested so far.

I've read the entire "suspicious results" thread from earlier this
year and don't seem to be doing any of the suggested mistakes.
Likewise, MRICron is reporting the slope1 of both the 4D input and the
3D searchlight output to be 1.000.

I'm going to briefly state my experimental design and order of steps
in my script. I feel as if I'm doing something majorly wrong, but I
can't spot it. Thanks in advance for your time and help!
.
The experiment had 9 short runs of a few minutes each. Images were
acquired via sparse sampling (due to the auditory stimuli): 2 seconds
to acquire an image, then 7.5 seconds of "quiet" in which to present
the next sound. While there could be a slight contamination of the n+1
volume without only 9.5 seconds of inter-scan-interval, randomization
should attenuate this. Each run had 39 volumes. Several of these were
"rest," 3 were cue sounds to perform an are-you-awake orthogonal task
(these were labelled uniquely and discarded in pymvpa), and the rest
were the experimental sounds. The experimental sounds were formed in a
3x3 design, although for all these initial analyses I'm collapsing
across one of the dimensions (pitch-related): both dimensions were
randomized and balanced for order/number of trials in the same manner.
So there were 9 unique sounds (3x3), which were each presented 3x per
run, making up 27 of the 39 volumes. Each repetition of these 9 unique
sounds (in a semi-random order) were separated from the next by a
couple of rest trials, so the experiment can also be seen as having 27
"little" runs or 9 "big" runs. I've initially gone with the 27-run
route (i.e. created an attribute file with 27 chunks), mainly for the
somewhat strange reason that (a) I couldn't get the LinearCSVMC
classifier to work without using pymvpa's "mean_group_sample" first
and (b) if I ran such averaging with only 9 runs, that would leave
very few (though probably cleaner) examples for the classifier to work
on. Since I'm collapsing across one of the two dimensions,
run-averaging using the "big" runs would leave only 3 conditions X 1
example X 9 runs.

My ultimate goal in this experiment is to attempt to show areas of
discriminability that are common to all three pairwise classifiers in
this first dimension, but don't carry the same kind of information in
that second (and for now, disregarded) dimension. My current
searchlight problem is that, even with a radius of 0 voxels, and even
with using permuted class labels that should be giving the SVM
classifier garbage information, all three pairwise classifiers have
been showing a vaguely-normal looking accuracy peak at 60-65% correct.
Note that each time I run the script, I remove one of the 3
targets/volumes of interest, so that I'm attempting binary
classification which should have chance level of 50%. There's also a
smaller but visible peak with center directly on 50%. Even stranger,
if I call these 3 pairwise searchlights A,B and C, and the
searchlights that result from the same scripts but run with a permuted
attribute file A*, B* and C*, it's very clear that the histograms for
A and A* look almost identical (likewise for B and B*, C and C*). Even
between subjects, it's very clear which of the 3 pairwise analyses has
been performed just by looking at the searchlight accuracy histogram's
characteristic shape. I've attached a few screenshots of these. All of
this makes me believe that somehow my training and testing data must
be getting polluted.

Finally, my steps in pymvpa (timeseries has already been motion
corrected in FSL… I've also tried this with unsmoothed and slightly
smoothed data):

1. Load attributes and dataset, remove targets from orthogonal task
2. Preprocess: run average, poly detrend, zscore against rest (all of
these by chunk)
3. Remove "rest" examples and the examples from the 1 of 3 conditions
I'm not comparing this time through
4. Define and run classifier and searchlight (I'll cut and paste the
actual code for this part):

# [7] define a classifier and then combine classifier and
cross-validation scheme
clf = LinearCSVMC()
cv = CrossValidation(clf, NFoldPartitioner())#by default, NFold leaves
one chunk out at a time

# [8] set up searchlight
sl = sphere_searchlight(cv, radius=0, space='voxel_indices',
center_ids=center_ids, postproc=mean_sample())

# [9] run searchlight on ds
s1_map = sl(dataset)

# [10] convert into percent-wise accuracies
s1_map.samples *= -1
s1_map.samples += 1
s1_map.samples *= 100

# [11] put accuracies into Nifti
niftiresults=map2nifti(s1_map,imghdr=dataset.a.imghdr)
niftiresults.to_filename('results/searchlight_radius0/outputFile.nii')

THANK YOU again for any help you can provide! I'm happy to provide
more detail if it's needed.
Best,
Mike Klein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: histogramsJPG.zip
Type: application/zip
Size: 153742 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20111020/2443df57/attachment-0001.zip>