[pymvpa] detecting changes between fMRI sessions: a new mapper?

Thu Sep 18 11:14:05 UTC 2008

Some observations below, inline.

Yaroslav Halchenko wrote:
> So I see 2 questions here actually:
>
> 1. technical -- how to combine features of two (or more datasets) into 1
> dataset. For now we have some way to ease the life of a user:
> MetaDataset...

Unfortunately I have no real clue on the technical side, currently.
Your ideas seems fine to me :)

>
> 2. neuroscientific -- how to contrast two sessions... That is imho more
>    complicated. I had a limited experience with 1 dataset where
>    different conditions were collected in different runs... so the
>    problem was to remove such 'run-specificity' so clf don't rely on
>    simple misalignment or trend to classify samples. apparently detrend
>    + zscoring helped -- at the end sensitivity maps were quite sensible.
>

Good point. In the experiment example that I was briefly describing,
each session is indeed made of just one run. But differences across runs in
a multi-run experiment are expected to be less (or better, somewhat
different) from differences across sessions weeks far from each other.

>    In Emanuele's example situation is more interesting, although could
>    be done indeed just simple way (in addition to suggested): double
>    number of labels -- ie. dog1 (for session 1) dog2(for session 2).
>    Do regular analysis, and then look at the sensitivity maps between
>    dog1-vs-dog2. If dog category wasn't trained for at home -- I hope
>    that generalization and sensitivity would be at-chance. If it was
>    trained at home -- should be sensible...
>
>    noone would know which one is the best way to proceed unless tried
>    them all I guess ;-)
>

Unfortunately, for the specific experiment I was mentioning,
each picture is shown no more than once per session. So
you can't label samples with "dog" or "dog1,dog2", since there is
at most just 1 sample per label (per session). Just to give some
figure, there is a pool of - say - 100 different pictures. 60-80
ot them are drawn at random and shown once in the first session to
the subject. Then you train at home on half of them (maybe plus
some others not use before) and then, during the second session,
the pictures of the first session are presented again (maybe in
different order).

So the only useful labelling you can use here seems to be "trained"
and "not trained", without taking into account the information
of which picture was actually shown in each sample. Actually
the semantic content of the picture is an information that can
be thrown away once the dataset is built.

A naive (crazy?) but maybe interesting idea to prepare a dataset
for a classifier could be to defines ROIs using a known atlas
(say Harvard-Oxford cortical atlas ;) ). Then average all voxels
in a ROI to get a "per ROI" average BOLD
signal. So you reduce 40k voxels to ~50 features/ROIs . Then
subtract average BOLD of the same ROI between the two session,
to get the differences across sessions (maybe z-scored or else).
Since you mapped data on a standard atlas you can even numpy.vstack()
different subjects together and get something like this:

sample | deltaROI1 | delatROI2 | ... | deltaROIN | label
--------------------------------------------------------------
S1dog  |           |           | ... |           | trained
--------------------------------------------------------------
S1house|           |           | ... |           | trained
--------------------------------------------------------------
S1cat  |           |           | ... |           | not trained
--------------------------------------------------------------
S1tree |           |           | ... |           | not trained
--------------------------------------------------------------
S1...  |           |           | ... |           | trained
--------------------------------------------------------------
S2dog  |           |           | ... |           | not trained
--------------------------------------------------------------
S2...  |           |           | ... |           | trained
--------------------------------------------------------------
SNdog  |           |           | ... |           | trained
--------------------------------------------------------------
SN...  |           |           | ... |           | not trained
--------------------------------------------------------------
(Sn means subject 'n')

which, given the figures above, is a manageable dataset with 60x10
samples (assuming N=10 subjects) with just ~50 features, one per ROI.
A classifier trained on this dataset (assuming that averages don't
destroy all relevant information :D ) in case of success would
tell if "trained/untrained" are predictable class-label and which ROIs
are relevant for the prediction, and (maybe naively :) ) which
ROIs are related to neuroplasticity for this task.

Ciao,

Emanuele