[pymvpa] Peculiar case of cross-validation performance

Wed Feb 1 14:49:13 UTC 2017

On Wed, 01 Feb 2017, Vadim Axelrod wrote:

> I have encountered a rather  peculiar scenario.  My experiment consists of
> two conditions and I have ~40 subjects. A standard group-level
> random-effects analysis t-contrast resulted in a very significant cluster
> (p<0.001, cluster size correction p<0.05). Now, for this cluster and for
> the same data for each subject I run SVM classification between two
> conditions (leave one session out cross-validation). Classification is done
> with one dimension (average response of ROI). So, for each subject I get a
> hit-rate which I submit to group t-test vs. 0.5. The surprising thing is
> that even despite a clear double-dipping, I fail to get beyond chance
> significant decoding in this cluster (hit-rate is only ~0.51-0.52). How
> this can be? Because classification does not care about direction of a
> difference, I thought that it should be always more sensitive than a
> directional comparison of activations. Thinking and simulating lead me to
> think that it is probably not a bug. Consider an extreme case, that in
> every one of my subjects condition_1  is slightly above condtion_2. Group
> t-test will show highly significant difference. But in individual
> classification, my prediction will fluctuate around 0.5 with a slight above
> 0.5 bias, that would not be enough to reach significance above 0.5 at group
> level. Indeed, in case of my ROI, for 90% of the subjects the difference
> was in one direction.  Does all this make sense to you? If so, what does it
> tell us about reliability of a standard random-effects group level analysis.

There could be multiple reasons I guess:

- choice of the classifier.

SVM is a non-parametric (doesn't have a model of the distributions) thus
somewhat more flexible and might be the least sensitive to univariate
effects...   To match closer assumptions of univariate GLM/t-test
you could try GNB (on that one feature probably should be similar to
LDA?) or mean-one-nearest-neighbor (probably the best detector if your
signal is normally distributed with equal variance), and see if you get
more consistent "significant"  excursion.

- Ideally then you should also get an effect in "random effects" analysis
while classifying across subjects (since you are using just mean of ROI
within subject, you could easily carry it out across subjects).

- another related question is -- do you see significant t-test result when
doing t-test against 0 on that difference between conditions on average
ROI?  may be somehow averaging also annihilates the effect

- also preprocessing:
    - what do you take for classification (per trial betas or ...?)
    - how you z-score (if you do) within each subject
    might introduce noise across your samples for classification (e.g.
    if you had just few samples for each condition within each run and
    zscored within each run based on those only few samples) 

So the best might be -- if you share a dataset with the extract of
that ROI before doing any pre-processing and then complete analysis
script -- then interested parties could play with it to determine the
underlying culprit.

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik