[pymvpa] Peculiar case of cross-validation performance

Wed Feb 1 19:46:12 UTC 2017

Thanks Yarik, for the ideas! In fact, it is not that I am interested too
much in this particular case. I was curious in principle what is the reason
for that. So, as far I understand from I deal with a suboptimal
classification. I thought that maybe cross-validation is a more
conservative way to evaluate the difference.

Per your suggestions:
1. LDA gives similar to SVM result (low prediction)
2. I  make z-scoring of the data (since it is one dimension, there only one
possibility). But when I drop any preprocessing the prediction is even
worse.
3. I make classification of the raw volumes (no GLM).
4. t-test is significant when I just compare the data that go into
classification. In other words, for each subject I save one data point per
condition and then do t-test between conditions at group level.

I am using matlab, so sharing the code will be messy.

On Wed, Feb 1, 2017 at 4:49 PM, Yaroslav Halchenko <debian at onerussian.com>
wrote:

>
> On Wed, 01 Feb 2017, Vadim Axelrod wrote:
>
> > I have encountered a rather  peculiar scenario.  My experiment consists
> of
> > two conditions and I have ~40 subjects. A standard group-level
> > random-effects analysis t-contrast resulted in a very significant cluster
> > (p<0.001, cluster size correction p<0.05). Now, for this cluster and for
> > the same data for each subject I run SVM classification between two
> > conditions (leave one session out cross-validation). Classification is
> done
> > with one dimension (average response of ROI). So, for each subject I get
> a
> > hit-rate which I submit to group t-test vs. 0.5. The surprising thing is
> > that even despite a clear double-dipping, I fail to get beyond chance
> > significant decoding in this cluster (hit-rate is only ~0.51-0.52). How
> > this can be? Because classification does not care about direction of a
> > difference, I thought that it should be always more sensitive than a
> > directional comparison of activations. Thinking and simulating lead me to
> > think that it is probably not a bug. Consider an extreme case, that in
> > every one of my subjects condition_1  is slightly above condtion_2. Group
> > t-test will show highly significant difference. But in individual
> > classification, my prediction will fluctuate around 0.5 with a slight
> above
> > 0.5 bias, that would not be enough to reach significance above 0.5 at
> group
> > level. Indeed, in case of my ROI, for 90% of the subjects the difference
> > was in one direction.  Does all this make sense to you? If so, what does
> it
> > tell us about reliability of a standard random-effects group level
> analysis.
>
> There could be multiple reasons I guess:
>
> - choice of the classifier.
>
> SVM is a non-parametric (doesn't have a model of the distributions) thus
> somewhat more flexible and might be the least sensitive to univariate
> effects...   To match closer assumptions of univariate GLM/t-test
> you could try GNB (on that one feature probably should be similar to
> LDA?) or mean-one-nearest-neighbor (probably the best detector if your
> signal is normally distributed with equal variance), and see if you get
> more consistent "significant"  excursion.
>
> - Ideally then you should also get an effect in "random effects" analysis
> while classifying across subjects (since you are using just mean of ROI
> within subject, you could easily carry it out across subjects).
>
> - another related question is -- do you see significant t-test result when
> doing t-test against 0 on that difference between conditions on average
> ROI?  may be somehow averaging also annihilates the effect
>
> - also preprocessing:
>     - what do you take for classification (per trial betas or ...?)
>     - how you z-score (if you do) within each subject
>     might introduce noise across your samples for classification (e.g.
>     if you had just few samples for each condition within each run and
>     zscored within each run based on those only few samples)
>
> So the best might be -- if you share a dataset with the extract of
> that ROI before doing any pre-processing and then complete analysis
> script -- then interested parties could play with it to determine the
> underlying culprit.
>
> --
> Yaroslav O. Halchenko
> Center for Open Neuroscience     http://centerforopenneuroscience.org
> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
> WWW:   http://www.linkedin.com/in/yarik
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20170201/ea01fdca/attachment.html>