[pymvpa] GNBSearchlight below/above chance accuracy ... again

basile pinsard basile.pinsard at gmail.com
Fri Jul 22 18:39:49 UTC 2016

Hi PyMVPA community,

I wanted to have an advice on a problem I have using PyMVPA.
My pipeline includes a Searchlight on BOLD data, for which I used the
optimized GNBSearchlight because I plan to run ~100 permutations to perform
statistical testing and it is the only one offering reasonable processing
time (or maybe the optimized KNN).

I have 2 classes x 8 samples for each (1 sample per chunk), the partitioner
(thanks @Yaroslav) I use is:
prtnr_2fold_factpart = FactorialPartitioner(
this way I repeatedly take out 2 samples of each of the 2 classes for
testing and train on the remaining 2x6 samples, 'equidistant' allows all
the samples to be tested approximately the same number of time, thus being
equally represented in the final accuracy score.

The problem is that the distribution of accuracy in searchlight map is very
wide with significantly below-chance classification, and the results are
very variable across scans/subjects.

So what I did to check if there was any problem in the analysis was to
replace my BOLD signal with random data from normal distribution, thus
removing any potential temporal dependency (even if the design was using
DeBruijn cycles for balancing carry-over effects) that could also result
from GLM (GLM-LSS, Mumford 2012), detrending or else.

Results, I get some accuracy from ~10% to ~90%, far below above chance
expected by normal approximation to binomial distribution (25-75%).
It seems that either from the design, pipeline or algorithm the information
is found by chance in the random data.

I took the neighborhood of where I got these results and ran a
cross-validation using the same partitioner but with GNB, LinearCSVMC, LDA.
GNB gives the same accuracy, so this is not the optimized GNBSearchlight
that causes this
LinearCSVMC and LDA gives about chance (50%) accuracy for the same

This can be reproduced by creating a random dataset from scratch with 2
classes and randomly selecting some features:
for i in range(1000)])
for i in range(1000)])

So is there something specific to GNB that gives this kind of lucky
overfitting of random data when use many times as in Searchlight?
Also as this lucky features are included in multiple overlapping
neighborhood it results in nice blobs in the searchlight which sizes
depends on radius.
I tried the GNB with and without common_variance (thus piecewise quadratic
or linear) and it is quite similar.
Does anybody have been using it to produce sensible results?
Maybe it work better with more that 2 classes.

LDA when applied to more features than samples is incredibly slow, thus is
unrealistic for searchlight and even more with permutation testing, but I
have seen it used in many papers (maybe not with permutation though), so i
wonder if it is PyMVPA algorithm, or my python setup.
Do you think an optimized LDA searchlight would be possible or there is
lengthy computation (eg: matrix inversion) that cannot be factorized?

Otherwise what kind of classifier would you recommend, that would not be
too computationally intensive? Or maybe I have to deal with that?

Many thanks for any idea about that.

Basile Pinsard

*PhD candidate, *
Laboratoire d'Imagerie Biomédicale, UMR S 1146 / UMR 7371, Sorbonne
Universités, UPMC, INSERM, CNRS
*Brain-Cognition-Behaviour Doctoral School **, *ED3C*, *UPMC, Sorbonne
Biomedical Sciences Doctoral School, Faculty of Medicine, Université de
CRIUGM, Université de Montréal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20160722/817599ec/attachment.html>

More information about the Pkg-ExpPsy-PyMVPA mailing list