[pymvpa] GNBSearchlight below/above chance accuracy ... again

Mon Aug 1 18:07:49 UTC 2016

it seems my email never got through moderation.
=]

On Tue, Jul 26, 2016 at 10:30 AM, basile pinsard <basile.pinsard at gmail.com>
wrote:

> Hi Richard, Yaroslav,
>
> Thanks for your replies.
>
>
> On Sat, Jul 23, 2016 at 9:49 AM, Yaroslav Halchenko <debian at onerussian.com
> > wrote:
>
>>
>> On Fri, 22 Jul 2016, basile pinsard wrote:
>>
>> > Hi PyMVPA community,
>>
>> > I wanted to have an advice on a problem I have using PyMVPA.
>> > My pipeline includes a Searchlight on BOLD data, for which I used the
>> > optimized GNBSearchlight because I plan to run ~100 permutations to
>> perform
>> > statistical testing and it is the only one offering reasonable
>> processing
>> > time (or maybe the optimized KNN).
>>
>> > I have 2 classes x 8 samples for each (1 sample per chunk), the
>> partitioner
>> > (thanks @Yaroslav) I use is:
>> > prtnr_2fold_factpart = FactorialPartitioner(
>> >     NFoldPartitioner(cvtype=2,attr='chunks'),
>> >     attr='targets',
>> >     selection_strategy='equidistant',
>> >     count=32)
>> > this way I repeatedly take out 2 samples of each of the 2 classes for
>> > testing and train on the remaining 2x6 samples, 'equidistant' allows all
>> > the samples to be tested approximately the same number of time, thus
>> being
>> > equally represented in the final accuracy score.
>>
>> > The problem is that the distribution of accuracy in searchlight map is
>> very
>> > wide with significantly below-chance classification, and the results are
>> > very variable across scans/subjects.
>>
>> first let's make sure that we we looking at the correct thing -- default
>> errorfx=mean_mismatch_error i.e. it is an error (lower is better ;) ),
>> not accuracy
>>
>
> yes the measure is mean_match_accuracy, so higher is better
>
>
>>
>> > So what I did to check if there was any problem in the analysis was to
>> > replace my BOLD signal with random data from normal distribution, thus
>> > removing any potential temporal dependency (even if the design was using
>> > DeBruijn cycles for balancing carry-over effects) that could also result
>> > from GLM (GLM-LSS, Mumford 2012), detrending or else.
>>
>> > Results, I get some accuracy from ~10% to ~90%, far below above chance
>> > expected by normal approximation to binomial distribution (25-75%).
>>
>> as already mentioned, parametric assumption might not apply here since
>> trials might be all dependent and depending on what distribution you are
>> looking at.  e.g. if looking at distribution of results across
>> searchlights, not across random data samples for the same dataset/roi,
>> you should expect lots of dependencies across different spatial
>> locations.  Even if all data is randomly generated, neighboring
>> searchlights would use the same voxels... although should matter less in
>> case of random data.
>>
>
> Yes, random data no have longer temporal dependency, nor spatial
> dependency except for the radius of the searchlight causing the voxel being
> used multiple times.
> The way I interpret this is that with 2 classes x 8 samples by drawing
> random voxel data a large number of time, it can happen by chance that
> their values are correlated to targets.
> By using this voxel multiple times in neighboring searchlight, this
> accuracy is repeated, creating blobs of high accuracy, and inflate the
> accuracy distribution tails.
> Certainly, having more classes and/or more samples, random structured
> noise is less likely to appear, thus random accuracy distribution should be
> lower.
>
> It seems to depend on the classifier, some of them allow one feature/voxel
> to outweight the other.
> But the high classification on random data can be obtained also with
> LinearCSVMC and LDA, but not at the same location.
> Using LDA at a searchlight where GNB found spuriously high accuracy, it
> often results around chance accuracy.
> However I modified LDA code to add regularization that is often performed
> in MVPA (adding a percentage of the trace to the diagonal of cov matrix,
> allowing more independence between voxels), then it again produces high
> accuracy value.
>
> I also ran a small simulation on random data, running cross-validation on
> subset of the features repeatedly with different classifiers (GNB,
> LinearCSVMC, kNN) and with variable number of samples (2*8,2*16,2*32), and
> variable size of "searchlight" taking 16,32 or 64 random features. All
> classifier have a comparable distribution, Linear SVM is slightly wider
> than the 2 others.
> For the same set of features, the accuracy is correlated between
> classifiers, but it can still vary a lot.
> Having more samples reduces accuracy distribution, so it seems that it is
> a low-sample size issue.
> Varying the size of searchlight does not changes the distribution much, so
> curse of dimensionality does not seems to be the problem here.
>
>
>> > It seems that either from the design, pipeline or algorithm the
>> information
>> > is found by chance in the random data.
>>
>> > I took the neighborhood of where I got these results and ran a
>> > cross-validation using the same partitioner but with GNB, LinearCSVMC,
>> LDA.
>> > GNB gives the same accuracy, so this is not the optimized GNBSearchlight
>> > that causes this
>> > LinearCSVMC and LDA gives about chance (50%) accuracy for the same
>> > neighborhood.
>>
>> that is on real data, right?  how did you pre-process it?
>>
>
> No this is on random data from normal distribution.
> About my preprocessing for my real data, it is similar to HCP data, with
> 'grayordinates' for cortex surfaces and subcortical ROI voxels.
> I tried multiple detrending techniques.
> I also tried multiple GLM estimation.
>
>
>> could you share the plots of distributions (just curious)
>>
> find enclosed distributions of searchlight accuracy maps across subjects.
> As you can see they are not always perfectly centered around chance, so
> there is a bias in the data/mvpa that I cannot understand.
>
> So the question that remain is what is significant in a searchlight map,
> when random data can create a blob the size of the searchlight radius as
> high as 90% accuracy?
> Maybe the 2-class design is to be avoided, or the number of samples should
> be proportional to the number of features we use.
>
> Thanks.
>
>
>
>
>> > This can be reproduced by creating a random dataset from scratch with 2
>> > classes and randomly selecting some features:
>> > ds_rand2=dataset_wizard(
>> >      np.random.normal(size=(16,10000)),
>> >      targets=[0,1]*8,
>> >      chunks=np.arange(16))
>> > cvte=CrossValidation(
>> >     GNB(common_variance=True),
>> >     prtnr_2fold_factpart,
>> >     errorfx=mean_match_accuracy)
>> >
>> np.max([cvte(ds_rand2[:,np.random.randint(0,ds_rand2.nfeatures,size=64)]).samples.mean()
>> > for i in range(1000)])
>> > 0.8828125
>> >
>> np.min([cvte(ds_rand2[:,np.random.randint(0,ds_rand2.nfeatures,size=64)]).samples.mean()
>> > for i in range(1000)])
>> > 0.1484375
>>
>> > So is there something specific to GNB that gives this kind of lucky
>> > overfitting of random data when use many times as in Searchlight?
>> > Also as this lucky features are included in multiple overlapping
>> > neighborhood it results in nice blobs in the searchlight which sizes
>> > depends on radius.
>> > I tried the GNB with and without common_variance (thus piecewise
>> quadratic
>> > or linear) and it is quite similar.
>> > Does anybody have been using it to produce sensible results?
>> > Maybe it work better with more that 2 classes.
>>
>> > LDA when applied to more features than samples is incredibly slow, thus
>> is
>> > unrealistic for searchlight and even more with permutation testing, but
>> I
>> > have seen it used in many papers (maybe not with permutation though),
>> so i
>> > wonder if it is PyMVPA algorithm, or my python setup.
>> > Do you think an optimized LDA searchlight would be possible or there is
>> > lengthy computation (eg: matrix inversion) that cannot be factorized?
>>
>> try first that m1nn as Richard recommended.  It is simple but powerful
>> if your data distribution matches its assumptions which are really close
>> to the one behind LDA ;)*p*
>>
>> > Otherwise what kind of classifier would you recommend, that would not be
>> > too computationally intensive? Or maybe I have to deal with that?
>>
>> > Many thanks for any idea about that.
>> --
>> Yaroslav O. Halchenko
>> Center for Open Neuroscience     http://centerforopenneuroscience.org
>> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
>> Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
>> WWW:   http://www.linkedin.com/in/yarik
>>
>> _______________________________________________
>> Pkg-ExpPsy-PyMVPA mailing list
>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>
>
>
>
> --
> Basile Pinsard
>
> *PhD candidate, *
> Laboratoire d'Imagerie Biomédicale, UMR S 1146 / UMR 7371, Sorbonne
> Universités, UPMC, INSERM, CNRS
> *Brain-Cognition-Behaviour Doctoral School **, *ED3C*, *UPMC, Sorbonne
> Universités
> Biomedical Sciences Doctoral School, Faculty of Medicine, Université de
> Montréal
> CRIUGM, Université de Montréal
>

-- 
Basile Pinsard

*PhD candidate, *
Laboratoire d'Imagerie Biomédicale, UMR S 1146 / UMR 7371, Sorbonne
Universités, UPMC, INSERM, CNRS
*Brain-Cognition-Behaviour Doctoral School **, *ED3C*, *UPMC, Sorbonne
Universités
Biomedical Sciences Doctoral School, Faculty of Medicine, Université de
Montréal
CRIUGM, Université de Montréal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20160801/3fe4b826/attachment.html>