[pymvpa] null classification performance in the presence of strong univariate signal??

Tue Sep 9 12:52:01 UTC 2014

Hi all, 

and thanks for the interesting discussion. I don't know if what I am proposing would also be interesting, but, I am more to Nick's opinion, after clarifying some issues (regardless of the z-scoring issue which I think is the major issue here). So, here are the odds:

1- GLM-beta(first-level, yielding 19 volumes), then, GLM(second level) [Typical univariate analysis]
2- GLM-beta(first-level, yielding 19 volumes), then, SearchLight(second level) [David's approach]
3- SearchLight-weights (first-level, within subject, 19 volumes), then, SearchLight(second level) [this might not work well, I am not sure though]
4- SearchLight-weights (first-level, within subject, yielding 19 volumes), then, GLM (second level)

To do group-level SearchLight analysis, maybe option 4 would better off.

All the best,
-Rawi

On Monday, September 8, 2014 11:22 PM, David Soto <d.soto.b at gmail.com> wrote:

>
>
>Hi Jo, thanks... indeed I aimed to do the leave 
one-out crossvalidation so that the classifier is trained to 
differentiate A from B on the PEs of 18 subjects and then tested in the 
remaining 18th subject........
>
>I do this by using the following code which I think is all that is needed
>
>clf = LinearNuSVMC()
>partitioner=NFoldPartitioner() 
>cv=CrossValidation(clf, partitioner)
>sl=sphere_searchlight(cv, radius=3, space='voxel_indices', postproc=mean_sample())
>
>
>When I plot the searlightmaps what I see is a random
>distribution centered around 45-50 accuracy and the blobs are scattered throughout the brain- (see pic attached)
>
>
>Becos the input 4D nii file  contains a contatenation of 19 PEs in condition A and 19 PEs in condition B (across the 19 subjects), as Nick mentioned 
it is not possible to zscore by chunck (here the subjects) however the 
same noisy pattern is found when I zscore by condition (A & B) or by the whole group of images....
>
>
>So again this null MVPA is in the presence of strong univariate signal in a t-test comparing the PEs in condition A and B across the 19 
subjects....
>
>
>Something seems wrong but just can find what....
>The 4D image with the concatenated PEs across conditions and subjects is fine, as using the same input
>file I can run a t-test in FSL using the command line and get a nice frontoparietal cluster....
>
>
>the PyMVPA code seems also OK to me...I'll try to give a go to the other troubleshouting ideas you suggested. Still I think is pretty weird the null that am getting - all seems fine ..below is the output for targets and chunks
>cheers, david
>ds.targets
>Out[2]: 
>array(['cued', 'cued', 'cued', 'cued', 'cued', 'cued', 'cued', 'cued',
>       'cued', 'cued', 'cued', 'cued', 'cued', 'cued', 'cued', 'cued',
>       'cued', 'cued', 'cued', 'uncued', 'uncued', 'uncued', 'uncued',
>       'uncued', 'uncued', 'uncued', 'uncued', 'uncued', 'uncued',
>       'uncued', 'uncued', 'uncued', 'uncued', 'uncued', 'uncued',
>       'uncued', 'uncued', 'uncued'], 
>      dtype='|S6')
>
>ds.chunks
>Out[3]: 
>array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
>        11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,   0.,   1.,   2.,
>         3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,  13.,
>        14.,  15.,  16.,  17.,  18.])
>
>
>On Mon, Sep 8, 2014 at 3:02 PM, J.A. Etzel <jetzel at artsci.wustl.edu> wrote:
>
>
>>On 9/8/2014 4:31 AM, Nick Oosterhof wrote:
>>
>>On Sep 8, 2014, at 11:21 AM, David Soto <d.soto.b at gmail.com> wrote:
>>>If I understand correctly, you have two samples per subject (one for
>>>each condition), and each value for a chunk corresponds to one
>>>subject. With those parameters you would be doing between-subject
>>>classification. Are you sure that is what you want?
>>>
>>>I'm asking, because /almost/ all MVPA (hyperalignment (TM) being an
>>>exception) are doing within-subject analysis. If you have not done so
>>>yet, I suggest strongly to do within-subject analysis first before
>>>trying between-subject analysis.
>>>
>>I disagree a bit about needing to start with within-subjects analysis: this strikes me as a situation in which the between-subjects analysis should work quite well. The strong mass-univariate results suggests there is a consistent difference in the estimates across the people (higher values in condition A than B), and the between-subjects analysis should pick this up; particularly a searchlight analysis (particularly since it only uses a few voxels and so is in some ways closer to a mass-univariate analysis), and particularly with a linear classifier (which is sensitive to consistent differences in individual voxels).
>>
>>I wouldn't bother with the kNN classifier during the troubleshooting, but just stick with the linear SVM (kNN tend to be unpredictably bad with fMRI data). I'm out of practice reading pyMVPA code, but assume you set it up as leave-one-subject-out cross-validation (train on 18 people, test on 1)? Since each person contributes 1 parameter estimate image to the dataset, the classes should be perfectly balanced in all training and testing sets (which is good).
>>
>>What sort of pattern are you seeing after the searchlight analysis? Everything more-or-less at chance, or something else?
>>
>>My first guess is that there is some sort of bug in the analysis code or parameter estimate image labeling/extracting; my second guess is that the normalizing went wrong, or removed your signal (do you see rings or 'doughnuts' around GLM peak areas in the searchlight maps?).
>>
>>For troubleshooting, you could try picking a few voxels with very strong (and not) effects in the GLM, then plotting the values from the parameter estimate images for those same voxels. You should see a strong difference between the two classes in the plots for the 'good' (strong GLM effect) voxels, and less of a difference at the 'bad' (weak GLM effect) voxels. Depending on how everything was set up, you could fit a linear model on the individual voxels to match against the original GLM, and run linear SVM on those individual voxels as well: the accuracies should be consistent with the GLM values (ie better accuracy on 'good' voxels).
>>
>>good luck,
>>Jo
>>
>>
>>
>>_______________________________________________
>>>
> Pkg-ExpPsy-PyMVPA mailing list
>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>>
>>http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>>
>>>
>>>-- 
>>Joset A. Etzel, Ph.D.
>>Research Analyst
>>Cognitive Control & Psychopathology Lab
>>Washington University in St. Louis
>>http://mvpa.blogspot.com/
>>
>>
>>_______________________________________________
>>Pkg-ExpPsy-PyMVPA mailing list
>>Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>>http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>
>
>
>-- 
>
>http://www1.imperial.ac.uk/medicine/people/d.soto/
>
>
>
>_______________________________________________
>Pkg-ExpPsy-PyMVPA mailing list
>Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
>