[pymvpa] conceptual and coding questions about fMRI MVPA example

Fri Oct 31 13:19:16 UTC 2014

Hi all, 

Sorry for the long text, but I'm in need of help and some clarification. Unfortunately, neither the tutorial, the forum nor the examples were able to help me. I would be happy to summarize all the provided answers and knowledge in a nice and complete fMRI example. 

You can find the whole script that I'm using as an IPython notebook here: http://nbviewer.ipython.org/urls/dl.dropbox.com/s/hpfdw9et24vaakj/pymvpa_searchlight_haxby.ipynb. The data I'm using is provided as a download in the script. 

Conceptual Questions: 
--------------------- 

1. Classifiers vs. Searchlight: 
My biggest confusion is about the difference between a Classifier and a Searchlight and how to cross-validate them. A classifier takes a dataset and learns for each feature a weight to optimally distinguish between labels. The accuracy of this distinction can be examined by doing a cross-validation, visualized by a confusion matrix. In PyMVPA, Searchlights belong to the module measures. They are a way to repeatedly apply a classifier to a restricted area (the searchlight). Like this the learning of the feature-weights at a given location are also influenced by its neighboring features. This would therefore lead to a map of weights, i.e. a classifier in itself that we could do a cross-validation on. Or is this wrong?
Is the searchlight only a way to calculate a classifier's accuracy in a restricted area, i.e. a measure to do a local dependent cross-validation? Is there no classifier that learns its weights depending to its neighbors? Or if a Searchlight creates weight-maps and therefore has classifier characteristics, how can one cross-validate a Searchlight? 

2. Whole Brain vs. Feature Selection: 
Computation time besides, a whole brain approach does not mean a better analysis, just because we consider more data points. It is rather the opposite, because an increase of feature means an increase in possible weights, which leads to an increased chance of over fitting the model. The goal therefore is to have as few feature as possible while containing a high accuracy. Is that correct? 

3. Location of label specific patterns: 
If I'm interested in finding out where specific patterns are located (such as here http://nilearn.github.io/auto_examples/plot_haxby_masks.html#example-plot-haxby-masks-py), the way to go are binary classifications, correct? Meaning, if I want to find the location for each label and their overlap, I always need some kind of a comparison to a baseline, correct? Without a baseline, e.g. scrambledpix, an interpretation would be difficult. So I would train a classifier on 'Baseline', 'House' and 'Face' and than look at the weights of 'House vs. Baseline' and 'Face vs. Baseline' and see where they overlap. And this is exactly the purpose of the Sensitivity measure, get_sensitivity_analyzer(), correct? 

Coding Questions: 
----------------- 

1. Correct Cross Validation with Feature Selection: (see notebook-cell 12) 
I'm not sure if my Method2 or Method3 is the correct way to do a cross validation of a classifier after feature selection was applied. If I understand it correctly, Method2 trains the classifier on the reduced 500 features but cross validates on all features, while Method3 also does the cross validation only on the 500 features. If that is the case, it means that Method3 is the wrong approach, and an instance of double-dipping, correct? 

2. How can I specify center_ids with GNBSearchlight or Searchlight: (see notebook-cell 17) 
How can I tell Searchlight or GNBSearchlight to only consider specific features, e.g. the "relevant voxels defined by the feature selection. I suppose that I have to pass such a list to an IndexQueryEngine parameter, but can't figure out how. 

3. sphere_gnbsearchlight vs. GNBSearchlight: (see notebook-cell 17) 
What is the difference between sphere_gnbsearchlight vs. GNBSearchlight? I know that the second is a class/object, while the first is a function that creates multiple instances of GNBSearchlight. The first is a "shortcut", while the second is the full version. But what are the differences, advantages and disadvantages? 

4. Execution Time: (see notebook-cell 18+19) 
Is there a directer way to access the training time of a classifier or searchlight algorithm? Or is the execution time only indirectly available as a debugging information? 

5. Multicore GNB: (see notebook-cell 17) 
Is it possible to run sphere_gnbsearchlight or GNBSearchlight in parallel? 

6. Transform feature selected Map back to NIfTI: (see notebook-cell 25) 
If I have a dataset consisting only of the selected top 500 features (in notebook called ds_fs) or if I have the searchlight output trained on this reduced dataset, how can I transform this data back into the original "brains space" and save it as a NIfTI? Currently, I receive the following error: shape mismatch: value array of shape (1,500) could not be broadcast to indexing result of shape (23984,1). I know that the way to go is to use the mapper correctly, but cant get it done. 

7. RepeatedMeasure vs. SplitClassifier: (see notebook-cell 28+29) 
It seems that both approaches of calculating the sensitivity maps shown in cell 28 and 29 lead to the exact same results. Why use one or the other? 

8. Combining sensitivity maps of each fold: (see notebook-cell 30+31) 
If I've calculated the sensitivity maps as shown in cell 28 or 29, how can I "sum up" or average the sensitivity maps of all folds? I suppose one way to go is the get_mapped(maxofabs_sample()) function, as shown in cell 31. But how can it be applied so that it takes the maximums within a binary comparison and not within a specific fold. 

9. Cross Validation of Searchlight: (see notebook-cell 32-34) 
Depending on the answer to my 1st conceptual question, if I want to do a Searchlight cross-validation, how can I do that? I tried to implement the code as mentioned in the forum (https://www.mail-archive.com/pkg-exppsy-pymvpa@lists.alioth.debian.org/msg02558.html) but was not able to get it to run. See error in cell 34. 

Thank you for reading. Any help is much appreciated.
- Michael 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20141031/0bdf75cb/attachment.html>