[pymvpa] Binary classification

Tue Mar 7 13:37:43 UTC 2017

> On 7 Mar 2017, at 14:23, jasi3k <jasi3k at gmail.com> wrote:
> 
> Dear Users,
> 
> I am running searchlight analysis with NFoldPartitioner on my fMRI beta data (5 different targets, 4 chunks). I would like to perform a binary classification between my first target and the rest of my data. Would it make sense to change the name of 4 remaining targets to (i.e. „other”) in the attributes file and run the searchlight analysis? 

That should work, though it's probably easier to first load your dataset and set the sample attributes. For this particular analysis you could then change the sample attributes to a common label. If you were then to change the analysis you would only have to change the last part.

> 
> I would expect that accuracy higher than 0.5 would mean that voxels contribute to the classification task even though targets marked as "other" are different.
> 
> Would this be correct?

Not necessarily, as a classifier could use the frequency of the class to make a prediction. Consider a class-counting classifier that ignores all neural data; it just counts how often each target occurs in the training set and uses that as as prediction for all samples in the test set. In this case, all elements in the test set would be marked as other. If 80% of the samples in the test set are "other", then accuracy would be 80% - for a classifier that does not even consider the neural data. 

To avoid this problem you may have to balance the number of targets for each class. However with only 4 chunks this does not give much power to detect any effects, even if there is a difference between the first and the other targets. Maybe you could use a partitioning scheme where, for every fold, you select a random subset of the other targets, with the number equal to the number of samples corresponding to the first target. Alternatively you could consider some split-half correlation approach.