[pymvpa] Interpreting mis-classification

Fri Sep 11 14:55:21 UTC 2009

I'm not sure that is the case (if I understand your question correctly).

In my experience I have accuracies on permuted-label data sets (true 
data, only the labels permuted, maintaining run, subject, etc. 
structure) ranging from the upper 40s to low 50s; I don't think I've 
ever seen an (averaged across subjects) average below 45% or above 55% 
(or even near). It's been awhile, but I've looked at the permutation 
test results in individual subjects as well, and still found highly 
significant below-chance accuracy.

If permuted-label data produce a near-50% accuracy (quite near 50% mean; 
distribution nicely overlapping 50% and assuming a decent number of 
permutations were performed), it does not strike me as a situation where 
a chance performance estimation of 50% was optimistic.

What I meant by data/labeling errors were simple errors. I was once 
shocked to find a set of exceptionally accuracies (< 10%), which ended 
up being due to a coding error which affected the labels.

Lately I've been asking most everyone I meet who does multivariate 
analyses of fMRI data if they've run into below-chance accuracies, and 
if so, what they ended up doing about it. So far my impression is that 
this is very common, and there is no cure-all or clear path on what to 
do. Any other impressions/views?

Jo

Yaroslav Halchenko wrote:
> On Fri, 11 Sep 2009, Jo Etzel wrote:
>> In the other cases I have not been able to find any errors (though
>> of course they may still exist!). I sometimes find classifications
>> in the range of 30-45% (balanced two-class) in some subjects, while
>> other subjects are classifying above chance. I have tried various
>> types of scaling, partitioning, and classifiers, but have not had
>> much luck; often accuracies stay below chance regardless.
> 
> So, once again, may be it is that 'chance' performance estimation is too
> "optimistic"? or, as you pointed out, data/labeling is not compliant
> with the taken assumption for chance performance estimation?
>