[pymvpa] On below-chance classification (Anti-learning, encore)

J.A. Etzel jetzel at artsci.wustl.edu
Thu Jan 31 19:44:14 UTC 2013

First, I'll echo Rawi's permutation suggestion. In "typical" fMRI MVPA 
the permutation distribution should be more-or-less normal in shape and 
centered on chance. Being centered on chance, some accuracies will be 
below chance, but I would expect relatively few to be below .4 (or above 
.6). Assuming the permutation distribution is centered on chance, doing 
a one-sided permutation test (real accuracy greater than chance) will 
result in a p-value of MORE than 0.5, so you won't get below-chance 
accuracies coming out significant, even if the real value is in the 
extreme left tail of the distribution.

People do these differently; I showed a bit how I often set things up at 
http://mvpa.blogspot.com/2012/09/permutation-testing-one-person-only.html , 
which also has some null distributions.

My strategy when running into below-chance accuracies (to be clear, I 
don't worry about accuracies unless they're below .45 or so for 
two-class classification) is to check for errors (like mislabeling 
examples or runs), then change the classification parameters 
(particularly the cross-validation scheme), then perhaps the temporal 

I've always been able to either find an error or alternative scheme that 
eliminates the troublesome below-chance accuracies. I'm certainly not 
suggesting that you should try every possible combination - too many 
experimenter degrees of freedom is a very dangerous thing in terms of 
false positives. But avoiding extreme imbalance in the datasets can 
really help (for example, if you have 16 runs but only a few examples in 
each run you might have much better results by leaving four runs out at 
a time instead of just one; if you have thousands of voxels but ten 
examples, using fewer features can help).

Another question: how many subjects are showing below-chance 
classification? One or two out of twenty is probably unsurprising (and 
usually is considered significant at the group level if the other 18 
people classify well).

If most all subjects are below-chance I'd dig hard for errors and 
imbalance rather than make interpretations. You might have to get 
creative; sometimes tests like making random number datafiles then 
running them through your procedure can turn up subtle problems (like if 
one class is getting scaled differently than the other).

good luck,

PS - If you have a simulation strategy for creating fMRI-like datasets 
with below-chance accuracy I'd be curious to see it.

On 1/31/2013 7:29 AM, Jacob Itzhacki wrote:
> Dear Rawi and fellow PyMVPAers,
> Thanks for your prompt response. Apologies once again for the
> difficulties I adscribe to finding this counterintuitive.
> That said, I have considered your suggestion and I have a couple of
> questions regarding it:
> - First off, what to do about about significant (p<0.01) classifications
> that hover around chance level? In the case of 4 way cross validations
> (25% chance) there is a (seemingly) much improved chance that
> significance threshold is reached even as classification hovers or is
> exactly chance level.
> - Would we be able to treat the differring significance spectrum as
> individual datapoints or would it have to be a dicotomic statistic (eg.
> p<0.01, yes or no?)?
> Moreover, going back to the original question, is it safe to say that in
> a below chance classification performance, even though the classifier is
> seemingly doing the opposite of what we are expecting, it is actually
> "learning" and hence there was information to learn from?
> Regards,
> Jacob Itzhacki
> On Thu, Jan 31, 2013 at 10:35 AM, MS Al-Rawi <rawi707 at yahoo.com
> <mailto:rawi707 at yahoo.com>> wrote:
>     I believe below-chance accuracy is a natural phenomenon in
>     classification theorem. This issue is obvious when one finds the
>     permutation distribution of some problem, where in a two-category
>     problem, the distribution has a peek around 50% accuracy, so there
>     will always be some (or, a lot of) below-chance values. This case is
>     more likely to happen when the dataset has few samples, and probably
>     high dimensional data. I am not sure if any procedure to relabel the
>     data, or any other fine-tuned algorithm would be considered as
>     'tweaking the results'.
>      >My question is this: How much (statistical?) merit would it be to
>     come with some sort of index to show how much a given classification
>     accuracy is off from absolute chance for this classification?
>     A p-value via permutation testing is the better candidate to answer
>     this question, eg, p<0.01.
>     Regards,
>     -Rawi
>      >________________________________
>      > From: Jacob Itzhacki <jitzhacki at gmail.com
>     <mailto:jitzhacki at gmail.com>>
>      >To: pkg-exppsy-pymvpa at lists.alioth.debian.org
>     <mailto:pkg-exppsy-pymvpa at lists.alioth.debian.org>
>      >Sent: Thursday, January 31, 2013 9:20 AM
>      >Subject: [pymvpa] On below-chance classification (Anti-learning,
>     encore)
>      >
>      >
>      >Dear all,
>      >
>      >
>      >First off, pardon me if anything of what I say might already be
>     described somewhere else, I've done quite a bit of searching and
>     reading on the subject (eg. including Dr. Kowalczyks lecture) but it
>     is always possible to have bypassed something in this internet age.
>     After reading as much as I could about the problem I've noticed that
>     the workarounds proposed don't really fix the problem, which I am
>     facing quite a bit, to the point that around 1/3 of classifications
>     are below classification accuracy (38-42% for 2way or 17%-19% for
>     4-way). I would like to have some feedback on an idea I've had to
>     try to still have this data be useful.
>      >
>      >
>      >My question is this: How much (statistical?) merit would it be to
>     come with some sort of index to show how much a given classification
>     accuracy is off from absolute chance for this classification?
>      >
>      >
>      >Elaborating, it would be displaying the absolute value of the
>     substraction of the resulting accuracy from chance level. Say, for a
>     2-way classification (with 50% chance level), in which you obtain
>     accuracies of 38% and 62% in 2 different instances the difference
>     from chance for both would be 12% which would make them equivalent.
>      >
>      >
>      >Please offer as much criticism as you can to this approach.
>      >
>      >
>      >Thanks in advance,
>      >
>      >
>      >Jacob
>      >
>      >
>      >
>      >
>      >PS. For completions sake, I'll first list the things I've tried.
>      >
>      >
>      >I'm running the classification on fMRI data obtained from a
>     paradigm that gives the following classification opportunities:
>      >
>      >
>      >a. 4 categories, with 40 trials each at its fullest use (160 trials)
>      >b. 2 categories as one yielding a classification of 80 trials for
>     each, by including two categories as one.
>      >c. 2 categories, with 40 trials each, by disregarding 2 of the
>     conditions.
>      >
>      >
>      >I am also using a total of 8 different ROI.
>      >
>      >
>      >I have tried reordering the trials on one of the subjects, however
>     this results in above chance accuracies in one analysis and below in
>     the other for the same ROI which gets rather frustrating if I wanted
>     to do some sort of averaging by the end. However, there seems to be
>     some consistency into which classification moves away from chance
>     which leads me once again to believe that there is in fact some
>     learning even in the below-chance classifications but the seeming
>     anti-learning baffles me. What does it mean?! (And how is it even
>     possible? O.o)
>      >
>      >
>      >Thanks again.


Joset A. Etzel, Ph.D.
Research Analyst
Cognitive Control & Psychopathology Lab
Washington University in St. Louis

More information about the Pkg-ExpPsy-PyMVPA mailing list