[pymvpa] On below-chance classification (Anti-learning, encore)
Jacob Itzhacki
jitzhacki at gmail.com
Thu Jan 31 20:00:42 UTC 2013
You all have my gratitude for offering your (precious!) insight.
I will try to post on what workaround I come up with.
Cheers,
Jacob
On Thu, Jan 31, 2013 at 8:44 PM, J.A. Etzel <jetzel at artsci.wustl.edu> wrote:
> First, I'll echo Rawi's permutation suggestion. In "typical" fMRI MVPA the
> permutation distribution should be more-or-less normal in shape and
> centered on chance. Being centered on chance, some accuracies will be below
> chance, but I would expect relatively few to be below .4 (or above .6).
> Assuming the permutation distribution is centered on chance, doing a
> one-sided permutation test (real accuracy greater than chance) will result
> in a p-value of MORE than 0.5, so you won't get below-chance accuracies
> coming out significant, even if the real value is in the extreme left tail
> of the distribution.
>
> People do these differently; I showed a bit how I often set things up at
> http://mvpa.blogspot.com/2012/**09/permutation-testing-one-**
> person-only.html<http://mvpa.blogspot.com/2012/09/permutation-testing-one-person-only.html>, which also has some null distributions.
>
> My strategy when running into below-chance accuracies (to be clear, I
> don't worry about accuracies unless they're below .45 or so for two-class
> classification) is to check for errors (like mislabeling examples or runs),
> then change the classification parameters (particularly the
> cross-validation scheme), then perhaps the temporal compression.
>
> I've always been able to either find an error or alternative scheme that
> eliminates the troublesome below-chance accuracies. I'm certainly not
> suggesting that you should try every possible combination - too many
> experimenter degrees of freedom is a very dangerous thing in terms of false
> positives. But avoiding extreme imbalance in the datasets can really help
> (for example, if you have 16 runs but only a few examples in each run you
> might have much better results by leaving four runs out at a time instead
> of just one; if you have thousands of voxels but ten examples, using fewer
> features can help).
>
> Another question: how many subjects are showing below-chance
> classification? One or two out of twenty is probably unsurprising (and
> usually is considered significant at the group level if the other 18 people
> classify well).
>
> If most all subjects are below-chance I'd dig hard for errors and
> imbalance rather than make interpretations. You might have to get creative;
> sometimes tests like making random number datafiles then running them
> through your procedure can turn up subtle problems (like if one class is
> getting scaled differently than the other).
>
> good luck,
> Jo
>
> PS - If you have a simulation strategy for creating fMRI-like datasets
> with below-chance accuracy I'd be curious to see it.
>
>
>
> On 1/31/2013 7:29 AM, Jacob Itzhacki wrote:
>
>> Dear Rawi and fellow PyMVPAers,
>>
>> Thanks for your prompt response. Apologies once again for the
>> difficulties I adscribe to finding this counterintuitive.
>>
>> That said, I have considered your suggestion and I have a couple of
>> questions regarding it:
>> - First off, what to do about about significant (p<0.01) classifications
>> that hover around chance level? In the case of 4 way cross validations
>> (25% chance) there is a (seemingly) much improved chance that
>> significance threshold is reached even as classification hovers or is
>> exactly chance level.
>> - Would we be able to treat the differring significance spectrum as
>> individual datapoints or would it have to be a dicotomic statistic (eg.
>> p<0.01, yes or no?)?
>>
>>
>> Moreover, going back to the original question, is it safe to say that in
>> a below chance classification performance, even though the classifier is
>> seemingly doing the opposite of what we are expecting, it is actually
>> "learning" and hence there was information to learn from?
>>
>> Regards,
>>
>> Jacob Itzhacki
>>
>> On Thu, Jan 31, 2013 at 10:35 AM, MS Al-Rawi <rawi707 at yahoo.com
>> <mailto:rawi707 at yahoo.com>> wrote:
>>
>> I believe below-chance accuracy is a natural phenomenon in
>> classification theorem. This issue is obvious when one finds the
>> permutation distribution of some problem, where in a two-category
>> problem, the distribution has a peek around 50% accuracy, so there
>> will always be some (or, a lot of) below-chance values. This case is
>> more likely to happen when the dataset has few samples, and probably
>> high dimensional data. I am not sure if any procedure to relabel the
>> data, or any other fine-tuned algorithm would be considered as
>> 'tweaking the results'.
>>
>> >My question is this: How much (statistical?) merit would it be to
>> come with some sort of index to show how much a given classification
>> accuracy is off from absolute chance for this classification?
>>
>> A p-value via permutation testing is the better candidate to answer
>> this question, eg, p<0.01.
>>
>> Regards,
>> -Rawi
>>
>>
>> >_____________________________**___
>> > From: Jacob Itzhacki <jitzhacki at gmail.com
>> <mailto:jitzhacki at gmail.com>>
>> >To: pkg-exppsy-pymvpa at lists.**alioth.debian.org<pkg-exppsy-pymvpa at lists.alioth.debian.org>
>> <mailto:pkg-exppsy-pymvpa@**lists.alioth.debian.org<pkg-exppsy-pymvpa at lists.alioth.debian.org>
>> >
>>
>> >Sent: Thursday, January 31, 2013 9:20 AM
>> >Subject: [pymvpa] On below-chance classification (Anti-learning,
>> encore)
>> >
>> >
>> >Dear all,
>> >
>> >
>> >First off, pardon me if anything of what I say might already be
>> described somewhere else, I've done quite a bit of searching and
>> reading on the subject (eg. including Dr. Kowalczyks lecture) but it
>> is always possible to have bypassed something in this internet age.
>> After reading as much as I could about the problem I've noticed that
>> the workarounds proposed don't really fix the problem, which I am
>> facing quite a bit, to the point that around 1/3 of classifications
>> are below classification accuracy (38-42% for 2way or 17%-19% for
>> 4-way). I would like to have some feedback on an idea I've had to
>> try to still have this data be useful.
>> >
>> >
>> >My question is this: How much (statistical?) merit would it be to
>> come with some sort of index to show how much a given classification
>> accuracy is off from absolute chance for this classification?
>> >
>> >
>> >Elaborating, it would be displaying the absolute value of the
>> substraction of the resulting accuracy from chance level. Say, for a
>> 2-way classification (with 50% chance level), in which you obtain
>> accuracies of 38% and 62% in 2 different instances the difference
>> from chance for both would be 12% which would make them equivalent.
>> >
>> >
>> >Please offer as much criticism as you can to this approach.
>> >
>> >
>> >Thanks in advance,
>> >
>> >
>> >Jacob
>> >
>> >
>> >
>> >
>> >PS. For completions sake, I'll first list the things I've tried.
>> >
>> >
>> >I'm running the classification on fMRI data obtained from a
>> paradigm that gives the following classification opportunities:
>> >
>> >
>> >a. 4 categories, with 40 trials each at its fullest use (160 trials)
>> >b. 2 categories as one yielding a classification of 80 trials for
>> each, by including two categories as one.
>> >c. 2 categories, with 40 trials each, by disregarding 2 of the
>> conditions.
>> >
>> >
>> >I am also using a total of 8 different ROI.
>> >
>> >
>> >I have tried reordering the trials on one of the subjects, however
>> this results in above chance accuracies in one analysis and below in
>> the other for the same ROI which gets rather frustrating if I wanted
>> to do some sort of averaging by the end. However, there seems to be
>> some consistency into which classification moves away from chance
>> which leads me once again to believe that there is in fact some
>> learning even in the below-chance classifications but the seeming
>> anti-learning baffles me. What does it mean?! (And how is it even
>> possible? O.o)
>> >
>> >
>> >Thanks again.
>>
>
>
>>
> --
> Joset A. Etzel, Ph.D.
> Research Analyst
> Cognitive Control & Psychopathology Lab
> Washington University in St. Louis
> http://mvpa.blogspot.com/
>
>
> ______________________________**_________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.**alioth.debian.org<Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org>
> http://lists.alioth.debian.**org/cgi-bin/mailman/listinfo/**
> pkg-exppsy-pymvpa<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20130131/a183be1b/attachment-0001.html>
More information about the Pkg-ExpPsy-PyMVPA
mailing list