[pymvpa] On below-chance classification (Anti-learning, encore)

Thu Jan 31 13:29:29 UTC 2013

Dear Rawi and fellow PyMVPAers,

Thanks for your prompt response. Apologies once again for the difficulties
I adscribe to finding this counterintuitive.

That said, I have considered your suggestion and I have a couple of
questions regarding it:
- First off, what to do about about significant (p<0.01) classifications
that hover around chance level? In the case of 4 way cross validations (25%
chance) there is a (seemingly) much improved chance that significance
threshold is reached even as classification hovers or is exactly chance
level.
- Would we be able to treat the differring significance spectrum as
individual datapoints or would it have to be a dicotomic statistic (eg.
p<0.01, yes or no?)?

Moreover, going back to the original question, is it safe to say that in a
below chance classification performance, even though the classifier is
seemingly doing the opposite of what we are expecting, it is actually
"learning" and hence there was information to learn from?

Regards,

Jacob Itzhacki

On Thu, Jan 31, 2013 at 10:35 AM, MS Al-Rawi <rawi707 at yahoo.com> wrote:

> I believe below-chance accuracy is a natural phenomenon in classification
> theorem. This issue is obvious when one finds the permutation distribution
> of some problem, where in a two-category problem, the distribution has a
> peek around 50% accuracy, so there will always be some (or, a lot of)
> below-chance values. This case is more likely to happen when the dataset
> has few samples, and probably high dimensional data. I am not sure if any
> procedure to relabel the data, or any other fine-tuned algorithm would be
> considered as 'tweaking the results'.
>
> >My question is this: How much (statistical?) merit would it be to come
> with some sort of index to show how much a given classification accuracy is
> off from absolute chance for this classification?
>
> A p-value via permutation testing is the better candidate to answer this
> question, eg, p<0.01.
>
> Regards,
> -Rawi
>
>
> >________________________________
> > From: Jacob Itzhacki <jitzhacki at gmail.com>
> >To: pkg-exppsy-pymvpa at lists.alioth.debian.org
> >Sent: Thursday, January 31, 2013 9:20 AM
> >Subject: [pymvpa] On below-chance classification (Anti-learning, encore)
> >
> >
> >Dear all,
> >
> >
> >First off, pardon me if anything of what I say might already be described
> somewhere else, I've done quite a bit of searching and reading on the
> subject (eg. including Dr. Kowalczyks lecture) but it is always possible to
> have bypassed something in this internet age. After reading as much as I
> could about the problem I've noticed that the workarounds proposed don't
> really fix the problem, which I am facing quite a bit, to the point that
> around 1/3 of classifications are below classification accuracy (38-42% for
> 2way or 17%-19% for 4-way). I would like to have some feedback on an idea
> I've had to try to still have this data be useful.
> >
> >
> >My question is this: How much (statistical?) merit would it be to come
> with some sort of index to show how much a given classification accuracy is
> off from absolute chance for this classification?
> >
> >
> >Elaborating, it would be displaying the absolute value of the
> substraction of the resulting accuracy from chance level. Say, for a 2-way
> classification (with 50% chance level), in which you obtain accuracies of
> 38% and 62% in 2 different instances the difference from chance for both
> would be 12% which would make them equivalent.
> >
> >
> >Please offer as much criticism as you can to this approach.
> >
> >
> >Thanks in advance,
> >
> >
> >Jacob
> >
> >
> >
> >
> >PS. For completions sake, I'll first list the things I've tried.
> >
> >
> >I'm running the classification on fMRI data obtained from a paradigm that
> gives the following classification opportunities:
> >
> >
> >a. 4 categories, with 40 trials each at its fullest use (160 trials)
> >b. 2 categories as one yielding a classification of 80 trials for each,
> by including two categories as one.
> >c. 2 categories, with 40 trials each, by disregarding 2 of the conditions.
> >
> >
> >I am also using a total of 8 different ROI.
> >
> >
> >I have tried reordering the trials on one of the subjects, however this
> results in above chance accuracies in one analysis and below in the other
> for the same ROI which gets rather frustrating if I wanted to do some sort
> of averaging by the end. However, there seems to be some consistency into
> which classification moves away from chance which leads me once again to
> believe that there is in fact some learning even in the below-chance
> classifications but the seeming anti-learning baffles me. What does it
> mean?! (And how is it even possible? O.o)
> >
> >
> >Thanks again.
> >_______________________________________________
> >Pkg-ExpPsy-PyMVPA mailing list
> >Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> >http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
> >
> >
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20130131/4d60bfc6/attachment.html>