[pymvpa] Train and test on different classes from a dataset

Fri Feb 1 20:41:22 UTC 2013

Thanks Michael,

It is a nice example showing that we better permute  testing set
as well.  Otherwise, the stronger our signal is -- worse we become in
detecting it (kinda awkward isn't it?).

e.g. with 2 chunks and SNR 30.00 (i.e. where signal is obviously
there and classification on unpermutted labels probably is stable 100%)
-- we would not be able to tell if that is "significant" if we maintain
original order in the testing set while doing permutation testing.

now (if you are already asleep, I will try doing it later) - just extend
it into the power (or ROC detection) analysis -- what we are actually
after here.  Add SNR=0.00 -- collect both types of samples and
describe each plot with its power to detect the true signal using both
kinds of permutation.   Ideally your power should grow with SNR ;)

On Fri, 01 Feb 2013, Michael Hanke wrote:

> On Thu, Jan 31, 2013 at 02:13:14PM -0600, J.A. Etzel wrote:
> > Why do you say in the tutorial that "Doing a whole-dataset
> > permutation is a common mistake ..." ? I don't see that permuting
> > the test set labels hurts the inter-sample dependencies ... won't I
> > still have (say) 5 A and 5 B in my test set?

> I am attaching some code and a figure. This is a modified version of

> http://pymvpa.org/examples/permutation_test.html

> I ran 24 permutation analysis for 12 combinations of number of
> chunks/runs/... and SNR. In the figure you can see MC sample histograms
> for all these combinations (always using 200 permutations). The greenish
> bars represent the permutation results from permuting both training and
> testing portion of the data (note that only within chunk permutation was
> done -- although this should have no effect on this data). The blueish
> histogram is the same analysis but only the training set has been
> permuted (I can't think of any good reason why one would only permute the
> testing set -- except for speed ;-).

> The input data is pure noise, plus a bit of univariate signal (according
> to SNR) added to two of three features. In all simulations there are 200
> samples in the dataset, but either grouped in 2, 3 or 5 chunks.

> I am using the SNR parameter in this simulation as a way to increase
> within category similarity. In a real dataset inter-sample similarity
> could have many reasons, of course.

> The dashed line shows the theoretical chance performance at 0.5, the red
> line the empirical performance for the unpermuted dataset.

> Now tell me that it doesn't make a difference what portion of the data
> you permute ;-) Depending on the actual number of chunks and data
> consistency the "permutability" of the dataset varies quite a bit -- but
> this is only reflected in the distributions when the testing portion is
> not permuted as well. For example, look at the upper right (high sample
> similarity, smallish training portion), in a significant portion of all
> permutations the training dataset isn't "properly" permuted at all
> (within category label swapping), in the other extreme case the labels
> are swapped entirely between categories. This can happen with small
> datasets and large chunks -- however, the green histogram doesn't tell me
> about it, at all.

> [BTW sorry for the poor quality of the figure, but I was hoping to be
>  gentle to the listserver. If you run the attached code, it will generate
>  a more beautiful one]

> Please point me to any conceptual of technical mistake you can think of
> -- this topic comes up frequently, the more critical feedback the better...

> Cheers,

> Michael
-- 
Yaroslav O. Halchenko
Postdoctoral Fellow,   Department of Psychological and Brain Sciences
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik