[pymvpa] Train and test on different classes from a dataset

Tue Feb 5 08:11:38 UTC 2013

On Mon, Feb 04, 2013 at 07:37:51PM -0500, Yaroslav Halchenko wrote:
> I have tortured your script a bit to bootstrap multiple cases and add
> plotting ROCs (even cheated and used scikit-learn for that since we
> apparently need overhaul of ROCCurve).  As you see, keeping
> testing  portion intact results in lower detection power

Thanks for the update!

Lower detection power? Do you mean the 4% difference from the
theoretical maximum? I can live with that. And that is because it is,
conceptually, a quite different test IMHO.

Let us ignore the cross-validation case for a second and focus on two
datasets for simplicity (although it doesn't change things).

Why are we doing permutation analysis? Because we want to know how
likely it is to observe a specific prediction performance on a
particular dataset under the H0 hypothesis, i.e. how good can a
classifier get at predicting our empirical data when the training
did not contain the signal of interest -- aka chance performance.

We assume that both training and testing dataset are rather similar --
generated by random sampling from the underlying common signal
distribution. If we permute the training dataset, we hope that this will
destroy the signal of interest. Hence if we train a classifier on
permuted datasets it will, in the long run, not have learned what we are
interested in -- no signal: H0.

For all these classifiers trained on permuted data we want to know how
well they can discriminate our empirical data (aka testing data) -- more
precisely the pristine testing data. Because, in general, we do _not_
want to know how well a classifier trained on no signal can discriminate
any other dataset with no signal (aka permuted testing data).

It is true that, depending on the structure of the testing data,
permuting the testing data will destroy the contained signal with
varying degrees of efficiency. But again, we want to answer the question
how well a classifier could perform by chance when predicting our actual
empirical test data -- not any dataset that could be generated from it
by degrading the potentially contained signal of interest.

Michael

-- 
Michael Hanke
http://mih.voxindeserto.de