[pymvpa] Train and test on different classes from a dataset
J.A. Etzel
jetzel at artsci.wustl.edu
Tue Feb 5 14:27:12 UTC 2013
If you are doing some sort of cross-modal or cross-day analysis (e.g.
training on data collected under one set of conditions and testing on
data collected under another set of conditions) I agree that only
permuting the training data can be quite sensible.
But when we only have a single dataset (such as do a and b vary in this
set of trials, partitioning on the runs) I don't fully agree with this
characterization:
> For all these classifiers trained on permuted data we want to know how
> well they can discriminate our empirical data (aka testing data) -- more
> precisely the pristine testing data. Because, in general, we do _not_
> want to know how well a classifier trained on no signal can discriminate
> any other dataset with no signal (aka permuted testing data).
Perhaps the difference is that I tend to think of each set of
cross-validations as the unit that should be permuted. For example,
suppose I have four runs and I'm partitioning on the runs. The
true-labeled accuracy is thus computed by averaging the four accuracies
(test on the first run, test on the second run, ...).
My instinct is that the permutations should follow that pattern: ONE
permutation mean should come from permuting the labels, doing the
cross-validation, and averaging over the folds (the four runs, in this
case). In other words, to keep the linking between the cross-validation
folds in the permutation test (e.g. training on runs 1-3 will likely be
somewhat similar to training on runs 1,2,4) we need to permute the
*entire* dataset at once (all four runs), not just the training (three
runs at a time).
I hope this argument is somewhat clear; I'd like to make a picture and
example but have to get something else done first. Unfortunately I
haven't been able to dig into the code you've sent yet either; hopefully
tomorrow.
Jo
On 2/5/2013 7:33 AM, Francisco Pereira wrote:
> I'm catching up with this long thread and all I can say is I fully
> concur with Michael, in particular:
>
> On Tue, Feb 5, 2013 at 3:11 AM, Michael Hanke <mih at debian.org> wrote:
>>
>> Why are we doing permutation analysis? Because we want to know how
>> likely it is to observe a specific prediction performance on a
>> particular dataset under the H0 hypothesis, i.e. how good can a
>> classifier get at predicting our empirical data when the training
>> did not contain the signal of interest -- aka chance performance.
>
> Permuting the test set might make sense, perhaps, if you wanted to
> make a statement about the result variability over all possible test
> sets of that size if H0 was true.
>
> Francisco
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
--
Joset A. Etzel, Ph.D.
Research Analyst
Cognitive Control & Psychopathology Lab
Washington University in St. Louis
http://mvpa.blogspot.com/
More information about the Pkg-ExpPsy-PyMVPA
mailing list