[pymvpa] RFE question 2.0

Yaroslav Halchenko debian at onerussian.com
Wed Nov 19 15:15:50 UTC 2008


Sorry for the delay

Well -- you indeed hit a bug, which I could treat to be a feature ;-)

What you were trying to do is to cheat (may be non-intentionally due
to somewhat difficult to comprehend parametrization/use of RFE) ;-)

What was happening: RFE tried to determine stopping point by using a
testing dataset of a split generated within
CrossValidatedTransferError. But CrossValidatedTransferError was not
providing such a testing dataset, since that would result in a
'cherry-picking'. I provided a patch to 'close the eyes' and perform
such analysis within CrossValidatedTransferError (let it be CVTE
below), but I am not sure if actually it is a good idea. To explicitly
allow such a scenario I added an argument to CVTE: expose_testdataset
which needs to be set to True (see added unittest based on your example)

To do RFE in unbiased way with your scenario, in clfs.warehouse (where
I guess you could took RFE sample for you code), sample classifier
with RFE is wrapped within SplitClassifier, so within each outer
training split, it does splitting again to determine stopping point in
an unbiased way. And then SplitClassifier itself makes a decision by
voting across multiple classifiers (1 per each inner split).

The negative side-effect -- such scenario results in the exponential
increase of computing time, (ie if both CrossValidatedTransferError
and SplitClassifier it runs on use NFoldSplitter, then you would get
training of N * N-1 classifiers using RFE... it would take a lot of
time. To overcome such a problem you can use some other splitter for
SplitClassifier (e.g. OddEvenSplitter) or just limit number of splits
produced (some recently added feature so is present only in git, but
will be in the upcoming release) with additional parameter to the
splitter 'count' (e.g. count=2 to get only 2 splits out of
NFoldSplitter)

Fixes are pushed into git -- if you need particular patches -- let me
know, although you can easily get them from my branch

but I would advise you to refactor your setup to don't
"cheat" ;)

Alternatively you can come up with some other strategy for stopping
criterion to be used, but then we might need to extend RFE code to
"fit it" in the framework.

I hope my explanation is comprehendable and helpful. Let me know

On Mon, 17 Nov 2008, James M. Hughes wrote:

> Sorry to spam the list, but is is possible that there's a problem  
> using RFE w/ CrossValidatedTransferError?
-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik        



More information about the Pkg-ExpPsy-PyMVPA mailing list