[pymvpa] Using FeatureSelectionClassifier for feature elmination

James M. Hughes james.m.hughes at Dartmouth.EDU
Wed Jul 16 18:58:35 UTC 2008


HI all,

Continuing a conversation started yesterday, I have a couple of  
questions about performing RFE using a FeatureSelectionClassifier.

I'm aware that if I have a dataset I need to split it into 3 parts;  
however, for me there stil exists some confusion about the role each  
of those parts is supposed to play.  I'd like to try to tackle this  
question first, then move on to specifics for PyMVPA's implementation.

In my original understanding, the three-way dataset split was as  
follows:

(1) Split for RFE (half of which is used to train the classifier, the  
other half to determine accuracy to pick features)
(2) Split for training the classifier (with a subset of the features)
(3) Split for testing the trained classifier

However, it would seem (by Yaroslav's email) that in fact we need  
these three splits:
(1) Split for RFE training
(2) Split for RFE validation
(3) Split for testing (i.e. generalization)

Partly my confusion comes from the fact that we would seem to be using  
the same split (1) to both train a classifier to select features and  
to use as the classifier for testing (i.e. generalization).  Is this a  
valid approach?

Finally, with respect to PyMVPA, if we use a  
FeatureSelectionClassifier (and presumably use it in the traditional  
way Classifier objects are used), then what meaning do clf.train/ 
clf.predict have.

As I understand it, we can provide a testdataset to the  
FeatureSelectionClassifier, which results in the following:

first, we initialize a classifier of type FeatureSelectionClassifier,  
along with split (2) above as the testdataset.  Then, we call  
clf.train on split (1) above, then clf.predict on split (3) above, in  
order to perform generalization.  I assume that, since we are using a  
FeatureSelectionClassifier, it automatically selects the correct  
features during generalization that were selected during training.

Do I have the right understanding on this?

Sorry for the confusion -- but I'd be very happy to write some of the  
documentation for this stuff if/when I get a better handle on it.

Thanks,
James.



More information about the Pkg-ExpPsy-PyMVPA mailing list