[pymvpa] Using FeatureSelectionClassifier for feature elmination
James M. Hughes
james.m.hughes at Dartmouth.EDU
Wed Jul 16 18:58:35 UTC 2008
HI all,
Continuing a conversation started yesterday, I have a couple of
questions about performing RFE using a FeatureSelectionClassifier.
I'm aware that if I have a dataset I need to split it into 3 parts;
however, for me there stil exists some confusion about the role each
of those parts is supposed to play. I'd like to try to tackle this
question first, then move on to specifics for PyMVPA's implementation.
In my original understanding, the three-way dataset split was as
follows:
(1) Split for RFE (half of which is used to train the classifier, the
other half to determine accuracy to pick features)
(2) Split for training the classifier (with a subset of the features)
(3) Split for testing the trained classifier
However, it would seem (by Yaroslav's email) that in fact we need
these three splits:
(1) Split for RFE training
(2) Split for RFE validation
(3) Split for testing (i.e. generalization)
Partly my confusion comes from the fact that we would seem to be using
the same split (1) to both train a classifier to select features and
to use as the classifier for testing (i.e. generalization). Is this a
valid approach?
Finally, with respect to PyMVPA, if we use a
FeatureSelectionClassifier (and presumably use it in the traditional
way Classifier objects are used), then what meaning do clf.train/
clf.predict have.
As I understand it, we can provide a testdataset to the
FeatureSelectionClassifier, which results in the following:
first, we initialize a classifier of type FeatureSelectionClassifier,
along with split (2) above as the testdataset. Then, we call
clf.train on split (1) above, then clf.predict on split (3) above, in
order to perform generalization. I assume that, since we are using a
FeatureSelectionClassifier, it automatically selects the correct
features during generalization that were selected during training.
Do I have the right understanding on this?
Sorry for the confusion -- but I'd be very happy to write some of the
documentation for this stuff if/when I get a better handle on it.
Thanks,
James.
More information about the Pkg-ExpPsy-PyMVPA
mailing list