[pymvpa] RFE question 2.0
James M. Hughes
james.m.hughes at Dartmouth.EDU
Thu Nov 20 01:37:39 UTC 2008
On Nov 19, 2008, at 5:19 PM, Yaroslav Halchenko wrote:
> the thing is that you should use
> Splitclassifier not around your basic SVM but around that
> FeatureSelectionClassifier -- look at the source of
> mvpa.clfs.warehouse.
> RFE classifiers are commented out but there -- look at the one which
> starts with SplitClassifier
Ok, I found this section of warehouse.py and am currently testing it
out, but I still have a question:
can I use CVTE w/ the SplitClassifier(FeatureSelectionClassifier(RFE))
combo? Otherwise it's irritating to have to come up w/ the data
splits myself, then call train and test on the split classifier...
Also, I'm not entirely sure whether I understand the role of
FeatureSelectionClassifier here (and I think this may have been part
of my problem to begin with) -- is this the *same* classifier we'll
use to perform generalization? Or is it *just* there for feature
selection? The way I currently understand it, the split classifier is
passing the FeatSelClassifier some training data used to perform
feature selection (in this case RFE), and then it's testing on the
held-out exemplars for that split. If this is the case, then how is
the internal splitting happening for RFE to be statistically valid?
(i.e, that the training data is split as well to get a stopping set
for RFE)
Here's my code now (note that I'm not sure if CVTE is really
appropriate here, but I'm still confused about the functionality) --
there are so many levels!!!
def do_rfe(dataset, percent):
debug.active = ['CLF']
rfesvm = LinearCSVMC()
FeatureSelection = RFE(sensitivity_analyzer=OneWayAnova(),
transfer_error=TransferError(rfesvm), \
feature_selector=FractionTailSelector(percent / 100.0,
mode='select', tail='upper'), update_sensitivity=False)
clf = FeatureSelectionClassifier(clf = LinearCSVMC(), \
# on features selected via RFE
feature_selection = FeatureSelection,
enable_states=['confusion'])
# update sensitivity at each step (since we're not using the
same CLF as sensitivity analyzer)
#clf.states.enable('feature_ids')
split_clf = SplitClassifier(clf=clf, enable_states=['confusion'])
split_clf.states.enable('feature_ids')
cv = CrossValidatedTransferError(TransferError(split_clf),
NFoldSplitter(cvtype=1), enable_states=['confusion'])
error = cv(dataset)
print 'Error: ' + `error`
return split_clf.confusion, split_clf.feature_ids
Many thanks for any input!
Cheers,
James.
More information about the Pkg-ExpPsy-PyMVPA
mailing list