[pymvpa] FeatureSelectionClassifier (in RFE) occasionally returns full features set
Vadim Axel
axel.vadim at gmail.com
Sat Apr 25 08:10:42 UTC 2009
Hi,
I implemented some simple RFE logic, similar to what was described here:
http://www.pymvpa.org/featsel.html
At the end of the classification procedure, I verify the the features that
were selected based on what was described here:
http://www.pymvpa.org/faq.html#how-do-i-know-which-features-were-finally-selected-by-a-classifier-doing-feature-selection
Now the problem: sometimes the resulted number of selected features is the
exact number, which is required (I use FixedNElementTailSelector), whereas
in some other case, for completely unknown reason, I get full set of
features. The issue is really weired, since for two sessions of a subject I
get selected feature set, but for two other sessions of the same subject I
get full feature set. I suspect, that the problem might be in updating the
feature_ids variable and not with classification, because the classification
error rate was pretty low.
Attached my code. Is it any problem with it?
I can also upload my dataset (~50 Mb zip). I didn't succeed to reproduce it
with smaller amount of data.
Thanks for your help,
Vadim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20090425/8701ef18/attachment.htm>
-------------- next part --------------
from mvpa.suite import *
import os
#data dir
data_dir = "Absolute path to where all the data (volumes, mask and design) is stored"
ROI_mask = "F_31_-70_1"
data_file_feature_select = "do_ns_07_09";
attr_file_feature_select = 'fosdesign_3_4_split_2.txt'
# load PyMVPA example dataset
attr_feature_select = SampleAttributes(os.path.join(data_dir, attr_file_feature_select))
dataset_feature_select = NiftiDataset(samples=os.path.join(data_dir, data_file_feature_select),
labels=attr_feature_select.labels,
chunks=attr_feature_select.chunks,
mask=os.path.join(data_dir, ROI_mask))
#dataset_feature_select = MaskedDataset(samples=N.random.normal(size=(480,5,5,5)),
# labels=attr_feature_select.labels)
# do chunkswise linear detrending on dataset
detrend(dataset_feature_select, perchunk=True, model='linear')
# zscore dataset relative to baseline ('rest') mean
zscore(dataset_feature_select, perchunk=True, baselinelabels=[0],
targetdtype='float32')
# select class 1 and 2 for this demo analysis
# would work with full datasets (just a little slower)
dataset_feature_select = dataset_feature_select.selectSamples(
N.array([l in [1,2] for l in dataset_feature_select.labels],
dtype='bool'))
N_FEATURES = 30
rfesvm_split = SplitClassifier(LinearCSVMC())
clf = \
FeatureSelectionClassifier(
clf = LinearCSVMC(),
# on features selected via RFE
feature_selection = RFE(
# based on sensitivity of a clf which does
# splitting internally
sensitivity_analyzer=rfesvm_split.getSensitivityAnalyzer(),
transfer_error=ConfusionBasedError(
rfesvm_split,
confusion_state="confusion"),
# and whose internal error we use
feature_selector=FixedNElementTailSelector(N_FEATURES, tail='upper', mode='select'),
#feature_selector=FractionTailSelector(
#0.2, mode='discard', tail='lower'),
# remove 20% of features at each step
update_sensitivity=True),
# update sensitivity at each step
descr='LinSVM+RFE(splits_avg)',
enable_states = ['feature_ids'])
clf.train(dataset_feature_select)
print clf.feature_ids.size
#cv = CrossValidatedTransferError(
#TransferError(clf),
#OddEvenSplitter(),
#enable_states=['confusion'],
#harvest_attribs=['transerror.clf.getSensitivityAnalyzer(force_training=False)()'])
## and run it
#error = cv(dataset_feature_select)
#print clf.feature_ids.size
More information about the Pkg-ExpPsy-PyMVPA
mailing list