[pymvpa] RFE problem w/ Multi-Class SVM classifier
Mark Lescroart
LESCROAR at USC.EDU
Sat Nov 21 02:11:20 UTC 2009
Hello,
I'd like perform some feature selection (using Recursive Feature
Elimination) on a data set I'm analyzing, but I haven't been able to
make it work.
I could not find any full example of how to use (rather than just
create and/or train) a FeatureSelectionClassifier; I think a full
example would be useful. The one example in the documentation showing
how to train a FeatureSelectionClassifier did it by calling
clf.train(dataset)
... and then calling dataset.selectFeatures(clf.feature_ids)
This didn't work for me (see the code and errors below). I was working
with a different classifier (linear SVM multi-class instead of kNN),
and I was working with a slightly different data set (masked data set
loaded from a Matlab matrix), but it seems that the same principles
should apply. What am I doing wrong?
I suspect my problem may have something to do with the (bug?) that I
wrote to you about previously (http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2009q4/000806.html
)
To review, the function clf.getSensitivityAnalyzer(), rather than
combining feature sensitivities across comparisons of the data (this
is a multi-class classifier), was combining across features. Thus I
got 3 sensitivity values (for the comparisons of 1vs2, 1vs3, and 2vs3)
rather than 649 values (1 per feature (voxel) in my data set). I was
able to read out the feature sensitivities by calling
clf.getSensitivityAnalyzer(transformer=None,combiner=None),
but now it seems like the RFE algorithm needs a correct combiner to
work. I could not find any documentation on other arguments to provide
besides "None" (combiner=??).
Help? Any idea what's going on?
The code I'm using and the error messages I get are provided below.
Thanks (again) for your time,
Mark
~~~~~~~~~~~~~
from scipy.io import loadmat
from mvpa.suite import *
DatFile = 'WholeBrainMatFile.mat' # 4-D .mat file of 2x2x2 voxels -
440x80x60x69
MaskFile = 'ROI_Mask.mat' # Contains a mask for 649 voxels in the
Lateral Occipital area
AttrFile = 'ConditionLabels.txt'
D = loadmat(DatFile)
Data = D['Data']
M = loadmat(MaskFile)
MaskMat = M['Mask']
attr = SampleAttributes(AttrFile)
# create masked data set
PyDat =
MaskedDataset
(samples=Data,labels=attr.labels,chunks=attr.chunks,mask=MaskMat)
zscore(PyDat,perchunk=True,targetdtype='float32')
# PyDat is: <Dataset / float32 440 x 649 uniq: 8 chunks 3 labels>
# Now: feature selection:
splitter = NFoldSplitter(cvtype=1)
rfesvm_split = SplitClassifier(LinearCSVMC(),splitter)
FtSelClf = FeatureSelectionClassifier(
# use a linear SVM classifier:
clf = LinearCSVMC(),
# on features selected via RFE
feature_selection = RFE(
# based on sensitivity of a clf which does splitting internally
sensitivity_analyzer=rfesvm_split.getSensitivityAnalyzer(),
#transformer=None
transfer_error=ConfusionBasedError(
rfesvm_split,
confusion_state="confusion"),
# and whose internal error we use
feature_selector=FractionTailSelector(
0.2, mode='discard', tail='lower'),
# remove 20% of features at each step
enable_states=['feature_ids'],
# update sensitivity at each step
update_sensitivity=True),
descr='LinSVM+RFE(splits_avg)')
# Option 1: simple training and check on feature IDs
print FtSelClf.trained # prints "False"
FtSelClf.train(PyDat)
print FtSelClf.trained # prints "True"
print FtSelClf.feature_ids
# (Generates error - see below)
# Option 2: Run cross-validated transfer error
terr = TransferError(FtSelClf)
splitter = NFoldSplitter(cvtype=1)
cvterr = CrossValidatedTransferError(
terr,
splitter)
Err = cvterr(PyDat)
print Err
# (Also generates error - having NOT run option 1)
To be clear - I only used EITHER Option 1 or Option 2 (one or the
other was always commented out when I ran the code).
Option 1 gives the error:
Traceback (most recent call last):
File "./FeatureSelection_Example.py", line 77, in <module>
print FtSelClf.feature_ids
File "/opt/local/lib/python2.5/site-packages/mvpa/misc/state.py",
line 1099, in __getattribute__
return collections[known_attribs[index]].getvalue(index)
File "/opt/local/lib/python2.5/site-packages/mvpa/misc/state.py",
line 353, in getvalue
return self._items[index].value
File "/opt/local/lib/python2.5/site-packages/mvpa/misc/
attributes.py", line 66, in _getVirtual
return self._get()
File "/opt/local/lib/python2.5/site-packages/mvpa/misc/
attributes.py", line 227, in _get
raise UnknownStateError("Unknown yet value of %s" % (self.name))
mvpa.misc.exceptions.UnknownStateError: Exception: Unknown yet value
of feature_ids
And Option 2 gives the error:
Traceback (most recent call last):
File "./FeatureSelection_Example.py", line 81, in <module>
Err = cvterr(PyDat)
File "/opt/local/lib/python2.5/site-packages/mvpa/measures/
base.py", line 105, in __call__
result = self._call(dataset)
File "/opt/local/lib/python2.5/site-packages/mvpa/algorithms/
cvtranserror.py", line 173, in _call
result = transerror(split[1], split[0])
File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/
transerror.py", line 1283, in __call__
self._precall(testdataset, trainingdataset)
File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/
transerror.py", line 1239, in _precall
self.__clf.train(trainingdataset)
File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/base.py",
line 354, in train
result = self._train(dataset)
File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/meta.py",
line 1058, in _train
self.__testdataset)
File "/opt/local/lib/python2.5/site-packages/mvpa/featsel/rfe.py",
line 268, in __call__
wdataset = wdataset.selectFeatures(selected_ids)
File "/opt/local/lib/python2.5/site-packages/mvpa/datasets/
mapped.py", line 130, in selectFeatures
sdata = Dataset.selectFeatures(self, ids=ids, sort=sort)
File "/opt/local/lib/python2.5/site-packages/mvpa/datasets/
base.py", line 1018, in selectFeatures
new_data['samples'] = self._data['samples'][:, ids]
IndexError: index (2) out of range (0<=index<1) in dimension 1
~~~~~~~~~~~~~~~~~~~~~~~~~~
Mark Lescroart
(say it LESS-qua)
University of Southern California
Neuroscience Graduate Program
Image Understanding Lab
Email: mark.lescroart at usc.edu
Cell: (213) 447-0752
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20091120/139ddafe/attachment.htm>
More information about the Pkg-ExpPsy-PyMVPA
mailing list