[pymvpa] RFE memory issue
Thorsten Kranz
thorstenkranz at googlemail.com
Fri Nov 19 08:15:12 UTC 2010
Hello,
before asking my question, I first want to introduce myself briefly.
My name is Thorsten Kranz, I work on fMRI, EEG and iEEG data at the
Clinic for Epileptologie, University of Bonn, Germany.
I have a problem using RFE. When I use the following classifier
rfesvm_split = SplitClassifier(LinearCSVMC())
clfr = FeatureSelectionClassifier(
LinearCSVMC(descr = "libsvm.LinSVM(C=def)", probability = 1),
RFE (
sensitivity_analyzer =
rfesvm_split.getSensitivityAnalyzer(),
transfer_error=ConfusionBasedError(
rfesvm_split,
confusion_state="confusion"),
# and whose internal error we use
feature_selector=FractionTailSelector(0.5,
mode='discard', tail='lower'),
update_sensitivity=True,# update sensitivity at each step
#disable_states = ["errors", "nfeatures"]
),
descr='LinSVM+RFE(splits_avg)' ,
enable_states = ["feature_ids"]
)
in a simply crossvalidation (for testing even on a simple OddEvenSplitter)
cvterr = CrossValidatedTransferError(terr,OddEvenSplitter(),
enable_states='confusion'],harvest_attribs=["transerror.clf.feature_ids"])
on a dataset with 66500 features:
Dataset / float64 379 x 66500 uniq: 379 chunks 17 labels
the memory usage increases up to more than 10 GB, causing huge
swapping and slow performance. When I increase the number of features
further, even all my mem and swap aren't sufficient.
I'm using PyMVPA on a 64bit Ubuntu Maverick machine, mvpa version
In [35]: mvpa.__version__
Out[35]: '0.4.5'
from your repositories.
What am I doing wrong? Am I expecting too much from RFE? Even though
my data are from iEEG rather than fMRI, typical fMRI datasets should
be of the same size, if not larger.
Is the number of labels (17) the problem?
Should I disable some states of RFE? history is needed, errors and
nfeatures shouldn't be too big...
Thanks in advance, greetings,
Thorsten Kranz
More information about the Pkg-ExpPsy-PyMVPA
mailing list