[pymvpa] RFE memory issue

Fri Nov 19 08:15:12 UTC 2010

Hello,

before asking my question, I first want to introduce myself briefly.
My name is Thorsten Kranz, I work on fMRI, EEG and iEEG data at the
Clinic for Epileptologie, University of Bonn, Germany.

I have a problem using RFE. When I use the following classifier

        rfesvm_split = SplitClassifier(LinearCSVMC())
        clfr = FeatureSelectionClassifier(
                  LinearCSVMC(descr = "libsvm.LinSVM(C=def)", probability = 1),
                  RFE (
                   sensitivity_analyzer =
rfesvm_split.getSensitivityAnalyzer(),
                   transfer_error=ConfusionBasedError(
                    rfesvm_split,
                    confusion_state="confusion"),
                    # and whose internal error we use
                   feature_selector=FractionTailSelector(0.5,
mode='discard', tail='lower'),
                   update_sensitivity=True,# update sensitivity at each step
                   #disable_states = ["errors", "nfeatures"]
                  ),
                  descr='LinSVM+RFE(splits_avg)' ,
                  enable_states = ["feature_ids"]
                )

in a simply crossvalidation (for testing even on a simple OddEvenSplitter)

        cvterr = CrossValidatedTransferError(terr,OddEvenSplitter(),

enable_states='confusion'],harvest_attribs=["transerror.clf.feature_ids"])

on a dataset with 66500 features:
        Dataset / float64 379 x 66500 uniq: 379 chunks 17 labels

the memory usage increases up to more than 10 GB, causing huge
swapping and slow performance. When I increase the number of features
further, even all my mem and swap aren't sufficient.

I'm using PyMVPA on a 64bit Ubuntu Maverick machine, mvpa version

In [35]: mvpa.__version__
Out[35]: '0.4.5'

from your repositories.

What am I doing wrong? Am I expecting too much from RFE? Even though
my data are from iEEG rather than fMRI, typical fMRI datasets should
be of the same size, if not larger.

Is the number of labels (17) the problem?

Should I disable some states of RFE? history is needed, errors and
nfeatures shouldn't be too big...

Thanks in advance, greetings,

Thorsten Kranz