[pymvpa] FeatureSelectionClassifier (in RFE) occasionally returns full features set

Mon Apr 27 21:48:26 UTC 2009

>    I see. The error rate was the best with the full set of features, so no
>    features were selected.
yeap... but not only that -- you haven't done actual RFE but rather just
selected 30 features on the first step right away -- please see my prev
email

>    the selection of fixed number of features using RFE. More specifically:
>    1. I would like to get 30 features, based on which I get the best
>    prediction.
ah... well... that is way too difficult task ;) search-space is a choice
of 30 out from 3022 ;)  some heuristics (RFE, IFS) might approach close
to that set (which we don't know) but do not guarantee it

> I don't care that with 31 features (or 3022) I will get a
>    better prediction. Isn't your graph Fig.1 in "Full Brain
>    Classi??cation: There Is No ?Face? Identi??cation Area" paper was the
>    result of such analysis?
nope -- we ran RFE to improve the prediction/generalization power since
it was quite bad some times whenever done full-brain.

> In addition, attached the output which I get
>    from classification of another dataset, which resulted in desired 30
>    features. However, I am confused to understand what I see there. If I
>    read out it correctly, starting from step 1, one subset of 30 features
>    was selected and classified all the time. What about the other possible
>    subsets?
as I mentioned above -- there is no way to explore (in a reasonable
time-span) all subsets of 30
features out from 2000. 

and by doing immediate jump to 30 features you hadn't done RFE per se --
you just have done FeatureSelectionClassifier based on the weights of
SVM

> How RFE knows that it is the best one?

it doesn't... if you look at the docs of RFE, there is a parameter
'bestdetector' which defines when to stop (by default whenever minimal
error was found within 10 steps iirc)

> It just picked the best
>    ranks from original 3022?
yeap

> I am not sure that it is very optimal. 
probably quite not optimal at all ;) who knows ;)

> If you
>    have some working example of correct / optimal RFE usage, I would very
>    appreciate you sending me.
hm... the one in the documentation should be fine I guess; although it
is actually a bit different from original RFE by Guyon... there she did
not actually do proper unbiased selection of the stopping point, etc.

I would just recommend to resort to some other feature selection
strategy, or using a classifier with inherent sparse regularization
(feature selection) like SMLR... you can also easily train an SVM based
on the features selected by SMLR (I think there is an example somewhere
in the documentation about that)

>    2. Unfortunately, even after reading Guyon 2002, I feel that I don't
>    fully understand RFE algorithm. Particularly, what is the size of the
>    original features subset, that algorithm starts with?
arbitrary size

> Does it really
>    start with full features set, although for 1000 voxels it is an evident
>    overfitting?
yes

SVM and others are quite well regularized one way or another, so they
might perform nicely even in the cases of limited number of samples and
large input dimensionality; so it might not overfit that badly even on
full-brain dataset... all depends on the data

> The solution with 3022 voxels, which I got, is not going
>    to generalize well (given that I have 480 trials only), what is a
>    benefit from such a solution?
see above ;) you might be lucky and generalize nicely

have you tried SMLR as well?

> ?  Any reference, which will clarify me
>    all those issues are more than welcomed.
I hope that I've clarified them to some extent here... not sure what
could be the best reference on feature selection in fMRI -- there is no
yet gold ground truth ;)

>    I really consider using PyMVPA, because I was impressed?  by robustness
>    of this software.
nice to hear that it doesn't crash on you too often ;)

> However, although your doc is well written and
>    organized, I am still got stuck in some places.
contributions are welcome to tune up documentation to be even more
verbose and comprehandable. There is no stopping point in perfecting
anything ;)
-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik