[pymvpa] FeatureSelectionClassifier (in RFE) occasionally returns full features set
Yaroslav Halchenko
debian at onerussian.com
Sun Apr 26 01:04:05 UTC 2009
actually I should have discovered the problem before asking you to
upload the data...
in your code you use
N_FEATURES = 30
...
feature_selector=FixedNElementTailSelector(N_FEATURES,
tail='upper', mode='select'),
so you aren't doing RFE per se ;) you just select 30 features right
on first step of RFE.... then, those 30 features lead to higher
generalization error than if you took all of them, therefore initial
dataset with all features is taken as the result.
to see that you had just to enable RFE debug target (or all RFE ones)
with
debug.active += ['RFE.*']
to see what is happening:
In [12]:## working on region in file /tmp/python-8102meB.py...
[RFEC ] DBG: Step 0: nfeatures=3022
[RFEC ] DBG: Step 0: nfeatures=3022 error=0.2125 best/stop=1/0
[RFEC_] DBG: Sensitivity: [-0.00507313 0.00025722 0.00159871 ..., -0.00212875 0.00078268
-0.00027174], nfeatures_selected=30, selected_ids: [ 120 338 341 356 462 472 483 501 517 571 573 574 594 612 619
634 635 636 659 676 677 760 778 779 796 872 1109 1338 1545 1677]
[RFEC ] DBG: Step 1: nfeatures=30
[RFEC ] DBG: Step 1: nfeatures=30 error=0.2500 best/stop=0/0
[RFEC_] DBG: Sensitivity: [ 0.09779742 0.16359045 0.02775154 0.09486282 -0.0804099 -0.04392221
-0.06721182 0.09752928 0.03872871 0.08811431 0.14541801 0.13167303
0.13925132 0.03046704 0.04748648 0.09525846 -0.04226041 0.06917038
0.03207438 0.06333298 0.01423283 0.02703152 0.16574083 0.05634531
0.11383484 0.03402658 0.07105218 -0.02116503 0.24369252 0.20591227], nfeatures_selected=30, selected_ids: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29]
[RFEC ] DBG: Step 2: nfeatures=30
[RFEC ] DBG: Step 2: nfeatures=30 error=0.2500 best/stop=0/0
[RFEC_] DBG: Sensitivity: [ 0.09779742 0.16359045 0.02775154 0.09486282 -0.0804099 -0.04392221
-0.06721182 0.09752928 0.03872871 0.08811431 0.14541801 0.13167303
0.13925132 0.03046704 0.04748648 0.09525846 -0.04226041 0.06917038
0.03207438 0.06333298 0.01423283 0.02703152 0.16574083 0.05634531
0.11383484 0.03402658 0.07105218 -0.02116503 0.24369252 0.20591227], nfeatures_selected=30, selected_ids: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29]
[RFEC ] DBG: Step 3: nfeatures=30
[RFEC ] DBG: Step 3: nfeatures=30 error=0.2500 best/stop=0/0
[RFEC_] DBG: Sensitivity: [ 0.09779742 0.16359045 0.02775154 0.09486282 -0.0804099 -0.04392221
-0.06721182 0.09752928 0.03872871 0.08811431 0.14541801 0.13167303
0.13925132 0.03046704 0.04748648 0.09525846 -0.04226041 0.06917038
0.03207438 0.06333298 0.01423283 0.02703152 0.16574083 0.05634531
0.11383484 0.03402658 0.07105218 -0.02116503 0.24369252 0.20591227], nfeatures_selected=30, selected_ids: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29]
....
see original RFE definition on how to actually do RFE ;) or just try SMLR
which might be more efficient, who knows ;)
On Sat, 25 Apr 2009, Yaroslav Halchenko wrote:
> at first I thought that I know what is the reason, but then I realized
> that it shouldn't be... didn't test though. to expedite things would you
> mind uploading your data + code to the address I will provide you in a
> followup email? ;)
> On Sat, 25 Apr 2009, Vadim Axel wrote:
> > Hi,
> > I implemented some simple RFE logic, similar to what was described
> > here: [1]http://www.pymvpa.org/featsel.html
> > At the end of the classification procedure, I verify the the features
> > that were selected based on what was described here:
> > [2]http://www.pymvpa.org/faq.html#how-do-i-know-which-features-were-fin
> > ally-selected-by-a-classifier-doing-feature-selection
> > Now the problem: sometimes the resulted number of selected features is
> > the exact number, which is required (I use FixedNElementTailSelector),
> > whereas in some other case, for completely unknown reason, I get full
> > set of features. The issue is really weired, since for two sessions of
> > a subject I get selected feature set, but for two other sessions of the
> > same subject I get full feature set. I suspect, that the problem might
> > be in updating the feature_ids variable and not with classification,
> > because the classification error rate was pretty low.
> > Attached my code. Is it any problem with it?
> > I can also upload my dataset (~50 Mb zip). I didn't succeed to
> > reproduce it with smaller amount of data.
> > Thanks for your help,
> > Vadim
> > Ссылки
> > 1. http://www.pymvpa.org/featsel.html
> > 2. http://www.pymvpa.org/faq.html#how-do-i-know-which-features-were-finally-selected-by-a-classifier-doing-feature-selection
--
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student Ph.D. @ CS Dept. NJIT
Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171
101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW: http://www.linkedin.com/in/yarik
More information about the Pkg-ExpPsy-PyMVPA
mailing list