[pymvpa] RFE and dataset splits

Wed Jun 22 18:11:02 UTC 2011

Hi All,
Wondering if anyone has had experience with RFE in PyMVPA 0.6.x? Still
trying to figure out RFE and it seems like I must still be missing
something... Here's part of what it shows when RFE is in progress:

[RFEC] DBG:            Step 0: nfeatures=135168
[RFEC] DBG:            Step 0: nfeatures=135168 error=0.5000 best/stop=1/0
[RFEC] DBG:            Step 1: nfeatures=67584
[RFEC] DBG:            Step 1: nfeatures=67584 error=0.5000 best/stop=0/0
[RFEC] DBG:            Step 2: nfeatures=33792
[RFEC] DBG:            Step 2: nfeatures=33792 error=0.5000 best/stop=0/0
[RFEC] DBG:            Step 3: nfeatures=16896
[RFEC] DBG:            Step 3: nfeatures=16896 error=0.5000 best/stop=0/0
[RFEC] DBG:            Step 4: nfeatures=8448
[RFEC] DBG:            Step 4: nfeatures=8448 error=0.5000 best/stop=0/0
[RFEC] DBG:            Step 5: nfeatures=4224
[RFEC] DBG:            Step 5: nfeatures=4224 error=0.5000 best/stop=0/0
[RFEC] DBG:            Step 6: nfeatures=2112
[RFEC] DBG:            Step 6: nfeatures=2112 error=0.5000 best/stop=0/0
[RFEC] DBG:            Step 7: nfeatures=1056
[RFEC] DBG:            Step 7: nfeatures=1056 error=0.5000 best/stop=0/0
[RFEC] DBG:            Step 8: nfeatures=528
[RFEC] DBG:            Step 8: nfeatures=528 error=0.5000 best/stop=0/0
[RFEC] DBG:            Step 9: nfeatures=264
[RFEC] DBG:            Step 9: nfeatures=264 error=0.5000 best/stop=0/0
[RFEC] DBG:            Step 10: nfeatures=132
[RFEC] DBG:            Step 10: nfeatures=132 error=0.5000 best/stop=0/1
...this goes on 24 times (I have 24 runs/chunks, each with two targets). The
main thing I am confused about is why it would have a 0.5 error each time.
Shouldn't it sometimes get both targets right or both wrong (i.e error of 0
or 1?).

Perhaps the code might help?

rfesvm_split = LinearCSVMC()
debug.active = ['RFEC']
fs = \
    RFE(rfesvm_split.get_sensitivity_analyzer(),
        ProxyMeasure(rfesvm_split,
                postproc=BinaryFxNode(mean_mismatch_error, 'targets')),
        Splitter('chunks'),
        fselector=FractionTailSelector(
            0.50,
            mode='select', tail='upper'),
        stopping_criterion=NBackHistoryStopCrit(BestDetector(), 10),
        update_sensitivity=True)

clf = FeatureSelectionClassifier(
    LinearCSVMC(),
     # on features selected via RFE
    fs)
     # update sensitivity at each step (since we're not using the
     # same CLF as sensitivity analyzer)

#cv = SplitClassifier(clf)
cvte = CrossValidation(clf, NFoldPartitioner(), errorfx=lambda p, t:
np.mean(p == t), postproc=mean_sample(),
    enable_ca=['confusion', 'stats'])
cv_results=cvte(avgds)
print np.mean(cv_results)
print cvte.ca.stats.matrix

I would greatly appreciate any ideas! Thank you!

Kimberly Zhou
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20110622/1bc4b5e3/attachment.html>