[pymvpa] RFE and dataset splits

Thu Jul 7 15:04:08 UTC 2011

just to catch up on things:  

 cons: indeed recipe was not correctly adopted from 0.4.x
 pros: no changes needed in PyMVPA itself

see attached a complete script with few ways for RFE, depending on what you
take for the sensitivity and for the error to decide on when to stop.  I was
going also implement a "biased" version, close to the original RFE version from
Guyon, where stopping point is chosen by having a sneak view and the
cross-validation held-out set, which would have negative consequences on
significance testing, which is a corner-stone of machine-learning in
neuroimaging... so I haven't done that -- only truly unbiased, although
possibly slightly overfitting versions available ;-)

we are yet to adjust the webpage describing the RFE in 0.6.x

on the dataset you have provided, here is what I got with the script (d'oh --
forgotten about zscoring being commented out):

SMLR: 58.33% ACC
SVM: 64.58% ACC
RFE (overfit): 66.67% ACC
RFE (overfit less?): 66.67% ACC
RFE (half/splits+rank order): 64.58% ACC

altogether -- results are comparable with SMLR and I bet with a simple nested
cross-validation to choose lm parameter for SMLR you might get better
results for lower computing cost

On Wed, 06 Jul 2011, Yaroslav Halchenko wrote:

> neh -- that was a good call from me but still not it -- just read it
> nevertheless ;)

-- 
=------------------------------------------------------------------=
Keep in touch                                     www.onerussian.com
Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dataset_rfe.py
Type: text/x-python
Size: 12162 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20110707/692e692a/attachment.py>