[pymvpa] RFE memory issue

Fri Nov 19 14:09:12 UTC 2010

Welcome Thorsten,

On Fri, 19 Nov 2010, Thorsten Kranz wrote:
>         rfesvm_split = SplitClassifier(LinearCSVMC())
>         >...<
> on a dataset with 66500 features:
>         Dataset / float64 379 x 66500 uniq: 379 chunks 17 labels

The reason is -- SplitClassifier by default uses NFoldSplitter and you
have 379 chunks... Also due to its nature SplitClassifier constructs and
keeps all those 379 classifiers (1 per split) in memory... so you can do
counting ;-)

workarounds:
- reduce number of chunks by groupping trials together
- use OddEven or HalfSplitter within rfesvm_split instead of NFold

also

- why not give a try to SMLR?
  at least it should be much faster than "correctly" RFEed SVM, like you
  are trying to do.... you might also like to use SMLR just for the sake
  of feature selection prior using SVM (pair-wised voted), effectively
  replacing RFE with SMLR as a feature selection step

> Is the number of labels (17) the problem?
not directly, although it contributes, since SVM does pair-wise
classification, thus effectively constructs 17*16/2 = 136 binary classifiers

> history is needed, errors and
> nfeatures shouldn't be too big...
yeap

-- 
                                  .-.
=------------------------------   /v\  ----------------------------=
Keep in touch                    // \\     (yoh@|www.)onerussian.com
Yaroslav Halchenko              /(   )\               ICQ#: 60653192
                   Linux User    ^^-^^    [175555]