[pymvpa] RFE memory issue
Yaroslav Halchenko
debian at onerussian.com
Fri Nov 19 14:09:12 UTC 2010
Welcome Thorsten,
On Fri, 19 Nov 2010, Thorsten Kranz wrote:
> rfesvm_split = SplitClassifier(LinearCSVMC())
> >...<
> on a dataset with 66500 features:
> Dataset / float64 379 x 66500 uniq: 379 chunks 17 labels
The reason is -- SplitClassifier by default uses NFoldSplitter and you
have 379 chunks... Also due to its nature SplitClassifier constructs and
keeps all those 379 classifiers (1 per split) in memory... so you can do
counting ;-)
workarounds:
- reduce number of chunks by groupping trials together
- use OddEven or HalfSplitter within rfesvm_split instead of NFold
also
- why not give a try to SMLR?
at least it should be much faster than "correctly" RFEed SVM, like you
are trying to do.... you might also like to use SMLR just for the sake
of feature selection prior using SVM (pair-wised voted), effectively
replacing RFE with SMLR as a feature selection step
> Is the number of labels (17) the problem?
not directly, although it contributes, since SVM does pair-wise
classification, thus effectively constructs 17*16/2 = 136 binary classifiers
> history is needed, errors and
> nfeatures shouldn't be too big...
yeap
--
.-.
=------------------------------ /v\ ----------------------------=
Keep in touch // \\ (yoh@|www.)onerussian.com
Yaroslav Halchenko /( )\ ICQ#: 60653192
Linux User ^^-^^ [175555]
More information about the Pkg-ExpPsy-PyMVPA
mailing list