[pymvpa] PCA transformation prior to SVM classification

Mon Nov 29 18:27:05 UTC 2010

Hi Jakob!

Definitely! Many classifiers are not so good if the input is too high
dimensional.

It is hard to say where critical limits are, but due to my experiences
I'd say, e. g. for SVM: some hundreds might be o.k., some thousands
aren't. In these cases, you definitely should do a feature selection.

Other classifiers behave differently, e.g. SMLR does some kind of
Feature Selection per se.

What do the experts think about critical numbers of inputs?

Greetings,

Thorsten

2010/11/29 Jakob Scherer <jakob.scherer at gmail.com>:
> 2010/11/29 Yaroslav Halchenko <debian at onerussian.com>:
>>
>> On Mon, 29 Nov 2010, Jakob Scherer wrote:
>>> > actually it depends... e.g. if underlying classifier's regularization is
>>> > invariant to the transformation (e.g. margin width), then yeap -- there should
>>> > be no effect.  But if it is sensitive to it (e.g. feature selection , like in
>>> > SMLR), then you might get advantage since, like in the case of SMLR, the goal
>>> > of having fewer important features might be achieved with higher
>>> > generalization.
>>> A follow-up question; is the inverse true too: can having fewer
>>> important features lead to a higher generalization?
>>
>> if you are asking:
>>
>> * "can having fewer important features among bulk of irrelevant features"
>>  then I guess answer is "No"
>>
>> * "can having fewer features (just important ones)..."
>>  then the answer "oh Yes" -- that is the goal of feature selection
>>  procedures, to distill featureset so only important ones are left
>>
>> or did I misunderstand entirely?
>
> Actually i wanted to ask: is it possible to get a higher performance
> by feature selection?
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>