[pymvpa] PCA transformation prior to SVM classification

Mon Nov 29 08:09:37 UTC 2010

Hi everyone,
First of: thanks to you for the rapid and enlightening replies.

On Thu, Nov 25, 2010 at 5:40 PM, Andreas Mueller
<amueller at ais.uni-bonn.de> wrote:
> Hi there.
> First of: I don't think this is a PyMVPA question.
> There are some machine learning communities like
> http://metaoptimize.com/qa/ and http://agbs.kyb.tuebingen.mpg.de/km/bb/.

I'll post questions like these there in the future. Thanks for
pointing me towards these lists.

On Thu, Nov 25, 2010 at 6:22 PM, Emanuele Olivetti
<emanuele at relativita.com> wrote:
> About CSP... it is using class-labels. Isn't it?

actually no, it does not use class labels. i had a look at the
algorithm published in "2007 Liao Yao Wu Li Ieee Trans Biomed Engin
Combining Spatial Filters for the Classification of Single-Trial EEG
in a Finger Movement Task", (where they introduce the DSP as well).

On Thu, Nov 25, 2010 at 11:27 PM, Yaroslav Halchenko
<debian at onerussian.com> wrote:
> As you could see -- it would be just a memory-based classifier with
> cheap train and expensive predict ;)

Ok, got it. thanks for the nice explanation.

>> >accuracy, i am looking at transformations (such as time-frequency
>> >decomposition) on the data prior to feeding it to the classifier.
>
> btw, if you have pywt module available, you could give a try to
> WaveletPacketMapper
> and
> WaveletTransformationMapper

I'll have a look at it. For now i've been using eeglab and then
reading in the transformed data to pymvpa.

>> >PCA, CSP (common spatial patterns), DSP (discriminative spatial
>> >patterns) and the like.
>> As far as I know, PCA is mainly used to reduce the dimensionality and
>> therefore the computational cost of the SVM.
>> Since this is only a linear transform, I doubt that it will improve results.
>
> actually it depends... e.g. if underlying classifier's regularization is
> invariant to the transformation (e.g. margin width), then yeap -- there should
> be no effect.  But if it is sensitive to it (e.g. feature selection , like in
> SMLR), then you might get advantage since, like in the case of SMLR, the goal
> of having fewer important features might be achieved with higher
> generalization.

A follow-up question; is the inverse true too: can having fewer
important features lead to a higher generalization?

best,
jakob