[pymvpa] PCA transformation prior to SVM classification

Thu Nov 25 17:22:22 UTC 2010

Hi,

On 11/25/2010 04:56 PM, Jakob Scherer wrote:
> i now have 2 questions:
>
> 1. running PCA, CSP, etc on the whole dataset _prior_ feeding it to a
> classifer: looks to me as a case of 'double-dipping', as all trials
> (training and test) are used to identifiy the components. thus all
> trials in the dataset given to the classifier are actually
> inter-dependent. am i right there?
>
>    

About PCA, you are not double-dipping, since PCA uses just brain
data, not stimuli (or whatever you want to predict) as input. It
is an "unsupervised" method, so it is safe to use it on the whole dataset.

As long as the pre-processing is unsupervised you are safe. When
you are using class labels in some way you may incur in double
dipping if you are not careful.

About CSP... it is using class-labels. Isn't it?

> 2. if 1. is true, then one could still run the PCA (,etc) just on the
> training set in each split*, and then run a SVM. does this make any
> sense,

Yes you could. But it would be sub-optimal (for PCA), in my opinion.

> or is a suited svm-kernel already taking care of this?
>
>    

There is a kernel taking care of that, actually. But it is work
in progress... :-) . If you are interested send me an email.

Best,

Emanuele

P.S.: you might be interested in this for handling supervised
pre-processing and account for double-dipping (bias)
http://dx.doi.org/10.1109/WBD.2010.9