[pymvpa] PCA transformation prior to SVM classification

Andreas Mueller amueller at ais.uni-bonn.de
Thu Nov 25 16:40:05 UTC 2010


Hi there.
First of: I don't think this is a PyMVPA question.
There are some machine learning communities like
http://metaoptimize.com/qa/ and http://agbs.kyb.tuebingen.mpg.de/km/bb/.
I would suggest that you ask your machine learning questions there.
> in order to improve
> accuracy, i am looking at transformations (such as time-frequency
> decomposition) on the data prior to feeding it to the classifier.
> i stumbled upon some methods mostly used in the BCI domain, such as
> PCA, CSP (common spatial patterns), DSP (discriminative spatial
> patterns) and the like.
>    
As far as I know, PCA is mainly used to reduce the dimensionality and
therefore the computational cost of the SVM.
Since this is only a linear transform, I doubt that it will improve results.
I am not familiar with CSP and DSP so I can not comment on those.
> i now have 2 questions:
>
> 1. running PCA, CSP, etc on the whole dataset _prior_ feeding it to a
> classifer: looks to me as a case of 'double-dipping', as all trials
> (training and test) are used to identifiy the components. thus all
> trials in the dataset given to the classifier are actually
> inter-dependent. am i right there?
>    
Whether you want to use the test set for feature extraction depends on 
the task.
Since you are doing only unsupervised feature extraction, it is in 
principal OK to do.
But if you want to apply your method as an online algorithm, it is 
impractical to do
the PCA again when a new pattern arrives.

> 2. if 1. is true, then one could still run the PCA (,etc) just on the
> training set in each split*, and then run a SVM. does this make any
> sense, or is a suited svm-kernel already taking care of this?
>
>    
You can run PCA on each split separately. But then you have to be 
careful to compute
the transformation matrix only once for each split, namely using only 
the training data,
and then apply the same transformation to the test data.
Do NOT do a separate PCA und the test data.

Cheers,
Andy



More information about the Pkg-ExpPsy-PyMVPA mailing list