[pymvpa] PCA transformation prior to SVM classification

Thu Nov 25 22:27:23 UTC 2010

Thank you Andreas and Emanuele for the replies,

Indeed there might be better forums to find the absolute knowledge in
machine learning, but well, at times we might be helpful too ;)

On Thu, 25 Nov 2010, Emanuele Olivetti wrote:
> About PCA, you are not double-dipping, since PCA uses just brain
> data, not stimuli (or whatever you want to predict) as input. It
> is an "unsupervised" method, so it is safe to use it on the whole dataset.

agree! In particular if the goal is just a generalization
estimation.  I am not sure if it would be ok if I was interested in the
"relevance" of any particular feature as diagnosed by classifier
sensitivity and corresponding loadings on PCA components.

To make it clear why it is ok for generalization assessment and there is
no double-dipping:

imagine I create a classifier which just stores labeled data obtained
during training, and while new data provided during prediction, it takes
both training data (without labels) and new data obtained for prediction
(no labels provided), computes PCA, and only then takes PCA-transformed
training data and labels to train a corresponding classifier; and doing
PCA-projection for prediction as well.

As you could see -- it would be just a memory-based classifier with
cheap train and expensive predict ;)

On Thu, 25 Nov 2010, Andreas Mueller wrote:
> >accuracy, i am looking at transformations (such as time-frequency
> >decomposition) on the data prior to feeding it to the classifier.

btw, if you have pywt module available, you could give a try to
WaveletPacketMapper
and
WaveletTransformationMapper

> >PCA, CSP (common spatial patterns), DSP (discriminative spatial
> >patterns) and the like.
> As far as I know, PCA is mainly used to reduce the dimensionality and
> therefore the computational cost of the SVM.
> Since this is only a linear transform, I doubt that it will improve results.

actually it depends... e.g. if underlying classifier's regularization is
invariant to the transformation (e.g. margin width), then yeap -- there should
be no effect.  But if it is sensitive to it (e.g. feature selection , like in
SMLR), then you might get advantage since, like in the case of SMLR, the goal
of having fewer important features might be achieved with higher
generalization.

-- 
=------------------------------------------------------------------=
Keep in touch                                     www.onerussian.com
Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic