[pymvpa] SVM classification of data with temporal correlation

Thu Dec 3 21:56:35 UTC 2009

Thank you Francisco and Emanuele for your encouragement :)
1. I use run sessions as my folds in cross-validation, so I think that I
should be fine. At least I do not have any temporal correlation leakage from
training to test data set.
2. I understand the concern with regard to binomial accuracy test. I think
it might be more robust to create scrambled labels distribution by running
~1000  times the classification on scrambled data (some sort of
non-parametric test). Just recently, I have encountered some class which for
some unknown reason even with scrambled labels was far beyond chance. So,
the binomial test would have missed this.
3. Averaging within block indeed reduces the temporal correlation. I just
ran some test on two different data sets, where in one of them I had 10 secs
fixation between the blocks and in another one, no fixation at all.
Surprisingly, the temporal correlation between the data blocks were similar
(~0.1). Though it is much better than raw data, it is still far away from
Gaussian noise. Anyway, if all the issue is just the low performance, then I
am fine with it.

On Thu, Dec 3, 2009 at 11:17 PM, Emanuele Olivetti
<emanuele at relativita.com>wrote:

> Vadim Axel wrote:
>
>> Hi,
>> ...
>>
>> What do you think about this issue? Does ignoring temporal correlation may
>> just decrease the prediction rate or it casts doubt in the results in
>> general?
>>
>
> SVM will underperform in case of non-iid data because it will not exploit
> temporal
> dependencies. Underperform in the sense that a classifier exploiting it
> could do better.
> As far as I remember some generalization bounds should not hold for SVM
> when data
> is not iid. Nevertheless it is pretty common that data is not iid and to
> use classifiers
> that assume iid data on them.
>
> As far as I know there are several schemas to minimize the impact of the
> temporal
> dependencies between fMRI volumes. Averaging over blocks is one of them.
> For
> example in [0] they use beta values for each trial as regressors instead of
> BOLD.
> Many other strategies can be conceived.
>
> As a basic rule just be sure that you don't use highly temporal-correlated
> samples
> between train and test set, which in your case could mean to avoid samples
> from the same
> block be splitted in train and test set. PyMVPA has the concept of "chunk"
> for that.
> During cross-validation samples from the same chunk will all go either to
> train or
> test set. This helps, for example, when you want to test the error rate of
> your
> binary classifier with the binomial test.
>
> HTH,
>
> E.
>
> [0]: http://www.citeulike.org/user/librain/article/3140982
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20091203/92432416/attachment.htm>