[pymvpa] feature sensitivity in MVPA run on EEG data

Fri Mar 7 19:42:04 UTC 2014

Hi everyone,

I finally had time to try out some of the suggestions made here. As a 
short reminder, my participants did a motor preparation task, using cues 
provided at the start of a preparation interval to optimally respond to 
the full stimulus at the end of the interval. I would expect that near 
the end of the interval, preparation is largest. For simplicity, I am 
now only looking at the full and 'empty' cues and the EEG signal on Cz 
and Pz during the preparation interval.

First, I tried z-scoring (dividing the whole signal by the variance of 
the signal in the baseline period). This has the effect that right from 
the start of the preparation interval Linear SVM performs near it's 
maximum with only a shallow increase over time (the maximum depends on 
the baseline interval used). Especially that performance is well above 
chance at the very first interval seems unlikely to me. I do observe 
that some participants blinked a lot right before trial onset, which may 
affect this step in the analysis, even though it should be roughly equal 
for each condition, as the baseline period is always before the first 
cue is presented. I'm not sure how to interpret or handle this.

Then I also tried averaging across several trials, by taking N 
consecutive trials within the same condition, and discarding the 
remaining trials. The minimum number of trials that were acceptable for 
analysis within the conditions and across subjects was 70 (one subject 
had 104, average close to 90). When I average across 23 trials (so that 
there are a minimum of 3 targets within each condition) the noisiness of 
the data should be minimal, but the Linear SVM performs at 0% for many 
participants across the whole preparation interval and on average at 
slightly above 20%... well below chance! Something must be terribly 
wrong there. When averaging around 5 trials (so that there are 35 
targets or more in each condition) performance looks better. Using sets 
of 10, 15 and 20 trials progressively decreases performance. So it seems 
that somewhere around 5 there is an optimum. I'm not sure how to pick a 
good value here, without trying them all and picking the one that 
performs best.

Any advice on these two issues would be hugely appreciated.

Brian: I'll see if I can do PLR. Also, say that one of my two channels 
is used for "detecting noise," how would that affect the results? And if 
I'd add a channel that should not be informative at all (i.e. only 
sample the actual noise) would that channel be more likely to be picked 
as "noise-cancellation channel"? Trying to wrap my head around this.

Best,
Marius

On 14-01-28 07:33 AM, Brian Murphy wrote:
> Hi,
>
> just jumping into this discussion a bit late...
>
>> Tying in to another discussion, could it be beneficial to first average
>> every 5 trials or so? In a way this reduces noise, so the performance
>> would most likely go up - as might the informativeness of feature
>> sensitivity. The downside is that you no longer have predictions on a
>> trial by trial basis.
> If you have enough data to get away with it (ie will still have enough
> cases to train on), then yes, it is worth trying, with a very important
> caveat: that you are interested in time-domain signals. Obviously a
> straight trialwise averaging will wash out any interesting spectral
> activity which isn't phase-locked (and given your task, precise
> phase-locking seems unlikely). But anyway, averaging might clean up the
> sensitivity maps. Then again, from the paper-writing point of view,
> keeping things as simple as possible is always preferable.
>
>>>>> Also: what preprocessing did you do? Any z-scoring, baseline correction etc?
>>>> I do baseline correction, but no Z-scoring. Should I do Z-scoring? If so, over all data, within electrode or within trial?
> I'm not an SVM expert, so this might not be relevant - but for many
> classifiers, the weights are only interpretable as sensitivity measures
> if the underlying variable is on a similar scale. So, for the sake of
> argument, if your Cz was twice as loud as your Pz (unlikely, I know),
> then it's weights would be scaled down, and not be directly comparable.
> So yes, for sensitivity analyses z-scoring of some kind would be
> advisable - there are several ways, e.g. ideally you would do this based
> on the *clean* rest periods (you've done manual artefact rejection - so
> that should be possible). But for EEG data you can often just z-score
> based on the whole signal time-course. [I see Nick O has made similar
> suggestions]
>
>
>> That doesn't look like what I expected - but I find it hard to judge if
>> what I'm doing is actually correct.
> There are few reasons that could account for the differences you see between the ERPs and the sensitivity maps:
>   - different scaling of the input signals (as above)
>   - more/less variance in the signals (looking at the ERPs, it looks like particular periods have better or worse separation between the conditions, but it is not just the magnitude of this difference that matters, but rather its magnitude *relative to the variance* across trials)
>   - models may also give weights to features that are good descriptions of noise, so that noise can be factored out of other condition-informative features. See this paper for details, also on how to normalise the sensitivity maps to compensate for this effect:
> http://www.citeulike.org/user/emanueleolivetti/article/12177881
>
> Regards classifiers, LinSVM is good, but my preference would be a regularized logistic regression (e.g. PLR), as I've yet to find a situation in which any variety of SVM gives me a decisive performance advantage. Also, consider the idea of SVMs, which are to find a hyperplane that best separates the boundary cases. If these boundary cases are representative of the conditions in general that is just fine. But if they are outliers in some sense, then maybe not.
>
> Brian
>