[pymvpa] Justification for trial averaging?
Brian Murphy
brian.murphy at qub.ac.uk
Thu Jan 23 17:52:38 UTC 2014
Hi,
I'm not sure about the motivation for averaging in that particular paper
- if I had to guess, it might be that they chose a simple exposition to
present what at that time was a completely novel approach.
But averaging can work as a simple but effective method to improve the
signal/noise ratio in individual test or training cases. The
corresponding trade-off is that you have fewer cases to train from, and
would need a higher rate of success to pass any significance threshold
you might have.
A simple experiment might be to compare the classification accuracy,
significance there-of, and sensitivity maps of the following:
- train on averages, test on averages
- train on trials, test on averages (could get similar classification
accuracy if your classifiers are dealing well with noise, colinearity,
etc)
- train on trials, test on trials (should get lower classification
accuracy, but be similarly significant)
Brian
On Thu, 2014-01-23 at 17:37 +0000, Shane Hoversten wrote:
>
> I have a question about trial averaging in MVPA, by which I mean
> taking the average response of a certain stimulus class, and using
> this average value as input to the classifier, instead of feeding it
> the responses from the individual trials themselves.
>
> For instance, in the original Haxby experiment[1] (referred to in the
> PyMVPA documentation and tutorial) each subject does two runs, and
> each run produces 12 time series, each of which includes 8 blocks, one
> for each stimulus category ('bottle' 'cat' 'chair' 'face' 'house'
> 'scissors' 'scrambledpix' ‘shoe’). I had some trouble following
> exactly what they’re collecting in each block, but the block is 24
> seconds long, so it’s a bunch of exemplars of the category in
> question.
>
> But in the ‘mappers’ section of the tutorial[2] the data is collapsed
> into 2 runs x 8 samples per run. So the responses for all the stimuli
> in each category (‘faces’, ‘scissors’, etc.) are averaged across the
> blocks in all 12 training sessions, producing 1 canonical sample for
> each of the categories (for each of the 2 runs.) And these ‘canonical
> samples’ are what is being used for classification purposes.
>
> The question is, why do it this way? The practice seems to be widely
> used, (although I can’t cite another reference off the top of my
> head.) It seems to me that this amounts to pre-classification, where
> you’re taking a ‘typical’ face/scissors/whatever, and seeing if the
> classifier can distinguish between these different kinds of
> typicality. But forming decisions boundaries over features is exactly
> what a classifier is meant to do, so why not just throw all these
> different exemplars into the mix, and let the classifier figure out
> its own notion of prototypicality? And if you’re going to
> pre-classify, why pick the average response? Why not take some kind
> of lower-dimensional input; the first several eigenvectors or
> something, or something else?
>
> I understand that this can be empirically answered (try a bunch of
> things; do what works best) but could someone enlighten me as to the
> theoretical justification of one choice over another?
>
> [1] http://www.sciencemag.org/content/293/5539/2425.abstract
> [2] http://www.pymvpa.org/tutorial_mappers.html
>
>
>
> Shane
--
Dr. Brian Murphy
Lecturer (Assistant Professor)
Knowledge & Data Engineering (EEECS)
Queen's University Belfast
brian.murphy at qub.ac.uk
More information about the Pkg-ExpPsy-PyMVPA
mailing list