[pymvpa] glm for MVPA

Mon Feb 15 16:07:43 UTC 2016

On 02/15/2016 10:20 AM, basile pinsard wrote:
> Hi pymvpa users and developers.
> 
> I have multiple questions regarding the use of GLM to model events for
> MVPA analysis which are not limited to PyMVPA.
> 
> First: in NiPy GLM mapper the measure extracted are beta weights from
> GLM fit
> <https://github.com/bpinsard/PyMVPA/blob/master/mvpa2/mappers/glm/nipy_glm.py#L50>,
> is that common for MVPA? StatsModelGLM mappers return t-p-z... values
> from model. Does using the t-statistic is more relevant?
> The following paper Decoding information in the human hippocampus
> <http://www.sciencedirect.com/science/article/pii/S0028393212002953>: A
> user's guide  says: "In summary, the pre-processing method of choice at
> present appears to be the use of the GLM to produce /t/-values as the
> input to MVPA analyses. However, it is important to note that this does
> not invalidate the use of other approaches such as raw BOLD or betas,
> rather the evidence suggests that these approaches may be sub-optimal,
> reducing the power of the analysis, making it more difficult to observe
> significant results."
> What do you think?

I don't remember reading the referenced article, but I've done a little
playing in this area. While I haven't done per-event t-scores, I have
done one t-statistic per-condition-per-run, and found very similar
spatial pattern of results to those from per-event betas, but with much
higher classification accuracy. For whatever that's worth.

With per-event betas (z-scored using the ZScoreMapper built on control
conditions), I get distributions of CV accuracies around chance, which
is fine for our analysis strategy. Personally, I don't quite know what
to make of ~50% minimum classification on a problem where chance is 33%.
That said, we use non-parametric significance testing, while there may
be parametric methods that work better on t-score-based MVPA.

Not sure if that's relevant to you, but that's what I think. :-)

> Second: I have developed another custom method to use
> Least-square-separate (LS-S) model that uses 1 model for each
> event/block/regressor of interest as shown to provide more stable
> estimates of the patterns and improved classification in Mumford et al
> 2012. However for each block I want to model 2 regressors, 1 for
> instruction and 1 for execution phases which are consecutive. So the
> procedure I use is 1 regressor for the block/phase that I want to model
> + 1 regressor for each phase, is that correct?

I've used a similar strategy, also inspired by the Mumford, et al paper.
I'm not sure if I'm understanding you correctly, so I'll just describe
my approach:

Each trial has an A event and a B event, and is described by a single
stimulus type. If I have 8 trial types, then I have 16 "conditions"
[(cond1, A), (cond1, B), ...]. For each condition, I perform one
least-squares estimate: one column for each condition of non-interest,
and the condition of interest expanded into one column per event.

It sounds like you might be doing something slightly different?

> I would be interested to include these in PyMVPA in the future, as the
> LS-A (stands for All) is not adequate in rapid event design with
> correlated regressors.

Since it turns out I'm not the only one who's needed to develop this,
this seems like a good idea to me. If you're interested in my input, you
can ping me on Github (@effigies) when you start to add it.

-- 
Christopher J Markiewicz
Ph.D. Candidate, Quantitative Neuroscience Laboratory
Boston University