[pymvpa] Combinatorial MVPA
billbrod at gmail.com
Thu Dec 10 19:34:33 UTC 2015
Sorry, I realize my description of the problem wasn't very clear.
ds.summary() looks like this (after dropping the time points without
Dataset: 54x26 at float64, <sa: chunks,targets>, <fa: network_ids>
stats: mean=0.0879673 std=1.08415 var=1.17538 min=-3.56651 max=4.62532
No details due to large number of targets or chunks. Increase maxc and maxt
Summary for chunks across targets
chunks mean std min max #targets
1 0.333 0.471 0 1 18
2 0.333 0.471 0 1 18
3 0.333 0.471 0 1 18
Number of unique targets > 20 thus no sequence statistics
"Volume for each volume" was a typo, I meant value for each volume. For
this subject, the full dataset is 1452 x 26, where we have 1452 time points
(across all runs) and 26 time courses. We then label each time point as
with the reaction time if it happens during an event (we're regressing to
the reaction time) or 'other' if it doesn't. We then are left with 54 time
points, across 3 runs.
Re: Richard's comment, we're interested in problem 1: we want to evaluate
the predictive power of each feature. We hypothesize that three of these
time courses are much more important than the rest. Based on earlier work
in the lab (Carter et al. 2012
<https://www.sciencemag.org/content/337/6090/109.full>), I was thinking of
using something similar to the Unique Combinatorial Performance (UCP) to
evaluate the contribution of each time course. UCP was used for pairs of
ROIs: For each ROI i, the UCP is the average additional classification
accuracy observed when a model was constructed using data from i with a
second ROI j (for every j). Since we're looking at individual time courses,
I was thinking of looking at the difference in accuracies between analyses
containing time course i and those not containing time course i (or those
containing time courses i and j and those not containing time courses i and
Does that make sense?
I agree that feature selection methods don't seem appropriate here and
something like what I outlined above seems intuitively appealing -- does it
sound like a reasonable way to evaluate predictive power of each time
On Wed, Dec 9, 2015 at 7:10 PM, Richard Dinga <dinga92 at gmail.com> wrote:
> Bill Broderick wrote:
> > However, to determine which timecourse is contributing the most to the
> > classifiers performance,
> > see which timecourses or which combination
> > of time courses caused the greatest drop in performance when removed.
> I wrote:
> > You might take a look at Relief algorithm (also implemented in PyMVPA),
> > that is less hacky approach to your feature weighting problem.
> Yaroslav Halchenko wrote:
> > there is yet another black hole of methods to assess contribution of
> > each feature to performance of the classifier. The irelief, which was
> > mentioned is one of them...
> > So what is your classification performance if you just do
> > classsification on all features? which one could you obtain if you do
> > feature selection, e.g. with SplitRFE (which would eliminate features to
> > attain best performance within each cv folds in nested cv)
> I think there are (at least) 2 separate problems.
> 1. How to evaluate predictive power for every feature in order to
> interpret data
> 2. How to evaluate importance of features for a classifier in order to
> understand a model and possibly select set of features to get best
> Feature selection methods like Lasso or RFE (as far as I know) would omit
> most of redundant/higly correlated features, therefore making a 1.
> impossible. It still might me a good idea for other reasons.
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pkg-ExpPsy-PyMVPA