[pymvpa] Combinatorial MVPA
billbrod at gmail.com
Thu Dec 10 19:42:21 UTC 2015
I should note, I created a modified Splitter (called
CombinatorialSplitter), which returns the datasets that I want. The only
edits are to lines 119 and 133 in splitters.py
119: for isplit, split in enumerate(cfgs): ---> for isplit, split in
133: filter_ = splattr_data == split ----> filter_ = np.array([i in split
for i in splattr_data])
My plan then is to run the analysis like so (each dataset has
fa['network_ids'], with a number identifying each network; I haven't fully
tested this yet):
splitter = CombinatorialSplitter('network_ids',25)
#clf, nf as normal
res = 
for i in splitter.generate(ds):
res_all = hstack(res)
Then res_all is 3 x 26 (3 runs, 26 because 26 c 1 = 26), with a separate
row for each, column for each combination. and each column has
fa.network_ids which identifies which networks were included. My plan was
then to use the data to perform a calculation like I described in the last
On Thu, Dec 10, 2015 at 2:34 PM, Bill Broderick <billbrod at gmail.com> wrote:
> Sorry, I realize my description of the problem wasn't very clear.
> ds.summary() looks like this (after dropping the time points without
> Dataset: 54x26 at float64, <sa: chunks,targets>, <fa: network_ids>
> stats: mean=0.0879673 std=1.08415 var=1.17538 min=-3.56651 max=4.62532
> No details due to large number of targets or chunks. Increase maxc and
> maxt if desired
> Summary for chunks across targets
> chunks mean std min max #targets
> 1 0.333 0.471 0 1 18
> 2 0.333 0.471 0 1 18
> 3 0.333 0.471 0 1 18
> Number of unique targets > 20 thus no sequence statistics
> "Volume for each volume" was a typo, I meant value for each volume. For
> this subject, the full dataset is 1452 x 26, where we have 1452 time points
> (across all runs) and 26 time courses. We then label each time point as
> with the reaction time if it happens during an event (we're regressing to
> the reaction time) or 'other' if it doesn't. We then are left with 54 time
> points, across 3 runs.
> Re: Richard's comment, we're interested in problem 1: we want to evaluate
> the predictive power of each feature. We hypothesize that three of these
> time courses are much more important than the rest. Based on earlier work
> in the lab (Carter et al. 2012
> <https://www.sciencemag.org/content/337/6090/109.full>), I was thinking
> of using something similar to the Unique Combinatorial Performance (UCP) to
> evaluate the contribution of each time course. UCP was used for pairs of
> ROIs: For each ROI i, the UCP is the average additional classification
> accuracy observed when a model was constructed using data from i with a
> second ROI j (for every j). Since we're looking at individual time courses,
> I was thinking of looking at the difference in accuracies between analyses
> containing time course i and those not containing time course i (or those
> containing time courses i and j and those not containing time courses i and
> j, etc.).
> Does that make sense?
> I agree that feature selection methods don't seem appropriate here and
> something like what I outlined above seems intuitively appealing -- does it
> sound like a reasonable way to evaluate predictive power of each time
> On Wed, Dec 9, 2015 at 7:10 PM, Richard Dinga <dinga92 at gmail.com> wrote:
>> Bill Broderick wrote:
>> > However, to determine which timecourse is contributing the most to the
>> > classifiers performance,
>> > see which timecourses or which combination
>> > of time courses caused the greatest drop in performance when removed.
>> I wrote:
>> > You might take a look at Relief algorithm (also implemented in PyMVPA),
>> > that is less hacky approach to your feature weighting problem.
>> Yaroslav Halchenko wrote:
>> > there is yet another black hole of methods to assess contribution of
>> > each feature to performance of the classifier. The irelief, which was
>> > mentioned is one of them...
>> > So what is your classification performance if you just do
>> > classsification on all features? which one could you obtain if you do
>> > feature selection, e.g. with SplitRFE (which would eliminate features to
>> > attain best performance within each cv folds in nested cv)
>> I think there are (at least) 2 separate problems.
>> 1. How to evaluate predictive power for every feature in order to
>> interpret data
>> 2. How to evaluate importance of features for a classifier in order to
>> understand a model and possibly select set of features to get best
>> Feature selection methods like Lasso or RFE (as far as I know) would omit
>> most of redundant/higly correlated features, therefore making a 1.
>> impossible. It still might me a good idea for other reasons.
>> Pkg-ExpPsy-PyMVPA mailing list
>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pkg-ExpPsy-PyMVPA