[pymvpa] Permutation testing and Nipype

Tue Aug 11 21:18:27 UTC 2015

On Tue, 11 Aug 2015, Bill Broderick wrote:

> Okay, so I did a little more investigating of this and I cannot
> replicate my original problem. Now it's looking like it's taking a
> long time just because the permutation testing is taking a long time.

it does!

> At the bottom of this message is the script I used for testing the
> timing. Using python 2.7.6 and PyMVPA version 2.4.0, I time the script
> as follows:

>      python2.7 -O -m timeit -n 1 -r 1 'import test' 'test.main()'

> The dataset I'm loading in has 3504 trials that we're using and 29462 voxels.

> I get the following times:
>      perm_num=1, ids=(0,1)  : 161sec
>      perm_num=1, ids=(0,2)  : 316sec
>      perm_num=1, ids=(0,3)  : 531sec
>      perm_num=1, ids=(0,4)  : 687sec
>      perm_num=5, ids=(0,1)  : 435sec

> Which makes me realize that there's no way I can get 100 permutations
> and 5 searchlights (which is about what I was looking at earlier) in
> 1.5 hours.

Depends on classifier/searchlight size/# of chunks etc.  But indeed --
unlikely ;)

> I don't know what changed -- going back through my commits
> I haven't changed any of the relevant code since then; it's possible I
> made a mistake and accidentally did 10 permutations or something like
> that.

> Regardless, this is still taking way too long. Does anyone have any
> idea how to speed it up? 

If you are to do statistical assessment though permutation (not e.g.
sign flipping technique ;) ), then you would need to wait  a bit

> It looks like it's a good idea to have jobs
> run a bunch of permutations in one function, but split up the
> searchlights, which is what I'm doing at the moment, but I still need
> to do something else to speed it up.

> Thanks,
> Bill

> test.py script:

> def main(perm_num=5,ids=(0,1)):
>     from mvpa2.suite import
> h5load,LinearCSVMC,Repeater,AttributePermutator,NFoldPartitioner,CrossValidation,ChainNode,MCNullDist,sphere_searchlight

>     ds=h5load('dataset.hdf5')
>     clf=LinearCSVMC()
>     repeater=Repeater(count=perm_num)
>     permutator = AttributePermutator('targets',limit={'partitions':1},count=1)
>     nf = NFoldPartitioner(attr='subject',cvtype=1,count=None,selection_strategy='random')
>     null_cv = CrossValidation(clf,ChainNode([nf,permutator],space=nf.get_space()))
>     distr_est =
> MCNullDist(repeater,tail='left',measure=null_cv,enable_ca=['dist_samples'])
>     cv = CrossValidation(clf,nf,null_dist=distr_est,pass_attr=[('ca.null_prob','fa',1)])
>     print 'running...'
>     sl = sphere_searchlight(cv,radius=3,center_ids=range(ids[0],ids[1]),enable_ca='roi_sizes',pass_attr=[('ca.roi_sizes','fa')])
>     res=sl(ds)

please see my response to Roni few minutes ago, so just collect up to 50
permutations per subject and then use GroupClusterThreshold to do
bootstrapping across subjects' permutation results.

-- 
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist,            Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik