[pymvpa] parallelizing permutation testing

Ben Acland benacland at gmail.com
Fri Aug 22 11:47:10 UTC 2014


Is there a way to run permutation testing using multiple processes?

For reference, here's the business end of my script:

rep = mp.Repeater(count=PERM_COUNT)
perm = mp.AttributePermutator('sub_group', limit={'partitions': 1}, count=1)
part = mp.NFoldPartitioner(cvtype=2, attr="subject")
sift = mp.Sifter([('partitions',2), 
    ('sub_group', dict(uvalues=np.unique(clf_ds.sa.sub_group),
        balanced=True))])
clf = mp.LinearCSVMC(space='sub_group')
# TODO: break here, make a dupe generator, see what it makes
gen = mp.ChainNode([part, sift, perm], space=part.get_space())
null_cv = mp.CrossValidation(
    clf,
    mp.ChainNode([part, sift, perm],
                 space=part.get_space()),
    postproc=mp.mean_sample())
distr_est = mp.MCNullDist(rep,
                          tail='left',
                          measure=null_cv,
                          enable_ca=['dist_samples'])
cvmcc = mp.CrossValidation(clf,
                           part,
                           postproc=mp.mean_sample(),
                           null_dist=distr_est,
                           enable_ca=['stats'])
result = cvmcc(clf_ds)

But if PERM_COUNT is at 1000 or 2000, that's a lot of time for the other 1799 available cores to sit staring at the one core I'm using and say, "dude, chill out!"

I'm not looking for cluster-specific answers - an approach that would work for a single machine would suffice, and would bring my wall time requirements down dramatically. This seems to be an "embarrassingly parallel" problem - is there an embarrassingly easy solution that I'm missing?

Ben





More information about the Pkg-ExpPsy-PyMVPA mailing list