[pymvpa] parallelizing permutation testing

Ben Acland benacland at gmail.com
Fri Aug 22 12:03:59 UTC 2014


Dang, last (two) times I submitted this it appeared under another thread... trying one last time, then I'll stop spamming.

Is there a way to run permutation testing using multiple processes?

For reference, here's the business end of my script:

rep = mp.Repeater(count=PERM_COUNT)
perm = mp.AttributePermutator('sub_group', limit={'partitions': 1}, count=1)
part = mp.NFoldPartitioner(cvtype=2, attr="subject")
sift = mp.Sifter([('partitions',2), 
  ('sub_group', dict(uvalues=np.unique(clf_ds.sa.sub_group),
      balanced=True))])
clf = mp.LinearCSVMC(space='sub_group')
# TODO: break here, make a dupe generator, see what it makes
gen = mp.ChainNode([part, sift, perm], space=part.get_space())
null_cv = mp.CrossValidation(
  clf,
  mp.ChainNode([part, sift, perm],
               space=part.get_space()),
  postproc=mp.mean_sample())
distr_est = mp.MCNullDist(rep,
                        tail='left',
                        measure=null_cv,
                        enable_ca=['dist_samples'])
cvmcc = mp.CrossValidation(clf,
                         part,
                         postproc=mp.mean_sample(),
                         null_dist=distr_est,
                         enable_ca=['stats'])
result = cvmcc(clf_ds)

But if PERM_COUNT is at 1000 or 2000, that's a lot of time for the other 1799 available cores to sit staring at the one core I'm using and say, "dude, chill out!"

I'm not looking for cluster-specific answers - an approach that would work for a single machine would suffice, and would bring my wall time requirements down dramatically. This seems to be an "embarrassingly parallel" problem - is there an embarrassingly easy solution that I'm missing?

Ben


More information about the Pkg-ExpPsy-PyMVPA mailing list