[pymvpa] parallelizing permutation testing
Ben Acland
benacland at gmail.com
Fri Aug 22 12:03:59 UTC 2014
Dang, last (two) times I submitted this it appeared under another thread... trying one last time, then I'll stop spamming.
Is there a way to run permutation testing using multiple processes?
For reference, here's the business end of my script:
rep = mp.Repeater(count=PERM_COUNT)
perm = mp.AttributePermutator('sub_group', limit={'partitions': 1}, count=1)
part = mp.NFoldPartitioner(cvtype=2, attr="subject")
sift = mp.Sifter([('partitions',2),
('sub_group', dict(uvalues=np.unique(clf_ds.sa.sub_group),
balanced=True))])
clf = mp.LinearCSVMC(space='sub_group')
# TODO: break here, make a dupe generator, see what it makes
gen = mp.ChainNode([part, sift, perm], space=part.get_space())
null_cv = mp.CrossValidation(
clf,
mp.ChainNode([part, sift, perm],
space=part.get_space()),
postproc=mp.mean_sample())
distr_est = mp.MCNullDist(rep,
tail='left',
measure=null_cv,
enable_ca=['dist_samples'])
cvmcc = mp.CrossValidation(clf,
part,
postproc=mp.mean_sample(),
null_dist=distr_est,
enable_ca=['stats'])
result = cvmcc(clf_ds)
But if PERM_COUNT is at 1000 or 2000, that's a lot of time for the other 1799 available cores to sit staring at the one core I'm using and say, "dude, chill out!"
I'm not looking for cluster-specific answers - an approach that would work for a single machine would suffice, and would bring my wall time requirements down dramatically. This seems to be an "embarrassingly parallel" problem - is there an embarrassingly easy solution that I'm missing?
Ben
More information about the Pkg-ExpPsy-PyMVPA
mailing list