[pymvpa] Time for single permutation varies between a few seconds and several minutes

Wed Apr 8 17:02:21 UTC 2015

Thank you, Yaroslav. So, I've looked into this a bit more, and I've 
arrived at the conclusion that the old code I was using suffered from 
the label permutation issue described here: 
http://www.pymvpa.org/tutorial_significance.html. Here are the relevant 
parts of the code (let me know if something is missing though):

import os
from mvpa2.suite import *
from time import strftime

(load data, preprocessing)

clf = LinearCSVMC(C=1)

cv = CrossValidation(clf, NFoldPartitioner(cvtype=1), errorfx = lambda 
p, t: np.mean(p==t), enable_ca=["stats"])

sl = sphere_searchlight(cv, radius = 3, postproc = mean_sample(), nproc 
= nProcs)

res_permutation = np.zeros((n_permutations, ds.nfeatures))

for i in range(n_permutations):

     print "   Running permutation " + str(i) + " " + strftime("%H:%M:%S")

     permutator = AttributePermutator('targets', count=1, assure=True)

     ds_tmp = permutator(ds)

     result_p = sl(ds_tmp)

     res_permutation[i,:] = result_p.samples[0]

(plus some post-processing to get the p values)

So, based on the manual, I changed the code to:

repeater = Repeater(count=10)  # just 10 for testing
partitioner = NFoldPartitioner(cvtype=1)
permutator = AttributePermutator('targets', limit={'partitions': 1}, 
count=1)

null_cv = CrossValidation(clf,
                             ChainNode(
                                     [partitioner, permutator],
                                     space=partitioner.get_space()),
                             postproc=mean_sample())

distr_est = MCNullDist(repeater, tail='left',
                         measure=null_cv,
                         enable_ca=['dist_samples'])

cv_mc_corr = CrossValidation(clf,
                             partitioner,
                             postproc=mean_sample(),
                             null_dist=distr_est,
                             enable_ca=['stats'])

sl = sphere_searchlight(cv_mc_corr, radius = 2, postproc = 
mean_sample(), nproc = nProcs)

result = sl(ds)

This code now runs for about 15 min (which is about the time it took to 
run 10 permutations with the old code).

1. Is the new code correct? (And, if yes, how do I, er, actually get the 
p values?)

2. Is there a way to figure out how long the individual permutations take?

Thanks again,
Jan

P.S. Oops, just noticed the other replies (sorry, I get the digest and 
therefore tend to lag behind). Thank you for your replies, Nick and Jo. 
Apart from system processes, no other processes were running. Not sure 
if hard disk issues could explain the differences...