[pymvpa] Parallelization

Sat Nov 11 21:44:55 UTC 2017

Hi Marco,

in your case, I would then recommend looking into joblib to parallelize
your for loops (https://pythonhosted.org/joblib/parallel.html).

As an example, here's a gist containing part of the PyMVPA's nested_cv
example where I parallelized the loop across partitions. I feel this is
what you might want to do in your case, since you have a lot more folds.

Here's the gist:
https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0

On Fri, Nov 10, 2017 at 3:13 PM, marco tettamanti <mrctttmnt at gmail.com>
wrote:

> Dear Matteo,
> thank you for the willingness to look into my code.
>
> This is taken almost verbatim from http://dev.pymvpa.org/
> examples/nested_cv.html, except for the leave-one-pair-out partitioning,
> and a slight reduction in the number of classifiers (in the original
> example, they are around 45).
>
> Any help or suggestion would be greatly appreciated!
> All the best,
> Marco
>
>
> ########## * ##########
> ##########
>
> PyMVPA:
>  Version:       2.6.3
>  Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
>  Path:          /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>  Version control (GIT):
>  GIT information could not be obtained due "/usr/lib/python2.7/dist-packages/mvpa2/..
> is not under GIT"
> SYSTEM:
>  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2
> (2017-10-15)
>
>
> print fds.summary()
> Dataset: 36x534 at float32, <sa: chunks,targets,time_coords,time_indices>,
> <fa: voxel_indices>, <a: imgaffine,imghdr,imgtype,
> mapper,voxel_dim,voxel_eldim>
> stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
> No details due to large number of targets or chunks. Increase maxc and
> maxt if desired
> Summary for targets across chunks
>   targets mean std min max #chunks
>     C      0.5 0.5  0   1     18
>     D      0.5 0.5  0   1     18
>
>
> #Evaluate prevalent best classifier with nested crossvalidation
> verbose.level = 5
>
> partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
> Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
> # training partitions
> for fds_ in partitionerCD.generate(fds):
>     training = fds[fds_.sa.partitions == 1]
>     #print list(zip(training.sa.chunks, training.sa.targets))
> # testing partitions
> for fds_ in partitionerCD.generate(fds):
>     testing = fds[fds_.sa.partitions == 2]
>     #print list(zip(testing.sa.chunks, testing.sa.targets))
>
> #Helper function (partitionerCD recursively acting on dstrain, rather than
> on fds):
> def select_best_clf(dstrain_, clfs):
>     """Select best model according to CVTE
>     Helper function which we will use twice -- once for proper nested
>     cross-validation, and once to see how big an optimistic bias due
>     to model selection could be if we simply provide an entire dataset.
>     Parameters
>     ----------
>     dstrain_ : Dataset
>     clfs : list of Classifiers
>       Which classifiers to explore
>     Returns
>     -------
>     best_clf, best_error
>     """
>     best_error = None
>     for clf in clfs:
>         cv = CrossValidation(clf, partitionerCD)
>         # unfortunately we don't have ability to reassign clf atm
>         # cv.transerror.clf = clf
>         try:
>             error = np.mean(cv(dstrain_))
>         except LearnerError, e:
>             # skip the classifier if data was not appropriate and it
>             # failed to learn/predict at all
>             continue
>         if best_error is None or error < best_error:
>             best_clf = clf
>             best_error = error
>         verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
>     verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
>             % (len(clfs), best_clf.descr, best_error))
>     return best_clf, best_error
>
> #Estimate error using nested CV for model selection:
> best_clfs = {}
> confusion = ConfusionMatrix()
> verbose(1, "Estimating error using nested CV for model selection")
> partitioner = partitionerCD
> splitter = Splitter('partitions')
> for isplit, partitions in enumerate(partitionerCD.generate(fds)):
>     verbose(2, "Processing split #%i" % isplit)
>     dstrain, dstest = list(splitter.generate(partitions))
>     best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
>     best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
>     # now that we have the best classifier, lets assess its transfer
>     # to the testing dataset while training on entire training
>     tm = TransferMeasure(best_clf, splitter,
>                          postproc=BinaryFxNode(mean_mismatch_error,
> space='targets'), enable_ca=['stats'])
>     tm(partitions)
>     confusion += tm.ca.stats
>
> ##########
> ########## * ##########
>
>
>
>
>
>
> On 10/11/2017 15:43, Matteo Visconti di Oleggio Castello wrote:
>
> What do you mean with "cycling over approx 40 different classifiers"? Are
> you testing different classifiers? If that's the case, a possibility is to
> create a script that takes as argument the type of classifiers and runs the
> classification across all folds. In that way you can submit 40 jobs and
> parallelize across classifiers.
>
> If that's not the case, because the folds are independent and deterministic
> I would create a script that performs the classification on blocks of folds
> (say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
> to parallelize there.
>
> I think that if you send a snippet of the code you're using it can be more
> evident which are good points for parallelization.
>
>
> On 10/11/2017 09:57, marco tettamanti wrote:
>
> Dear Matteo and Nick,
> thank you for your responses.
> I take the occasion to ask some follow-up questions, because I am struggling to
> make pymvpa2 computations faster and more efficient.
>
> I often find myself in the situation of giving up with a particular analysis,
> because it is going to take far more time that I can bear (weeks, months!). This
> happens particularly with searchlight permutation testing (gnbsearchlight is
> much faster, but does not support pprocess), and nested cross-validation.
> As for the latter, for example, I recently wanted to run nested cross-validation
> in a sample of 18 patients and 18 controls (1 image x subject), training the
> classifiers to discriminate patients from controls in a leave-one-pair-out
> partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
> cycling over approx 40 different classifiers takes about 2 hours for each fold
> on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
> days for all 324 folds!
> The same server is equipped with 32 CPUs. With full parallelization, the same
> analysis may be completed in less than one day. This is the reason of my
> interest and questions about parallelization.
>
> Is there anything that you experts do in such situations to speed up or make the
> computation more efficient?
>
> Thank you again and best wishes,
> Marco
>
>
>
> On 10/11/2017 10:07, Nick Oosterhof wrote:
>
> There have been some plans / minor attempts for using parallelisation more
> parallel, but as far as I know we only support pprocces, and only for (1)
> searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
> do remember that parallelisation of other functions was challenging due to
> some getting the conditional attributes set right, but this is long time
> ago.
>
>
> On 09/11/2017 18:35, Matteo Visconti di Oleggio Castello wrote:
>
> Hi Marco,
> AFAIK, there is no support for parallelization at the level of
> cross-validation. Usually for a small ROI (such a searchlight) and with
> standard CV schemes, the process is quite fast, and the bottleneck is
> really the number of searchlights to be computed (for which parallelization
> exists).
>
> In my experience, we tend to parallelize at the level of individual
> participants; for example we might set up a searchlight analysis with
> however n_procs you can have, and then submit one such job for every
> participant to a cluster (using either torque or condor).
>
> HTH,
> Matteo
>
> On 09/11/2017 10:08, marco tettamanti wrote:
>
> Dear all,
> forgive me if this has already been asked in the past, but I was wondering
> whether there has been any development meanwhile.
>
> Are there any chances that one can generally apply parallel computing (multiple
> CPUs or clusters) with pymvpa2, in addition to what is already implemented for
> searchlight (pprocess)? That is, also for general cross-validation, nested
> cross-validation, permutation testing, RFE, etc.?
>
> Has anyone had succesful experience with parallelization schemes such as
> ipyparallel, condor or else?
>
> Thank you and best wishes!
> Marco
>
>
>
> --
> Marco Tettamanti, Ph.D.
> Nuclear Medicine Department & Division of Neuroscience
> IRCCS San Raffaele Scientific Institute
> Via Olgettina 58
> I-20132 Milano, Italy
> Phone ++39-02-26434888 <+39%2002%202643%204888>
> Fax ++39-02-26434892 <+39%2002%202643%204892>
> Email: tettamanti.marco at hsr.it
> Skype: mtettamantihttp://scholar.google.it/citations?user=x4qQl4AAAAAJ
>
>
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>

-- 
Matteo Visconti di Oleggio Castello
Ph.D. Candidate in Cognitive Neuroscience
Dartmouth College

+1 (603) 646-8665
mvdoc.me || github.com/mvdoc || linkedin.com/in/matteovisconti
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20171111/fe3a174c/attachment.html>