[pymvpa] Parallelization

marco tettamanti mrctttmnt at gmail.com
Mon Nov 13 08:12:38 UTC 2017


Dear Matteo,
grazie mille, this is precisely the kind of thing I was looking for: it works 
like charm!
Ciao,
Marco

> On 11/11/2017 21:44, Matteo Visconti di Oleggio Castello wrote:
>
> Hi Marco,
>
> in your case, I would then recommend looking into joblib to parallelize
> your for loops (https://pythonhosted.org/joblib/parallel.html).
>
> As an example, here's a gist containing part of the PyMVPA's nested_cv
> example where I parallelized the loop across partitions. I feel this is
> what you might want to do in your case, since you have a lot more folds.
>
> Here's the gist:
> https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0
>
>
> On 10/11/2017 21:13, marco tettamanti wrote:
>> Dear Matteo,
>> thank you for the willingness to look into my code.
>>
>> This is taken almost verbatim from 
>> http://dev.pymvpa.org/examples/nested_cv.html, except for the 
>> leave-one-pair-out partitioning, and a slight reduction in the number of 
>> classifiers (in the original example, they are around 45).
>>
>> Any help or suggestion would be greatly appreciated!
>> All the best,
>> Marco
>>
>>
>> ########## * ##########
>> ##########
>>
>> PyMVPA:
>>  Version:       2.6.3
>>  Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
>>  Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>>  Version control (GIT):
>>  GIT information could not be obtained due 
>> "/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
>> SYSTEM:
>>  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
>>
>>
>> print fds.summary()
>> Dataset: 36x534 at float32, <sa: chunks,targets,time_coords,time_indices>, <fa: 
>> voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
>> stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
>> No details due to large number of targets or chunks. Increase maxc and maxt 
>> if desired
>> Summary for targets across chunks
>>   targets mean std min max #chunks
>>     C      0.5 0.5  0   1     18
>>     D      0.5 0.5  0   1     18
>>
>>
>> #Evaluate prevalent best classifier with nested crossvalidation
>> verbose.level = 5
>>
>> partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'), 
>> Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
>> # training partitions
>> for fds_ in partitionerCD.generate(fds):
>>     training = fds[fds_.sa.partitions == 1]
>>     #print list(zip(training.sa.chunks, training.sa.targets))
>> # testing partitions
>> for fds_ in partitionerCD.generate(fds):
>>     testing = fds[fds_.sa.partitions == 2]
>>     #print list(zip(testing.sa.chunks, testing.sa.targets))
>>
>> #Helper function (partitionerCD recursively acting on dstrain, rather than on 
>> fds):
>> def select_best_clf(dstrain_, clfs):
>>     """Select best model according to CVTE
>>     Helper function which we will use twice -- once for proper nested
>>     cross-validation, and once to see how big an optimistic bias due
>>     to model selection could be if we simply provide an entire dataset.
>>     Parameters
>>     ----------
>>     dstrain_ : Dataset
>>     clfs : list of Classifiers
>>       Which classifiers to explore
>>     Returns
>>     -------
>>     best_clf, best_error
>>     """
>>     best_error = None
>>     for clf in clfs:
>>         cv = CrossValidation(clf, partitionerCD)
>>         # unfortunately we don't have ability to reassign clf atm
>>         # cv.transerror.clf = clf
>>         try:
>>             error = np.mean(cv(dstrain_))
>>         except LearnerError, e:
>>             # skip the classifier if data was not appropriate and it
>>             # failed to learn/predict at all
>>             continue
>>         if best_error is None or error < best_error:
>>             best_clf = clf
>>             best_error = error
>>         verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
>>     verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
>>             % (len(clfs), best_clf.descr, best_error))
>>     return best_clf, best_error
>>
>> #Estimate error using nested CV for model selection:
>> best_clfs = {}
>> confusion = ConfusionMatrix()
>> verbose(1, "Estimating error using nested CV for model selection")
>> partitioner = partitionerCD
>> splitter = Splitter('partitions')
>> for isplit, partitions in enumerate(partitionerCD.generate(fds)):
>>     verbose(2, "Processing split #%i" % isplit)
>>     dstrain, dstest = list(splitter.generate(partitions))
>>     best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
>>     best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
>>     # now that we have the best classifier, lets assess its transfer
>>     # to the testing dataset while training on entire training
>>     tm = TransferMeasure(best_clf, splitter,
>> postproc=BinaryFxNode(mean_mismatch_error, space='targets'), enable_ca=['stats'])
>>     tm(partitions)
>>     confusion += tm.ca.stats
>>
>> ##########
>> ########## * ##########
>>
>>
>>
>>
>>
>>
>>> On 10/11/2017 15:43, Matteo Visconti di Oleggio Castello wrote:
>>>
>>> What do you mean with "cycling over approx 40 different classifiers"? Are
>>> you testing different classifiers? If that's the case, a possibility is to
>>> create a script that takes as argument the type of classifiers and runs the
>>> classification across all folds. In that way you can submit 40 jobs and
>>> parallelize across classifiers.
>>>
>>> If that's not the case, because the folds are independent and deterministic
>>> I would create a script that performs the classification on blocks of folds
>>> (say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
>>> to parallelize there.
>>>
>>> I think that if you send a snippet of the code you're using it can be more
>>> evident which are good points for parallelization.
>>>
>>>
>>> On 10/11/2017 09:57, marco tettamanti wrote:
>>>> Dear Matteo and Nick,
>>>> thank you for your responses.
>>>> I take the occasion to ask some follow-up questions, because I am struggling to
>>>> make pymvpa2 computations faster and more efficient.
>>>>
>>>> I often find myself in the situation of giving up with a particular analysis,
>>>> because it is going to take far more time that I can bear (weeks, months!). This
>>>> happens particularly with searchlight permutation testing (gnbsearchlight is
>>>> much faster, but does not support pprocess), and nested cross-validation.
>>>> As for the latter, for example, I recently wanted to run nested cross-validation
>>>> in a sample of 18 patients and 18 controls (1 image x subject), training the
>>>> classifiers to discriminate patients from controls in a leave-one-pair-out
>>>> partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
>>>> cycling over approx 40 different classifiers takes about 2 hours for each fold
>>>> on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
>>>> days for all 324 folds!
>>>> The same server is equipped with 32 CPUs. With full parallelization, the same
>>>> analysis may be completed in less than one day. This is the reason of my
>>>> interest and questions about parallelization.
>>>>
>>>> Is there anything that you experts do in such situations to speed up or make the
>>>> computation more efficient?
>>>>
>>>> Thank you again and best wishes,
>>>> Marco
>>>>
>>>>
>>>>> On 10/11/2017 10:07, Nick Oosterhof wrote:
>>>>>
>>>>> There have been some plans / minor attempts for using parallelisation more
>>>>> parallel, but as far as I know we only support pprocces, and only for (1)
>>>>> searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
>>>>> do remember that parallelisation of other functions was challenging due to
>>>>> some getting the conditional attributes set right, but this is long time
>>>>> ago.
>>>>>
>>>>>> On 09/11/2017 18:35, Matteo Visconti di Oleggio Castello wrote:
>>>>>>
>>>>>> Hi Marco,
>>>>>> AFAIK, there is no support for parallelization at the level of
>>>>>> cross-validation. Usually for a small ROI (such a searchlight) and with
>>>>>> standard CV schemes, the process is quite fast, and the bottleneck is
>>>>>> really the number of searchlights to be computed (for which parallelization
>>>>>> exists).
>>>>>>
>>>>>> In my experience, we tend to parallelize at the level of individual
>>>>>> participants; for example we might set up a searchlight analysis with
>>>>>> however n_procs you can have, and then submit one such job for every
>>>>>> participant to a cluster (using either torque or condor).
>>>>>>
>>>>>> HTH,
>>>>>> Matteo
>>>>>>
>>>>>> On 09/11/2017 10:08, marco tettamanti wrote:
>>>>>>> Dear all,
>>>>>>> forgive me if this has already been asked in the past, but I was wondering
>>>>>>> whether there has been any development meanwhile.
>>>>>>>
>>>>>>> Are there any chances that one can generally apply parallel computing (multiple
>>>>>>> CPUs or clusters) with pymvpa2, in addition to what is already implemented for
>>>>>>> searchlight (pprocess)? That is, also for general cross-validation, nested
>>>>>>> cross-validation, permutation testing, RFE, etc.?
>>>>>>>
>>>>>>> Has anyone had succesful experience with parallelization schemes such as
>>>>>>> ipyparallel, condor or else?
>>>>>>>
>>>>>>> Thank you and best wishes!
>>>>>>> Marco
>>>>>>>
>>>
>>> -- 
>>> Marco Tettamanti, Ph.D.
>>> Nuclear Medicine Department & Division of Neuroscience
>>> IRCCS San Raffaele Scientific Institute
>>> Via Olgettina 58
>>> I-20132 Milano, Italy
>>> Phone ++39-02-26434888
>>> Fax ++39-02-26434892
>>> Email:tettamanti.marco at hsr.it
>>> Skype: mtettamanti
>>> http://scholar.google.it/citations?user=x4qQl4AAAAAJ
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20171113/34f80833/attachment-0001.html>


More information about the Pkg-ExpPsy-PyMVPA mailing list