[pymvpa] Parallelization

Thu Nov 23 11:07:48 UTC 2017

Dear Matteo (and others),
sorry, I am again asking for your help!

I have experimented with the analysis of my dataset using an adaptation of your 
joblib-based gist.
As I wrote before, it works perfectly, but not with some classifiers: SVM 
classifiers always cause the code to terminate with an error.

If I set:
         myclassif=clfswh['!gnpp','!skl','svm']    #Note that 'gnnp' and 'skl' 
were excluded for independent reasons
the code runs through without errors.

However, with:
         myclassif=clfswh['!gnpp','!skl']
I get the following error:
         MaybeEncodingError: Error sending result: 
'[TransferMeasure(measure=SVM(svm_impl='C_SVC', kernel=LinearLSKernel(), 
weight=[], probability=1,
          weight_label=[]), splitter=Splitter(space='partitions'), 
postproc=BinaryFxNode(space='targets'), enable_ca=['stats'])]'. Reason: 
'TypeError("can't
         pickle SwigPyObject objects",)'

After googling for what may cause this particular error, I have found that the 
situation improves slightly (i.e. more splits executed, sometimes even all 
splits) by importing the following:
         import os
         from sklearn.externals.joblib import Parallel, delayed
         from sklearn.externals.joblib.parallel import parallel_backend
and then specifying just before 'Parallel(n_jobs=2)':
         with parallel_backend('threading'):
However, also in this case, the code invariably terminates with a long error 
message (I only report an extract, but in case I can send the whole error message):
         <type 'str'>: (<type 'exceptions.UnicodeEncodeError'>, 
UnicodeEncodeError('ascii',
u'JoblibAttributeError\n___________________________________________________________________________\nMultiprocessing
exception:\n...........................................................................\n/usr/lib/python2.7/runpy.py 
in
        _run_module_as_main(mod_name=\'ipykernel_launcher\', alter_argv=1)\n    
169     pkg_name = mod_name.rpartition(\'.\')[0]\n    170
        main_globals = sys.modules["__main__"].__dict__\n 171     if 
alter_argv:\n    172         sys.argv[0] = fname\n 173     return _run_code(code,
        main_globals, None,\n--> 174

I think I have sort of understood that the problem is due to some failure in 
pickling the parallelized jobs, but I have no clues if and how it can be solved.
Do you have any suggestions?

Thank you and very best wishes,
Marco

p.s. This is again the full code:

########## * ##########
##########

PyMVPA:
  Version:       2.6.3
  Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
  Path:          /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
  Version control (GIT):
  GIT information could not be obtained due 
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
SYSTEM:
  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)

print fds.summary()
Dataset: 36x534 at float32, <sa: chunks,targets,time_coords,time_indices>, <fa: 
voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
No details due to large number of targets or chunks. Increase maxc and maxt if 
desired
Summary for targets across chunks
   targets mean std min max #chunks
     C      0.5 0.5  0   1     18
     D      0.5 0.5  0   1     18

#Evaluate prevalent best classifier with nested crossvalidation
verbose.level = 5

partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'), 
Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
# training partitions
for fds_ in partitionerCD.generate(fds):
     training = fds[fds_.sa.partitions == 1]
     #print list(zip(training.sa.chunks, training.sa.targets))
# testing partitions
for fds_ in partitionerCD.generate(fds):
     testing = fds[fds_.sa.partitions == 2]
     #print list(zip(testing.sa.chunks, testing.sa.targets))

#Helper function (partitionerCD recursively acting on dstrain, rather than on fds):
def select_best_clf(dstrain_, clfs):
     """Select best model according to CVTE
     Helper function which we will use twice -- once for proper nested
     cross-validation, and once to see how big an optimistic bias due
     to model selection could be if we simply provide an entire dataset.
     Parameters
     ----------
     dstrain_ : Dataset
     clfs : list of Classifiers
       Which classifiers to explore
     Returns
     -------
     best_clf, best_error
     """
     best_error = None
     for clf in clfs:
         cv = CrossValidation(clf, partitionerCD)
         # unfortunately we don't have ability to reassign clf atm
         # cv.transerror.clf = clf
         try:
             error = np.mean(cv(dstrain_))
         except LearnerError, e:
             # skip the classifier if data was not appropriate and it
             # failed to learn/predict at all
             continue
         if best_error is None or error < best_error:
             best_clf = clf
             best_error = error
         verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
     verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
             % (len(clfs), best_clf.descr, best_error))
     return best_clf, best_error

# This function will run all classifiers for one single partitions
myclassif=clfswh['!gnpp','!skl'][5:6]  #Testing a single SVM classifier
def _run_one_partition(isplit, partitions, classifiers=myclassif): #see §§
     verbose(2, "Processing split #%i" % isplit)
     dstrain, dstest = list(splitter.generate(partitions))
     best_clf, best_error = select_best_clf(dstrain, classifiers)
     # now that we have the best classifier, lets assess its transfer
     # to the testing dataset while training on entire training
     tm = TransferMeasure(best_clf, 
splitter,postproc=BinaryFxNode(mean_mismatch_error,space='targets'), 
enable_ca=['stats'])
     tm(partitions)
     return tm

#import os
#from sklearn.externals.joblib import Parallel, delayed
#from sklearn.externals.joblib.parallel import parallel_backend

# Parallel estimate error using nested CV for model selection
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
# Here we are using joblib Parallel to parallelize each partition
# Set n_jobs to the number of available cores (or how many you want to use)
#with parallel_backend('threading'):
#    tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
                          for isplit, partitions in 
enumerate(partitionerCD.generate(fds)))
# Parallel retuns a list with the results of each parallel loop, so we need to
# unravel it to get the confusion matrix
best_clfs = {}
for tm in tms:
     confusion += tm.ca.stats
     best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1

##########
########## * ##########

On 13/11/2017 09:12, marco tettamanti wrote:
> Dear Matteo,
> grazie mille, this is precisely the kind of thing I was looking for: it works 
> like charm!
> Ciao,
> Marco
>
>> On 11/11/2017 21:44, Matteo Visconti di Oleggio Castello wrote:
>>
>> Hi Marco,
>>
>> in your case, I would then recommend looking into joblib to parallelize
>> your for loops (https://pythonhosted.org/joblib/parallel.html).
>>
>> As an example, here's a gist containing part of the PyMVPA's nested_cv
>> example where I parallelized the loop across partitions. I feel this is
>> what you might want to do in your case, since you have a lot more folds.
>>
>> Here's the gist:
>> https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0
>>
>>
>> On 10/11/2017 21:13, marco tettamanti wrote:
>>> Dear Matteo,
>>> thank you for the willingness to look into my code.
>>>
>>> This is taken almost verbatim from 
>>> http://dev.pymvpa.org/examples/nested_cv.html, except for the 
>>> leave-one-pair-out partitioning, and a slight reduction in the number of 
>>> classifiers (in the original example, they are around 45).
>>>
>>> Any help or suggestion would be greatly appreciated!
>>> All the best,
>>> Marco
>>>
>>>
>>> ########## * ##########
>>> ##########
>>>
>>> PyMVPA:
>>>  Version:       2.6.3
>>>  Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
>>>  Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>>>  Version control (GIT):
>>>  GIT information could not be obtained due 
>>> "/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
>>> SYSTEM:
>>>  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
>>>
>>>
>>> print fds.summary()
>>> Dataset: 36x534 at float32, <sa: chunks,targets,time_coords,time_indices>, <fa: 
>>> voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
>>> stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
>>> No details due to large number of targets or chunks. Increase maxc and maxt 
>>> if desired
>>> Summary for targets across chunks
>>>   targets mean std min max #chunks
>>>     C      0.5 0.5  0   1     18
>>>     D      0.5 0.5  0   1     18
>>>
>>>
>>> #Evaluate prevalent best classifier with nested crossvalidation
>>> verbose.level = 5
>>>
>>> partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'), 
>>> Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
>>> # training partitions
>>> for fds_ in partitionerCD.generate(fds):
>>>     training = fds[fds_.sa.partitions == 1]
>>>     #print list(zip(training.sa.chunks, training.sa.targets))
>>> # testing partitions
>>> for fds_ in partitionerCD.generate(fds):
>>>     testing = fds[fds_.sa.partitions == 2]
>>>     #print list(zip(testing.sa.chunks, testing.sa.targets))
>>>
>>> #Helper function (partitionerCD recursively acting on dstrain, rather than 
>>> on fds):
>>> def select_best_clf(dstrain_, clfs):
>>>     """Select best model according to CVTE
>>>     Helper function which we will use twice -- once for proper nested
>>>     cross-validation, and once to see how big an optimistic bias due
>>>     to model selection could be if we simply provide an entire dataset.
>>>     Parameters
>>>     ----------
>>>     dstrain_ : Dataset
>>>     clfs : list of Classifiers
>>>       Which classifiers to explore
>>>     Returns
>>>     -------
>>>     best_clf, best_error
>>>     """
>>>     best_error = None
>>>     for clf in clfs:
>>>         cv = CrossValidation(clf, partitionerCD)
>>>         # unfortunately we don't have ability to reassign clf atm
>>>         # cv.transerror.clf = clf
>>>         try:
>>>             error = np.mean(cv(dstrain_))
>>>         except LearnerError, e:
>>>             # skip the classifier if data was not appropriate and it
>>>             # failed to learn/predict at all
>>>             continue
>>>         if best_error is None or error < best_error:
>>>             best_clf = clf
>>>             best_error = error
>>>         verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
>>>     verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
>>>             % (len(clfs), best_clf.descr, best_error))
>>>     return best_clf, best_error
>>>
>>> #Estimate error using nested CV for model selection:
>>> best_clfs = {}
>>> confusion = ConfusionMatrix()
>>> verbose(1, "Estimating error using nested CV for model selection")
>>> partitioner = partitionerCD
>>> splitter = Splitter('partitions')
>>> for isplit, partitions in enumerate(partitionerCD.generate(fds)):
>>>     verbose(2, "Processing split #%i" % isplit)
>>>     dstrain, dstest = list(splitter.generate(partitions))
>>>     best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
>>>     best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
>>>     # now that we have the best classifier, lets assess its transfer
>>>     # to the testing dataset while training on entire training
>>>     tm = TransferMeasure(best_clf, splitter,
>>> postproc=BinaryFxNode(mean_mismatch_error, space='targets'), 
>>> enable_ca=['stats'])
>>>     tm(partitions)
>>>     confusion += tm.ca.stats
>>>
>>> ##########
>>> ########## * ##########
>>>
>>>
>>>
>>>
>>>
>>>
>>>> On 10/11/2017 15:43, Matteo Visconti di Oleggio Castello wrote:
>>>>
>>>> What do you mean with "cycling over approx 40 different classifiers"? Are
>>>> you testing different classifiers? If that's the case, a possibility is to
>>>> create a script that takes as argument the type of classifiers and runs the
>>>> classification across all folds. In that way you can submit 40 jobs and
>>>> parallelize across classifiers.
>>>>
>>>> If that's not the case, because the folds are independent and deterministic
>>>> I would create a script that performs the classification on blocks of folds
>>>> (say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
>>>> to parallelize there.
>>>>
>>>> I think that if you send a snippet of the code you're using it can be more
>>>> evident which are good points for parallelization.
>>>>
>>>>
>>>> On 10/11/2017 09:57, marco tettamanti wrote:
>>>>> Dear Matteo and Nick,
>>>>> thank you for your responses.
>>>>> I take the occasion to ask some follow-up questions, because I am struggling to
>>>>> make pymvpa2 computations faster and more efficient.
>>>>>
>>>>> I often find myself in the situation of giving up with a particular analysis,
>>>>> because it is going to take far more time that I can bear (weeks, months!). This
>>>>> happens particularly with searchlight permutation testing (gnbsearchlight is
>>>>> much faster, but does not support pprocess), and nested cross-validation.
>>>>> As for the latter, for example, I recently wanted to run nested cross-validation
>>>>> in a sample of 18 patients and 18 controls (1 image x subject), training the
>>>>> classifiers to discriminate patients from controls in a leave-one-pair-out
>>>>> partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
>>>>> cycling over approx 40 different classifiers takes about 2 hours for each fold
>>>>> on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
>>>>> days for all 324 folds!
>>>>> The same server is equipped with 32 CPUs. With full parallelization, the same
>>>>> analysis may be completed in less than one day. This is the reason of my
>>>>> interest and questions about parallelization.
>>>>>
>>>>> Is there anything that you experts do in such situations to speed up or make the
>>>>> computation more efficient?
>>>>>
>>>>> Thank you again and best wishes,
>>>>> Marco
>>>>>
>>>>>
>>>>>> On 10/11/2017 10:07, Nick Oosterhof wrote:
>>>>>>
>>>>>> There have been some plans / minor attempts for using parallelisation more
>>>>>> parallel, but as far as I know we only support pprocces, and only for (1)
>>>>>> searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
>>>>>> do remember that parallelisation of other functions was challenging due to
>>>>>> some getting the conditional attributes set right, but this is long time
>>>>>> ago.
>>>>>>
>>>>>>> On 09/11/2017 18:35, Matteo Visconti di Oleggio Castello wrote:
>>>>>>>
>>>>>>> Hi Marco,
>>>>>>> AFAIK, there is no support for parallelization at the level of
>>>>>>> cross-validation. Usually for a small ROI (such a searchlight) and with
>>>>>>> standard CV schemes, the process is quite fast, and the bottleneck is
>>>>>>> really the number of searchlights to be computed (for which parallelization
>>>>>>> exists).
>>>>>>>
>>>>>>> In my experience, we tend to parallelize at the level of individual
>>>>>>> participants; for example we might set up a searchlight analysis with
>>>>>>> however n_procs you can have, and then submit one such job for every
>>>>>>> participant to a cluster (using either torque or condor).
>>>>>>>
>>>>>>> HTH,
>>>>>>> Matteo
>>>>>>>
>>>>>>> On 09/11/2017 10:08, marco tettamanti wrote:
>>>>>>>> Dear all,
>>>>>>>> forgive me if this has already been asked in the past, but I was wondering
>>>>>>>> whether there has been any development meanwhile.
>>>>>>>>
>>>>>>>> Are there any chances that one can generally apply parallel computing (multiple
>>>>>>>> CPUs or clusters) with pymvpa2, in addition to what is already implemented for
>>>>>>>> searchlight (pprocess)? That is, also for general cross-validation, nested
>>>>>>>> cross-validation, permutation testing, RFE, etc.?
>>>>>>>>
>>>>>>>> Has anyone had succesful experience with parallelization schemes such as
>>>>>>>> ipyparallel, condor or else?
>>>>>>>>
>>>>>>>> Thank you and best wishes!
>>>>>>>> Marco
>>>>>>>>
>>>>
>>>> -- 
>>>> Marco Tettamanti, Ph.D.
>>>> Nuclear Medicine Department & Division of Neuroscience
>>>> IRCCS San Raffaele Scientific Institute
>>>> Via Olgettina 58
>>>> I-20132 Milano, Italy
>>>> Phone ++39-02-26434888
>>>> Fax ++39-02-26434892
>>>> Email:tettamanti.marco at hsr.it
>>>> Skype: mtettamanti
>>>> http://scholar.google.it/citations?user=x4qQl4AAAAAJ
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20171123/018bd8f0/attachment-0001.html>