[pymvpa] Parallelization

Fri Nov 24 20:57:50 UTC 2017

Dear Matteo,
thank you for kindly replying!

Yes, I do have the latest versions of joblib (0.11) and sklearn (0.19.1), see at 
the bottom of this email.

The problem seems independent of either running in jupyter, or evoking ipython 
or python directly in the console.

I am now wondering whether there may be something wrong in my snippet.
When first running your gist, I encountered an:

         UnboundLocalError: local variable 'best_clf' referenced before assignment

which I solved by moving the best_clf declaration a few lines down:

-----------------------------------------
#best_clfs = {}  #moved down 7 lines
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
                          for isplit, partitions in 
enumerate(partitionerCD.generate(fds)))
best_clfs = {}
for tm in tms:
     confusion += tm.ca.stats
     best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
-----------------------------------------

But now, running the snippet in ipython/python specifically for the SVM 
parallelization issue, I saw the error message popping up again:

          UnboundLocalError: local variable 'best_clf' referenced before assignment

May this be the culprit? As a reminder, the full snippet I am using is included 
in my previous email.

Thank you and very best wishes,
Marco

In [21]: mvpa2.wtf(exclude=['runtime','process']) ##other possible arguments 
(['sources',
Out[21]:
Current date:   2017-11-24 21:23
PyMVPA:
  Version:       2.6.3
  Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
  Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
  Version control (GIT):
  GIT information could not be obtained due 
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
SYSTEM:
  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
  Distribution:  debian/buster/sid
EXTERNALS:
  Present:       atlas_fsl, cPickle, ctypes, good 
scipy.stats.rv_continuous._reduce_func(floc,fscale), good 
scipy.stats.rv_discrete.ppf, griddata, gzip, h5py, hdf5, ipython, joblib, 
liblapack.so, libsvm, libsvm verbosity control, lxml, matplotlib, mdp, mdp ge 
2.4, mock, nibabel, nose, numpy, numpy_correct_unique, pprocess, pylab, pylab 
plottable, pywt, pywt wp reconstruct, reportlab, running ipython env, scipy, 
skl, statsmodels
  Absent:        afni-3dinfo, atlas_pymvpa, cran-energy, datalad, elasticnet, 
glmnet, good scipy.stats.rdist, hcluster, lars, mass, nipy, nipy.neurospin, 
numpydoc, openopt, pywt wp reconstruct fixed, rpy2, scipy.weave, sg ge 0.6.4, sg 
ge 0.6.5, sg_fixedcachesize, shogun, shogun.krr, shogun.lightsvm, shogun.mpd, 
shogun.svmocas, shogun.svrlight, weave
  Versions of critical externals:
   ctypes      : 1.1.0
   h5py        : 2.7.1
   hdf5        : 1.10.0
   ipython     : 5.5.0
   joblib      : 0.11
   lxml        : 4.1.0
   matplotlib  : 2.0.0
   mdp         : 3.5
   mock        : 2.0.0
   nibabel     : 2.3.0dev
   numpy       : 1.13.1
   pprocess    : 0.5
   pywt        : 0.5.1
   reportlab   : 3.4.0
   scipy       : 0.19.1
   skl         : 0.19.1
  Matplotlib backend: TkAgg

> On 24/11/2017 17:32, Matteo Visconti di Oleggio Castello wrote:
>
> Hi Marco,
>
> some ideas in random order
>
> - what version of sklearn/joblib are you using? I would make sure to use
> the latest version (0.11), perhaps not importing it from sklearn (unless
> you have the latest sklearn version, 0.19.1)
> - are you running the code in a jupyter notebook? There might be some
> issues with that (see https://github.com/joblib/joblib/issues/174). As a
> test you might try to convert your notebook to a script and then run it
>
>
>
> On 23/11/2017 12:07, marco tettamanti wrote:
>> Dear Matteo (and others),
>> sorry, I am again asking for your help!
>>
>> I have experimented with the analysis of my dataset using an adaptation of 
>> your joblib-based gist.
>> As I wrote before, it works perfectly, but not with some classifiers: SVM 
>> classifiers always cause the code to terminate with an error.
>>
>> If I set:
>>         myclassif=clfswh['!gnpp','!skl','svm']    #Note that 'gnnp' and 'skl' 
>> were excluded for independent reasons
>> the code runs through without errors.
>>
>> However, with:
>>         myclassif=clfswh['!gnpp','!skl']
>> I get the following error:
>>         MaybeEncodingError: Error sending result: 
>> '[TransferMeasure(measure=SVM(svm_impl='C_SVC', kernel=LinearLSKernel(), 
>> weight=[], probability=1,
>>          weight_label=[]), splitter=Splitter(space='partitions'), 
>> postproc=BinaryFxNode(space='targets'), enable_ca=['stats'])]'. Reason: 
>> 'TypeError("can't
>>         pickle SwigPyObject objects",)'
>>
>> After googling for what may cause this particular error, I have found that 
>> the situation improves slightly (i.e. more splits executed, sometimes even 
>> all splits) by importing the following:
>>         import os
>>         from sklearn.externals.joblib import Parallel, delayed
>>         from sklearn.externals.joblib.parallel import parallel_backend
>> and then specifying just before 'Parallel(n_jobs=2)':
>>         with parallel_backend('threading'):
>> However, also in this case, the code invariably terminates with a long error 
>> message (I only report an extract, but in case I can send the whole error 
>> message):
>>         <type 'str'>: (<type 'exceptions.UnicodeEncodeError'>, 
>> UnicodeEncodeError('ascii',
>> u'JoblibAttributeError\n___________________________________________________________________________\nMultiprocessing
>> exception:\n...........................................................................\n/usr/lib/python2.7/runpy.py 
>> in
>>        _run_module_as_main(mod_name=\'ipykernel_launcher\', 
>> alter_argv=1)\n    169     pkg_name = mod_name.rpartition(\'.\')[0]\n    170
>>        main_globals = sys.modules["__main__"].__dict__\n 171     if 
>> alter_argv:\n    172         sys.argv[0] = fname\n    173     return 
>> _run_code(code,
>>        main_globals, None,\n--> 174
>>
>>
>> I think I have sort of understood that the problem is due to some failure in 
>> pickling the parallelized jobs, but I have no clues if and how it can be solved.
>> Do you have any suggestions?
>>
>> Thank you and very best wishes,
>> Marco
>>
>> p.s. This is again the full code:
>>
>> ########## * ##########
>> ##########
>>
>> PyMVPA:
>>  Version:       2.6.3
>>  Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
>>  Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>>  Version control (GIT):
>>  GIT information could not be obtained due 
>> "/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
>> SYSTEM:
>>  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
>>
>>
>> print fds.summary()
>> Dataset: 36x534 at float32, <sa: chunks,targets,time_coords,time_indices>, <fa: 
>> voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
>> stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
>> No details due to large number of targets or chunks. Increase maxc and maxt 
>> if desired
>> Summary for targets across chunks
>>   targets mean std min max #chunks
>>     C      0.5 0.5  0   1     18
>>     D      0.5 0.5  0   1     18
>>
>>
>> #Evaluate prevalent best classifier with nested crossvalidation
>> verbose.level = 5
>>
>> partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'), 
>> Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
>> # training partitions
>> for fds_ in partitionerCD.generate(fds):
>>     training = fds[fds_.sa.partitions == 1]
>>     #print list(zip(training.sa.chunks, training.sa.targets))
>> # testing partitions
>> for fds_ in partitionerCD.generate(fds):
>>     testing = fds[fds_.sa.partitions == 2]
>>     #print list(zip(testing.sa.chunks, testing.sa.targets))
>>
>> #Helper function (partitionerCD recursively acting on dstrain, rather than on 
>> fds):
>> def select_best_clf(dstrain_, clfs):
>>     """Select best model according to CVTE
>>     Helper function which we will use twice -- once for proper nested
>>     cross-validation, and once to see how big an optimistic bias due
>>     to model selection could be if we simply provide an entire dataset.
>>     Parameters
>>     ----------
>>     dstrain_ : Dataset
>>     clfs : list of Classifiers
>>       Which classifiers to explore
>>     Returns
>>     -------
>>     best_clf, best_error
>>     """
>>     best_error = None
>>     for clf in clfs:
>>         cv = CrossValidation(clf, partitionerCD)
>>         # unfortunately we don't have ability to reassign clf atm
>>         # cv.transerror.clf = clf
>>         try:
>>             error = np.mean(cv(dstrain_))
>>         except LearnerError, e:
>>             # skip the classifier if data was not appropriate and it
>>             # failed to learn/predict at all
>>             continue
>>         if best_error is None or error < best_error:
>>             best_clf = clf
>>             best_error = error
>>         verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
>>     verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
>>             % (len(clfs), best_clf.descr, best_error))
>>     return best_clf, best_error
>>
>> # This function will run all classifiers for one single partitions
>> myclassif=clfswh['!gnpp','!skl'][5:6]  #Testing a single SVM classifier
>> def _run_one_partition(isplit, partitions, classifiers=myclassif): #see §§
>>     verbose(2, "Processing split #%i" % isplit)
>>     dstrain, dstest = list(splitter.generate(partitions))
>>     best_clf, best_error = select_best_clf(dstrain, classifiers)
>>     # now that we have the best classifier, lets assess its transfer
>>     # to the testing dataset while training on entire training
>>     tm = TransferMeasure(best_clf, 
>> splitter,postproc=BinaryFxNode(mean_mismatch_error,space='targets'), 
>> enable_ca=['stats'])
>>     tm(partitions)
>>     return tm
>>
>> #import os
>> #from sklearn.externals.joblib import Parallel, delayed
>> #from sklearn.externals.joblib.parallel import parallel_backend
>>
>> # Parallel estimate error using nested CV for model selection
>> confusion = ConfusionMatrix()
>> verbose(1, "Estimating error using nested CV for model selection")
>> partitioner = partitionerCD
>> splitter = Splitter('partitions')
>> # Here we are using joblib Parallel to parallelize each partition
>> # Set n_jobs to the number of available cores (or how many you want to use)
>> #with parallel_backend('threading'):
>> #    tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
>> tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
>>                          for isplit, partitions in 
>> enumerate(partitionerCD.generate(fds)))
>> # Parallel retuns a list with the results of each parallel loop, so we need to
>> # unravel it to get the confusion matrix
>> best_clfs = {}
>> for tm in tms:
>>     confusion += tm.ca.stats
>>     best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
>>
>> ##########
>> ########## * ##########
>>
>>
>>
>>
>>
>>
>>
>> On 13/11/2017 09:12, marco tettamanti wrote:
>>> Dear Matteo,
>>> grazie mille, this is precisely the kind of thing I was looking for: it 
>>> works like charm!
>>> Ciao,
>>> Marco
>>>
>>>> On 11/11/2017 21:44, Matteo Visconti di Oleggio Castello wrote:
>>>>
>>>> Hi Marco,
>>>>
>>>> in your case, I would then recommend looking into joblib to parallelize
>>>> your for loops (https://pythonhosted.org/joblib/parallel.html).
>>>>
>>>> As an example, here's a gist containing part of the PyMVPA's nested_cv
>>>> example where I parallelized the loop across partitions. I feel this is
>>>> what you might want to do in your case, since you have a lot more folds.
>>>>
>>>> Here's the gist:
>>>> https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0
>>>>
>>>>
>>>> On 10/11/2017 21:13, marco tettamanti wrote:
>>>>> Dear Matteo,
>>>>> thank you for the willingness to look into my code.
>>>>>
>>>>> This is taken almost verbatim from 
>>>>> http://dev.pymvpa.org/examples/nested_cv.html, except for the 
>>>>> leave-one-pair-out partitioning, and a slight reduction in the number of 
>>>>> classifiers (in the original example, they are around 45).
>>>>>
>>>>> Any help or suggestion would be greatly appreciated!
>>>>> All the best,
>>>>> Marco
>>>>>
>>>>>
>>>>> ########## * ##########
>>>>> ##########
>>>>>
>>>>> PyMVPA:
>>>>>  Version:       2.6.3
>>>>>  Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
>>>>>  Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>>>>>  Version control (GIT):
>>>>>  GIT information could not be obtained due 
>>>>> "/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
>>>>> SYSTEM:
>>>>>  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
>>>>>
>>>>>
>>>>> print fds.summary()
>>>>> Dataset: 36x534 at float32, <sa: chunks,targets,time_coords,time_indices>, 
>>>>> <fa: voxel_indices>, <a: 
>>>>> imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
>>>>> stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
>>>>> No details due to large number of targets or chunks. Increase maxc and 
>>>>> maxt if desired
>>>>> Summary for targets across chunks
>>>>>   targets mean std min max #chunks
>>>>>     C      0.5 0.5  0   1     18
>>>>>     D      0.5 0.5  0   1     18
>>>>>
>>>>>
>>>>> #Evaluate prevalent best classifier with nested crossvalidation
>>>>> verbose.level = 5
>>>>>
>>>>> partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'), 
>>>>> Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
>>>>> # training partitions
>>>>> for fds_ in partitionerCD.generate(fds):
>>>>>     training = fds[fds_.sa.partitions == 1]
>>>>>     #print list(zip(training.sa.chunks, training.sa.targets))
>>>>> # testing partitions
>>>>> for fds_ in partitionerCD.generate(fds):
>>>>>     testing = fds[fds_.sa.partitions == 2]
>>>>>     #print list(zip(testing.sa.chunks, testing.sa.targets))
>>>>>
>>>>> #Helper function (partitionerCD recursively acting on dstrain, rather than 
>>>>> on fds):
>>>>> def select_best_clf(dstrain_, clfs):
>>>>>     """Select best model according to CVTE
>>>>>     Helper function which we will use twice -- once for proper nested
>>>>>     cross-validation, and once to see how big an optimistic bias due
>>>>>     to model selection could be if we simply provide an entire dataset.
>>>>>     Parameters
>>>>>     ----------
>>>>>     dstrain_ : Dataset
>>>>>     clfs : list of Classifiers
>>>>>       Which classifiers to explore
>>>>>     Returns
>>>>>     -------
>>>>>     best_clf, best_error
>>>>>     """
>>>>>     best_error = None
>>>>>     for clf in clfs:
>>>>>         cv = CrossValidation(clf, partitionerCD)
>>>>>         # unfortunately we don't have ability to reassign clf atm
>>>>>         # cv.transerror.clf = clf
>>>>>         try:
>>>>>             error = np.mean(cv(dstrain_))
>>>>>         except LearnerError, e:
>>>>>             # skip the classifier if data was not appropriate and it
>>>>>             # failed to learn/predict at all
>>>>>             continue
>>>>>         if best_error is None or error < best_error:
>>>>>             best_clf = clf
>>>>>             best_error = error
>>>>>         verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
>>>>>     verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
>>>>>             % (len(clfs), best_clf.descr, best_error))
>>>>>     return best_clf, best_error
>>>>>
>>>>> #Estimate error using nested CV for model selection:
>>>>> best_clfs = {}
>>>>> confusion = ConfusionMatrix()
>>>>> verbose(1, "Estimating error using nested CV for model selection")
>>>>> partitioner = partitionerCD
>>>>> splitter = Splitter('partitions')
>>>>> for isplit, partitions in enumerate(partitionerCD.generate(fds)):
>>>>>     verbose(2, "Processing split #%i" % isplit)
>>>>>     dstrain, dstest = list(splitter.generate(partitions))
>>>>>     best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
>>>>>     best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
>>>>>     # now that we have the best classifier, lets assess its transfer
>>>>>     # to the testing dataset while training on entire training
>>>>>     tm = TransferMeasure(best_clf, splitter,
>>>>> postproc=BinaryFxNode(mean_mismatch_error, space='targets'), 
>>>>> enable_ca=['stats'])
>>>>>     tm(partitions)
>>>>>     confusion += tm.ca.stats
>>>>>
>>>>> ##########
>>>>> ########## * ##########
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On 10/11/2017 15:43, Matteo Visconti di Oleggio Castello wrote:
>>>>>>
>>>>>> What do you mean with "cycling over approx 40 different classifiers"? Are
>>>>>> you testing different classifiers? If that's the case, a possibility is to
>>>>>> create a script that takes as argument the type of classifiers and runs the
>>>>>> classification across all folds. In that way you can submit 40 jobs and
>>>>>> parallelize across classifiers.
>>>>>>
>>>>>> If that's not the case, because the folds are independent and deterministic
>>>>>> I would create a script that performs the classification on blocks of folds
>>>>>> (say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
>>>>>> to parallelize there.
>>>>>>
>>>>>> I think that if you send a snippet of the code you're using it can be more
>>>>>> evident which are good points for parallelization.
>>>>>>
>>>>>>
>>>>>> On 10/11/2017 09:57, marco tettamanti wrote:
>>>>>>> Dear Matteo and Nick,
>>>>>>> thank you for your responses.
>>>>>>> I take the occasion to ask some follow-up questions, because I am struggling to
>>>>>>> make pymvpa2 computations faster and more efficient.
>>>>>>>
>>>>>>> I often find myself in the situation of giving up with a particular analysis,
>>>>>>> because it is going to take far more time that I can bear (weeks, months!). This
>>>>>>> happens particularly with searchlight permutation testing (gnbsearchlight is
>>>>>>> much faster, but does not support pprocess), and nested cross-validation.
>>>>>>> As for the latter, for example, I recently wanted to run nested cross-validation
>>>>>>> in a sample of 18 patients and 18 controls (1 image x subject), training the
>>>>>>> classifiers to discriminate patients from controls in a leave-one-pair-out
>>>>>>> partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
>>>>>>> cycling over approx 40 different classifiers takes about 2 hours for each fold
>>>>>>> on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
>>>>>>> days for all 324 folds!
>>>>>>> The same server is equipped with 32 CPUs. With full parallelization, the same
>>>>>>> analysis may be completed in less than one day. This is the reason of my
>>>>>>> interest and questions about parallelization.
>>>>>>>
>>>>>>> Is there anything that you experts do in such situations to speed up or make the
>>>>>>> computation more efficient?
>>>>>>>
>>>>>>> Thank you again and best wishes,
>>>>>>> Marco
>>>>>>>
>>>>>>>
>>>>>>>> On 10/11/2017 10:07, Nick Oosterhof wrote:
>>>>>>>>
>>>>>>>> There have been some plans / minor attempts for using parallelisation more
>>>>>>>> parallel, but as far as I know we only support pprocces, and only for (1)
>>>>>>>> searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
>>>>>>>> do remember that parallelisation of other functions was challenging due to
>>>>>>>> some getting the conditional attributes set right, but this is long time
>>>>>>>> ago.
>>>>>>>>
>>>>>>>>> On 09/11/2017 18:35, Matteo Visconti di Oleggio Castello wrote:
>>>>>>>>>
>>>>>>>>> Hi Marco,
>>>>>>>>> AFAIK, there is no support for parallelization at the level of
>>>>>>>>> cross-validation. Usually for a small ROI (such a searchlight) and with
>>>>>>>>> standard CV schemes, the process is quite fast, and the bottleneck is
>>>>>>>>> really the number of searchlights to be computed (for which parallelization
>>>>>>>>> exists).
>>>>>>>>>
>>>>>>>>> In my experience, we tend to parallelize at the level of individual
>>>>>>>>> participants; for example we might set up a searchlight analysis with
>>>>>>>>> however n_procs you can have, and then submit one such job for every
>>>>>>>>> participant to a cluster (using either torque or condor).
>>>>>>>>>
>>>>>>>>> HTH,
>>>>>>>>> Matteo
>>>>>>>>>
>>>>>>>>> On 09/11/2017 10:08, marco tettamanti wrote:
>>>>>>>>>> Dear all,
>>>>>>>>>> forgive me if this has already been asked in the past, but I was wondering
>>>>>>>>>> whether there has been any development meanwhile.
>>>>>>>>>>
>>>>>>>>>> Are there any chances that one can generally apply parallel computing (multiple
>>>>>>>>>> CPUs or clusters) with pymvpa2, in addition to what is already implemented for
>>>>>>>>>> searchlight (pprocess)? That is, also for general cross-validation, nested
>>>>>>>>>> cross-validation, permutation testing, RFE, etc.?
>>>>>>>>>>
>>>>>>>>>> Has anyone had succesful experience with parallelization schemes such as
>>>>>>>>>> ipyparallel, condor or else?
>>>>>>>>>>
>>>>>>>>>> Thank you and best wishes!
>>>>>>>>>> Marco
>>>>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Marco Tettamanti, Ph.D.
>>>>>> Nuclear Medicine Department & Division of Neuroscience
>>>>>> IRCCS San Raffaele Scientific Institute
>>>>>> Via Olgettina 58
>>>>>> I-20132 Milano, Italy
>>>>>> Phone ++39-02-26434888
>>>>>> Fax ++39-02-26434892
>>>>>> Email:tettamanti.marco at hsr.it
>>>>>> Skype: mtettamanti
>>>>>> http://scholar.google.it/citations?user=x4qQl4AAAAAJ
>>>>>
>>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20171124/67214814/attachment-0001.html>