[pymvpa] Parallelization

Tue Nov 28 20:45:47 UTC 2017

Hi Marco,

I think there are a bunch of conflated issues here.

- First, there was an error in my code, and that's why you got the
error " UnboundLocalError:
local variable 'best_clf' referenced before assignment". I updated the
gist, and now the example code for running the parallelization should be ok
and should work as a blueprint for your code (
https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0).

- You are correct in changing the backend to 'threading' for this
particular case because of the pickling error.

- However, I think that the example for nested_cv.py didn't work from the
start, even without parallelization. The last change was 6 years ago, and
I'm afraid that things changed in between and the code wasn't updated. I
opened an issue on github to keep track of it (
https://github.com/PyMVPA/PyMVPA/issues/559).

On Fri, Nov 24, 2017 at 3:57 PM, marco tettamanti <mrctttmnt at gmail.com>
wrote:

> Dear Matteo,
> thank you for kindly replying!
>
> Yes, I do have the latest versions of joblib (0.11) and sklearn (0.19.1),
> see at the bottom of this email.
>
> The problem seems independent of either running in jupyter, or evoking
> ipython or python directly in the console.
>
> I am now wondering whether there may be something wrong in my snippet.
> When first running your gist, I encountered an:
>
>         UnboundLocalError: local variable 'best_clf' referenced before
> assignment
>
> which I solved by moving the best_clf declaration a few lines down:
>
> -----------------------------------------
> #best_clfs = {}  #moved down 7 lines
> confusion = ConfusionMatrix()
> verbose(1, "Estimating error using nested CV for model selection")
> partitioner = partitionerCD
> splitter = Splitter('partitions')
> tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
>                          for isplit, partitions in enumerate(partitionerCD.
> generate(fds)))
> best_clfs = {}
> for tm in tms:
>     confusion += tm.ca.stats
>     best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
> -----------------------------------------
>
> But now, running the snippet in ipython/python specifically for the SVM
> parallelization issue, I saw the error message popping up again:
>
>          UnboundLocalError: local variable 'best_clf' referenced before
> assignment
>
> May this be the culprit? As a reminder, the full snippet I am using is
> included in my previous email.
>
> Thank you and very best wishes,
> Marco
>
>
> In [21]: mvpa2.wtf(exclude=['runtime','process']) ##other possible
> arguments (['sources',
> Out[21]:
> Current date:   2017-11-24 21:23
> PyMVPA:
>  Version:       2.6.3
>  Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
>  Path:          /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>  Version control (GIT):
>  GIT information could not be obtained due "/usr/lib/python2.7/dist-packages/mvpa2/..
> is not under GIT"
> SYSTEM:
>  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2
> (2017-10-15)
>  Distribution:  debian/buster/sid
> EXTERNALS:
>  Present:       atlas_fsl, cPickle, ctypes, good
> scipy.stats.rv_continuous._reduce_func(floc,fscale), good
> scipy.stats.rv_discrete.ppf, griddata, gzip, h5py, hdf5, ipython, joblib,
> liblapack.so, libsvm, libsvm verbosity control, lxml, matplotlib, mdp, mdp
> ge 2.4, mock, nibabel, nose, numpy, numpy_correct_unique, pprocess, pylab,
> pylab plottable, pywt, pywt wp reconstruct, reportlab, running ipython env,
> scipy, skl, statsmodels
>  Absent:        afni-3dinfo, atlas_pymvpa, cran-energy, datalad,
> elasticnet, glmnet, good scipy.stats.rdist, hcluster, lars, mass, nipy,
> nipy.neurospin, numpydoc, openopt, pywt wp reconstruct fixed, rpy2,
> scipy.weave, sg ge 0.6.4, sg ge 0.6.5, sg_fixedcachesize, shogun,
> shogun.krr, shogun.lightsvm, shogun.mpd, shogun.svmocas, shogun.svrlight,
> weave
>  Versions of critical externals:
>   ctypes      : 1.1.0
>   h5py        : 2.7.1
>   hdf5        : 1.10.0
>   ipython     : 5.5.0
>   joblib      : 0.11
>   lxml        : 4.1.0
>   matplotlib  : 2.0.0
>   mdp         : 3.5
>   mock        : 2.0.0
>   nibabel     : 2.3.0dev
>   numpy       : 1.13.1
>   pprocess    : 0.5
>   pywt        : 0.5.1
>   reportlab   : 3.4.0
>   scipy       : 0.19.1
>   skl         : 0.19.1
>  Matplotlib backend: TkAgg
>
>
> On 24/11/2017 17:32, Matteo Visconti di Oleggio Castello wrote:
>
> Hi Marco,
>
> some ideas in random order
>
> - what version of sklearn/joblib are you using? I would make sure to use
> the latest version (0.11), perhaps not importing it from sklearn (unless
> you have the latest sklearn version, 0.19.1)
> - are you running the code in a jupyter notebook? There might be some
> issues with that (see https://github.com/joblib/joblib/issues/174). As a
> test you might try to convert your notebook to a script and then run it
>
>
>
> On 23/11/2017 12:07, marco tettamanti wrote:
>
> Dear Matteo (and others),
> sorry, I am again asking for your help!
>
> I have experimented with the analysis of my dataset using an adaptation of
> your joblib-based gist.
> As I wrote before, it works perfectly, but not with some classifiers: SVM
> classifiers always cause the code to terminate with an error.
>
> If I set:
>         myclassif=clfswh['!gnpp','!skl','svm']    #Note that 'gnnp' and
> 'skl' were excluded for independent reasons
> the code runs through without errors.
>
> However, with:
>         myclassif=clfswh['!gnpp','!skl']
> I get the following error:
>         MaybeEncodingError: Error sending result:
> '[TransferMeasure(measure=SVM(svm_impl='C_SVC', kernel=LinearLSKernel(),
> weight=[], probability=1,
>          weight_label=[]), splitter=Splitter(space='partitions'),
> postproc=BinaryFxNode(space='targets'), enable_ca=['stats'])]'. Reason:
> 'TypeError("can't
>         pickle SwigPyObject objects",)'
>
> After googling for what may cause this particular error, I have found that
> the situation improves slightly (i.e. more splits executed, sometimes even
> all splits) by importing the following:
>         import os
>         from sklearn.externals.joblib import Parallel, delayed
>         from sklearn.externals.joblib.parallel import parallel_backend
> and then specifying just before 'Parallel(n_jobs=2)':
>         with parallel_backend('threading'):
> However, also in this case, the code invariably terminates with a long
> error message (I only report an extract, but in case I can send the whole
> error message):
>         <type 'str'>: (<type 'exceptions.UnicodeEncodeError'>,
> UnicodeEncodeError('ascii',
>        u'JoblibAttributeError\n____________________________________
> _______________________________________\nMultiprocessing
>        exception:\n................................................
> ...........................\n/usr/lib/python2.7/runpy.py in
>        _run_module_as_main(mod_name=\'ipykernel_launcher\',
> alter_argv=1)\n    169     pkg_name = mod_name.rpartition(\'.\')[0]\n
> 170
>        main_globals = sys.modules["__main__"].__dict__\n    171     if
> alter_argv:\n    172         sys.argv[0] = fname\n    173     return
> _run_code(code,
>        main_globals, None,\n--> 174
>
>
> I think I have sort of understood that the problem is due to some failure
> in pickling the parallelized jobs, but I have no clues if and how it can be
> solved.
> Do you have any suggestions?
>
> Thank you and very best wishes,
> Marco
>
> p.s. This is again the full code:
>
> ########## * ##########
> ##########
>
> PyMVPA:
>  Version:       2.6.3
>  Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
>  Path:          /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>  Version control (GIT):
>  GIT information could not be obtained due "/usr/lib/python2.7/dist-packages/mvpa2/..
> is not under GIT"
> SYSTEM:
>  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2
> (2017-10-15)
>
>
> print fds.summary()
> Dataset: 36x534 at float32, <sa: chunks,targets,time_coords,time_indices>,
> <fa: voxel_indices>, <a: imgaffine,imghdr,imgtype,
> mapper,voxel_dim,voxel_eldim>
> stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
> No details due to large number of targets or chunks. Increase maxc and
> maxt if desired
> Summary for targets across chunks
>   targets mean std min max #chunks
>     C      0.5 0.5  0   1     18
>     D      0.5 0.5  0   1     18
>
>
> #Evaluate prevalent best classifier with nested crossvalidation
> verbose.level = 5
>
> partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
> Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
> # training partitions
> for fds_ in partitionerCD.generate(fds):
>     training = fds[fds_.sa.partitions == 1]
>     #print list(zip(training.sa.chunks, training.sa.targets))
> # testing partitions
> for fds_ in partitionerCD.generate(fds):
>     testing = fds[fds_.sa.partitions == 2]
>     #print list(zip(testing.sa.chunks, testing.sa.targets))
>
> #Helper function (partitionerCD recursively acting on dstrain, rather than
> on fds):
> def select_best_clf(dstrain_, clfs):
>     """Select best model according to CVTE
>     Helper function which we will use twice -- once for proper nested
>     cross-validation, and once to see how big an optimistic bias due
>     to model selection could be if we simply provide an entire dataset.
>     Parameters
>     ----------
>     dstrain_ : Dataset
>     clfs : list of Classifiers
>       Which classifiers to explore
>     Returns
>     -------
>     best_clf, best_error
>     """
>     best_error = None
>     for clf in clfs:
>         cv = CrossValidation(clf, partitionerCD)
>         # unfortunately we don't have ability to reassign clf atm
>         # cv.transerror.clf = clf
>         try:
>             error = np.mean(cv(dstrain_))
>         except LearnerError, e:
>             # skip the classifier if data was not appropriate and it
>             # failed to learn/predict at all
>             continue
>         if best_error is None or error < best_error:
>             best_clf = clf
>             best_error = error
>         verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
>     verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
>             % (len(clfs), best_clf.descr, best_error))
>     return best_clf, best_error
>
> # This function will run all classifiers for one single partitions
> myclassif=clfswh['!gnpp','!skl'][5:6]  #Testing a single SVM classifier
> def _run_one_partition(isplit, partitions, classifiers=myclassif): #see §§
>     verbose(2, "Processing split #%i" % isplit)
>     dstrain, dstest = list(splitter.generate(partitions))
>     best_clf, best_error = select_best_clf(dstrain, classifiers)
>     # now that we have the best classifier, lets assess its transfer
>     # to the testing dataset while training on entire training
>     tm = TransferMeasure(best_clf, splitter,postproc=
> BinaryFxNode(mean_mismatch_error,space='targets'), enable_ca=['stats'])
>     tm(partitions)
>     return tm
>
> #import os
> #from sklearn.externals.joblib import Parallel, delayed
> #from sklearn.externals.joblib.parallel import parallel_backend
>
> # Parallel estimate error using nested CV for model selection
> confusion = ConfusionMatrix()
> verbose(1, "Estimating error using nested CV for model selection")
> partitioner = partitionerCD
> splitter = Splitter('partitions')
> # Here we are using joblib Parallel to parallelize each partition
> # Set n_jobs to the number of available cores (or how many you want to use)
> #with parallel_backend('threading'):
> #    tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit,
> partitions)
> tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
>                          for isplit, partitions in enumerate(partitionerCD.
> generate(fds)))
> # Parallel retuns a list with the results of each parallel loop, so we
> need to
> # unravel it to get the confusion matrix
> best_clfs = {}
> for tm in tms:
>     confusion += tm.ca.stats
>     best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
>
> ##########
> ########## * ##########
>
>
>
>
>
>
>
> On 13/11/2017 09:12, marco tettamanti wrote:
>
> Dear Matteo,
> grazie mille, this is precisely the kind of thing I was looking for: it
> works like charm!
> Ciao,
> Marco
>
> On 11/11/2017 21:44, Matteo Visconti di Oleggio Castello wrote:
>
> Hi Marco,
>
> in your case, I would then recommend looking into joblib to parallelize
> your for loops (https://pythonhosted.org/joblib/parallel.html).
>
> As an example, here's a gist containing part of the PyMVPA's nested_cv
> example where I parallelized the loop across partitions. I feel this is
> what you might want to do in your case, since you have a lot more folds.
>
> Here's the gist:
> https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0
>
>
> On 10/11/2017 21:13, marco tettamanti wrote:
>
> Dear Matteo,
> thank you for the willingness to look into my code.
>
> This is taken almost verbatim from http://dev.pymvpa.org/
> examples/nested_cv.html, except for the leave-one-pair-out partitioning,
> and a slight reduction in the number of classifiers (in the original
> example, they are around 45).
>
> Any help or suggestion would be greatly appreciated!
> All the best,
> Marco
>
>
> ########## * ##########
> ##########
>
> PyMVPA:
>  Version:       2.6.3
>  Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
>  Path:          /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>  Version control (GIT):
>  GIT information could not be obtained due "/usr/lib/python2.7/dist-packages/mvpa2/..
> is not under GIT"
> SYSTEM:
>  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2
> (2017-10-15)
>
>
> print fds.summary()
> Dataset: 36x534 at float32, <sa: chunks,targets,time_coords,time_indices>,
> <fa: voxel_indices>, <a: imgaffine,imghdr,imgtype,
> mapper,voxel_dim,voxel_eldim>
> stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
> No details due to large number of targets or chunks. Increase maxc and
> maxt if desired
> Summary for targets across chunks
>   targets mean std min max #chunks
>     C      0.5 0.5  0   1     18
>     D      0.5 0.5  0   1     18
>
>
> #Evaluate prevalent best classifier with nested crossvalidation
> verbose.level = 5
>
> partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
> Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
> # training partitions
> for fds_ in partitionerCD.generate(fds):
>     training = fds[fds_.sa.partitions == 1]
>     #print list(zip(training.sa.chunks, training.sa.targets))
> # testing partitions
> for fds_ in partitionerCD.generate(fds):
>     testing = fds[fds_.sa.partitions == 2]
>     #print list(zip(testing.sa.chunks, testing.sa.targets))
>
> #Helper function (partitionerCD recursively acting on dstrain, rather than
> on fds):
> def select_best_clf(dstrain_, clfs):
>     """Select best model according to CVTE
>     Helper function which we will use twice -- once for proper nested
>     cross-validation, and once to see how big an optimistic bias due
>     to model selection could be if we simply provide an entire dataset.
>     Parameters
>     ----------
>     dstrain_ : Dataset
>     clfs : list of Classifiers
>       Which classifiers to explore
>     Returns
>     -------
>     best_clf, best_error
>     """
>     best_error = None
>     for clf in clfs:
>         cv = CrossValidation(clf, partitionerCD)
>         # unfortunately we don't have ability to reassign clf atm
>         # cv.transerror.clf = clf
>         try:
>             error = np.mean(cv(dstrain_))
>         except LearnerError, e:
>             # skip the classifier if data was not appropriate and it
>             # failed to learn/predict at all
>             continue
>         if best_error is None or error < best_error:
>             best_clf = clf
>             best_error = error
>         verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
>     verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
>             % (len(clfs), best_clf.descr, best_error))
>     return best_clf, best_error
>
> #Estimate error using nested CV for model selection:
> best_clfs = {}
> confusion = ConfusionMatrix()
> verbose(1, "Estimating error using nested CV for model selection")
> partitioner = partitionerCD
> splitter = Splitter('partitions')
> for isplit, partitions in enumerate(partitionerCD.generate(fds)):
>     verbose(2, "Processing split #%i" % isplit)
>     dstrain, dstest = list(splitter.generate(partitions))
>     best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
>     best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
>     # now that we have the best classifier, lets assess its transfer
>     # to the testing dataset while training on entire training
>     tm = TransferMeasure(best_clf, splitter,
>                          postproc=BinaryFxNode(mean_mismatch_error,
> space='targets'), enable_ca=['stats'])
>     tm(partitions)
>     confusion += tm.ca.stats
>
> ##########
> ########## * ##########
>
>
>
>
>
>
> On 10/11/2017 15:43, Matteo Visconti di Oleggio Castello wrote:
>
> What do you mean with "cycling over approx 40 different classifiers"? Are
> you testing different classifiers? If that's the case, a possibility is to
> create a script that takes as argument the type of classifiers and runs the
> classification across all folds. In that way you can submit 40 jobs and
> parallelize across classifiers.
>
> If that's not the case, because the folds are independent and deterministic
> I would create a script that performs the classification on blocks of folds
> (say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
> to parallelize there.
>
> I think that if you send a snippet of the code you're using it can be more
> evident which are good points for parallelization.
>
>
> On 10/11/2017 09:57, marco tettamanti wrote:
>
> Dear Matteo and Nick,
> thank you for your responses.
> I take the occasion to ask some follow-up questions, because I am struggling to
> make pymvpa2 computations faster and more efficient.
>
> I often find myself in the situation of giving up with a particular analysis,
> because it is going to take far more time that I can bear (weeks, months!). This
> happens particularly with searchlight permutation testing (gnbsearchlight is
> much faster, but does not support pprocess), and nested cross-validation.
> As for the latter, for example, I recently wanted to run nested cross-validation
> in a sample of 18 patients and 18 controls (1 image x subject), training the
> classifiers to discriminate patients from controls in a leave-one-pair-out
> partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
> cycling over approx 40 different classifiers takes about 2 hours for each fold
> on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
> days for all 324 folds!
> The same server is equipped with 32 CPUs. With full parallelization, the same
> analysis may be completed in less than one day. This is the reason of my
> interest and questions about parallelization.
>
> Is there anything that you experts do in such situations to speed up or make the
> computation more efficient?
>
> Thank you again and best wishes,
> Marco
>
>
>
> On 10/11/2017 10:07, Nick Oosterhof wrote:
>
> There have been some plans / minor attempts for using parallelisation more
> parallel, but as far as I know we only support pprocces, and only for (1)
> searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
> do remember that parallelisation of other functions was challenging due to
> some getting the conditional attributes set right, but this is long time
> ago.
>
>
> On 09/11/2017 18:35, Matteo Visconti di Oleggio Castello wrote:
>
> Hi Marco,
> AFAIK, there is no support for parallelization at the level of
> cross-validation. Usually for a small ROI (such a searchlight) and with
> standard CV schemes, the process is quite fast, and the bottleneck is
> really the number of searchlights to be computed (for which parallelization
> exists).
>
> In my experience, we tend to parallelize at the level of individual
> participants; for example we might set up a searchlight analysis with
> however n_procs you can have, and then submit one such job for every
> participant to a cluster (using either torque or condor).
>
> HTH,
> Matteo
>
> On 09/11/2017 10:08, marco tettamanti wrote:
>
> Dear all,
> forgive me if this has already been asked in the past, but I was wondering
> whether there has been any development meanwhile.
>
> Are there any chances that one can generally apply parallel computing (multiple
> CPUs or clusters) with pymvpa2, in addition to what is already implemented for
> searchlight (pprocess)? That is, also for general cross-validation, nested
> cross-validation, permutation testing, RFE, etc.?
>
> Has anyone had succesful experience with parallelization schemes such as
> ipyparallel, condor or else?
>
> Thank you and best wishes!
> Marco
>
>
>
> --
> Marco Tettamanti, Ph.D.
> Nuclear Medicine Department & Division of Neuroscience
> IRCCS San Raffaele Scientific Institute
> Via Olgettina 58
> I-20132 Milano, Italy
> Phone ++39-02-26434888 <+39%2002%202643%204888>
> Fax ++39-02-26434892 <+39%2002%202643%204892>
> Email: tettamanti.marco at hsr.it
> Skype: mtettamantihttp://scholar.google.it/citations?user=x4qQl4AAAAAJ
>
>
>
>
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>

-- 
Matteo Visconti di Oleggio Castello
Ph.D. Candidate in Cognitive Neuroscience
Dartmouth College

+1 (603) 646-8665
mvdoc.me || github.com/mvdoc || linkedin.com/in/matteovisconti
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20171128/7ae7a718/attachment-0001.html>