[pymvpa] Parallelization

marco tettamanti mrctttmnt at gmail.com
Wed Nov 29 11:26:19 UTC 2017


Hi Matteo,
thank you again for your precious help!

I have spotted a small typo in your gist. Line 105 should read:
     for tm in tms_parallel:


Apart from the the problems with the nested_cv.py example that you refer to, I 
am experiencing some more troubles:
- backend='threading' does not seem to parallelize on my machine, only 
backend='multiprocessing'
- while your 'nested_cv_parallel.py' gist runs smoothly, when I adapt it on my 
dataset, partitioner, etc., I get the following error:

     tms_parallel, best_clfs_parallel = zip(*out_parallel)
TypeError: zip argument #1 must support iteration


I guess I am putting my nested analysis in standby for the moment. Hopefully 
these issues will soon be solved.

Thank you for all!
Marco


> On 28/11/2017 15:45, Matteo Visconti di Oleggio Castello wrote:
>
> Hi Marco,
>
> I think there are a bunch of conflated issues here.
>
> - First, there was an error in my code, and that's why you got the
> error " UnboundLocalError:
> local variable 'best_clf' referenced before assignment". I updated the
> gist, and now the example code for running the parallelization should be ok
> and should work as a blueprint for your code (
> https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0).
>
> - You are correct in changing the backend to 'threading' for this
> particular case because of the pickling error.
>
> - However, I think that the example for nested_cv.py didn't work from the
> start, even without parallelization. The last change was 6 years ago, and
> I'm afraid that things changed in between and the code wasn't updated. I
> opened an issue on github to keep track of it (
> https://github.com/PyMVPA/PyMVPA/issues/559).
>
> On 24/11/2017 21:57, marco tettamanti wrote:
>> Dear Matteo,
>> thank you for kindly replying!
>>
>> Yes, I do have the latest versions of joblib (0.11) and sklearn (0.19.1), see 
>> at the bottom of this email.
>>
>> The problem seems independent of either running in jupyter, or evoking 
>> ipython or python directly in the console.
>>
>> I am now wondering whether there may be something wrong in my snippet.
>> When first running your gist, I encountered an:
>>
>>         UnboundLocalError: local variable 'best_clf' referenced before assignment
>>
>> which I solved by moving the best_clf declaration a few lines down:
>>
>> -----------------------------------------
>> #best_clfs = {}  #moved down 7 lines
>> confusion = ConfusionMatrix()
>> verbose(1, "Estimating error using nested CV for model selection")
>> partitioner = partitionerCD
>> splitter = Splitter('partitions')
>> tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
>>                          for isplit, partitions in 
>> enumerate(partitionerCD.generate(fds)))
>> best_clfs = {}
>> for tm in tms:
>>     confusion += tm.ca.stats
>>     best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
>> -----------------------------------------
>>
>> But now, running the snippet in ipython/python specifically for the SVM 
>> parallelization issue, I saw the error message popping up again:
>>
>>          UnboundLocalError: local variable 'best_clf' referenced before 
>> assignment
>>
>> May this be the culprit? As a reminder, the full snippet I am using is 
>> included in my previous email.
>>
>> Thank you and very best wishes,
>> Marco
>>
>>
>> In [21]: mvpa2.wtf(exclude=['runtime','process']) ##other possible arguments 
>> (['sources',
>> Out[21]:
>> Current date:   2017-11-24 21:23
>> PyMVPA:
>>  Version:       2.6.3
>>  Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
>>  Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>>  Version control (GIT):
>>  GIT information could not be obtained due 
>> "/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
>> SYSTEM:
>>  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
>>  Distribution:  debian/buster/sid
>> EXTERNALS:
>>  Present:       atlas_fsl, cPickle, ctypes, good 
>> scipy.stats.rv_continuous._reduce_func(floc,fscale), good 
>> scipy.stats.rv_discrete.ppf, griddata, gzip, h5py, hdf5, ipython, joblib, 
>> liblapack.so, libsvm, libsvm verbosity control, lxml, matplotlib, mdp, mdp ge 
>> 2.4, mock, nibabel, nose, numpy, numpy_correct_unique, pprocess, pylab, pylab 
>> plottable, pywt, pywt wp reconstruct, reportlab, running ipython env, scipy, 
>> skl, statsmodels
>>  Absent:        afni-3dinfo, atlas_pymvpa, cran-energy, datalad, elasticnet, 
>> glmnet, good scipy.stats.rdist, hcluster, lars, mass, nipy, nipy.neurospin, 
>> numpydoc, openopt, pywt wp reconstruct fixed, rpy2, scipy.weave, sg ge 0.6.4, 
>> sg ge 0.6.5, sg_fixedcachesize, shogun, shogun.krr, shogun.lightsvm, 
>> shogun.mpd, shogun.svmocas, shogun.svrlight, weave
>>  Versions of critical externals:
>>   ctypes      : 1.1.0
>>   h5py        : 2.7.1
>>   hdf5        : 1.10.0
>>   ipython     : 5.5.0
>>   joblib      : 0.11
>>   lxml        : 4.1.0
>>   matplotlib  : 2.0.0
>>   mdp         : 3.5
>>   mock        : 2.0.0
>>   nibabel     : 2.3.0dev
>>   numpy       : 1.13.1
>>   pprocess    : 0.5
>>   pywt        : 0.5.1
>>   reportlab   : 3.4.0
>>   scipy       : 0.19.1
>>   skl         : 0.19.1
>>  Matplotlib backend: TkAgg
>>
>>
>>> On 24/11/2017 17:32, Matteo Visconti di Oleggio Castello wrote:
>>>
>>> Hi Marco,
>>>
>>> some ideas in random order
>>>
>>> - what version of sklearn/joblib are you using? I would make sure to use
>>> the latest version (0.11), perhaps not importing it from sklearn (unless
>>> you have the latest sklearn version, 0.19.1)
>>> - are you running the code in a jupyter notebook? There might be some
>>> issues with that (see https://github.com/joblib/joblib/issues/174). As a
>>> test you might try to convert your notebook to a script and then run it
>>>
>>>
>>>
>>> On 23/11/2017 12:07, marco tettamanti wrote:
>>>> Dear Matteo (and others),
>>>> sorry, I am again asking for your help!
>>>>
>>>> I have experimented with the analysis of my dataset using an adaptation of 
>>>> your joblib-based gist.
>>>> As I wrote before, it works perfectly, but not with some classifiers: SVM 
>>>> classifiers always cause the code to terminate with an error.
>>>>
>>>> If I set:
>>>>         myclassif=clfswh['!gnpp','!skl','svm']    #Note that 'gnnp' and 
>>>> 'skl' were excluded for independent reasons
>>>> the code runs through without errors.
>>>>
>>>> However, with:
>>>>         myclassif=clfswh['!gnpp','!skl']
>>>> I get the following error:
>>>>         MaybeEncodingError: Error sending result: 
>>>> '[TransferMeasure(measure=SVM(svm_impl='C_SVC', kernel=LinearLSKernel(), 
>>>> weight=[], probability=1,
>>>>          weight_label=[]), splitter=Splitter(space='partitions'), 
>>>> postproc=BinaryFxNode(space='targets'), enable_ca=['stats'])]'. Reason: 
>>>> 'TypeError("can't
>>>>         pickle SwigPyObject objects",)'
>>>>
>>>> After googling for what may cause this particular error, I have found that 
>>>> the situation improves slightly (i.e. more splits executed, sometimes even 
>>>> all splits) by importing the following:
>>>>         import os
>>>>         from sklearn.externals.joblib import Parallel, delayed
>>>>         from sklearn.externals.joblib.parallel import parallel_backend
>>>> and then specifying just before 'Parallel(n_jobs=2)':
>>>>         with parallel_backend('threading'):
>>>> However, also in this case, the code invariably terminates with a long 
>>>> error message (I only report an extract, but in case I can send the whole 
>>>> error message):
>>>>         <type 'str'>: (<type 'exceptions.UnicodeEncodeError'>, 
>>>> UnicodeEncodeError('ascii',
>>>> u'JoblibAttributeError\n___________________________________________________________________________\nMultiprocessing
>>>> exception:\n...........................................................................\n/usr/lib/python2.7/runpy.py 
>>>> in
>>>> _run_module_as_main(mod_name=\'ipykernel_launcher\', alter_argv=1)\n    
>>>> 169     pkg_name = mod_name.rpartition(\'.\')[0]\n    170
>>>>        main_globals = sys.modules["__main__"].__dict__\n    171     if 
>>>> alter_argv:\n    172         sys.argv[0] = fname\n 173     return 
>>>> _run_code(code,
>>>>        main_globals, None,\n--> 174
>>>>
>>>>
>>>> I think I have sort of understood that the problem is due to some failure 
>>>> in pickling the parallelized jobs, but I have no clues if and how it can be 
>>>> solved.
>>>> Do you have any suggestions?
>>>>
>>>> Thank you and very best wishes,
>>>> Marco
>>>>
>>>> p.s. This is again the full code:
>>>>
>>>> ########## * ##########
>>>> ##########
>>>>
>>>> PyMVPA:
>>>>  Version:       2.6.3
>>>>  Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
>>>>  Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>>>>  Version control (GIT):
>>>>  GIT information could not be obtained due 
>>>> "/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
>>>> SYSTEM:
>>>>  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
>>>>
>>>>
>>>> print fds.summary()
>>>> Dataset: 36x534 at float32, <sa: chunks,targets,time_coords,time_indices>, 
>>>> <fa: voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
>>>> stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
>>>> No details due to large number of targets or chunks. Increase maxc and maxt 
>>>> if desired
>>>> Summary for targets across chunks
>>>>   targets mean std min max #chunks
>>>>     C      0.5 0.5  0   1     18
>>>>     D      0.5 0.5  0   1     18
>>>>
>>>>
>>>> #Evaluate prevalent best classifier with nested crossvalidation
>>>> verbose.level = 5
>>>>
>>>> partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'), 
>>>> Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
>>>> # training partitions
>>>> for fds_ in partitionerCD.generate(fds):
>>>>     training = fds[fds_.sa.partitions == 1]
>>>>     #print list(zip(training.sa.chunks, training.sa.targets))
>>>> # testing partitions
>>>> for fds_ in partitionerCD.generate(fds):
>>>>     testing = fds[fds_.sa.partitions == 2]
>>>>     #print list(zip(testing.sa.chunks, testing.sa.targets))
>>>>
>>>> #Helper function (partitionerCD recursively acting on dstrain, rather than 
>>>> on fds):
>>>> def select_best_clf(dstrain_, clfs):
>>>>     """Select best model according to CVTE
>>>>     Helper function which we will use twice -- once for proper nested
>>>>     cross-validation, and once to see how big an optimistic bias due
>>>>     to model selection could be if we simply provide an entire dataset.
>>>>     Parameters
>>>>     ----------
>>>>     dstrain_ : Dataset
>>>>     clfs : list of Classifiers
>>>>       Which classifiers to explore
>>>>     Returns
>>>>     -------
>>>>     best_clf, best_error
>>>>     """
>>>>     best_error = None
>>>>     for clf in clfs:
>>>>         cv = CrossValidation(clf, partitionerCD)
>>>>         # unfortunately we don't have ability to reassign clf atm
>>>>         # cv.transerror.clf = clf
>>>>         try:
>>>>             error = np.mean(cv(dstrain_))
>>>>         except LearnerError, e:
>>>>             # skip the classifier if data was not appropriate and it
>>>>             # failed to learn/predict at all
>>>>             continue
>>>>         if best_error is None or error < best_error:
>>>>             best_clf = clf
>>>>             best_error = error
>>>>         verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
>>>>     verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
>>>>             % (len(clfs), best_clf.descr, best_error))
>>>>     return best_clf, best_error
>>>>
>>>> # This function will run all classifiers for one single partitions
>>>> myclassif=clfswh['!gnpp','!skl'][5:6]  #Testing a single SVM classifier
>>>> def _run_one_partition(isplit, partitions, classifiers=myclassif): #see §§
>>>>     verbose(2, "Processing split #%i" % isplit)
>>>>     dstrain, dstest = list(splitter.generate(partitions))
>>>>     best_clf, best_error = select_best_clf(dstrain, classifiers)
>>>>     # now that we have the best classifier, lets assess its transfer
>>>>     # to the testing dataset while training on entire training
>>>>     tm = TransferMeasure(best_clf, 
>>>> splitter,postproc=BinaryFxNode(mean_mismatch_error,space='targets'), 
>>>> enable_ca=['stats'])
>>>>     tm(partitions)
>>>>     return tm
>>>>
>>>> #import os
>>>> #from sklearn.externals.joblib import Parallel, delayed
>>>> #from sklearn.externals.joblib.parallel import parallel_backend
>>>>
>>>> # Parallel estimate error using nested CV for model selection
>>>> confusion = ConfusionMatrix()
>>>> verbose(1, "Estimating error using nested CV for model selection")
>>>> partitioner = partitionerCD
>>>> splitter = Splitter('partitions')
>>>> # Here we are using joblib Parallel to parallelize each partition
>>>> # Set n_jobs to the number of available cores (or how many you want to use)
>>>> #with parallel_backend('threading'):
>>>> #    tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
>>>> tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
>>>>                          for isplit, partitions in 
>>>> enumerate(partitionerCD.generate(fds)))
>>>> # Parallel retuns a list with the results of each parallel loop, so we need to
>>>> # unravel it to get the confusion matrix
>>>> best_clfs = {}
>>>> for tm in tms:
>>>>     confusion += tm.ca.stats
>>>>     best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
>>>>
>>>> ##########
>>>> ########## * ##########
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 13/11/2017 09:12, marco tettamanti wrote:
>>>>> Dear Matteo,
>>>>> grazie mille, this is precisely the kind of thing I was looking for: it 
>>>>> works like charm!
>>>>> Ciao,
>>>>> Marco
>>>>>
>>>>>> On 11/11/2017 21:44, Matteo Visconti di Oleggio Castello wrote:
>>>>>>
>>>>>> Hi Marco,
>>>>>>
>>>>>> in your case, I would then recommend looking into joblib to parallelize
>>>>>> your for loops (https://pythonhosted.org/joblib/parallel.html).
>>>>>>
>>>>>> As an example, here's a gist containing part of the PyMVPA's nested_cv
>>>>>> example where I parallelized the loop across partitions. I feel this is
>>>>>> what you might want to do in your case, since you have a lot more folds.
>>>>>>
>>>>>> Here's the gist:
>>>>>> https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0
>>>>>>
>>>>>>
>>>>>> On 10/11/2017 21:13, marco tettamanti wrote:
>>>>>>> Dear Matteo,
>>>>>>> thank you for the willingness to look into my code.
>>>>>>>
>>>>>>> This is taken almost verbatim from 
>>>>>>> http://dev.pymvpa.org/examples/nested_cv.html, except for the 
>>>>>>> leave-one-pair-out partitioning, and a slight reduction in the number of 
>>>>>>> classifiers (in the original example, they are around 45).
>>>>>>>
>>>>>>> Any help or suggestion would be greatly appreciated!
>>>>>>> All the best,
>>>>>>> Marco
>>>>>>>
>>>>>>>
>>>>>>> ########## * ##########
>>>>>>> ##########
>>>>>>>
>>>>>>> PyMVPA:
>>>>>>>  Version:       2.6.3
>>>>>>>  Hash: 9c07e8827819aaa79ff15d2db10c420a876d7785
>>>>>>>  Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>>>>>>>  Version control (GIT):
>>>>>>>  GIT information could not be obtained due 
>>>>>>> "/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
>>>>>>> SYSTEM:
>>>>>>>  OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 
>>>>>>> (2017-10-15)
>>>>>>>
>>>>>>>
>>>>>>> print fds.summary()
>>>>>>> Dataset: 36x534 at float32, <sa: chunks,targets,time_coords,time_indices>, 
>>>>>>> <fa: voxel_indices>, <a: 
>>>>>>> imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
>>>>>>> stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
>>>>>>> No details due to large number of targets or chunks. Increase maxc and 
>>>>>>> maxt if desired
>>>>>>> Summary for targets across chunks
>>>>>>>   targets mean std min max #chunks
>>>>>>>     C      0.5 0.5  0   1     18
>>>>>>>     D      0.5 0.5  0   1     18
>>>>>>>
>>>>>>>
>>>>>>> #Evaluate prevalent best classifier with nested crossvalidation
>>>>>>> verbose.level = 5
>>>>>>>
>>>>>>> partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'), 
>>>>>>> Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
>>>>>>> # training partitions
>>>>>>> for fds_ in partitionerCD.generate(fds):
>>>>>>>     training = fds[fds_.sa.partitions == 1]
>>>>>>>     #print list(zip(training.sa.chunks, training.sa.targets))
>>>>>>> # testing partitions
>>>>>>> for fds_ in partitionerCD.generate(fds):
>>>>>>>     testing = fds[fds_.sa.partitions == 2]
>>>>>>>     #print list(zip(testing.sa.chunks, testing.sa.targets))
>>>>>>>
>>>>>>> #Helper function (partitionerCD recursively acting on dstrain, rather 
>>>>>>> than on fds):
>>>>>>> def select_best_clf(dstrain_, clfs):
>>>>>>>     """Select best model according to CVTE
>>>>>>>     Helper function which we will use twice -- once for proper nested
>>>>>>>     cross-validation, and once to see how big an optimistic bias due
>>>>>>>     to model selection could be if we simply provide an entire dataset.
>>>>>>>     Parameters
>>>>>>>     ----------
>>>>>>>     dstrain_ : Dataset
>>>>>>>     clfs : list of Classifiers
>>>>>>>       Which classifiers to explore
>>>>>>>     Returns
>>>>>>>     -------
>>>>>>>     best_clf, best_error
>>>>>>>     """
>>>>>>>     best_error = None
>>>>>>>     for clf in clfs:
>>>>>>>         cv = CrossValidation(clf, partitionerCD)
>>>>>>>         # unfortunately we don't have ability to reassign clf atm
>>>>>>>         # cv.transerror.clf = clf
>>>>>>>         try:
>>>>>>>             error = np.mean(cv(dstrain_))
>>>>>>>         except LearnerError, e:
>>>>>>>             # skip the classifier if data was not appropriate and it
>>>>>>>             # failed to learn/predict at all
>>>>>>>             continue
>>>>>>>         if best_error is None or error < best_error:
>>>>>>>             best_clf = clf
>>>>>>>             best_error = error
>>>>>>>         verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
>>>>>>>     verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
>>>>>>>             % (len(clfs), best_clf.descr, best_error))
>>>>>>>     return best_clf, best_error
>>>>>>>
>>>>>>> #Estimate error using nested CV for model selection:
>>>>>>> best_clfs = {}
>>>>>>> confusion = ConfusionMatrix()
>>>>>>> verbose(1, "Estimating error using nested CV for model selection")
>>>>>>> partitioner = partitionerCD
>>>>>>> splitter = Splitter('partitions')
>>>>>>> for isplit, partitions in enumerate(partitionerCD.generate(fds)):
>>>>>>>     verbose(2, "Processing split #%i" % isplit)
>>>>>>>     dstrain, dstest = list(splitter.generate(partitions))
>>>>>>>     best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
>>>>>>>     best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
>>>>>>>     # now that we have the best classifier, lets assess its transfer
>>>>>>>     # to the testing dataset while training on entire training
>>>>>>>     tm = TransferMeasure(best_clf, splitter,
>>>>>>> postproc=BinaryFxNode(mean_mismatch_error, space='targets'), 
>>>>>>> enable_ca=['stats'])
>>>>>>>     tm(partitions)
>>>>>>>     confusion += tm.ca.stats
>>>>>>>
>>>>>>> ##########
>>>>>>> ########## * ##########
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On 10/11/2017 15:43, Matteo Visconti di Oleggio Castello wrote:
>>>>>>>>
>>>>>>>> What do you mean with "cycling over approx 40 different classifiers"? Are
>>>>>>>> you testing different classifiers? If that's the case, a possibility is to
>>>>>>>> create a script that takes as argument the type of classifiers and runs the
>>>>>>>> classification across all folds. In that way you can submit 40 jobs and
>>>>>>>> parallelize across classifiers.
>>>>>>>>
>>>>>>>> If that's not the case, because the folds are independent and deterministic
>>>>>>>> I would create a script that performs the classification on blocks of folds
>>>>>>>> (say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
>>>>>>>> to parallelize there.
>>>>>>>>
>>>>>>>> I think that if you send a snippet of the code you're using it can be more
>>>>>>>> evident which are good points for parallelization.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/11/2017 09:57, marco tettamanti wrote:
>>>>>>>>> Dear Matteo and Nick,
>>>>>>>>> thank you for your responses.
>>>>>>>>> I take the occasion to ask some follow-up questions, because I am struggling to
>>>>>>>>> make pymvpa2 computations faster and more efficient.
>>>>>>>>>
>>>>>>>>> I often find myself in the situation of giving up with a particular analysis,
>>>>>>>>> because it is going to take far more time that I can bear (weeks, months!). This
>>>>>>>>> happens particularly with searchlight permutation testing (gnbsearchlight is
>>>>>>>>> much faster, but does not support pprocess), and nested cross-validation.
>>>>>>>>> As for the latter, for example, I recently wanted to run nested cross-validation
>>>>>>>>> in a sample of 18 patients and 18 controls (1 image x subject), training the
>>>>>>>>> classifiers to discriminate patients from controls in a leave-one-pair-out
>>>>>>>>> partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
>>>>>>>>> cycling over approx 40 different classifiers takes about 2 hours for each fold
>>>>>>>>> on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
>>>>>>>>> days for all 324 folds!
>>>>>>>>> The same server is equipped with 32 CPUs. With full parallelization, the same
>>>>>>>>> analysis may be completed in less than one day. This is the reason of my
>>>>>>>>> interest and questions about parallelization.
>>>>>>>>>
>>>>>>>>> Is there anything that you experts do in such situations to speed up or make the
>>>>>>>>> computation more efficient?
>>>>>>>>>
>>>>>>>>> Thank you again and best wishes,
>>>>>>>>> Marco
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 10/11/2017 10:07, Nick Oosterhof wrote:
>>>>>>>>>>
>>>>>>>>>> There have been some plans / minor attempts for using parallelisation more
>>>>>>>>>> parallel, but as far as I know we only support pprocces, and only for (1)
>>>>>>>>>> searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
>>>>>>>>>> do remember that parallelisation of other functions was challenging due to
>>>>>>>>>> some getting the conditional attributes set right, but this is long time
>>>>>>>>>> ago.
>>>>>>>>>>
>>>>>>>>>>> On 09/11/2017 18:35, Matteo Visconti di Oleggio Castello wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Marco,
>>>>>>>>>>> AFAIK, there is no support for parallelization at the level of
>>>>>>>>>>> cross-validation. Usually for a small ROI (such a searchlight) and with
>>>>>>>>>>> standard CV schemes, the process is quite fast, and the bottleneck is
>>>>>>>>>>> really the number of searchlights to be computed (for which parallelization
>>>>>>>>>>> exists).
>>>>>>>>>>>
>>>>>>>>>>> In my experience, we tend to parallelize at the level of individual
>>>>>>>>>>> participants; for example we might set up a searchlight analysis with
>>>>>>>>>>> however n_procs you can have, and then submit one such job for every
>>>>>>>>>>> participant to a cluster (using either torque or condor).
>>>>>>>>>>>
>>>>>>>>>>> HTH,
>>>>>>>>>>> Matteo
>>>>>>>>>>>
>>>>>>>>>>> On 09/11/2017 10:08, marco tettamanti wrote:
>>>>>>>>>>>> Dear all,
>>>>>>>>>>>> forgive me if this has already been asked in the past, but I was wondering
>>>>>>>>>>>> whether there has been any development meanwhile.
>>>>>>>>>>>>
>>>>>>>>>>>> Are there any chances that one can generally apply parallel computing (multiple
>>>>>>>>>>>> CPUs or clusters) with pymvpa2, in addition to what is already implemented for
>>>>>>>>>>>> searchlight (pprocess)? That is, also for general cross-validation, nested
>>>>>>>>>>>> cross-validation, permutation testing, RFE, etc.?
>>>>>>>>>>>>
>>>>>>>>>>>> Has anyone had succesful experience with parallelization schemes such as
>>>>>>>>>>>> ipyparallel, condor or else?
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you and best wishes!
>>>>>>>>>>>> Marco
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Marco Tettamanti, Ph.D.
>>>>>>>> Nuclear Medicine Department & Division of Neuroscience
>>>>>>>> IRCCS San Raffaele Scientific Institute
>>>>>>>> Via Olgettina 58
>>>>>>>> I-20132 Milano, Italy
>>>>>>>> Phone ++39-02-26434888
>>>>>>>> Fax ++39-02-26434892
>>>>>>>> Email:tettamanti.marco at hsr.it
>>>>>>>> Skype: mtettamanti
>>>>>>>> http://scholar.google.it/citations?user=x4qQl4AAAAAJ
>>>>>>>
>>>>>
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20171129/26fcb148/attachment-0001.html>


More information about the Pkg-ExpPsy-PyMVPA mailing list