[pymvpa] Parallelization
marco tettamanti
mrctttmnt at gmail.com
Wed Nov 29 11:26:19 UTC 2017
Hi Matteo,
thank you again for your precious help!
I have spotted a small typo in your gist. Line 105 should read:
for tm in tms_parallel:
Apart from the the problems with the nested_cv.py example that you refer to, I
am experiencing some more troubles:
- backend='threading' does not seem to parallelize on my machine, only
backend='multiprocessing'
- while your 'nested_cv_parallel.py' gist runs smoothly, when I adapt it on my
dataset, partitioner, etc., I get the following error:
tms_parallel, best_clfs_parallel = zip(*out_parallel)
TypeError: zip argument #1 must support iteration
I guess I am putting my nested analysis in standby for the moment. Hopefully
these issues will soon be solved.
Thank you for all!
Marco
> On 28/11/2017 15:45, Matteo Visconti di Oleggio Castello wrote:
>
> Hi Marco,
>
> I think there are a bunch of conflated issues here.
>
> - First, there was an error in my code, and that's why you got the
> error " UnboundLocalError:
> local variable 'best_clf' referenced before assignment". I updated the
> gist, and now the example code for running the parallelization should be ok
> and should work as a blueprint for your code (
> https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0).
>
> - You are correct in changing the backend to 'threading' for this
> particular case because of the pickling error.
>
> - However, I think that the example for nested_cv.py didn't work from the
> start, even without parallelization. The last change was 6 years ago, and
> I'm afraid that things changed in between and the code wasn't updated. I
> opened an issue on github to keep track of it (
> https://github.com/PyMVPA/PyMVPA/issues/559).
>
> On 24/11/2017 21:57, marco tettamanti wrote:
>> Dear Matteo,
>> thank you for kindly replying!
>>
>> Yes, I do have the latest versions of joblib (0.11) and sklearn (0.19.1), see
>> at the bottom of this email.
>>
>> The problem seems independent of either running in jupyter, or evoking
>> ipython or python directly in the console.
>>
>> I am now wondering whether there may be something wrong in my snippet.
>> When first running your gist, I encountered an:
>>
>> UnboundLocalError: local variable 'best_clf' referenced before assignment
>>
>> which I solved by moving the best_clf declaration a few lines down:
>>
>> -----------------------------------------
>> #best_clfs = {} #moved down 7 lines
>> confusion = ConfusionMatrix()
>> verbose(1, "Estimating error using nested CV for model selection")
>> partitioner = partitionerCD
>> splitter = Splitter('partitions')
>> tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
>> for isplit, partitions in
>> enumerate(partitionerCD.generate(fds)))
>> best_clfs = {}
>> for tm in tms:
>> confusion += tm.ca.stats
>> best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
>> -----------------------------------------
>>
>> But now, running the snippet in ipython/python specifically for the SVM
>> parallelization issue, I saw the error message popping up again:
>>
>> UnboundLocalError: local variable 'best_clf' referenced before
>> assignment
>>
>> May this be the culprit? As a reminder, the full snippet I am using is
>> included in my previous email.
>>
>> Thank you and very best wishes,
>> Marco
>>
>>
>> In [21]: mvpa2.wtf(exclude=['runtime','process']) ##other possible arguments
>> (['sources',
>> Out[21]:
>> Current date: 2017-11-24 21:23
>> PyMVPA:
>> Version: 2.6.3
>> Hash: 9c07e8827819aaa79ff15d2db10c420a876d7785
>> Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>> Version control (GIT):
>> GIT information could not be obtained due
>> "/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
>> SYSTEM:
>> OS: posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
>> Distribution: debian/buster/sid
>> EXTERNALS:
>> Present: atlas_fsl, cPickle, ctypes, good
>> scipy.stats.rv_continuous._reduce_func(floc,fscale), good
>> scipy.stats.rv_discrete.ppf, griddata, gzip, h5py, hdf5, ipython, joblib,
>> liblapack.so, libsvm, libsvm verbosity control, lxml, matplotlib, mdp, mdp ge
>> 2.4, mock, nibabel, nose, numpy, numpy_correct_unique, pprocess, pylab, pylab
>> plottable, pywt, pywt wp reconstruct, reportlab, running ipython env, scipy,
>> skl, statsmodels
>> Absent: afni-3dinfo, atlas_pymvpa, cran-energy, datalad, elasticnet,
>> glmnet, good scipy.stats.rdist, hcluster, lars, mass, nipy, nipy.neurospin,
>> numpydoc, openopt, pywt wp reconstruct fixed, rpy2, scipy.weave, sg ge 0.6.4,
>> sg ge 0.6.5, sg_fixedcachesize, shogun, shogun.krr, shogun.lightsvm,
>> shogun.mpd, shogun.svmocas, shogun.svrlight, weave
>> Versions of critical externals:
>> ctypes : 1.1.0
>> h5py : 2.7.1
>> hdf5 : 1.10.0
>> ipython : 5.5.0
>> joblib : 0.11
>> lxml : 4.1.0
>> matplotlib : 2.0.0
>> mdp : 3.5
>> mock : 2.0.0
>> nibabel : 2.3.0dev
>> numpy : 1.13.1
>> pprocess : 0.5
>> pywt : 0.5.1
>> reportlab : 3.4.0
>> scipy : 0.19.1
>> skl : 0.19.1
>> Matplotlib backend: TkAgg
>>
>>
>>> On 24/11/2017 17:32, Matteo Visconti di Oleggio Castello wrote:
>>>
>>> Hi Marco,
>>>
>>> some ideas in random order
>>>
>>> - what version of sklearn/joblib are you using? I would make sure to use
>>> the latest version (0.11), perhaps not importing it from sklearn (unless
>>> you have the latest sklearn version, 0.19.1)
>>> - are you running the code in a jupyter notebook? There might be some
>>> issues with that (see https://github.com/joblib/joblib/issues/174). As a
>>> test you might try to convert your notebook to a script and then run it
>>>
>>>
>>>
>>> On 23/11/2017 12:07, marco tettamanti wrote:
>>>> Dear Matteo (and others),
>>>> sorry, I am again asking for your help!
>>>>
>>>> I have experimented with the analysis of my dataset using an adaptation of
>>>> your joblib-based gist.
>>>> As I wrote before, it works perfectly, but not with some classifiers: SVM
>>>> classifiers always cause the code to terminate with an error.
>>>>
>>>> If I set:
>>>> myclassif=clfswh['!gnpp','!skl','svm'] #Note that 'gnnp' and
>>>> 'skl' were excluded for independent reasons
>>>> the code runs through without errors.
>>>>
>>>> However, with:
>>>> myclassif=clfswh['!gnpp','!skl']
>>>> I get the following error:
>>>> MaybeEncodingError: Error sending result:
>>>> '[TransferMeasure(measure=SVM(svm_impl='C_SVC', kernel=LinearLSKernel(),
>>>> weight=[], probability=1,
>>>> weight_label=[]), splitter=Splitter(space='partitions'),
>>>> postproc=BinaryFxNode(space='targets'), enable_ca=['stats'])]'. Reason:
>>>> 'TypeError("can't
>>>> pickle SwigPyObject objects",)'
>>>>
>>>> After googling for what may cause this particular error, I have found that
>>>> the situation improves slightly (i.e. more splits executed, sometimes even
>>>> all splits) by importing the following:
>>>> import os
>>>> from sklearn.externals.joblib import Parallel, delayed
>>>> from sklearn.externals.joblib.parallel import parallel_backend
>>>> and then specifying just before 'Parallel(n_jobs=2)':
>>>> with parallel_backend('threading'):
>>>> However, also in this case, the code invariably terminates with a long
>>>> error message (I only report an extract, but in case I can send the whole
>>>> error message):
>>>> <type 'str'>: (<type 'exceptions.UnicodeEncodeError'>,
>>>> UnicodeEncodeError('ascii',
>>>> u'JoblibAttributeError\n___________________________________________________________________________\nMultiprocessing
>>>> exception:\n...........................................................................\n/usr/lib/python2.7/runpy.py
>>>> in
>>>> _run_module_as_main(mod_name=\'ipykernel_launcher\', alter_argv=1)\n
>>>> 169 pkg_name = mod_name.rpartition(\'.\')[0]\n 170
>>>> main_globals = sys.modules["__main__"].__dict__\n 171 if
>>>> alter_argv:\n 172 sys.argv[0] = fname\n 173 return
>>>> _run_code(code,
>>>> main_globals, None,\n--> 174
>>>>
>>>>
>>>> I think I have sort of understood that the problem is due to some failure
>>>> in pickling the parallelized jobs, but I have no clues if and how it can be
>>>> solved.
>>>> Do you have any suggestions?
>>>>
>>>> Thank you and very best wishes,
>>>> Marco
>>>>
>>>> p.s. This is again the full code:
>>>>
>>>> ########## * ##########
>>>> ##########
>>>>
>>>> PyMVPA:
>>>> Version: 2.6.3
>>>> Hash: 9c07e8827819aaa79ff15d2db10c420a876d7785
>>>> Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>>>> Version control (GIT):
>>>> GIT information could not be obtained due
>>>> "/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
>>>> SYSTEM:
>>>> OS: posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
>>>>
>>>>
>>>> print fds.summary()
>>>> Dataset: 36x534 at float32, <sa: chunks,targets,time_coords,time_indices>,
>>>> <fa: voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
>>>> stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
>>>> No details due to large number of targets or chunks. Increase maxc and maxt
>>>> if desired
>>>> Summary for targets across chunks
>>>> targets mean std min max #chunks
>>>> C 0.5 0.5 0 1 18
>>>> D 0.5 0.5 0 1 18
>>>>
>>>>
>>>> #Evaluate prevalent best classifier with nested crossvalidation
>>>> verbose.level = 5
>>>>
>>>> partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
>>>> Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
>>>> # training partitions
>>>> for fds_ in partitionerCD.generate(fds):
>>>> training = fds[fds_.sa.partitions == 1]
>>>> #print list(zip(training.sa.chunks, training.sa.targets))
>>>> # testing partitions
>>>> for fds_ in partitionerCD.generate(fds):
>>>> testing = fds[fds_.sa.partitions == 2]
>>>> #print list(zip(testing.sa.chunks, testing.sa.targets))
>>>>
>>>> #Helper function (partitionerCD recursively acting on dstrain, rather than
>>>> on fds):
>>>> def select_best_clf(dstrain_, clfs):
>>>> """Select best model according to CVTE
>>>> Helper function which we will use twice -- once for proper nested
>>>> cross-validation, and once to see how big an optimistic bias due
>>>> to model selection could be if we simply provide an entire dataset.
>>>> Parameters
>>>> ----------
>>>> dstrain_ : Dataset
>>>> clfs : list of Classifiers
>>>> Which classifiers to explore
>>>> Returns
>>>> -------
>>>> best_clf, best_error
>>>> """
>>>> best_error = None
>>>> for clf in clfs:
>>>> cv = CrossValidation(clf, partitionerCD)
>>>> # unfortunately we don't have ability to reassign clf atm
>>>> # cv.transerror.clf = clf
>>>> try:
>>>> error = np.mean(cv(dstrain_))
>>>> except LearnerError, e:
>>>> # skip the classifier if data was not appropriate and it
>>>> # failed to learn/predict at all
>>>> continue
>>>> if best_error is None or error < best_error:
>>>> best_clf = clf
>>>> best_error = error
>>>> verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
>>>> verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
>>>> % (len(clfs), best_clf.descr, best_error))
>>>> return best_clf, best_error
>>>>
>>>> # This function will run all classifiers for one single partitions
>>>> myclassif=clfswh['!gnpp','!skl'][5:6] #Testing a single SVM classifier
>>>> def _run_one_partition(isplit, partitions, classifiers=myclassif): #see §§
>>>> verbose(2, "Processing split #%i" % isplit)
>>>> dstrain, dstest = list(splitter.generate(partitions))
>>>> best_clf, best_error = select_best_clf(dstrain, classifiers)
>>>> # now that we have the best classifier, lets assess its transfer
>>>> # to the testing dataset while training on entire training
>>>> tm = TransferMeasure(best_clf,
>>>> splitter,postproc=BinaryFxNode(mean_mismatch_error,space='targets'),
>>>> enable_ca=['stats'])
>>>> tm(partitions)
>>>> return tm
>>>>
>>>> #import os
>>>> #from sklearn.externals.joblib import Parallel, delayed
>>>> #from sklearn.externals.joblib.parallel import parallel_backend
>>>>
>>>> # Parallel estimate error using nested CV for model selection
>>>> confusion = ConfusionMatrix()
>>>> verbose(1, "Estimating error using nested CV for model selection")
>>>> partitioner = partitionerCD
>>>> splitter = Splitter('partitions')
>>>> # Here we are using joblib Parallel to parallelize each partition
>>>> # Set n_jobs to the number of available cores (or how many you want to use)
>>>> #with parallel_backend('threading'):
>>>> # tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
>>>> tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
>>>> for isplit, partitions in
>>>> enumerate(partitionerCD.generate(fds)))
>>>> # Parallel retuns a list with the results of each parallel loop, so we need to
>>>> # unravel it to get the confusion matrix
>>>> best_clfs = {}
>>>> for tm in tms:
>>>> confusion += tm.ca.stats
>>>> best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
>>>>
>>>> ##########
>>>> ########## * ##########
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 13/11/2017 09:12, marco tettamanti wrote:
>>>>> Dear Matteo,
>>>>> grazie mille, this is precisely the kind of thing I was looking for: it
>>>>> works like charm!
>>>>> Ciao,
>>>>> Marco
>>>>>
>>>>>> On 11/11/2017 21:44, Matteo Visconti di Oleggio Castello wrote:
>>>>>>
>>>>>> Hi Marco,
>>>>>>
>>>>>> in your case, I would then recommend looking into joblib to parallelize
>>>>>> your for loops (https://pythonhosted.org/joblib/parallel.html).
>>>>>>
>>>>>> As an example, here's a gist containing part of the PyMVPA's nested_cv
>>>>>> example where I parallelized the loop across partitions. I feel this is
>>>>>> what you might want to do in your case, since you have a lot more folds.
>>>>>>
>>>>>> Here's the gist:
>>>>>> https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0
>>>>>>
>>>>>>
>>>>>> On 10/11/2017 21:13, marco tettamanti wrote:
>>>>>>> Dear Matteo,
>>>>>>> thank you for the willingness to look into my code.
>>>>>>>
>>>>>>> This is taken almost verbatim from
>>>>>>> http://dev.pymvpa.org/examples/nested_cv.html, except for the
>>>>>>> leave-one-pair-out partitioning, and a slight reduction in the number of
>>>>>>> classifiers (in the original example, they are around 45).
>>>>>>>
>>>>>>> Any help or suggestion would be greatly appreciated!
>>>>>>> All the best,
>>>>>>> Marco
>>>>>>>
>>>>>>>
>>>>>>> ########## * ##########
>>>>>>> ##########
>>>>>>>
>>>>>>> PyMVPA:
>>>>>>> Version: 2.6.3
>>>>>>> Hash: 9c07e8827819aaa79ff15d2db10c420a876d7785
>>>>>>> Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
>>>>>>> Version control (GIT):
>>>>>>> GIT information could not be obtained due
>>>>>>> "/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
>>>>>>> SYSTEM:
>>>>>>> OS: posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2
>>>>>>> (2017-10-15)
>>>>>>>
>>>>>>>
>>>>>>> print fds.summary()
>>>>>>> Dataset: 36x534 at float32, <sa: chunks,targets,time_coords,time_indices>,
>>>>>>> <fa: voxel_indices>, <a:
>>>>>>> imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
>>>>>>> stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
>>>>>>> No details due to large number of targets or chunks. Increase maxc and
>>>>>>> maxt if desired
>>>>>>> Summary for targets across chunks
>>>>>>> targets mean std min max #chunks
>>>>>>> C 0.5 0.5 0 1 18
>>>>>>> D 0.5 0.5 0 1 18
>>>>>>>
>>>>>>>
>>>>>>> #Evaluate prevalent best classifier with nested crossvalidation
>>>>>>> verbose.level = 5
>>>>>>>
>>>>>>> partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
>>>>>>> Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
>>>>>>> # training partitions
>>>>>>> for fds_ in partitionerCD.generate(fds):
>>>>>>> training = fds[fds_.sa.partitions == 1]
>>>>>>> #print list(zip(training.sa.chunks, training.sa.targets))
>>>>>>> # testing partitions
>>>>>>> for fds_ in partitionerCD.generate(fds):
>>>>>>> testing = fds[fds_.sa.partitions == 2]
>>>>>>> #print list(zip(testing.sa.chunks, testing.sa.targets))
>>>>>>>
>>>>>>> #Helper function (partitionerCD recursively acting on dstrain, rather
>>>>>>> than on fds):
>>>>>>> def select_best_clf(dstrain_, clfs):
>>>>>>> """Select best model according to CVTE
>>>>>>> Helper function which we will use twice -- once for proper nested
>>>>>>> cross-validation, and once to see how big an optimistic bias due
>>>>>>> to model selection could be if we simply provide an entire dataset.
>>>>>>> Parameters
>>>>>>> ----------
>>>>>>> dstrain_ : Dataset
>>>>>>> clfs : list of Classifiers
>>>>>>> Which classifiers to explore
>>>>>>> Returns
>>>>>>> -------
>>>>>>> best_clf, best_error
>>>>>>> """
>>>>>>> best_error = None
>>>>>>> for clf in clfs:
>>>>>>> cv = CrossValidation(clf, partitionerCD)
>>>>>>> # unfortunately we don't have ability to reassign clf atm
>>>>>>> # cv.transerror.clf = clf
>>>>>>> try:
>>>>>>> error = np.mean(cv(dstrain_))
>>>>>>> except LearnerError, e:
>>>>>>> # skip the classifier if data was not appropriate and it
>>>>>>> # failed to learn/predict at all
>>>>>>> continue
>>>>>>> if best_error is None or error < best_error:
>>>>>>> best_clf = clf
>>>>>>> best_error = error
>>>>>>> verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
>>>>>>> verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
>>>>>>> % (len(clfs), best_clf.descr, best_error))
>>>>>>> return best_clf, best_error
>>>>>>>
>>>>>>> #Estimate error using nested CV for model selection:
>>>>>>> best_clfs = {}
>>>>>>> confusion = ConfusionMatrix()
>>>>>>> verbose(1, "Estimating error using nested CV for model selection")
>>>>>>> partitioner = partitionerCD
>>>>>>> splitter = Splitter('partitions')
>>>>>>> for isplit, partitions in enumerate(partitionerCD.generate(fds)):
>>>>>>> verbose(2, "Processing split #%i" % isplit)
>>>>>>> dstrain, dstest = list(splitter.generate(partitions))
>>>>>>> best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
>>>>>>> best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
>>>>>>> # now that we have the best classifier, lets assess its transfer
>>>>>>> # to the testing dataset while training on entire training
>>>>>>> tm = TransferMeasure(best_clf, splitter,
>>>>>>> postproc=BinaryFxNode(mean_mismatch_error, space='targets'),
>>>>>>> enable_ca=['stats'])
>>>>>>> tm(partitions)
>>>>>>> confusion += tm.ca.stats
>>>>>>>
>>>>>>> ##########
>>>>>>> ########## * ##########
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On 10/11/2017 15:43, Matteo Visconti di Oleggio Castello wrote:
>>>>>>>>
>>>>>>>> What do you mean with "cycling over approx 40 different classifiers"? Are
>>>>>>>> you testing different classifiers? If that's the case, a possibility is to
>>>>>>>> create a script that takes as argument the type of classifiers and runs the
>>>>>>>> classification across all folds. In that way you can submit 40 jobs and
>>>>>>>> parallelize across classifiers.
>>>>>>>>
>>>>>>>> If that's not the case, because the folds are independent and deterministic
>>>>>>>> I would create a script that performs the classification on blocks of folds
>>>>>>>> (say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
>>>>>>>> to parallelize there.
>>>>>>>>
>>>>>>>> I think that if you send a snippet of the code you're using it can be more
>>>>>>>> evident which are good points for parallelization.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/11/2017 09:57, marco tettamanti wrote:
>>>>>>>>> Dear Matteo and Nick,
>>>>>>>>> thank you for your responses.
>>>>>>>>> I take the occasion to ask some follow-up questions, because I am struggling to
>>>>>>>>> make pymvpa2 computations faster and more efficient.
>>>>>>>>>
>>>>>>>>> I often find myself in the situation of giving up with a particular analysis,
>>>>>>>>> because it is going to take far more time that I can bear (weeks, months!). This
>>>>>>>>> happens particularly with searchlight permutation testing (gnbsearchlight is
>>>>>>>>> much faster, but does not support pprocess), and nested cross-validation.
>>>>>>>>> As for the latter, for example, I recently wanted to run nested cross-validation
>>>>>>>>> in a sample of 18 patients and 18 controls (1 image x subject), training the
>>>>>>>>> classifiers to discriminate patients from controls in a leave-one-pair-out
>>>>>>>>> partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
>>>>>>>>> cycling over approx 40 different classifiers takes about 2 hours for each fold
>>>>>>>>> on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
>>>>>>>>> days for all 324 folds!
>>>>>>>>> The same server is equipped with 32 CPUs. With full parallelization, the same
>>>>>>>>> analysis may be completed in less than one day. This is the reason of my
>>>>>>>>> interest and questions about parallelization.
>>>>>>>>>
>>>>>>>>> Is there anything that you experts do in such situations to speed up or make the
>>>>>>>>> computation more efficient?
>>>>>>>>>
>>>>>>>>> Thank you again and best wishes,
>>>>>>>>> Marco
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 10/11/2017 10:07, Nick Oosterhof wrote:
>>>>>>>>>>
>>>>>>>>>> There have been some plans / minor attempts for using parallelisation more
>>>>>>>>>> parallel, but as far as I know we only support pprocces, and only for (1)
>>>>>>>>>> searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
>>>>>>>>>> do remember that parallelisation of other functions was challenging due to
>>>>>>>>>> some getting the conditional attributes set right, but this is long time
>>>>>>>>>> ago.
>>>>>>>>>>
>>>>>>>>>>> On 09/11/2017 18:35, Matteo Visconti di Oleggio Castello wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Marco,
>>>>>>>>>>> AFAIK, there is no support for parallelization at the level of
>>>>>>>>>>> cross-validation. Usually for a small ROI (such a searchlight) and with
>>>>>>>>>>> standard CV schemes, the process is quite fast, and the bottleneck is
>>>>>>>>>>> really the number of searchlights to be computed (for which parallelization
>>>>>>>>>>> exists).
>>>>>>>>>>>
>>>>>>>>>>> In my experience, we tend to parallelize at the level of individual
>>>>>>>>>>> participants; for example we might set up a searchlight analysis with
>>>>>>>>>>> however n_procs you can have, and then submit one such job for every
>>>>>>>>>>> participant to a cluster (using either torque or condor).
>>>>>>>>>>>
>>>>>>>>>>> HTH,
>>>>>>>>>>> Matteo
>>>>>>>>>>>
>>>>>>>>>>> On 09/11/2017 10:08, marco tettamanti wrote:
>>>>>>>>>>>> Dear all,
>>>>>>>>>>>> forgive me if this has already been asked in the past, but I was wondering
>>>>>>>>>>>> whether there has been any development meanwhile.
>>>>>>>>>>>>
>>>>>>>>>>>> Are there any chances that one can generally apply parallel computing (multiple
>>>>>>>>>>>> CPUs or clusters) with pymvpa2, in addition to what is already implemented for
>>>>>>>>>>>> searchlight (pprocess)? That is, also for general cross-validation, nested
>>>>>>>>>>>> cross-validation, permutation testing, RFE, etc.?
>>>>>>>>>>>>
>>>>>>>>>>>> Has anyone had succesful experience with parallelization schemes such as
>>>>>>>>>>>> ipyparallel, condor or else?
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you and best wishes!
>>>>>>>>>>>> Marco
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Marco Tettamanti, Ph.D.
>>>>>>>> Nuclear Medicine Department & Division of Neuroscience
>>>>>>>> IRCCS San Raffaele Scientific Institute
>>>>>>>> Via Olgettina 58
>>>>>>>> I-20132 Milano, Italy
>>>>>>>> Phone ++39-02-26434888
>>>>>>>> Fax ++39-02-26434892
>>>>>>>> Email:tettamanti.marco at hsr.it
>>>>>>>> Skype: mtettamanti
>>>>>>>> http://scholar.google.it/citations?user=x4qQl4AAAAAJ
>>>>>>>
>>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20171129/26fcb148/attachment-0001.html>
More information about the Pkg-ExpPsy-PyMVPA
mailing list