[pymvpa] Optimization Results
Valentin Haenel
valentin.haenel at gmx.de
Mon May 11 11:27:24 UTC 2009
Hi,
when i started my Lab Rotation here in berlin 2 Months ago i was given the task
of evaluating PyMVPA as a possible alternative to the Matlab+SPM combo. To this
end i replicated part of the following study:
http://www.ncbi.nlm.nih.gov/pubmed/19111624
In particular i took the FIR betas and ran a searchlight on these. I soon
realized that PyMVPA was quite a bit slower than Matlab, and so i sat down to
improve things. (hence all the optimizations commits in the val/* branches on
alioth)
All the comparisons were done on the same machine:
2 x DualCore AMD Opteron 2220 (4 cores)
16246 MB RAM
I ran a searchlight for all 8 FIR timebins for a single subject. And the
resulting accuracy maps are almost the same (99.99%)
Matlab+SPM = 22 min
Here is a list of what i managed to squeeze out
PyMVPA:
1. No improvements 2h 42 min
2. _svm.py - remove for loops in python code 2h 31 min
3. Searchlight index cache (multiple datasets) 2h 8 min
4. 2 and 3 combinded 1h 55 min
5. 4 + LIBSVM Wrapper optimization 1h 2 min
6. 5 + comment out deepcopy in base.py 57 min
7. 6 + python -O switch 53 min
My current profiling round came up with the following:
--------------------------------------------------------------------------------
Sat May 9 13:11:32 2009 runprof6
2342453314 function calls (2308610267 primitive calls) in 10262.765 CPU seconds
Ordered by: internal time
List reduced from 3531 to 40 due to restriction <40>
ncalls tottime percall cumtime percall filename:lineno(function)
553055208/534337859 1295.902 0.000 1413.098 0.000 state.py:306(__getattribute__)
120986219 621.773 0.000 1048.780 0.000 state.py:257(_checkIndex)
65194885 594.444 0.000 1539.628 0.000 state.py:682(isEnabled)
30235296/27355744 403.370 0.000 1132.826 0.000 state.py:377(_action)
107353443/96915067 313.182 0.000 502.194 0.000 state.py:1084(__getattribute__)
204855562 289.925 0.000 289.925 0.000 {method 'has_key' of 'dict' objects}
39594431 282.294 0.000 471.186 0.000 verbosity.py:505(__call__)
5759104 273.423 0.000 1693.267 0.000 state.py:408(reset)
14397760 220.419 0.000 485.896 0.000 _svm.py:179(convert2SVMNode)
11608194 161.626 0.000 341.870 0.000 mask.py:216(isValidInId)
25915986 160.144 0.000 1338.768 0.000 state.py:773(<lambda>)
1439776 159.463 0.000 1286.389 0.001 svm.py:125(_train)
2879552 157.217 0.000 418.094 0.000 base.py:1131(selectSamples)
1439776 149.204 0.000 532.763 0.000 _svm.py:206(__init__)
28795520 146.755 0.000 146.755 0.000 {mvpa.clfs.libsvmc._svmc.svm_node_array_set}
90912297/89472064 133.970 0.000 138.536 0.000 {len}
76353402 121.160 0.000 121.160 0.000 attributes.py:248(isEnabled)
1439780 120.549 0.000 2015.755 0.001 _svmbase.py:220(__repr__)
3599448 120.319 0.000 282.827 0.000 base.py:103(__init__)
1439776 108.342 0.000 211.624 0.000 _svm.py:91(__init__)
9974753 107.149 0.000 107.149 0.000 mask.py:221(getOutId)
11608194 104.920 0.000 121.404 0.000 support.py:306(isInVolume)
359944 99.851 0.000 9511.643 0.026 cvtranserror.py:117(_call)
28795582 95.811 0.000 193.214 0.000 attributes.py:94(reset)
1439776 92.656 0.000 880.621 0.001 svm.py:191(_predict)
54352246 90.825 0.000 90.825 0.000 verbosity.py:343(<lambda>)
14397771 83.953 0.000 240.313 0.000 state.py:329(__getitem__)
31675997 83.106 0.000 83.106 0.000 {range}
8278713 78.118 0.000 400.513 0.000 state.py:371(setvalue)
2879554 76.175 0.000 1504.840 0.001 state.py:756(_getEnabled)
14757770 74.728 0.000 498.424 0.000 state.py:1105(__setattr__)
4319352 73.530 0.000 158.611 0.000 function_base.py:900(unique)
1439776 72.381 0.000 72.381 0.000 {mvpa.clfs.libsvmc._svmc.svm_train}
1439776 72.335 0.000 592.177 0.000 splitters.py:283(splitDataset)
1439776 68.823 0.000 6321.527 0.004 transerror.py:1220(_precall)
11158265 67.908 0.000 257.093 0.000 attributes.py:231(_set)
39594431 66.210 0.000 66.210 0.000 verbosity.py:344(<lambda>)
42522592 66.051 0.000 66.051 0.000 {isinstance}
12957993 65.684 0.000 405.324 0.000 state.py:766(<lambda>)
12957993 65.623 0.000 406.477 0.000 state.py:768(<lambda>)
--------------------------------------------------------------------------------
So it looks like the 'Collection' class in state.py is using up alot of time, not
because of the implementation, but because of the number of function calls. One
of the suggestions Tiziano Zito had, was to refactor the collections class and
maybe inherit from one of the builtin types such as Set. However we are unsure
as to what 'Collection' class actually is supposed to do, so any hints regarding
this would be greately appreciated.
V-
More information about the Pkg-ExpPsy-PyMVPA
mailing list