[pymvpa] Optimization Results

Yaroslav Halchenko debian at onerussian.com
Mon May 11 14:28:21 UTC 2009


Thank you Valentin,

Indeed, as Michael has pointed out, in our craving for being generic we
sacrificed quite a bit in places in terms of the performance... although
it is not that important at this stage but would you mind to complete
your performance table with

8. 1 + python -O switch

Searchlight is somewhat a beast since it engages the same processing
pipeline over and over again (like crossvalidationtransfererror ;)) and
that causes lots of what someone could consider to be redundant
functionality. As you mentioned from profiling, lots of evilness of
code in state.py starts to come out. Indeed it would be great to have it
refactored but it might be quite a b..ch -- ClassWithCollections (used
to be a Statefull, hence the name of the file state.py) provides
somewhat generic way to handle

1. .states -- optionally enabled computing/storage
2. .params -- parameters of the classifiers/whatever
3. .kernel_params -- parameters of kernel used by a classifier

AttributesCollector metaclass takes care about picking up all those
StateVariable, Parameter, KernelParameter static definitions from the
class and packing them into a corresponding collection per instance,
adjusting docstring. Also collections have some nice ways to list
memebrs with descriptions (.listing I believe), and provide additional
functionality (like checking if any parameter of classifier was
changed... although now I am not sure if I should have not just done
simple comparison within a classifier to the previous set of
parameters). And the major slowdown actually comes from the fact
that we have tried to make state variable 'blah' available not as 

instance.states.blah or instance.params.blah

also as

instance.blah

to make user more comfortable

therefore -- all that mess in __get.. __set.. methods.

There might be a cleaner solution somehow... But once again -- all those
.states issues come back at us primarily in searchlight, since now we
have little problems to train/generalize (just a few voxels in a sphere)
and lost of overhead comes from all that house-keeping like states,
which otherwise do not provide much of overhead.

I guess we should start cherry-picking or just merging your changes ;)

Thanks once again!

On Mon, 11 May 2009, Valentin Haenel wrote:

> Hi,

> when i started my Lab Rotation here in berlin 2 Months ago i was given the task
> of evaluating PyMVPA as a possible alternative to the Matlab+SPM combo. To this
> end i replicated part of the following study:

> http://www.ncbi.nlm.nih.gov/pubmed/19111624

> In particular i took the FIR betas and ran a searchlight on these. I soon
> realized that PyMVPA was quite a bit slower than Matlab, and so i sat down to
> improve things. (hence all the optimizations commits in the val/* branches on
> alioth)

> All the comparisons were done on the same machine:

> 2 x DualCore AMD Opteron 2220 (4 cores)
> 16246 MB RAM

> I ran a searchlight for all 8 FIR timebins for a single subject. And the
> resulting accuracy maps are almost the same (99.99%)

> Matlab+SPM = 22 min

> Here is a list of what i managed to squeeze out

> PyMVPA:

> 1. No improvements                                 2h 42 min
> 2. _svm.py - remove for loops in python code       2h 31 min
> 3. Searchlight index cache (multiple datasets)     2h  8 min
> 4. 2 and 3 combinded                               1h 55 min

> 5. 4 + LIBSVM Wrapper optimization                 1h  2 min
> 6. 5 + comment out deepcopy in base.py                57 min
> 7. 6 + python -O switch                               53 min


> My current profiling round came up with the following:

> --------------------------------------------------------------------------------

> Sat May  9 13:11:32 2009    runprof6

>          2342453314 function calls (2308610267 primitive calls) in 10262.765 CPU seconds

>    Ordered by: internal time
>    List reduced from 3531 to 40 due to restriction <40>

>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
> 553055208/534337859 1295.902    0.000 1413.098    0.000 state.py:306(__getattribute__)
> 120986219  621.773    0.000 1048.780    0.000 state.py:257(_checkIndex)
>  65194885  594.444    0.000 1539.628    0.000 state.py:682(isEnabled)
> 30235296/27355744  403.370    0.000 1132.826    0.000 state.py:377(_action)
> 107353443/96915067  313.182    0.000  502.194    0.000 state.py:1084(__getattribute__)
> 204855562  289.925    0.000  289.925    0.000 {method 'has_key' of 'dict' objects}
>  39594431  282.294    0.000  471.186    0.000 verbosity.py:505(__call__)
>   5759104  273.423    0.000 1693.267    0.000 state.py:408(reset)
>  14397760  220.419    0.000  485.896    0.000 _svm.py:179(convert2SVMNode)
>  11608194  161.626    0.000  341.870    0.000 mask.py:216(isValidInId)
>  25915986  160.144    0.000 1338.768    0.000 state.py:773(<lambda>)
>   1439776  159.463    0.000 1286.389    0.001 svm.py:125(_train)
>   2879552  157.217    0.000  418.094    0.000 base.py:1131(selectSamples)
>   1439776  149.204    0.000  532.763    0.000 _svm.py:206(__init__)
>  28795520  146.755    0.000  146.755    0.000 {mvpa.clfs.libsvmc._svmc.svm_node_array_set}
> 90912297/89472064  133.970    0.000  138.536    0.000 {len}
>  76353402  121.160    0.000  121.160    0.000 attributes.py:248(isEnabled)
>   1439780  120.549    0.000 2015.755    0.001 _svmbase.py:220(__repr__)
>   3599448  120.319    0.000  282.827    0.000 base.py:103(__init__)
>   1439776  108.342    0.000  211.624    0.000 _svm.py:91(__init__)
>   9974753  107.149    0.000  107.149    0.000 mask.py:221(getOutId)
>  11608194  104.920    0.000  121.404    0.000 support.py:306(isInVolume)
>    359944   99.851    0.000 9511.643    0.026 cvtranserror.py:117(_call)
>  28795582   95.811    0.000  193.214    0.000 attributes.py:94(reset)
>   1439776   92.656    0.000  880.621    0.001 svm.py:191(_predict)
>  54352246   90.825    0.000   90.825    0.000 verbosity.py:343(<lambda>)
>  14397771   83.953    0.000  240.313    0.000 state.py:329(__getitem__)
>  31675997   83.106    0.000   83.106    0.000 {range}
>   8278713   78.118    0.000  400.513    0.000 state.py:371(setvalue)
>   2879554   76.175    0.000 1504.840    0.001 state.py:756(_getEnabled)
>  14757770   74.728    0.000  498.424    0.000 state.py:1105(__setattr__)
>   4319352   73.530    0.000  158.611    0.000 function_base.py:900(unique)
>   1439776   72.381    0.000   72.381    0.000 {mvpa.clfs.libsvmc._svmc.svm_train}
>   1439776   72.335    0.000  592.177    0.000 splitters.py:283(splitDataset)
>   1439776   68.823    0.000 6321.527    0.004 transerror.py:1220(_precall)
>  11158265   67.908    0.000  257.093    0.000 attributes.py:231(_set)
>  39594431   66.210    0.000   66.210    0.000 verbosity.py:344(<lambda>)
>  42522592   66.051    0.000   66.051    0.000 {isinstance}
>  12957993   65.684    0.000  405.324    0.000 state.py:766(<lambda>)
>  12957993   65.623    0.000  406.477    0.000 state.py:768(<lambda>)

> --------------------------------------------------------------------------------

> So it looks like the 'Collection' class in state.py is using up alot of time, not
> because of the implementation, but because of the number of function calls. One
> of the suggestions Tiziano Zito had, was to refactor the collections class and
> maybe inherit from one of the builtin types such as Set. However we are unsure
> as to what 'Collection' class actually is supposed to do, so any hints regarding
> this would be greately appreciated. 


> V-



> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa


-- 
                                  .-.
=------------------------------   /v\  ----------------------------=
Keep in touch                    // \\     (yoh@|www.)onerussian.com
Yaroslav Halchenko              /(   )\               ICQ#: 60653192
                   Linux User    ^^-^^    [175555]





More information about the Pkg-ExpPsy-PyMVPA mailing list