[pymvpa] GroupClusterThreshold memory usage

Nick Oosterhof n.n.oosterhof at googlemail.com
Thu Sep 3 16:28:16 UTC 2015


> On 03 Sep 2015, at 18:04, Bill Broderick <billbrod at gmail.com> wrote:
> 
> I'm trying to run group cluster thresholding using the defaults of
> GroupClusterThreshold (100000 bootstraps) and I'm running into memory
> issues. In the documentation, it looks like either n_proc (to split
> the load across several nodes on our cluster) or n_blocks would help,
> but it's not clear to me how to use these parameters. 

Peak memory usage is in the order of (n_bootstrap * n_features / n_blocks), where n_features is the number of features of the dataset.
For example, if you set n_blocks=1000, then memory consumption will be reduced by about a factor of 1,000 compared to n_blocks=1.

I'm not sure how the Parallel module behaves, but it may be the case that using n_proc processes will actually multiply memory demands by a factor of n_proc. If you want to keep memory consumption low, my suggestion would be to start with n_proc=1 and try higher values for n_blocks.


More information about the Pkg-ExpPsy-PyMVPA mailing list