[pymvpa] Memory Error

Mon Nov 24 17:10:58 UTC 2014

On 24 Nov 2014, at 16:35, Thomas Nickson <thomas.nickson at gmail.com> wrote:

> I'm using some structural data, 133 subjects with a size of about 350 meg, on a machine with 32gig of ram. I'm trying to set up a basic algorithm to make sure that everything works okay. I have chosen ridge regression and I'm attempting to use a HalfPartitioner and I'm running all of the code from ipython notebook.
> 
>   from mvpa2.clfs.ridge import RidgeReg
>   a = HalfPartitioner(attr='runtype')
>   clf = RidgeReg()
>   cv = CrossValidation(clf, a)
>   cv_results = cv(ds) 
> 
> However, I get the following memory error, even when I use a very minimal subset such as 2 subjects.
> 
> ---------------------------------------------------------------------------
> MemoryError
>  […]
> /home/orkney_01/tnickson/Programming/pyVirtualEnv/lib/python2.6/site-packages/mvpa2/clfs/ridge.pyc in _train(self, data)
>      76             if self.__lm is None:
>      77                 # Not specified, so calculate based on .05*nfeatures
> ---> 78                 Lambda = .05*data.nfeatures*np.eye(data.nfeatures)
>      79             else:
>      80                 # use the provided penalty

numpy tries to allocate a matrix here with size data.nfeatures x data.nfeatures.
So with 50k features (not untypical for an fMRI dataset) in double precision (8 bytes per number) one would need 20Gbytes just for that matrix alone. 
There could also be other places in the code that require a lot of memory. In addition, If you use structural data, it is possible you have even more features than the 50k in this example. 

The bottom line: it’s not inconceivable to get a memory error in your analysis.

Depending on your research question, you could consider to:
- use a different classifier
- use a searchlight analysis
- downsample the data (to reduce the number of features)