[pymvpa] optimal way of loading the whole-brain data

Wed May 7 13:22:06 UTC 2014

On Wed, 07 May 2014, Dmitry Smirnov wrote:
>    Indeed, when I gzipped the images, it took a few minutes to load the whole
>    thing:

>    [DS_] DBG{0.000 sec}:      Duplicating samples shaped (350, 91, 109, 91)
>    [DS_] DBG{0.001 sec}:      Create new dataset instance for copy
>    [DS_] DBG{0.000 sec}:      Return dataset copy (ID: 150670160) of source
>    (ID: 59680656)
>    [DS_] DBG{4.083 sec}:   Selecting feature/samples of (350, 902629)
>    [DS_] DBG{2.064 sec}:   Selected feature/samples (350, 902629)
>    [DS_] DBG{0.423 sec}:  Selecting feature/samples of (350, 228483)
>    [DS_] DBG{0.000 sec}:  Selected feature/samples (350, 228483)
>    [DS_] DBG{59.208 sec}:      Duplicating samples shaped (350, 91, 109, 91)
>    [DS_] DBG{0.000 sec}:      Create new dataset instance for copy
>    [DS_] DBG{0.000 sec}:      Return dataset copy (ID: 121150032) of source
>    (ID: 59680656)
>    [DS_] DBG{4.123 sec}:   Selecting feature/samples of (350, 902629)
>    [DS_] DBG{1.985 sec}:   Selected feature/samples (350, 902629)
>    [DS_] DBG{0.309 sec}:  Selecting feature/samples of (350, 228483)
>    [DS_] DBG{0.000 sec}:  Selected feature/samples (350, 228483)
>    [DS_] DBG{57.703 sec}:      Duplicating samples shaped (350, 91, 109, 91)
>    [DS_] DBG{0.000 sec}:      Create new dataset instance for copy
>    [DS_] DBG{0.000 sec}:      Return dataset copy (ID: 150670160) of source
>    (ID: 121093840)
>    [DS_] DBG{4.384 sec}:   Selecting feature/samples of (350, 902629)
>    [DS_] DBG{2.056 sec}:   Selected feature/samples (350, 902629)
>    [DS_] DBG{0.293 sec}:  Selecting feature/samples of (350, 228483)
>    [DS_] DBG{0.000 sec}:  Selected feature/samples (350, 228483)
>    [DS_] DBG{57.575 sec}:      Duplicating samples shaped (350, 91, 109, 91)
>    [DS_] DBG{0.000 sec}:      Create new dataset instance for copy
>    [DS_] DBG{0.000 sec}:      Return dataset copy (ID: 150670160) of source
>    (ID: 121093840)
>    [DS_] DBG{4.094 sec}:   Selecting feature/samples of (350, 902629)
>    [DS_] DBG{2.273 sec}:   Selected feature/samples (350, 902629)
>    [DS_] DBG{0.384 sec}:  Selecting feature/samples of (350, 228483)
>    [DS_] DBG{0.000 sec}:  Selected feature/samples (350, 228483)
>    [DS_] DBG{62.976 sec}:      Duplicating samples shaped (350, 91, 109, 91)
>    [DS_] DBG{0.000 sec}:      Create new dataset instance for copy
>    [DS_] DBG{0.000 sec}:      Return dataset copy (ID: 121150032) of source
>    (ID: 59680656)
>    [DS_] DBG{4.122 sec}:   Selecting feature/samples of (350, 902629)
>    [DS_] DBG{2.143 sec}:   Selected feature/samples (350, 902629)
>    [DS_] DBG{0.353 sec}:  Selecting feature/samples of (350, 228483)
>    [DS_] DBG{0.000 sec}:  Selected feature/samples (350, 228483)

For the record filed 
https://github.com/nipy/nibabel/issues/238
could you please also share output (cut/paste) of mvpa2.wtf() call?

could you also time analogously  the run directly on uncompressed .nii?

>    A follow up question - if nibabel by default memmaps the uncompressed
>    images, can you guide me how I can possibly go around that behavior?
> I  don't want to gzip all my data, and in my current context, I'd like to be
>    able to load the whole dataset into memory at once. Is there some simple
>    way to do it?

well -- the simplest way would be to sacrifice those few hours of
loading as you did, and then dump those datasets into hdf5 files
(h5save) and then in the script load/use those instead of reconstructing
all the way again from nifti's.... let me know if this would not work
for you -- then we could come up with other alternatives ;-)

-- 
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist,            Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik