[pymvpa] optimal way of loading the whole-brain data
Yaroslav Halchenko
debian at onerussian.com
Wed May 7 13:22:06 UTC 2014
On Wed, 07 May 2014, Dmitry Smirnov wrote:
> Indeed, when I gzipped the images, it took a few minutes to load the whole
> thing:
> [DS_] DBG{0.000 sec}: Duplicating samples shaped (350, 91, 109, 91)
> [DS_] DBG{0.001 sec}: Create new dataset instance for copy
> [DS_] DBG{0.000 sec}: Return dataset copy (ID: 150670160) of source
> (ID: 59680656)
> [DS_] DBG{4.083 sec}: Selecting feature/samples of (350, 902629)
> [DS_] DBG{2.064 sec}: Selected feature/samples (350, 902629)
> [DS_] DBG{0.423 sec}: Selecting feature/samples of (350, 228483)
> [DS_] DBG{0.000 sec}: Selected feature/samples (350, 228483)
> [DS_] DBG{59.208 sec}: Duplicating samples shaped (350, 91, 109, 91)
> [DS_] DBG{0.000 sec}: Create new dataset instance for copy
> [DS_] DBG{0.000 sec}: Return dataset copy (ID: 121150032) of source
> (ID: 59680656)
> [DS_] DBG{4.123 sec}: Selecting feature/samples of (350, 902629)
> [DS_] DBG{1.985 sec}: Selected feature/samples (350, 902629)
> [DS_] DBG{0.309 sec}: Selecting feature/samples of (350, 228483)
> [DS_] DBG{0.000 sec}: Selected feature/samples (350, 228483)
> [DS_] DBG{57.703 sec}: Duplicating samples shaped (350, 91, 109, 91)
> [DS_] DBG{0.000 sec}: Create new dataset instance for copy
> [DS_] DBG{0.000 sec}: Return dataset copy (ID: 150670160) of source
> (ID: 121093840)
> [DS_] DBG{4.384 sec}: Selecting feature/samples of (350, 902629)
> [DS_] DBG{2.056 sec}: Selected feature/samples (350, 902629)
> [DS_] DBG{0.293 sec}: Selecting feature/samples of (350, 228483)
> [DS_] DBG{0.000 sec}: Selected feature/samples (350, 228483)
> [DS_] DBG{57.575 sec}: Duplicating samples shaped (350, 91, 109, 91)
> [DS_] DBG{0.000 sec}: Create new dataset instance for copy
> [DS_] DBG{0.000 sec}: Return dataset copy (ID: 150670160) of source
> (ID: 121093840)
> [DS_] DBG{4.094 sec}: Selecting feature/samples of (350, 902629)
> [DS_] DBG{2.273 sec}: Selected feature/samples (350, 902629)
> [DS_] DBG{0.384 sec}: Selecting feature/samples of (350, 228483)
> [DS_] DBG{0.000 sec}: Selected feature/samples (350, 228483)
> [DS_] DBG{62.976 sec}: Duplicating samples shaped (350, 91, 109, 91)
> [DS_] DBG{0.000 sec}: Create new dataset instance for copy
> [DS_] DBG{0.000 sec}: Return dataset copy (ID: 121150032) of source
> (ID: 59680656)
> [DS_] DBG{4.122 sec}: Selecting feature/samples of (350, 902629)
> [DS_] DBG{2.143 sec}: Selected feature/samples (350, 902629)
> [DS_] DBG{0.353 sec}: Selecting feature/samples of (350, 228483)
> [DS_] DBG{0.000 sec}: Selected feature/samples (350, 228483)
For the record filed
https://github.com/nipy/nibabel/issues/238
could you please also share output (cut/paste) of mvpa2.wtf() call?
could you also time analogously the run directly on uncompressed .nii?
> A follow up question - if nibabel by default memmaps the uncompressed
> images, can you guide me how I can possibly go around that behavior?
> I don't want to gzip all my data, and in my current context, I'd like to be
> able to load the whole dataset into memory at once. Is there some simple
> way to do it?
well -- the simplest way would be to sacrifice those few hours of
loading as you did, and then dump those datasets into hdf5 files
(h5save) and then in the script load/use those instead of reconstructing
all the way again from nifti's.... let me know if this would not work
for you -- then we could come up with other alternatives ;-)
--
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik
More information about the Pkg-ExpPsy-PyMVPA
mailing list