[pymvpa] optimal way of loading the whole-brain data
Dmitry Smirnov
dmi.smirnov07 at gmail.com
Thu May 8 11:57:28 UTC 2014
Alright!
Below are the outputs, but speaking of the solution, I would still try to
find some way which avoids making more copies of the data.
This is not urgent for me, so for example if suggested modification to
allow user to load uncompressed .nii in memory will appear in some nearby
future - that is enough for me :)
Here goes the time from uncompressed .nii, took again some enormous time:
start = time.time()
fds = fmri_dataset(samples=['%s/run%i/epi.nii' % (dataroot,(r+1)) for r in
range(runs)],
targets=targets,
chunks=selector,
mask='/MNI152_T1_2mm_brain_mask.nii')
end = time.time()
print end - start
[DS_] DBG{0.000 sec}: Duplicating samples shaped (1750, 91, 109, 91)
[DS_] DBG{0.001 sec}: Create new dataset instance for copy
[DS_] DBG{0.000 sec}: Return dataset copy (ID: 103268432) of source
(ID: 45836304)
[DS_] DBG{5.597 sec}: Selecting feature/samples of (1750, 902629)
[DS_] DBG{14.003 sec}: Selected feature/samples (1750, 902629)
>>> 14120.6634622
Call to mvpa2.wtf():
Current date: 2014-05-08 14:51
PyMVPA:
Version: 2.2.0
Hash: ad955620e460965ce83c652bc690bea4dc2e21eb
Path: /usr/lib/pymodules/python2.7/mvpa2/__init__.pyc
Version control (GIT):
GIT information could not be obtained due
"/usr/lib/pymodules/python2.7/mvpa2/.. is not under GIT"
SYSTEM:
OS: posix Linux 2.6.32-46-generic #107-Ubuntu SMP Fri Mar 22
20:15:42 UTC 2013
Distribution: Ubuntu/12.04/precise
EXTERNALS:
Present: atlas_fsl, cPickle, ctypes, good scipy.stats.rdist, good
scipy.stats.rv_continuous._reduce_func(floc,fscale), good
scipy.stats.rv_discrete.ppf, griddata, gzip, h5py, ipython, liblapack.so,
libsvm, libsvm verbosity control, lxml, matplotlib, mdp, mdp ge 2.4,
nibabel, nipy, nose, numpy, numpy_correct_unique, openopt, pylab, pylab
plottable, pywt, pywt wp reconstruct, reportlab, scipy, sg ge 0.6.4, sg ge
0.6.5, sg_fixedcachesize, shogun, shogun.krr, shogun.mpd, shogun.svmocas,
skl, statsmodels, weave
Absent: atlas_pymvpa, cran-energy, elasticnet, glmnet, hcluster,
lars, mass, nipy.neurospin, pprocess, pywt wp reconstruct fixed, rpy2,
running ipython env, shogun.lightsvm, shogun.svrlight
Versions of critical externals:
shogun:full : 1.1.0_02ce3cd_2011-12-12_08:17_
shogun:rev : 2941901
reportlab : 2.5
shogun : 1.1.0
nibabel : 1.3.0
matplotlib : 1.1.1rc
scipy : 0.9.0
nipy : 0.3.0
ipython : 0.12.1
skl : 0.14.1
mdp : 3.4
numpy : 1.6.1
ctypes : 1.1.0
matplotlib : 1.1.1rc
lxml : 2.3.2
nifti : failed to query due to "nifti is not a known dependency
key."
numpy : 1.6.1
openopt : 0.38
openopt : failed to query due to "No module named scikits.openopt"
pywt : 0.2.0
shogun : v1.1.0_02ce3cd_2011-12-12_08:17_
Matplotlib backend: Qt4Agg
RUNTIME:
PyMVPA Environment Variables:
PYTHONPATH :
":/usr/lib/python2.7/lib-old:/usr/local/lib/python2.7/dist-packages:/usr/lib/pymodules/python2.7/openopt/:/u/smirnod1/kernel:/home/smirnod1/.local/lib/python2.7/site-packages:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-dynload:/usr/lib/python2.7/plat-linux2:/usr/lib/python2.7/dist-packages/gtk-2.0:/home/smirnod1/.python27_compiled:/usr/lib/python2.7/dist-packages/wx-2.8-gtk2-unicode:/usr/lib/python2.7/dist-packages:/usr/lib/pymodules/python2.7:/usr/lib/python2.7/dist-packages/gst-0.10:/usr/lib/python2.7:/usr/lib/python2.7/dist-packages/spyderlib/utils/external:/u/smirnod1:/usr/lib/python2.7/dist-packages/PIL:/usr/lib/pymodules/python2.7/openopt/kernel"
PYTHONSTARTUP :
"/usr/lib/python2.7/dist-packages/spyderlib/scientific_startup.py"
PyMVPA Runtime Configuration:
[general]
verbose = 1
[externals]
have running ipython env = no
have numpy = yes
have scipy = yes
have matplotlib = yes
have h5py = yes
have reportlab = yes
have weave = yes
have good scipy.stats.rdist = yes
have good scipy.stats.rv_discrete.ppf = yes
have good scipy.stats.rv_continuous._reduce_func(floc,fscale) = yes
have pylab = yes
have lars = no
have elasticnet = no
have glmnet = no
have skl = yes
have ctypes = yes
have libsvm = yes
have shogun = yes
have sg ge 0.6.5 = yes
have shogun.mpd = yes
have shogun.lightsvm = no
have shogun.svrlight = no
have shogun.krr = yes
have shogun.svmocas = yes
have sg_fixedcachesize = yes
have openopt = yes
have nibabel = yes
have mdp = yes
have mdp ge 2.4 = yes
have statsmodels = yes
have pywt = yes
have cpickle = yes
have gzip = yes
have cran-energy = no
have griddata = yes
have nipy.neurospin = no
have lxml = yes
have atlas_fsl = yes
have atlas_pymvpa = no
have hcluster = no
have ipython = yes
have liblapack.so = yes
have libsvm verbosity control = yes
have mass = no
have nipy = yes
have nose = yes
have numpy_correct_unique = yes
have pprocess = no
have pylab plottable = yes
have pywt wp reconstruct = yes
have pywt wp reconstruct fixed = no
have rpy2 = no
have sg ge 0.6.4 = yes
Process Information:
Name: /usr/bin/python
State: R (running)
Tgid: 31307
Pid: 31307
PPid: 31273
TracerPid: 0
Uid: 1021772 1021772 1021772 1021772
Gid: 310001 310001 310001 310001
FDSize: 128
Groups: 310001 310002 310020 310044
VmPeak: 38029860 kB
VmSize: 4351092 kB
VmLck: 0 kB
VmHWM: 29355884 kB
VmRSS: 3333720 kB
VmData: 3606424 kB
VmStk: 272 kB
VmExe: 2504 kB
VmLib: 137680 kB
VmPTE: 7916 kB
Threads: 19
SigQ: 0/1162705
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000001001000
SigCgt: 0000000180010002
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: ffffffff
Cpus_allowed_list: 0-31
Mems_allowed: 00000000,00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 3086144
nonvoluntary_ctxt_switches: 53383
2014-05-07 16:22 GMT+03:00 Yaroslav Halchenko <debian at onerussian.com>:
>
> On Wed, 07 May 2014, Dmitry Smirnov wrote:
> > Indeed, when I gzipped the images, it took a few minutes to load the
> whole
> > thing:
>
> > [DS_] DBG{0.000 sec}: Duplicating samples shaped (350, 91, 109,
> 91)
> > [DS_] DBG{0.001 sec}: Create new dataset instance for copy
> > [DS_] DBG{0.000 sec}: Return dataset copy (ID: 150670160) of
> source
> > (ID: 59680656)
> > [DS_] DBG{4.083 sec}: Selecting feature/samples of (350, 902629)
> > [DS_] DBG{2.064 sec}: Selected feature/samples (350, 902629)
> > [DS_] DBG{0.423 sec}: Selecting feature/samples of (350, 228483)
> > [DS_] DBG{0.000 sec}: Selected feature/samples (350, 228483)
> > [DS_] DBG{59.208 sec}: Duplicating samples shaped (350, 91, 109,
> 91)
> > [DS_] DBG{0.000 sec}: Create new dataset instance for copy
> > [DS_] DBG{0.000 sec}: Return dataset copy (ID: 121150032) of
> source
> > (ID: 59680656)
> > [DS_] DBG{4.123 sec}: Selecting feature/samples of (350, 902629)
> > [DS_] DBG{1.985 sec}: Selected feature/samples (350, 902629)
> > [DS_] DBG{0.309 sec}: Selecting feature/samples of (350, 228483)
> > [DS_] DBG{0.000 sec}: Selected feature/samples (350, 228483)
> > [DS_] DBG{57.703 sec}: Duplicating samples shaped (350, 91, 109,
> 91)
> > [DS_] DBG{0.000 sec}: Create new dataset instance for copy
> > [DS_] DBG{0.000 sec}: Return dataset copy (ID: 150670160) of
> source
> > (ID: 121093840)
> > [DS_] DBG{4.384 sec}: Selecting feature/samples of (350, 902629)
> > [DS_] DBG{2.056 sec}: Selected feature/samples (350, 902629)
> > [DS_] DBG{0.293 sec}: Selecting feature/samples of (350, 228483)
> > [DS_] DBG{0.000 sec}: Selected feature/samples (350, 228483)
> > [DS_] DBG{57.575 sec}: Duplicating samples shaped (350, 91, 109,
> 91)
> > [DS_] DBG{0.000 sec}: Create new dataset instance for copy
> > [DS_] DBG{0.000 sec}: Return dataset copy (ID: 150670160) of
> source
> > (ID: 121093840)
> > [DS_] DBG{4.094 sec}: Selecting feature/samples of (350, 902629)
> > [DS_] DBG{2.273 sec}: Selected feature/samples (350, 902629)
> > [DS_] DBG{0.384 sec}: Selecting feature/samples of (350, 228483)
> > [DS_] DBG{0.000 sec}: Selected feature/samples (350, 228483)
> > [DS_] DBG{62.976 sec}: Duplicating samples shaped (350, 91, 109,
> 91)
> > [DS_] DBG{0.000 sec}: Create new dataset instance for copy
> > [DS_] DBG{0.000 sec}: Return dataset copy (ID: 121150032) of
> source
> > (ID: 59680656)
> > [DS_] DBG{4.122 sec}: Selecting feature/samples of (350, 902629)
> > [DS_] DBG{2.143 sec}: Selected feature/samples (350, 902629)
> > [DS_] DBG{0.353 sec}: Selecting feature/samples of (350, 228483)
> > [DS_] DBG{0.000 sec}: Selected feature/samples (350, 228483)
>
> For the record filed
> https://github.com/nipy/nibabel/issues/238
> could you please also share output (cut/paste) of mvpa2.wtf() call?
>
> could you also time analogously the run directly on uncompressed .nii?
>
> > A follow up question - if nibabel by default memmaps the uncompressed
> > images, can you guide me how I can possibly go around that behavior?
> > I don't want to gzip all my data, and in my current context, I'd like
> to be
> > able to load the whole dataset into memory at once. Is there some
> simple
> > way to do it?
>
> well -- the simplest way would be to sacrifice those few hours of
> loading as you did, and then dump those datasets into hdf5 files
> (h5save) and then in the script load/use those instead of reconstructing
> all the way again from nifti's.... let me know if this would not work
> for you -- then we could come up with other alternatives ;-)
>
> --
> Yaroslav O. Halchenko, Ph.D.
> http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
> Research Scientist, Psychological and Brain Sciences Dept.
> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
> WWW: http://www.linkedin.com/in/yarik
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
--
Dmitry Smirnov (MSc.)
PhD Candidate, Brain & Mind Laboratory <http://becs.aalto.fi/bml/>
BECS, Aalto University School of Science
00076 AALTO, FINLAND
mobile: +358 50 3015072
email: dmitry.smirnov at aalto.fi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20140508/0682d731/attachment-0001.html>
More information about the Pkg-ExpPsy-PyMVPA
mailing list