[pymvpa] optimal way of loading the whole-brain data

Thu May 8 11:57:28 UTC 2014

Alright!
Below are the outputs, but speaking of the solution, I would still try to
find some way which avoids making more copies of the data.
This is not urgent for me, so for example if suggested modification to
allow user to load uncompressed .nii in memory will appear in some nearby
future - that is enough for me :)

Here goes the time from uncompressed .nii, took again some enormous time:
start = time.time()
fds = fmri_dataset(samples=['%s/run%i/epi.nii' % (dataroot,(r+1)) for r in
range(runs)],
targets=targets,
chunks=selector,
mask='/MNI152_T1_2mm_brain_mask.nii')
end = time.time()
print end - start

[DS_] DBG{0.000 sec}:      Duplicating samples shaped (1750, 91, 109, 91)
[DS_] DBG{0.001 sec}:      Create new dataset instance for copy
[DS_] DBG{0.000 sec}:      Return dataset copy (ID: 103268432) of source
(ID: 45836304)
[DS_] DBG{5.597 sec}:   Selecting feature/samples of (1750, 902629)
[DS_] DBG{14.003 sec}:   Selected feature/samples (1750, 902629)
>>> 14120.6634622

Call to mvpa2.wtf():

Current date:   2014-05-08 14:51
PyMVPA:
 Version:       2.2.0
 Hash:          ad955620e460965ce83c652bc690bea4dc2e21eb
 Path:          /usr/lib/pymodules/python2.7/mvpa2/__init__.pyc
 Version control (GIT):
 GIT information could not be obtained due
"/usr/lib/pymodules/python2.7/mvpa2/.. is not under GIT"
SYSTEM:
 OS:            posix Linux 2.6.32-46-generic #107-Ubuntu SMP Fri Mar 22
20:15:42 UTC 2013
 Distribution:  Ubuntu/12.04/precise
EXTERNALS:
 Present:       atlas_fsl, cPickle, ctypes, good scipy.stats.rdist, good
scipy.stats.rv_continuous._reduce_func(floc,fscale), good
scipy.stats.rv_discrete.ppf, griddata, gzip, h5py, ipython, liblapack.so,
libsvm, libsvm verbosity control, lxml, matplotlib, mdp, mdp ge 2.4,
nibabel, nipy, nose, numpy, numpy_correct_unique, openopt, pylab, pylab
plottable, pywt, pywt wp reconstruct, reportlab, scipy, sg ge 0.6.4, sg ge
0.6.5, sg_fixedcachesize, shogun, shogun.krr, shogun.mpd, shogun.svmocas,
skl, statsmodels, weave
 Absent:        atlas_pymvpa, cran-energy, elasticnet, glmnet, hcluster,
lars, mass, nipy.neurospin, pprocess, pywt wp reconstruct fixed, rpy2,
running ipython env, shogun.lightsvm, shogun.svrlight
 Versions of critical externals:
  shogun:full : 1.1.0_02ce3cd_2011-12-12_08:17_
  shogun:rev  : 2941901
  reportlab   : 2.5
  shogun      : 1.1.0
  nibabel     : 1.3.0
  matplotlib  : 1.1.1rc
  scipy       : 0.9.0
  nipy        : 0.3.0
  ipython     : 0.12.1
  skl         : 0.14.1
  mdp         : 3.4
  numpy       : 1.6.1
  ctypes      : 1.1.0
  matplotlib  : 1.1.1rc
  lxml        : 2.3.2
  nifti       : failed to query due to "nifti is not a known dependency
key."
  numpy       : 1.6.1
  openopt     : 0.38
  openopt     : failed to query due to "No module named scikits.openopt"
  pywt        : 0.2.0
  shogun      : v1.1.0_02ce3cd_2011-12-12_08:17_
 Matplotlib backend: Qt4Agg
RUNTIME:
 PyMVPA Environment Variables:
  PYTHONPATH          :
":/usr/lib/python2.7/lib-old:/usr/local/lib/python2.7/dist-packages:/usr/lib/pymodules/python2.7/openopt/:/u/smirnod1/kernel:/home/smirnod1/.local/lib/python2.7/site-packages:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-dynload:/usr/lib/python2.7/plat-linux2:/usr/lib/python2.7/dist-packages/gtk-2.0:/home/smirnod1/.python27_compiled:/usr/lib/python2.7/dist-packages/wx-2.8-gtk2-unicode:/usr/lib/python2.7/dist-packages:/usr/lib/pymodules/python2.7:/usr/lib/python2.7/dist-packages/gst-0.10:/usr/lib/python2.7:/usr/lib/python2.7/dist-packages/spyderlib/utils/external:/u/smirnod1:/usr/lib/python2.7/dist-packages/PIL:/usr/lib/pymodules/python2.7/openopt/kernel"
  PYTHONSTARTUP       :
"/usr/lib/python2.7/dist-packages/spyderlib/scientific_startup.py"
 PyMVPA Runtime Configuration:
  [general]
  verbose = 1

  [externals]
  have running ipython env = no
  have numpy = yes
  have scipy = yes
  have matplotlib = yes
  have h5py = yes
  have reportlab = yes
  have weave = yes
  have good scipy.stats.rdist = yes
  have good scipy.stats.rv_discrete.ppf = yes
  have good scipy.stats.rv_continuous._reduce_func(floc,fscale) = yes
  have pylab = yes
  have lars = no
  have elasticnet = no
  have glmnet = no
  have skl = yes
  have ctypes = yes
  have libsvm = yes
  have shogun = yes
  have sg ge 0.6.5 = yes
  have shogun.mpd = yes
  have shogun.lightsvm = no
  have shogun.svrlight = no
  have shogun.krr = yes
  have shogun.svmocas = yes
  have sg_fixedcachesize = yes
  have openopt = yes
  have nibabel = yes
  have mdp = yes
  have mdp ge 2.4 = yes
  have statsmodels = yes
  have pywt = yes
  have cpickle = yes
  have gzip = yes
  have cran-energy = no
  have griddata = yes
  have nipy.neurospin = no
  have lxml = yes
  have atlas_fsl = yes
  have atlas_pymvpa = no
  have hcluster = no
  have ipython = yes
  have liblapack.so = yes
  have libsvm verbosity control = yes
  have mass = no
  have nipy = yes
  have nose = yes
  have numpy_correct_unique = yes
  have pprocess = no
  have pylab plottable = yes
  have pywt wp reconstruct = yes
  have pywt wp reconstruct fixed = no
  have rpy2 = no
  have sg ge 0.6.4 = yes
 Process Information:
  Name:    /usr/bin/python
  State:    R (running)
  Tgid:    31307
  Pid:    31307
  PPid:    31273
  TracerPid:    0
  Uid:    1021772    1021772    1021772    1021772
  Gid:    310001    310001    310001    310001
  FDSize:    128
  Groups:    310001 310002 310020 310044
  VmPeak:    38029860 kB
  VmSize:     4351092 kB
  VmLck:           0 kB
  VmHWM:    29355884 kB
  VmRSS:     3333720 kB
  VmData:     3606424 kB
  VmStk:         272 kB
  VmExe:        2504 kB
  VmLib:      137680 kB
  VmPTE:        7916 kB
  Threads:    19
  SigQ:    0/1162705
  SigPnd:    0000000000000000
  ShdPnd:    0000000000000000
  SigBlk:    0000000000000000
  SigIgn:    0000000001001000
  SigCgt:    0000000180010002
  CapInh:    0000000000000000
  CapPrm:    0000000000000000
  CapEff:    0000000000000000
  CapBnd:    ffffffffffffffff
  Cpus_allowed:    ffffffff
  Cpus_allowed_list:    0-31
  Mems_allowed:    00000000,00000003
  Mems_allowed_list:    0-1
  voluntary_ctxt_switches:    3086144
  nonvoluntary_ctxt_switches:    53383

2014-05-07 16:22 GMT+03:00 Yaroslav Halchenko <debian at onerussian.com>:

>
> On Wed, 07 May 2014, Dmitry Smirnov wrote:
> >    Indeed, when I gzipped the images, it took a few minutes to load the
> whole
> >    thing:
>
> >    [DS_] DBG{0.000 sec}:      Duplicating samples shaped (350, 91, 109,
> 91)
> >    [DS_] DBG{0.001 sec}:      Create new dataset instance for copy
> >    [DS_] DBG{0.000 sec}:      Return dataset copy (ID: 150670160) of
> source
> >    (ID: 59680656)
> >    [DS_] DBG{4.083 sec}:   Selecting feature/samples of (350, 902629)
> >    [DS_] DBG{2.064 sec}:   Selected feature/samples (350, 902629)
> >    [DS_] DBG{0.423 sec}:  Selecting feature/samples of (350, 228483)
> >    [DS_] DBG{0.000 sec}:  Selected feature/samples (350, 228483)
> >    [DS_] DBG{59.208 sec}:      Duplicating samples shaped (350, 91, 109,
> 91)
> >    [DS_] DBG{0.000 sec}:      Create new dataset instance for copy
> >    [DS_] DBG{0.000 sec}:      Return dataset copy (ID: 121150032) of
> source
> >    (ID: 59680656)
> >    [DS_] DBG{4.123 sec}:   Selecting feature/samples of (350, 902629)
> >    [DS_] DBG{1.985 sec}:   Selected feature/samples (350, 902629)
> >    [DS_] DBG{0.309 sec}:  Selecting feature/samples of (350, 228483)
> >    [DS_] DBG{0.000 sec}:  Selected feature/samples (350, 228483)
> >    [DS_] DBG{57.703 sec}:      Duplicating samples shaped (350, 91, 109,
> 91)
> >    [DS_] DBG{0.000 sec}:      Create new dataset instance for copy
> >    [DS_] DBG{0.000 sec}:      Return dataset copy (ID: 150670160) of
> source
> >    (ID: 121093840)
> >    [DS_] DBG{4.384 sec}:   Selecting feature/samples of (350, 902629)
> >    [DS_] DBG{2.056 sec}:   Selected feature/samples (350, 902629)
> >    [DS_] DBG{0.293 sec}:  Selecting feature/samples of (350, 228483)
> >    [DS_] DBG{0.000 sec}:  Selected feature/samples (350, 228483)
> >    [DS_] DBG{57.575 sec}:      Duplicating samples shaped (350, 91, 109,
> 91)
> >    [DS_] DBG{0.000 sec}:      Create new dataset instance for copy
> >    [DS_] DBG{0.000 sec}:      Return dataset copy (ID: 150670160) of
> source
> >    (ID: 121093840)
> >    [DS_] DBG{4.094 sec}:   Selecting feature/samples of (350, 902629)
> >    [DS_] DBG{2.273 sec}:   Selected feature/samples (350, 902629)
> >    [DS_] DBG{0.384 sec}:  Selecting feature/samples of (350, 228483)
> >    [DS_] DBG{0.000 sec}:  Selected feature/samples (350, 228483)
> >    [DS_] DBG{62.976 sec}:      Duplicating samples shaped (350, 91, 109,
> 91)
> >    [DS_] DBG{0.000 sec}:      Create new dataset instance for copy
> >    [DS_] DBG{0.000 sec}:      Return dataset copy (ID: 121150032) of
> source
> >    (ID: 59680656)
> >    [DS_] DBG{4.122 sec}:   Selecting feature/samples of (350, 902629)
> >    [DS_] DBG{2.143 sec}:   Selected feature/samples (350, 902629)
> >    [DS_] DBG{0.353 sec}:  Selecting feature/samples of (350, 228483)
> >    [DS_] DBG{0.000 sec}:  Selected feature/samples (350, 228483)
>
> For the record filed
> https://github.com/nipy/nibabel/issues/238
> could you please also share output (cut/paste) of mvpa2.wtf() call?
>
> could you also time analogously  the run directly on uncompressed .nii?
>
> >    A follow up question - if nibabel by default memmaps the uncompressed
> >    images, can you guide me how I can possibly go around that behavior?
> > I  don't want to gzip all my data, and in my current context, I'd like
> to be
> >    able to load the whole dataset into memory at once. Is there some
> simple
> >    way to do it?
>
> well -- the simplest way would be to sacrifice those few hours of
> loading as you did, and then dump those datasets into hdf5 files
> (h5save) and then in the script load/use those instead of reconstructing
> all the way again from nifti's.... let me know if this would not work
> for you -- then we could come up with other alternatives ;-)
>
> --
> Yaroslav O. Halchenko, Ph.D.
> http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
> Research Scientist,            Psychological and Brain Sciences Dept.
> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
> WWW:   http://www.linkedin.com/in/yarik
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>

-- 

Dmitry Smirnov (MSc.)
PhD Candidate, Brain & Mind Laboratory <http://becs.aalto.fi/bml/>
BECS, Aalto University School of Science
00076 AALTO, FINLAND
mobile: +358 50 3015072
email: dmitry.smirnov at aalto.fi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20140508/0682d731/attachment-0001.html>