[pymvpa] postproc=mean_sample(), is there a bug?

Mon Aug 17 14:49:11 UTC 2015

Dear all,
I have encountered a strange behaviour in running a Monte Carlo permutation 
analysis, though this may just be my lack of understanding.

Running the analysis without "postproc=mean_sample()" to collect p-values for 
each fold seems to produce sensible results:

-----------------
repeats=100
repeater = Repeater(count=repeats)
permutator = AttributePermutator('targets', limit={'partitions': 1}, count=1)
null_cv = CrossValidation(clf, ChainNode([partitioner, permutator], 
space=partitioner.get_space()))
distr_est = MCNullDist(repeater, tail='left', measure=null_cv, 
enable_ca=['dist_samples'])
cv_mc = CrossValidation(clf, partitioner, null_dist=distr_est, 
enable_ca=['confusion', 'stats'])
mcerr = cv_mc(fds)

In [72]: print cv_mc.ca.stats.stats['ACC']
0.537037037037

In [73]: p_mc = cv_mc.ca.null_prob

In [74]: print np.ravel(p_mc)
[ 0.66  0.34  0.59  0.42  0.28  0.14  0.3   0.06  0.64  0.03  0.34 0.26
   0.59  0.12  0.34  0.09  0.27  0.1 ]

In [75]: print np.mean(np.ravel(p_mc))
0.309444444444
-----------------

However, the same snippet, but with the addition of "postproc=mean_sample()", 
yields a quite strange p-value, if compared to the mean p-value across folds. 
What got me suspicious, is that this p-value always remains constant, 
independently of whether I repeat the analysis n-times (as if no random 
permutation was actually occurring), and independently of whether I use a 
different permutator (e.g. AttributePermutator('targets', count=1, limit='chunks')):

-----------------
repeats=100
repeater = Repeater(count=repeats)
permutator = AttributePermutator('targets', limit={'partitions': 1}, count=1)
null_cv = CrossValidation(clf, ChainNode([partitioner, permutator], 
space=partitioner.get_space()), postproc=mean_sample())
distr_est = MCNullDist(repeater, tail='left', measure=null_cv, 
enable_ca=['dist_samples'])
cv_mc = CrossValidation(clf, partitioner, postproc=mean_sample(), 
null_dist=distr_est, enable_ca=['confusion', 'stats'])
mcerr = cv_mc(fds)

In [50]: print cv_mc.ca.stats.stats['ACC']
0.537037037037

In [51]: p_mc = cv_mc.ca.null_prob

In [52]: print np.ravel(p_mc)
[ 0.00980392]

In [53]: print np.mean(np.ravel(p_mc))
0.00980392156863
-----------------

Can anybody reproduce this behaviour or explain what I am doing wrong?
Below you find some info on my (toy-)dataset and on my installation.

Thank you and all the best!
Marco

-- 
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Email: tettamanti.marco at hsr.it
Skype: mtettamanti
http://scholar.google.it/citations?user=x4qQl4AAAAAJ

In [98]: print fds.summary()
Dataset: 108x100 at float32, <sa: chunks,targets,time_coords,time_indices>, <fa: 
voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=-0.0109682 std=0.992155 var=0.984372 min=-3.62952 max=4.32006

Counts of targets in each chunk:
   chunks\targets Whatd Whend Whetd
                   ---   ---   ---
        0.0         2     2     2
        1.0         2     2     2
        2.0         2     2     2
        3.0         2     2     2
        4.0         2     2     2
        5.0         2     2     2
        6.0         2     2     2
        7.0         2     2     2
        8.0         2     2     2
        9.0         2     2     2
       10.0         2     2     2
       11.0         2     2     2
       12.0         2     2     2
       13.0         2     2     2
       14.0         2     2     2
       15.0         2     2     2
       16.0         2     2     2
       17.0         2     2     2

Summary for targets across chunks
   targets mean std min max #chunks
   Whatd     2   0   2   2     18
   Whend     2   0   2   2     18
   Whetd     2   0   2   2     18

Summary for chunks across targets
   chunks mean std min max #targets
     0      2   0   2   2      3
     1      2   0   2   2      3
     2      2   0   2   2      3
     3      2   0   2   2      3
     4      2   0   2   2      3
     5      2   0   2   2      3
     6      2   0   2   2      3
     7      2   0   2   2      3
     8      2   0   2   2      3
     9      2   0   2   2      3
    10      2   0   2   2      3
    11      2   0   2   2      3
    12      2   0   2   2      3
    13      2   0   2   2      3
    14      2   0   2   2      3
    15      2   0   2   2      3
    16      2   0   2   2      3
    17      2   0   2   2      3
Sequence statistics for 108 entries from set ['Whatd', 'Whend', 'Whetd']
Counter-balance table for orders up to 2:
Targets/Order O1        |  O2        |
     Whatd:    35  1  0  |  34  2  0  |
     Whend:     0 35  1  |   0 34  2  |
     Whetd:     0  0 35  |   0  0 34  |
Correlations: min=-0.5 max=0.96 mean=-0.0093 sum(abs)=47

In [99]: mvpa2.wtf()
Out[99]:
Current date:   2015-08-17 16:32
PyMVPA:
  Version:       2.3.1
  Hash:          d1da5a749dc9cc606bd7f425d93d25464bf43454
  Path:          /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
  Version control (GIT):
  GIT information could not be obtained due 
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
SYSTEM:
  OS:            posix Linux 4.1.0-1-amd64 #1 SMP Debian 4.1.3-1 (2015-08-03)
  Distribution:  debian/stretch/sid
EXTERNALS:
  Present:       atlas_fsl, cPickle, ctypes, good 
scipy.stats.rv_continuous._reduce_func(floc,fscale), good 
scipy.stats.rv_discrete.ppf, griddata, gzip, h5py, hdf5, ipython, joblib, 
liblapack.so, libsvm, libsvm verbosity control, lxml, matplotlib, mdp, mdp ge 
2.4, mock, nibabel, nipy, nose, numpy, numpy_correct_unique, pprocess, pylab, 
pylab plottable, pywt, pywt wp reconstruct, reportlab, running ipython env, 
scipy, sg ge 0.6.4, sg ge 0.6.5, sg_fixedcachesize, shogun, shogun.mpd, 
shogun.svmocas, skl, statsmodels, weave
  Absent:        atlas_pymvpa, cran-energy, elasticnet, glmnet, good 
scipy.stats.rdist, hcluster, lars, mass, nipy.neurospin, numpydoc, openopt, pywt 
wp reconstruct fixed, rpy2, shogun.krr, shogun.lightsvm, shogun.svrlight
  Versions of critical externals:
   ctypes      : 1.1.0
   h5py        : 2.5.0
   hdf5        : 1.8.13
   ipython     : 2.3.0
   lxml        : 3.4.4
   matplotlib  : 1.4.2
   mdp         : 3.4
   mock        : 1.3.0
   nibabel     : 2.0.1
   nipy        : 0.4.0.dev
   numpy       : 1.8.2
   pprocess    : 0.5
   reportlab   : 3.2.0
   scipy       : 0.14.1
   shogun      : 3.2.0
   shogun:full : 3.2.0_2014-2-17_18:46
   shogun:rev  : 197120
   skl         : 0.16.1
  Matplotlib backend: TkAgg
RUNTIME:
  PyMVPA Environment Variables:
   PYTHONPATH          : 
":/usr/lib/python2.7/lib-old:/usr/local/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages/wx-3.0-gtk2:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-dynload:/usr/bin:.:/home/marco/.ipython:/usr/lib/python2.7/dist-packages:/usr/lib/pymodules/python2.7:/usr/lib/python2.7/dist-packages/IPython/extensions:/usr/lib/python2.7:/usr/lib/python2.7/dist-packages/PILcompat:/home/marco/.cache/scipy/python27_compiled:/usr/lib/python2.7/dist-packages/gtk-2.0:/home/marco/data/bicocca/MVPA_IntentionToMove/mvpa_itm"
  PyMVPA Runtime Configuration:
   [general]
   verbose = 1

   [externals]
   have running ipython env = yes
   have ipython = yes
   have numpy = yes
   have scipy = yes
   have matplotlib = yes
   have h5py = yes
   have reportlab = yes
   have weave = yes
   have good scipy.stats.rdist = no
   have good scipy.stats.rv_discrete.ppf = yes
   have good scipy.stats.rv_continuous._reduce_func(floc,fscale) = yes
   have pylab = yes
   have lars = no
   have elasticnet = no
   have glmnet = no
   have skl = yes
   have ctypes = yes
   have libsvm = yes
   have shogun = yes
   have sg ge 0.6.5 = yes
   have shogun.mpd = yes
   have shogun.lightsvm = no
   have shogun.svrlight = no
   have shogun.krr = no
   have shogun.svmocas = yes
   have sg_fixedcachesize = yes
   have openopt = no
   have nibabel = yes
   have mdp = yes
   have mdp ge 2.4 = yes
   have nipy = yes
   have statsmodels = yes
   have pywt = yes
   have cpickle = yes
   have gzip = yes
   have cran-energy = no
   have griddata = yes
   have nipy.neurospin = no
   have lxml = yes
   have atlas_fsl = yes
   have atlas_pymvpa = no
   have hcluster = no
   have hdf5 = yes
   have joblib = yes
   have liblapack.so = yes
   have libsvm verbosity control = yes
   have mass = no
   have mock = yes
   have nose = yes
   have numpy_correct_unique = yes
   have numpydoc = no
   have pprocess = yes
   have pylab plottable = yes
   have pywt wp reconstruct = yes
   have pywt wp reconstruct fixed = no
   have rpy2 = no
   have sg ge 0.6.4 = yes
  Process Information:
   Name: ipython
   State:        R (running)
   Tgid: 8970
   Ngid: 0
   Pid:  8970
   PPid: 2053
   TracerPid:    0
   Uid:  1000    1000    1000    1000
   Gid:  1000    1000    1000    1000
   FDSize:       256
   Groups:       6 7 20 24 25 27 29 30 44 46 100 104 113 114 116 121 124 132 139 
999 1000
   NStgid:       8970
   NSpid:        8970
   NSpgid:       8970
   NSsid:        2053
   VmPeak:        1748880 kB
   VmSize:        1099484 kB
   VmLck:               0 kB
   VmPin:               0 kB
   VmHWM:          883520 kB
   VmRSS:          225784 kB
   VmData:         376948 kB
   VmStk:             136 kB
   VmExe:            3220 kB
   VmLib:          119892 kB
   VmPTE:            1732 kB
   VmPMD:              16 kB
   VmSwap:              0 kB
   Threads:      11
   SigQ: 0/126979
   SigPnd:       0000000000000000
   ShdPnd:       0000000000000000
   SigBlk:       0000000000000000
   SigIgn:       0000000001001000
   SigCgt:       0000000180000002
   CapInh:       0000000000000000
   CapPrm:       0000000000000000
   CapEff:       0000000000000000
   CapBnd:       0000003fffffffff
   Seccomp:      0
   Cpus_allowed: ff
   Cpus_allowed_list:    0-7
   Mems_allowed: 00000000,00000001
   Mems_allowed_list:    0
   voluntary_ctxt_switches:      90582
   nonvoluntary_ctxt_switches:   5294