[pymvpa] postproc=mean_sample(), is there a bug?
marco tettamanti
mrctttmnt at gmail.com
Mon Aug 17 14:49:11 UTC 2015
Dear all,
I have encountered a strange behaviour in running a Monte Carlo permutation
analysis, though this may just be my lack of understanding.
Running the analysis without "postproc=mean_sample()" to collect p-values for
each fold seems to produce sensible results:
-----------------
repeats=100
repeater = Repeater(count=repeats)
permutator = AttributePermutator('targets', limit={'partitions': 1}, count=1)
null_cv = CrossValidation(clf, ChainNode([partitioner, permutator],
space=partitioner.get_space()))
distr_est = MCNullDist(repeater, tail='left', measure=null_cv,
enable_ca=['dist_samples'])
cv_mc = CrossValidation(clf, partitioner, null_dist=distr_est,
enable_ca=['confusion', 'stats'])
mcerr = cv_mc(fds)
In [72]: print cv_mc.ca.stats.stats['ACC']
0.537037037037
In [73]: p_mc = cv_mc.ca.null_prob
In [74]: print np.ravel(p_mc)
[ 0.66 0.34 0.59 0.42 0.28 0.14 0.3 0.06 0.64 0.03 0.34 0.26
0.59 0.12 0.34 0.09 0.27 0.1 ]
In [75]: print np.mean(np.ravel(p_mc))
0.309444444444
-----------------
However, the same snippet, but with the addition of "postproc=mean_sample()",
yields a quite strange p-value, if compared to the mean p-value across folds.
What got me suspicious, is that this p-value always remains constant,
independently of whether I repeat the analysis n-times (as if no random
permutation was actually occurring), and independently of whether I use a
different permutator (e.g. AttributePermutator('targets', count=1, limit='chunks')):
-----------------
repeats=100
repeater = Repeater(count=repeats)
permutator = AttributePermutator('targets', limit={'partitions': 1}, count=1)
null_cv = CrossValidation(clf, ChainNode([partitioner, permutator],
space=partitioner.get_space()), postproc=mean_sample())
distr_est = MCNullDist(repeater, tail='left', measure=null_cv,
enable_ca=['dist_samples'])
cv_mc = CrossValidation(clf, partitioner, postproc=mean_sample(),
null_dist=distr_est, enable_ca=['confusion', 'stats'])
mcerr = cv_mc(fds)
In [50]: print cv_mc.ca.stats.stats['ACC']
0.537037037037
In [51]: p_mc = cv_mc.ca.null_prob
In [52]: print np.ravel(p_mc)
[ 0.00980392]
In [53]: print np.mean(np.ravel(p_mc))
0.00980392156863
-----------------
Can anybody reproduce this behaviour or explain what I am doing wrong?
Below you find some info on my (toy-)dataset and on my installation.
Thank you and all the best!
Marco
--
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Email: tettamanti.marco at hsr.it
Skype: mtettamanti
http://scholar.google.it/citations?user=x4qQl4AAAAAJ
In [98]: print fds.summary()
Dataset: 108x100 at float32, <sa: chunks,targets,time_coords,time_indices>, <fa:
voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=-0.0109682 std=0.992155 var=0.984372 min=-3.62952 max=4.32006
Counts of targets in each chunk:
chunks\targets Whatd Whend Whetd
--- --- ---
0.0 2 2 2
1.0 2 2 2
2.0 2 2 2
3.0 2 2 2
4.0 2 2 2
5.0 2 2 2
6.0 2 2 2
7.0 2 2 2
8.0 2 2 2
9.0 2 2 2
10.0 2 2 2
11.0 2 2 2
12.0 2 2 2
13.0 2 2 2
14.0 2 2 2
15.0 2 2 2
16.0 2 2 2
17.0 2 2 2
Summary for targets across chunks
targets mean std min max #chunks
Whatd 2 0 2 2 18
Whend 2 0 2 2 18
Whetd 2 0 2 2 18
Summary for chunks across targets
chunks mean std min max #targets
0 2 0 2 2 3
1 2 0 2 2 3
2 2 0 2 2 3
3 2 0 2 2 3
4 2 0 2 2 3
5 2 0 2 2 3
6 2 0 2 2 3
7 2 0 2 2 3
8 2 0 2 2 3
9 2 0 2 2 3
10 2 0 2 2 3
11 2 0 2 2 3
12 2 0 2 2 3
13 2 0 2 2 3
14 2 0 2 2 3
15 2 0 2 2 3
16 2 0 2 2 3
17 2 0 2 2 3
Sequence statistics for 108 entries from set ['Whatd', 'Whend', 'Whetd']
Counter-balance table for orders up to 2:
Targets/Order O1 | O2 |
Whatd: 35 1 0 | 34 2 0 |
Whend: 0 35 1 | 0 34 2 |
Whetd: 0 0 35 | 0 0 34 |
Correlations: min=-0.5 max=0.96 mean=-0.0093 sum(abs)=47
In [99]: mvpa2.wtf()
Out[99]:
Current date: 2015-08-17 16:32
PyMVPA:
Version: 2.3.1
Hash: d1da5a749dc9cc606bd7f425d93d25464bf43454
Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
Version control (GIT):
GIT information could not be obtained due
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
SYSTEM:
OS: posix Linux 4.1.0-1-amd64 #1 SMP Debian 4.1.3-1 (2015-08-03)
Distribution: debian/stretch/sid
EXTERNALS:
Present: atlas_fsl, cPickle, ctypes, good
scipy.stats.rv_continuous._reduce_func(floc,fscale), good
scipy.stats.rv_discrete.ppf, griddata, gzip, h5py, hdf5, ipython, joblib,
liblapack.so, libsvm, libsvm verbosity control, lxml, matplotlib, mdp, mdp ge
2.4, mock, nibabel, nipy, nose, numpy, numpy_correct_unique, pprocess, pylab,
pylab plottable, pywt, pywt wp reconstruct, reportlab, running ipython env,
scipy, sg ge 0.6.4, sg ge 0.6.5, sg_fixedcachesize, shogun, shogun.mpd,
shogun.svmocas, skl, statsmodels, weave
Absent: atlas_pymvpa, cran-energy, elasticnet, glmnet, good
scipy.stats.rdist, hcluster, lars, mass, nipy.neurospin, numpydoc, openopt, pywt
wp reconstruct fixed, rpy2, shogun.krr, shogun.lightsvm, shogun.svrlight
Versions of critical externals:
ctypes : 1.1.0
h5py : 2.5.0
hdf5 : 1.8.13
ipython : 2.3.0
lxml : 3.4.4
matplotlib : 1.4.2
mdp : 3.4
mock : 1.3.0
nibabel : 2.0.1
nipy : 0.4.0.dev
numpy : 1.8.2
pprocess : 0.5
reportlab : 3.2.0
scipy : 0.14.1
shogun : 3.2.0
shogun:full : 3.2.0_2014-2-17_18:46
shogun:rev : 197120
skl : 0.16.1
Matplotlib backend: TkAgg
RUNTIME:
PyMVPA Environment Variables:
PYTHONPATH :
":/usr/lib/python2.7/lib-old:/usr/local/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages/wx-3.0-gtk2:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-dynload:/usr/bin:.:/home/marco/.ipython:/usr/lib/python2.7/dist-packages:/usr/lib/pymodules/python2.7:/usr/lib/python2.7/dist-packages/IPython/extensions:/usr/lib/python2.7:/usr/lib/python2.7/dist-packages/PILcompat:/home/marco/.cache/scipy/python27_compiled:/usr/lib/python2.7/dist-packages/gtk-2.0:/home/marco/data/bicocca/MVPA_IntentionToMove/mvpa_itm"
PyMVPA Runtime Configuration:
[general]
verbose = 1
[externals]
have running ipython env = yes
have ipython = yes
have numpy = yes
have scipy = yes
have matplotlib = yes
have h5py = yes
have reportlab = yes
have weave = yes
have good scipy.stats.rdist = no
have good scipy.stats.rv_discrete.ppf = yes
have good scipy.stats.rv_continuous._reduce_func(floc,fscale) = yes
have pylab = yes
have lars = no
have elasticnet = no
have glmnet = no
have skl = yes
have ctypes = yes
have libsvm = yes
have shogun = yes
have sg ge 0.6.5 = yes
have shogun.mpd = yes
have shogun.lightsvm = no
have shogun.svrlight = no
have shogun.krr = no
have shogun.svmocas = yes
have sg_fixedcachesize = yes
have openopt = no
have nibabel = yes
have mdp = yes
have mdp ge 2.4 = yes
have nipy = yes
have statsmodels = yes
have pywt = yes
have cpickle = yes
have gzip = yes
have cran-energy = no
have griddata = yes
have nipy.neurospin = no
have lxml = yes
have atlas_fsl = yes
have atlas_pymvpa = no
have hcluster = no
have hdf5 = yes
have joblib = yes
have liblapack.so = yes
have libsvm verbosity control = yes
have mass = no
have mock = yes
have nose = yes
have numpy_correct_unique = yes
have numpydoc = no
have pprocess = yes
have pylab plottable = yes
have pywt wp reconstruct = yes
have pywt wp reconstruct fixed = no
have rpy2 = no
have sg ge 0.6.4 = yes
Process Information:
Name: ipython
State: R (running)
Tgid: 8970
Ngid: 0
Pid: 8970
PPid: 2053
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 256
Groups: 6 7 20 24 25 27 29 30 44 46 100 104 113 114 116 121 124 132 139
999 1000
NStgid: 8970
NSpid: 8970
NSpgid: 8970
NSsid: 2053
VmPeak: 1748880 kB
VmSize: 1099484 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 883520 kB
VmRSS: 225784 kB
VmData: 376948 kB
VmStk: 136 kB
VmExe: 3220 kB
VmLib: 119892 kB
VmPTE: 1732 kB
VmPMD: 16 kB
VmSwap: 0 kB
Threads: 11
SigQ: 0/126979
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000001001000
SigCgt: 0000000180000002
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
Seccomp: 0
Cpus_allowed: ff
Cpus_allowed_list: 0-7
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 90582
nonvoluntary_ctxt_switches: 5294
More information about the Pkg-ExpPsy-PyMVPA
mailing list