[pymvpa] question about detrend and zscore

Wed Oct 7 22:44:13 UTC 2009

;)

so... input data is some distribution (in your example normal
distribution in each feature), which is symmetric.
Whenever we split samples into 2 groups, each of them will have mean
close to 0 but not quite (up to some stderr of mean estimate etc).

But then, with zscore or with detrend we do enforce mean to become 0 in
each feature (if we take samples of both labels/groups)... below you
could find my version of your script just refactored a bit and printing
min/max of those means across features

In [37]:## working on region in file /home/yoh/.tmp/python-9357lvE.py...
Total mean min/max: -0.05/0.048484  Correlate(not corrcoef necessarily): -0.02 
Total mean min/max: -0.00/0.000000  Correlate(not corrcoef necessarily): -2.67

but those means are for the full "cloud" (ie without splitting into
samples for labels 1 vs labels 0).

Now, imagine arbitrary feature, which has mean 0, take any half of the
samples, compute mean... it would not be 0 any longer but some X... what
should the mean of the other half then??? correct -- -X so that X+(-X)=0 
;-)

Here is the code:

from mvpa.suite import *

##make fake data##
ds = normalFeatureDataset(perlabel=100, nlabels=2, nchunks=2,
                          nfeatures=500, snr=2.5)

def our_stats(ds):
    ##split data based on labels...this can be done several ways##
    m0 = N.mean(ds.select(labels=0).samples, 0)
    m1 = N.mean(ds.select(labels=1).samples, 0)
    P.plot(m0, m1, 'o')
    ##show correlation##
    tmean = N.mean(ds.samples, 0)
    print "Total mean min/max: %.2f/%2f  Correlate(not corrcoef necessarily): %.2f " % \
          (min(tmean), max(tmean), N.correlate(m0,m1))

##show voxel averages before detrend/zscore##
P.subplot(2,1,1)
our_stats(ds)

##detrend/zscore##
detrend(ds, perchunk=False, model='linear') #had to turn off perchunk b/c of discontinuities in random data
zscore(ds, perchunk=False)

##show now after preprocessing
P.subplot(2, 1, 2)
our_stats(ds)

P.show()

On Wed, 07 Oct 2009, John Clithero wrote:

> Hi,
> I have written a brief script (copied below) that makes some fake data
> (similar to example on website) that essentially replicates my
> problem...this tells me that I am almost surely doing something very
> stupid with my code, so hopefully someone can point that out :-)
> If you run this code you should get a plot of the noisy data before
> detrend/zscore and then a strongly negatively correlated plot after
> detrend/zscore.  This result holds up for lots of different values in
> the fake data, and no detrend/no zscore.
> I hope this makes my mistake more obvious!
> John

> import os, sys, glob
> import numpy as N
> import pylab as P
> from mvpa.suite import *

> ##make fake data##
> ds = normalFeatureDataset(perlabel=100, nlabels=2, nchunks=2,
> 	nfeatures=500,
> 	snr=2.5)
> fake_dataset = MaskedDataset(samples=ds.samples, labels=ds.labels,
> 	chunks=ds.chunks)
> ##split data based on labels...this can be done several ways##
> pre_dataset0 = fake_dataset.selectSamples(N.array([l in [0] for l in
> fake_dataset.labels],
> 	dtype='bool'))
> pre_dataset1 = fake_dataset.selectSamples(N.array([l in [1] for l in
> fake_dataset.labels],
> 	dtype='bool'))
> ##show voxel averages before detrend/zscore##
> P.subplot(2,1,1)
> m0 = N.mean(pre_dataset0.samples,0)
> m1 = N.mean(pre_dataset1.samples,0)
> P.plot(m0,m1,'o')
> ##show correlation##
> print N.correlate(m0,m1)
> ##detrend/zscore##
> detrend(fake_dataset,perchunk=False,model='linear') #had to turn off
> perchunk b/c of discontinuities in random data
> zscore(fake_dataset,perchunk=True)
> ##split data based on labels...this can be done several ways##
> dataset0 = fake_dataset.selectSamples(N.array([l in [0] for l in
> fake_dataset.labels],
> 	dtype='bool'))
> dataset1 = fake_dataset.selectSamples(N.array([l in [1] for l in
> fake_dataset.labels],
> 	dtype='bool'))
> ##del dataset##
> a0 = N.mean(dataset0.samples,0)
> a1 = N.mean(dataset1.samples,0)
> ##plot voxel averages after detrending/zscore##
> P.subplot(2,1,2)
> P.plot(a0,a1,'o')
> P.show()
> ##show correlation##
> print N.correlate(a0,a1) #will be strong negative

> On Wed, Oct 7, 2009 at 9:26 AM, Yaroslav Halchenko
> <debian at onerussian.com> wrote:
> > heh heh ... hard to say anything definite ;) detrend/zscore should not
> > introduce spurious correlations, so it must be in the data somehow...
> > may be motion was too strongly correlated with the design? (you could
> > load motion correction parameters and just use them to see if there is a
> > correlation)

> > may be you could also provide actual data?

> > with smth like (before any zscore/detrend)

> > Hamster(dataset=dataset.selectFeatures([0,1]), summary=dataset.summary()).dump('/tmp/2pymvpa.nii.gz')

> > (may be increase number of selected features ;))

> > and send that /tmp/2pymvpa.nii.gz file

> > --
> >                                  .-.
> > =------------------------------   /v\  ----------------------------=
> > Keep in touch                    // \\     (yoh@|www.)onerussian.com
> > Yaroslav Halchenko              /(   )\               ICQ#: 60653192
> >                   Linux User    ^^-^^    [175555]

> > _______________________________________________
> > Pkg-ExpPsy-PyMVPA mailing list
> > Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> > http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

-- 
                                  .-.
=------------------------------   /v\  ----------------------------=
Keep in touch                    // \\     (yoh@|www.)onerussian.com
Yaroslav Halchenko              /(   )\               ICQ#: 60653192
                   Linux User    ^^-^^    [175555]