[pymvpa] effect size (in lieu of zscore)

Fri Dec 23 04:15:24 UTC 2011

Hi Yaroslav,

Thanks for the help! I've answered your questions for this message inline
below. I'm currently running a couple analyses to address questions from
your 2nd message… should be ready to go by tomorrow.

Best,
Mike

On Thu, Dec 22, 2011 at 12:31 PM, Yaroslav Halchenko
<debian at onerussian.com>wrote:

> Hi Mike,
>
> First of all thanks for looking into the details and sharing it here --
> it is an important and under-explored topic imho.  My comments are below
> -- sorry if they are not very well structured
>
> >    With old z-scoring, C=-5:
> >    Accuracies are around 60% with a heavy selection bias towards the rest
> >    condition (chooses "rest" correctly 27/27 times, but also chooses
> rest for
> >    21/27 sound conditions).
>
> am I taking you right -- you have multi-class classification (multiple
> 'sound conditions') here? or did you collapse all non-rest conditions
> into 1?
>

Sorry, I should have made this more clear. For this sanity check test, I
got rid of 2 of my 3 "sound" conditions. The experiment involved 81
presentations of each of 3 sound conditions (243 total), and 81
presentations of of rest/baseline. After some averaging, this is reduced to
27 of each. For this basic test, I was comparing 27 examples of one of
these sounds against the 27 rest (just a single pairwise classifier)… so
all should be balanced in numbers. In the problematic subject who I'm
currently looking at/testing, I had to throw out a couple of experimental
runs, so the numbers are actually 21 and 21, which is reflected in the
summary output below.

>
> by "any of the sound conditions vs. the silence condition at 98-100" did
> you mean separate pair-wise classifiers or multi-class?
>
> what is the output of print dataset.summary()?
>

*I'm pasting it here:*

In [37]: print dataset.summary()
Dataset: 42x43049 at float64, <sa: chunks,targets,time_coords,time_indices>,
<fa: thr_glm,voxel_indices>, <a:
imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=0.00947037 std=0.609248 var=0.371183 min=-2.06233 max=2.07801

Counts of targets in each chunk:
  chunks\targets major rest
                  ---   ---
       0.0         1     1
       1.0         1     1
       2.0         1     1
       3.0         1     1
       4.0         1     1
       5.0         1     1
       6.0         1     1
       7.0         1     1
       8.0         1     1
       9.0         1     1
      10.0         1     1
      11.0         1     1
      12.0         1     1
      13.0         1     1
      14.0         1     1
      15.0         1     1
      16.0         1     1
      17.0         1     1
      18.0         1     1
      19.0         1     1
      20.0         1     1

Summary for targets across chunks
  targets mean std min max #chunks
  major     1   0   1   1     21
   rest     1   0   1   1     21

Summary for chunks across targets
  chunks mean std min max #targets
    0      1   0   1   1      2
    1      1   0   1   1      2
    2      1   0   1   1      2
    3      1   0   1   1      2
    4      1   0   1   1      2
    5      1   0   1   1      2
    6      1   0   1   1      2
    7      1   0   1   1      2
    8      1   0   1   1      2
    9      1   0   1   1      2
   10      1   0   1   1      2
   11      1   0   1   1      2
   12      1   0   1   1      2
   13      1   0   1   1      2
   14      1   0   1   1      2
   15      1   0   1   1      2
   16      1   0   1   1      2
   17      1   0   1   1      2
   18      1   0   1   1      2
   19      1   0   1   1      2
   20      1   0   1   1      2
Sequence statistics for 42 entries from set ['major', 'rest']
Counter-balance table for orders up to 2:
Targets/Order O1     |  O2     |
    major:     0 21  |  20  0  |
    rest:     20  0  |   0 20  |
Correlations: min=-1 max=1 mean=-0.024 sum(abs)=41

> 1. with dominance of rest condition samples (if you haven't
> ballanced it out) classifier might be just preferring 'rest' condition
> overall
>
> 2. with multiclass you might also be hitting here the present problem of
> non-arbitrary breakage of the ties - thus leading to collapsing into
> 'rest' condition.  With just released mvpa2 -- how does it look if you
> try SMLR or kNN?
>

So I don't think these 2 apply, as it should be balanced and not
multi-class.

>
>
> >    With old z-scoring, C=-1:
> >    Same as above, except with accuracies around 54%
> >    With NO z-scoring, C=-5 or C=-1:
> >    Accuracies are 98-100%
> >    With C=-5 (or C=-1) and zscore(dataset, chunks_attr='chunks',
> >    dtype='float32'):
> >    Accuracies are about 98%
>
> >    So it looks like:
> >    (a) Using C=-5 (as opposed to C=-1) helps a little with the zscore
> against
> >    rest method. Although it might help across the board, but there's a
> >    ceiling effect with the other combinations.
>
> just to make it clear -- by 'old z-scoring' you meant z-scoring against
> rest condition, right?  and in new one you just z-scored across all
> conditions (including rest) and it lead to good generalization...
>

Yes…that's correct.

>
> question: what was exact line you used to z-score against rest
> condition? we might like also to check if everything implemented as it
> should but also it might simply be that due to smaller number of trials
> for rest condition alone, estimates of mean/variance were not stable
> thus leading to noisy 'standardized' samples thus lower performance.
>

*The old line was:*
zscore(dataset, chunks_attr='chunks', param_est=('targets', ['rest']),
dtype='float32')
which I just ripped off from one of the website's tutorial pages.
The new line removed the "param_est=('targets', ['rest'])" and left
everything else the same.

I, too, am worried about the noisy baseline, though I'd thought that 25% of
all trials should be plenty. (Additionally, it seems to act as a normal
baseline in the GLM… auditory+ regions are activated in sound>silence
contrast.)

>
> >    (b) There's a huge difference between whether I zscore against rest or
> >    with the whole time series. I'm not sure what's up... running sounds >
> >    silence GLMs in FSL show obvious responses in the expected brain
> regions.
>

Just to be clear: 9 runs, each containing 39 volumes (this was slow
event-related sparse sampling). 3 of these were used for an orthogonal
behavioral task and thrown out. The remaining 36 (in each run) were
9xsilence, 9xSound1, 9xSound2, and 9xSound3. For my "against silence" MVPA
sanity check above, I threw out 2 of the sound conditions, so the remaining
sound and silence should be balanced. For my more empirically interesting
analyses, I've been throwing out the rest and one of the sound conditions,
so the classifier looks at 2 balanced sound conditions.

>
> so once again knowing number of samples in each chunk and how z-scoring
> against was done would help us to get better clue
>

Thanks again! -Mike

>
>
> --
> =------------------------------------------------------------------=
> Keep in touch                                     www.onerussian.com
> Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20111222/ddb6e40e/attachment-0001.html>