[pymvpa] effect size (in lieu of zscore)

Mike E. Klein michaeleklein at gmail.com
Fri Dec 23 19:16:41 UTC 2011


Just a couple of updates and questions:

1. For this one particular subject, I'm still seeing the strange negative
peak to the chance distribution, even without any z-scoring. The shape
looks remarkably similar with or without zscoring (whether I use the raw
values or the effect sizes as input). I think my confusion here is, even if
I did several things wrong in my code, I'd expect no worse than a
regular-old looking chance distribution (centered on 0). There's about
40,000 3.5 isotrophic voxels in that subject's brain mask, so plenty of
observations. Just eyeballing, the peak is centered at about -8%, and the
bulk (95%) of observations fall between about -32% and +22% … so it's a
notable shift.

2. As a temporary workaround for this 9 vs. 27 chunk issue (from the prior
message), I'm doing detrending and zscoring with my 9 "real" runs, saving
the output time series as a nifti (without removing any of the volumes),
and then reloading it as a new fmri_dataset. This second dataset then gets
the attribute file with 27 chunks, which get "run"-averaged. This method
doesn't allow me to throw out my 3 "distractor" volumes in each run (which
are tied to a behavioral response and may impact the zscores), so it's not
optimal.

This last one is a bit more theoretical (so perhaps more interesting to
people!).

3. I'm using a 3X3 design. So while I technically have 9 unique sounds, for
this part of my study I collapse in one direction (fundamental frequency),
giving me three classes of sounds (each class of which has a variable
fundamental frequency). Each of my 9 runs has the 9 unique sounds, each
repeated thrice. My current run-averaging scheme has me breaking up each
run into 3 smaller portions ("chunklets" if you will…) by time (early part
of the run, middle period, late period) and averaging within a chunklet.
What this means is that a single example that is formed within a "chunklet"
is made from 3 fMRI volumes, each following a sound from one experimental
condition (but from different levels of the orthogonal condition:
fundamental frequency). So, from my perspective, I'm feeding the SVM
examples which, to my eye, are all "equivalent" - each example represents a
single category/level in one dimension, and a "smear" of levels in the
other dimension.
A second way I could be doing this, but haven't been, is to not define
"chunklets" by time within a run (early, middle, late), but instead by
level of the orthogonal condition (low fundamental frequency, mid FF, high
FF). A single example for the SVM, after averaging, would then represent
sounds from a single experimental condition *and* a single FF (so not
smeared over either dimension). These averaged examples would presumably be
*cleaner* (due to the combining of volumes from identical acoustic
stimuli), but would each be less representative of the category-of-interest
as a whole. I'm not sure what makes more sense for a support vector
machine: to work on cleaner (but less representative) examples, or more
variable (but more representative) examples.

Hopefully all of the above made sense. Happy holidays, everyone, and thanks
as always!

-Mike




On Thu, Dec 22, 2011 at 11:43 PM, Mike E. Klein <michaeleklein at gmail.com>wrote:

> OK, I think I'm starting to get this… after mightily confusing myself!
>
> So my subjects had 9 experimental "runs," but I recoded this into 27
> chunks (by merely diving each run into 3 by time: 1st, 2nd and 3rd thirds).
>
> The reason why I did this was, if I run-averaged over the entire "real"
> runs, it would only leave the classifier with 18 comparisons to make. So
> the classifier would have to perform incredibly well to surpass
> significance thresholds in a binomial n=18 situation. 27 chunks (so 54
> tests) seemed like a much better bet. I couldn't seem to get the software
> to work without *any* run-averaging (162 tests), it would hang on
> searchlight progress = 0%. (Though this seemed like not the best way to go
> for other reasons as well.)
>
> So you're absolutely right… I only have 3 baselines (per chunk) to zscore
> against.
>
> Would you recommend that I zscore within each of the 9 "real" runs, but
> then run-average, test and do LOOCV with the larger number of chunks? (If
> so, I'm not sure how to do this… it seems like it would take parallel
> attribute file associations.)
>
> Thanks again for all the help!
>
> Mike
>
>
> On Thu, Dec 22, 2011 at 11:26 PM, Yaroslav Halchenko <
> debian at onerussian.com> wrote:
>
>>
>> On Thu, 22 Dec 2011, Mike E. Klein wrote:
>> > got rid of 2 of my 3 "sound" conditions. The experiment involved 81
>> > presentations of each of 3 sound conditions (243 total), and 81
>> > presentations of of rest/baseline. After some averaging, this is
>> reduced to
>> > 27 of each.
>> > ...
>> > *The old line was:*
>> > zscore(dataset, chunks_attr='chunks', param_est=('targets', ['rest']),
>> > dtype='float32')
>> > which I just ripped off from one of the website's tutorial pages.
>> > The new line removed the "param_est=('targets', ['rest'])" and left
>> > everything else the same.
>>
>> so  here is the catch ;-)  from above I deduce that you had only 81/27=3
>> samples of 'rest' within each chunk... so mean/variance estimation
>> was not really "good" (I think some time recently I even added a warning
>> for such cases [1] -- which version are you using? ;-))
>>
>> [1]
>> https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/mappers/zscore.py#L149
>>
>> > Just to be clear: 9 runs, each containing 39 volumes (this was slow
>> > event-related sparse sampling). 3 of these were used for an orthogonal
>> > behavioral task and thrown out. The remaining 36 (in each run) were
>> > 9xsilence, 9xSound1, 9xSound2, and 9xSound3. For my "against silence"
>> MVPA
>> > sanity check above, I threw out 2 of the sound conditions, so the
>> remaining
>> > sound and silence should be balanced. For my more empirically
>> interesting
>> > analyses, I've been throwing out the rest and one of the sound
>> conditions,
>> > so the classifier looks at 2 balanced sound conditions.
>>
>> d'oh -- I wrote above first before reading this... but I guess ok -- so
>> you used 3 volumes after each stimuli onset, that is why for 3 'rest'
>> conditions in each run you had 9 volumes?  then pretty much the same
>> logic applies on "why it didn't work", although the warning obviously
>> wouldn't be triggered for such cases
>>
>>
>> --
>> =------------------------------------------------------------------=
>> Keep in touch                                     www.onerussian.com
>> Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic
>>
>> _______________________________________________
>> Pkg-ExpPsy-PyMVPA mailing list
>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20111223/3441b395/attachment.html>


More information about the Pkg-ExpPsy-PyMVPA mailing list