[pymvpa] effect size (in lieu of zscore)

Thu Dec 29 21:01:39 UTC 2011

(I've attached a picture of one of the distributions. I still can't fathom
how it ends up with this negative shift, even if I'm actually doing several
things incorrectly!)

-Mike

On Fri, Dec 23, 2011 at 2:16 PM, Mike E. Klein <michaeleklein at gmail.com>wrote:

> Just a couple of updates and questions:
>
> 1. For this one particular subject, I'm still seeing the strange negative
> peak to the chance distribution, even without any z-scoring. The shape
> looks remarkably similar with or without zscoring (whether I use the raw
> values or the effect sizes as input). I think my confusion here is, even if
> I did several things wrong in my code, I'd expect no worse than a
> regular-old looking chance distribution (centered on 0). There's about
> 40,000 3.5 isotrophic voxels in that subject's brain mask, so plenty of
> observations. Just eyeballing, the peak is centered at about -8%, and the
> bulk (95%) of observations fall between about -32% and +22% … so it's a
> notable shift.
>
> 2. As a temporary workaround for this 9 vs. 27 chunk issue (from the prior
> message), I'm doing detrending and zscoring with my 9 "real" runs, saving
> the output time series as a nifti (without removing any of the volumes),
> and then reloading it as a new fmri_dataset. This second dataset then gets
> the attribute file with 27 chunks, which get "run"-averaged. This method
> doesn't allow me to throw out my 3 "distractor" volumes in each run (which
> are tied to a behavioral response and may impact the zscores), so it's not
> optimal.
>
> This last one is a bit more theoretical (so perhaps more interesting to
> people!).
>
> 3. I'm using a 3X3 design. So while I technically have 9 unique sounds,
> for this part of my study I collapse in one direction (fundamental
> frequency), giving me three classes of sounds (each class of which has a
> variable fundamental frequency). Each of my 9 runs has the 9 unique sounds,
> each repeated thrice. My current run-averaging scheme has me breaking up
> each run into 3 smaller portions ("chunklets" if you will…) by time (early
> part of the run, middle period, late period) and averaging within a
> chunklet. What this means is that a single example that is formed within a
> "chunklet" is made from 3 fMRI volumes, each following a sound from one
> experimental condition (but from different levels of the orthogonal
> condition: fundamental frequency). So, from my perspective, I'm feeding the
> SVM examples which, to my eye, are all "equivalent" - each example
> represents a single category/level in one dimension, and a "smear" of
> levels in the other dimension.
> A second way I could be doing this, but haven't been, is to not define
> "chunklets" by time within a run (early, middle, late), but instead by
> level of the orthogonal condition (low fundamental frequency, mid FF, high
> FF). A single example for the SVM, after averaging, would then represent
> sounds from a single experimental condition *and* a single FF (so not
> smeared over either dimension). These averaged examples would presumably be
> *cleaner* (due to the combining of volumes from identical acoustic
> stimuli), but would each be less representative of the category-of-interest
> as a whole. I'm not sure what makes more sense for a support vector
> machine: to work on cleaner (but less representative) examples, or more
> variable (but more representative) examples.
>
> Hopefully all of the above made sense. Happy holidays, everyone, and
> thanks as always!
>
> -Mike
>
>
>
>
> On Thu, Dec 22, 2011 at 11:43 PM, Mike E. Klein <michaeleklein at gmail.com>wrote:
>
>> OK, I think I'm starting to get this… after mightily confusing myself!
>>
>> So my subjects had 9 experimental "runs," but I recoded this into 27
>> chunks (by merely diving each run into 3 by time: 1st, 2nd and 3rd thirds).
>>
>> The reason why I did this was, if I run-averaged over the entire "real"
>> runs, it would only leave the classifier with 18 comparisons to make. So
>> the classifier would have to perform incredibly well to surpass
>> significance thresholds in a binomial n=18 situation. 27 chunks (so 54
>> tests) seemed like a much better bet. I couldn't seem to get the software
>> to work without *any* run-averaging (162 tests), it would hang on
>> searchlight progress = 0%. (Though this seemed like not the best way to go
>> for other reasons as well.)
>>
>> So you're absolutely right… I only have 3 baselines (per chunk) to zscore
>> against.
>>
>> Would you recommend that I zscore within each of the 9 "real" runs, but
>> then run-average, test and do LOOCV with the larger number of chunks? (If
>> so, I'm not sure how to do this… it seems like it would take parallel
>> attribute file associations.)
>>
>> Thanks again for all the help!
>>
>> Mike
>>
>>
>> On Thu, Dec 22, 2011 at 11:26 PM, Yaroslav Halchenko <
>> debian at onerussian.com> wrote:
>>
>>>
>>> On Thu, 22 Dec 2011, Mike E. Klein wrote:
>>> > got rid of 2 of my 3 "sound" conditions. The experiment involved 81
>>> > presentations of each of 3 sound conditions (243 total), and 81
>>> > presentations of of rest/baseline. After some averaging, this is
>>> reduced to
>>> > 27 of each.
>>> > ...
>>> > *The old line was:*
>>> > zscore(dataset, chunks_attr='chunks', param_est=('targets', ['rest']),
>>> > dtype='float32')
>>> > which I just ripped off from one of the website's tutorial pages.
>>> > The new line removed the "param_est=('targets', ['rest'])" and left
>>> > everything else the same.
>>>
>>> so  here is the catch ;-)  from above I deduce that you had only 81/27=3
>>> samples of 'rest' within each chunk... so mean/variance estimation
>>> was not really "good" (I think some time recently I even added a warning
>>> for such cases [1] -- which version are you using? ;-))
>>>
>>> [1]
>>> https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/mappers/zscore.py#L149
>>>
>>> > Just to be clear: 9 runs, each containing 39 volumes (this was slow
>>> > event-related sparse sampling). 3 of these were used for an orthogonal
>>> > behavioral task and thrown out. The remaining 36 (in each run) were
>>> > 9xsilence, 9xSound1, 9xSound2, and 9xSound3. For my "against silence"
>>> MVPA
>>> > sanity check above, I threw out 2 of the sound conditions, so the
>>> remaining
>>> > sound and silence should be balanced. For my more empirically
>>> interesting
>>> > analyses, I've been throwing out the rest and one of the sound
>>> conditions,
>>> > so the classifier looks at 2 balanced sound conditions.
>>>
>>> d'oh -- I wrote above first before reading this... but I guess ok -- so
>>> you used 3 volumes after each stimuli onset, that is why for 3 'rest'
>>> conditions in each run you had 9 volumes?  then pretty much the same
>>> logic applies on "why it didn't work", although the warning obviously
>>> wouldn't be triggered for such cases
>>>
>>>
>>> --
>>> =------------------------------------------------------------------=
>>> Keep in touch                                     www.onerussian.com
>>> Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic
>>>
>>> _______________________________________________
>>> Pkg-ExpPsy-PyMVPA mailing list
>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20111229/addd97ec/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: distrib.png
Type: image/png
Size: 16470 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20111229/addd97ec/attachment-0001.png>