[pymvpa] normalization: zscore by example?
Yaroslav Halchenko
debian at onerussian.com
Mon Nov 7 23:20:38 UTC 2011
On Mon, 07 Nov 2011, Mike E. Klein wrote:
> - My reason for wanting to do this is because of a relative paucity of
> examples (9 per run, 3 categories, 9 runs), which I plan to reduce in
> number further by some averaging. It seems (to me) that the zscoring would
> be more accurate when considering thousands of voxels (per example) as
> opposed to several voxels (per run).
oy... full answer and pros/cons/against/for would take a while... there
are papers which were published and such "preprocessing" was in-place...
I even debated with one of the authors but never got to submit some kind
of cririque... in case of Francisco's -- I guess I have missed that
particular piece or it was in respect only for classification (which is
as I said is ok)
really coarse reasoning from me: the problem with zscoring (or
even just plain demeaning) across voxels is that it leaks information
among voxels... e.g. consider the most obvious simplified example
where you have 2 voxels, 1 of which is informative, and one is not.
after demeaning (let stay to the ground) -- they both become informative
in respect to the condition of interest, so if you were to judge among
"who is informative" -- you would be misguided
in case of full brain, where majority of voxels is not informative and
only few (unless you have really simple contrast/paradigm) are
informative, zscoring shouldn't be as detremental, but fact would remain
valid -- mean/std might be stimuli-dependent thus you would leak
that information into each voxel (possibly even removing effects from
informative voxels). So in case of distributed effects having a large
"mass" -- they will be reflected as diagnostic information in the mean
of the volumes, and then would be introduced into every voxel in the
volume.
altogether -- to be on a safe side I would not do anything like that ;)
zscoring -- I would just zscore across run before extracting/meaning
any samples you want to use for classification...
hope it makes sense...
> - I guess I don't see why zscoring this way would render a searchlight
> invalid. I'm looking at the Pereira paper clip that was referenced
> recently in the listserv: "In the example study, we normalized each
> example (row) to have mean 0 and standard deviation 1. The idea in this
> case is to reduce the effect of large, image-wide signal changes. Another
> possibility would be to normalize each feature (column) to have mean 0 and
> standard deviation 1, either across the entire experiment or within
> examples coming from the same run."
--
=------------------------------------------------------------------=
Keep in touch www.onerussian.com
Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic
More information about the Pkg-ExpPsy-PyMVPA
mailing list