[pymvpa] question about detrend and zscore

Mon Oct 5 15:28:04 UTC 2009

Hey,

On Mon, Oct 05, 2009 at 11:05:34AM -0400, John Clithero wrote:
> Hi all,
> 
> I have a naive question about some of the preprocessing steps in PyMVPA.
> 
> I am loading in my data and detrending as follows (similar to some
> examples listed):
> 
> ##Load Data##	
> dataset = NiftiDataset((wb_file),
> 		labels=attr.labels,
> 		chunks=attr.chunks,
> 		mask=os.path.join(roidir,'wb.nii.gz'))
> 
> ##Detrend Data##
> detrend(dataset, perchunk=True, model='linear')
> zscore(dataset, targetdtype='float32')
> 
> I have two types of trials (A and B from the labels).
> If I plot the average voxel value of A trials versus B trials, I get a
> perfectly negatively correlated line.
> In other words, if mean(sample voxel on A trials) = .5, then
> mean(sample voxel on B trials) = -.5. This is true for all voxels.
>
> I have looked over miscfx.py, but I thought would send an email to see
> if (1) this is what "should happen"

Your description indicates a substantial univariate effect in the signal
of each voxel in your mask (if the above scenario is true for every
voxel in the dataset). zscoring transforms the data to have zero mean,
hence a baseline difference is removed and the mean of the classes ands
up being above and below zero. That can happen if there is such signal
in the data, but it need not happen (e.g. if there is a more complex
multivariate signal, or no signal at all).

If you provide some more information about the nature of 'wb_file'
(preprocessing done to it outside PyMVPA) and this ROI (e.g. size) it
might be possible to figure out some more aspects of this problem.

> and (2) if so, what the idea is
> for making such a split before running a classifier.

If this question refers to why normalization is useful:

Quote from:

  Pereira, F., Mitchell, T. & Botvinick, M. (in press). Machine learning
  classifiers and fMRI: A tutorial overview. Neuroimage.

| A final issue to consider in the construction of examples is that of
| preprocessing. By this we do not mean the usual preprocessing of
| neuroimaging data, e.g. motion correction or detrending, a topic covered
| in great depth in Strother, (2006). We mean, rather, that done on the
| examples, considered as a matrix where each row is an example (or each
| column is a feature). In the example study, we normalized each example
| (row) to have mean 0 and standard deviation 1. The idea in this case is
| to reduce the effect of large, image-wide signal changes. Another
| possibility would be to normalize each feature (column) to have mean 0
| and standard deviation 1, either across the entire experiment or within
| examples coming from the same run. This is worth considering if there is
| a chance that some voxels will have much wider variation in signal
| amplitude than others. Although a linear classifier can in principle
| compensate for this to some degree by scaling the coefficient for each
| voxel, there are situations where it will not and thus this
| normalization will help. Our own informal experience suggests either row
| or column normalization is generally beneficial and should be tried in
| turn.

Cheers,

Michael

-- 
GPG key:  1024D/3144BE0F Michael Hanke
http://mih.voxindeserto.de