[pymvpa] ANNOUNCEMENT: bugfix 0.4.7 release -- data scaling issue

Yaroslav Halchenko debian at onerussian.com
Tue Mar 8 20:58:35 UTC 2011


Dear PyMVPA-ers,

Instead of sticking our heads into the sand and pretending that there
is nothing to worry about, we have decided to announce the
availability of 0.4.7 release potentially fixing a critical issue for
SOME of our users who got affected by it.

This issue was revealed while recently troubleshooting "suspicious
results" Nynke van der Laan has posted to the list (so thank you Nynke
for raising the concern!).  We think/hope that this issue is not
affecting the majority of PyMVPA users, but only few.  In our
experience (we dealt with files obtained from SPM, AFNI, FSL), scaling
header fields were not set to anything but default 1.0 for scale and
0.0 for intercept (or 0.0 for scale altogether).

Depending on the software you used to get the volumes, here is our
take on probability of you being effected:

BrainVoyager:
      if Analyze files, those according to [1] seem to not carry
      scaling parameters. But according to the report from Roberto
      Guidotti they might, thus                   -- medium probability
FreeSurfer: on files we have, none had != 1.0/0.0 -- low probability
FSL:  most probably  scaling/intercept == 1.0/0.0 -- low probability
NiBabel: if you have managed to preprocess data while using nibabel to
      load/save - you are nearly guaranteed to have scaling/intercept
      set                                         -- high probability
SPM:  you might have scaling/intercept != 1.0/0.0 -- medium probability

N.B. in Analyze files, it is vendor/software-specific treatment [1] of
     scaling, so we have not cared about those, and hope that no-one
     uses Analyze nowadays.

But even if you had those fields set implying necessary scaling, if
they were the same per each chunk (run) and you have used zscoring to
process your data -- they should be of no effect since differences get
removed by zscoring.  And even if you had scaling different across
volumes, most probably those were quite consistent and not reflecting
the effect of interest, thus at worst you had got no results before
while could obtain it now with proper scaling.

Since nibabel (used in 0.6 series) unconditionally applies data
transformation parameters (scaling and intercept) if they are defined,
users of PyMVPA 0.6 series are safe regardless either there is scaling
and can ignore this email -- their data got properly scaled on
load/save operations.

Quick work-around
=================

If you are affected and want to avoid upgrade to 0.4.7 (or most recent
head of maint/0.5 for 0.5 series you might be using):

   As a quick resolution we advise you to convert your files using
   your favorite tool (or a little Python script below) into NIfTI
   files with no needed scaling/intercept, prior loading them into
   PyMVPA for processing.

   If you had the same scaling/intercept across files but you were
   saving resultant maps using map2nifti -- you could simply reset
   those fields in the original "Nifti" dataset's niftihdr, e.g.
   ds.niftihdr['scl_slope'] = 1.0; ds.niftihdr['scl_inter'] = 0.;
   prior calling map2nifti which would use that dataset's niftihdr for
   mapping data back to NIfTI

Long saga
=========

Who is effected
---------------

"Scaling" issue is relevant for those who use PyNifti (0.4.x series
and 0.5, whenever nibabel is not installed/used) to access
NIfTI/Analyze files, and who was loading/saving them in following two
possible use cases:

U1 load data from multiple NIfTI files which do have different
  scaling/intercept fields values implying necessary and different
  scaling.

  Absent correct scaling of data could effect the analysis if either
  multiple volumes were provided to the NiftiDataset constructor, or
  multiple datasets were concatenated while being loaded from files
  with different scaling parameters.  In the later case, "incorrect
  scaling" could have no impact if loaded files had different
  scaling/intercept but had just 1 file per chunk, and per-chunk
  zscoring was carried out to preprocess the data (thus different
  scaling and offsets among chunks were eliminated).

U2 rely on post-analysis of spatial maps (e.g. accuracies or
  sensitivities) obtained with PyMVPA and saved with map2nifti
  function to be processed elsewhere, IF AND ONLY IF original data
  came from the files with scaling/intercept field set to something
  else than 1 or 0 for scaling and 0 for intercept in case of non-0
  scaling.

  Then data was saved to Nifti files without any scaling applied but
  with the original scaling/intercept parameters coming from the
  original dataset/NIfTI file.  Consecutively they could have been
  applied by the software used to load those NIfTI files (with
  incoherent scaling/intercept) for post-analysis, thus possibly
  providing incorrect results.

What could have been "the effect" (once again, only if originally your
files had scl_slope/inter defined)

U1 Not accounting for scale and offset could be considered as 'volume
   normalization' if scaling/intercept were consistently computed by
   the original software based on the dynamic range of the data.
   That in turn affects all the voxels in the volume, possibly
   introducing or ruining the effect (depending on the effect and the
   analysis) in every voxel taken separately.

   If you did not care about per-feature/voxel effects and just did
   full brain classification and

   * haven't got results before -- now you might get them
   * have got classification results -- you might loose them with
     scaling of the data upon load but should get them back by
     normalizing each volume into the same range of values (e.g. from
     -1 to 1)

   if you cared about per-feature/voxel sensitivities or searchlight
   accuracies, or loaded masked ROI -- results (if you got them) might
   be not quite correct, since they could have been affected by the
   values of the other, possibly not from even the masked ROI, voxels.
   Those analyses would need to be recomputed using PyMVPA 0.4.7 (or
   0.6.x), or after scaling data within the files so scaling should
   not be applied while loading data in PyMVPA.

U2 If it was full-brain analysis, like in the case of Nynke van der
   Laan, the effects were global and could be spotted easily (unless
   initially data was masked to the ROI already known to carry the
   effects of interest).  Otherwise, you could get consistent
   bias/scaling in your ROI without detecting that it is invalid.


0.4.7 release
=============

of PyMVPA addresses both issues:

U1 if NIfTI header specifies scaling -- it gets applied to the data by
   default upon loading (unless new argument `scale_data` to the
   constructor is set explicitly to False)

U2 upon map2nifti, both scl_scale and scl_inter are reset to 1.0 and
   0.0 accordingly to reflect not-needed scaling for the provided data.

0.4.7 should be available from http://neuro.debian.net for Debian-based
systems (thus apt-get update; apt-get install python-mvpa),
Windows installer and source tarball available from
https://alioth.debian.org/frs/?group_id=30954 and GIT repositories were
updated and tagged.

How to check the files
======================

We hope that not so many of you actually got effected with this issue
but we advise you to verify your data -- just inspect the headers of
your files for set scl_* fields:

* if scl_scale == 0.0 -- you are ok
  since scaling/offset should be applied only if scl_inter != 0
* if scl_scale == 1.0 and scl_inter == 0.0 -- you are ok

Otherwise, it depends on either those scale/intercepts were the same
across all files, and depends on your analysis and how you
load/preprocess the data (see above).

To check the headers of your NIfTI files you can use any available
tool (e.g. fslhd from FSL, or nifti_tool from NIfTI toolset), or just
Python with PyNifti. e.g. from command line to visualize them in
/tmp/1.nii.gz

 python -c "import sys, nifti as n; h=n.NiftiImage(sys.argv[1]).header; print h['scl_slope'], h['scl_inter']" /tmp/1.nii.gz

and to convert a file with scaling parameters into the scaled
version with scl_slope and scl_inter set to 1.0 and 0.0, e.g.:

 python -c "import sys, nifti as n; ni=n.NiftiImage(sys.argv[1]); h=ni.header; ni.data = ni.data; h['scl_slope'] = 1.0; h['scl_inter'] = 0; ni.updateFromDict(h); ni.save(sys.argv[2]);" ./EEGtemplates/iskull.nii /tmp/iskull.nii

[1] http://nifti.nimh.nih.gov/nifti-1/documentation/Analyze_usage.pdf

With best regards,
-- 
PyMVPA Team
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20110308/7963f7b9/attachment-0001.pgp>


More information about the Pkg-ExpPsy-PyMVPA mailing list