[pymvpa] ANNOUNCEMENT: bugfix 0.4.7 release -- data scaling issue
Yaroslav Halchenko
debian at onerussian.com
Tue Mar 8 20:58:35 UTC 2011
Dear PyMVPA-ers,
Instead of sticking our heads into the sand and pretending that there
is nothing to worry about, we have decided to announce the
availability of 0.4.7 release potentially fixing a critical issue for
SOME of our users who got affected by it.
This issue was revealed while recently troubleshooting "suspicious
results" Nynke van der Laan has posted to the list (so thank you Nynke
for raising the concern!). We think/hope that this issue is not
affecting the majority of PyMVPA users, but only few. In our
experience (we dealt with files obtained from SPM, AFNI, FSL), scaling
header fields were not set to anything but default 1.0 for scale and
0.0 for intercept (or 0.0 for scale altogether).
Depending on the software you used to get the volumes, here is our
take on probability of you being effected:
BrainVoyager:
if Analyze files, those according to [1] seem to not carry
scaling parameters. But according to the report from Roberto
Guidotti they might, thus -- medium probability
FreeSurfer: on files we have, none had != 1.0/0.0 -- low probability
FSL: most probably scaling/intercept == 1.0/0.0 -- low probability
NiBabel: if you have managed to preprocess data while using nibabel to
load/save - you are nearly guaranteed to have scaling/intercept
set -- high probability
SPM: you might have scaling/intercept != 1.0/0.0 -- medium probability
N.B. in Analyze files, it is vendor/software-specific treatment [1] of
scaling, so we have not cared about those, and hope that no-one
uses Analyze nowadays.
But even if you had those fields set implying necessary scaling, if
they were the same per each chunk (run) and you have used zscoring to
process your data -- they should be of no effect since differences get
removed by zscoring. And even if you had scaling different across
volumes, most probably those were quite consistent and not reflecting
the effect of interest, thus at worst you had got no results before
while could obtain it now with proper scaling.
Since nibabel (used in 0.6 series) unconditionally applies data
transformation parameters (scaling and intercept) if they are defined,
users of PyMVPA 0.6 series are safe regardless either there is scaling
and can ignore this email -- their data got properly scaled on
load/save operations.
Quick work-around
=================
If you are affected and want to avoid upgrade to 0.4.7 (or most recent
head of maint/0.5 for 0.5 series you might be using):
As a quick resolution we advise you to convert your files using
your favorite tool (or a little Python script below) into NIfTI
files with no needed scaling/intercept, prior loading them into
PyMVPA for processing.
If you had the same scaling/intercept across files but you were
saving resultant maps using map2nifti -- you could simply reset
those fields in the original "Nifti" dataset's niftihdr, e.g.
ds.niftihdr['scl_slope'] = 1.0; ds.niftihdr['scl_inter'] = 0.;
prior calling map2nifti which would use that dataset's niftihdr for
mapping data back to NIfTI
Long saga
=========
Who is effected
---------------
"Scaling" issue is relevant for those who use PyNifti (0.4.x series
and 0.5, whenever nibabel is not installed/used) to access
NIfTI/Analyze files, and who was loading/saving them in following two
possible use cases:
U1 load data from multiple NIfTI files which do have different
scaling/intercept fields values implying necessary and different
scaling.
Absent correct scaling of data could effect the analysis if either
multiple volumes were provided to the NiftiDataset constructor, or
multiple datasets were concatenated while being loaded from files
with different scaling parameters. In the later case, "incorrect
scaling" could have no impact if loaded files had different
scaling/intercept but had just 1 file per chunk, and per-chunk
zscoring was carried out to preprocess the data (thus different
scaling and offsets among chunks were eliminated).
U2 rely on post-analysis of spatial maps (e.g. accuracies or
sensitivities) obtained with PyMVPA and saved with map2nifti
function to be processed elsewhere, IF AND ONLY IF original data
came from the files with scaling/intercept field set to something
else than 1 or 0 for scaling and 0 for intercept in case of non-0
scaling.
Then data was saved to Nifti files without any scaling applied but
with the original scaling/intercept parameters coming from the
original dataset/NIfTI file. Consecutively they could have been
applied by the software used to load those NIfTI files (with
incoherent scaling/intercept) for post-analysis, thus possibly
providing incorrect results.
What could have been "the effect" (once again, only if originally your
files had scl_slope/inter defined)
U1 Not accounting for scale and offset could be considered as 'volume
normalization' if scaling/intercept were consistently computed by
the original software based on the dynamic range of the data.
That in turn affects all the voxels in the volume, possibly
introducing or ruining the effect (depending on the effect and the
analysis) in every voxel taken separately.
If you did not care about per-feature/voxel effects and just did
full brain classification and
* haven't got results before -- now you might get them
* have got classification results -- you might loose them with
scaling of the data upon load but should get them back by
normalizing each volume into the same range of values (e.g. from
-1 to 1)
if you cared about per-feature/voxel sensitivities or searchlight
accuracies, or loaded masked ROI -- results (if you got them) might
be not quite correct, since they could have been affected by the
values of the other, possibly not from even the masked ROI, voxels.
Those analyses would need to be recomputed using PyMVPA 0.4.7 (or
0.6.x), or after scaling data within the files so scaling should
not be applied while loading data in PyMVPA.
U2 If it was full-brain analysis, like in the case of Nynke van der
Laan, the effects were global and could be spotted easily (unless
initially data was masked to the ROI already known to carry the
effects of interest). Otherwise, you could get consistent
bias/scaling in your ROI without detecting that it is invalid.
0.4.7 release
=============
of PyMVPA addresses both issues:
U1 if NIfTI header specifies scaling -- it gets applied to the data by
default upon loading (unless new argument `scale_data` to the
constructor is set explicitly to False)
U2 upon map2nifti, both scl_scale and scl_inter are reset to 1.0 and
0.0 accordingly to reflect not-needed scaling for the provided data.
0.4.7 should be available from http://neuro.debian.net for Debian-based
systems (thus apt-get update; apt-get install python-mvpa),
Windows installer and source tarball available from
https://alioth.debian.org/frs/?group_id=30954 and GIT repositories were
updated and tagged.
How to check the files
======================
We hope that not so many of you actually got effected with this issue
but we advise you to verify your data -- just inspect the headers of
your files for set scl_* fields:
* if scl_scale == 0.0 -- you are ok
since scaling/offset should be applied only if scl_inter != 0
* if scl_scale == 1.0 and scl_inter == 0.0 -- you are ok
Otherwise, it depends on either those scale/intercepts were the same
across all files, and depends on your analysis and how you
load/preprocess the data (see above).
To check the headers of your NIfTI files you can use any available
tool (e.g. fslhd from FSL, or nifti_tool from NIfTI toolset), or just
Python with PyNifti. e.g. from command line to visualize them in
/tmp/1.nii.gz
python -c "import sys, nifti as n; h=n.NiftiImage(sys.argv[1]).header; print h['scl_slope'], h['scl_inter']" /tmp/1.nii.gz
and to convert a file with scaling parameters into the scaled
version with scl_slope and scl_inter set to 1.0 and 0.0, e.g.:
python -c "import sys, nifti as n; ni=n.NiftiImage(sys.argv[1]); h=ni.header; ni.data = ni.data; h['scl_slope'] = 1.0; h['scl_inter'] = 0; ni.updateFromDict(h); ni.save(sys.argv[2]);" ./EEGtemplates/iskull.nii /tmp/iskull.nii
[1] http://nifti.nimh.nih.gov/nifti-1/documentation/Analyze_usage.pdf
With best regards,
--
PyMVPA Team
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20110308/7963f7b9/attachment-0001.pgp>
More information about the Pkg-ExpPsy-PyMVPA
mailing list