[med-svn] [Git][med-team/heudiconv][upstream] New upstream version 0.8.0
Yaroslav Halchenko
gitlab at salsa.debian.org
Mon Apr 20 15:45:18 BST 2020
Yaroslav Halchenko pushed to branch upstream at Debian Med / heudiconv
Commits:
255bc455 by Yaroslav Halchenko at 2020-04-20T08:57:30-04:00
New upstream version 0.8.0
- - - - -
27 changed files:
- .travis.yml
- CHANGELOG.md
- + Makefile
- README.rst
- docs/conf.py
- docs/heuristics.rst
- docs/installation.rst
- docs/tutorials.rst
- docs/usage.rst
- heudiconv/bids.py
- heudiconv/cli/run.py
- heudiconv/convert.py
- heudiconv/dicoms.py
- heudiconv/external/dlad.py
- heudiconv/external/tests/test_dlad.py
- heudiconv/heuristics/reproin.py
- heudiconv/heuristics/reproin_validator.cfg
- heudiconv/heuristics/test_reproin.py
- heudiconv/info.py
- heudiconv/parser.py
- + heudiconv/tests/data/phantom.dcm
- heudiconv/tests/test_dicoms.py
- heudiconv/tests/test_main.py
- heudiconv/tests/test_regression.py
- heudiconv/tests/utils.py
- heudiconv/utils.py
- + utils/prep_release
Changes:
=====================================
.travis.yml
=====================================
@@ -1,10 +1,10 @@
# vim ft=yaml
language: python
python:
- - 2.7
- 3.5
- 3.6
- 3.7
+ - 3.8
cache:
- apt
=====================================
CHANGELOG.md
=====================================
@@ -4,6 +4,62 @@ All notable changes to this project will be documented (for humans) in this file
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
+## [0.8.0] - 2020-04-15
+
+### Enhancements
+
+- Centralized saving of .json files. Indentation of some files could
+ change now from previous versions where it could have used `3`
+ spaces. Now indentation should be consistently `2` for .json files
+ we produce/modify ([#436][]) (note: dcm2niix uses tabs for indentation)
+- ReproIn heuristic: support SBRef and phase data ([#387][])
+- Set the "TaskName" field in .json sidecar files for multi-echo data
+ ([#420][])
+- Provide an informative exception if command needs heuristic to be
+ specified ([#437][])
+
+### Refactored
+
+- `embed_nifti` was refactored into `embed_dicom_and_nifti_metadata`
+ which would no longer create `.nii` file if it does not exist
+ already ([#432][])
+
+### Fixed
+
+- Skip datalad-based tests if no datalad available ([#430][])
+- Search heuristic file path first so we do not pick up a python
+ module if name conflicts ([#434][])
+
+## [0.7.0] - 2020-03-20
+
+### Removed
+
+- Python 2 support/testing
+
+### Enhancement
+
+- `-g` option obtained two new modes: `all` and `custom`. In case of `all`,
+ all provided DICOMs will be treated as coming from a single scanning session.
+ `custom` instructs to use `.grouping` value (could be a DICOM attribute or
+ a callable)provided by the heuristic ([#359][]).
+- Stop before reading pixels data while gathering metadata from DICOMs ([#404][])
+- reproin heuristic:
+ - In addition to original "md5sum of the study_description" `protocols2fix`
+ could now have (and applied after md5sum matching ones)
+ 1). a regular expression searched in study_description,
+ 2). an empty string as "catch all".
+ This features could be used to easily provide remapping into reproin
+ naming (documentation is to come to http://github.com/ReproNim/reproin)
+ ([#425][])
+
+### Fixed
+
+- Use nan, not None for absent echo value in sorting
+- reproin heuristic: case seqinfos into a list to be able to modify from
+ overloaded heuristic ([#419][])
+- No spurious errors from the logger upon a warning about `etelemetry`
+ absence ([#407][])
+
## [0.6.0] - 2019-12-16
This is largely a bug fix. Metadata and order of `_key-value` fields in BIDS
@@ -271,6 +327,7 @@ TODO Summary
[#348]: https://github.com/nipy/heudiconv/issues/348
[#351]: https://github.com/nipy/heudiconv/issues/351
[#352]: https://github.com/nipy/heudiconv/issues/352
+[#359]: https://github.com/nipy/heudiconv/issues/359
[#360]: https://github.com/nipy/heudiconv/issues/360
[#364]: https://github.com/nipy/heudiconv/issues/364
[#369]: https://github.com/nipy/heudiconv/issues/369
@@ -279,4 +336,15 @@ TODO Summary
[#376]: https://github.com/nipy/heudiconv/issues/376
[#379]: https://github.com/nipy/heudiconv/issues/379
[#380]: https://github.com/nipy/heudiconv/issues/380
+[#387]: https://github.com/nipy/heudiconv/issues/387
[#390]: https://github.com/nipy/heudiconv/issues/390
+[#404]: https://github.com/nipy/heudiconv/issues/404
+[#407]: https://github.com/nipy/heudiconv/issues/407
+[#419]: https://github.com/nipy/heudiconv/issues/419
+[#420]: https://github.com/nipy/heudiconv/issues/420
+[#425]: https://github.com/nipy/heudiconv/issues/425
+[#430]: https://github.com/nipy/heudiconv/issues/430
+[#432]: https://github.com/nipy/heudiconv/issues/432
+[#434]: https://github.com/nipy/heudiconv/issues/434
+[#436]: https://github.com/nipy/heudiconv/issues/436
+[#437]: https://github.com/nipy/heudiconv/issues/437
=====================================
Makefile
=====================================
@@ -0,0 +1,6 @@
+all:
+ echo 'nothing by default'
+
+prep_release:
+ # take previous one, and replace with the next one
+ utils/prep_release
=====================================
README.rst
=====================================
@@ -4,7 +4,7 @@
`a heuristic-centric DICOM converter`
-.. image:: https://img.shields.io/badge/docker-nipy/heudiconv:0.5.4-brightgreen.svg?logo=docker&style=flat
+.. image:: https://img.shields.io/badge/docker-nipy/heudiconv:latest-brightgreen.svg?logo=docker&style=flat
:target: https://hub.docker.com/r/nipy/heudiconv/tags/
:alt: Our Docker image
=====================================
docs/conf.py
=====================================
@@ -26,7 +26,7 @@ author = 'Heudiconv team'
# The short X.Y version
version = ''
# The full version, including alpha/beta/rc tags
-release = '0.6.0'
+release = '0.8.0'
# -- General configuration ---------------------------------------------------
=====================================
docs/heuristics.rst
=====================================
@@ -68,3 +68,20 @@ DICOMs where this function returns ``True`` will be filtered out.
Further processing on ``seqinfos`` to deduce/customize subject, session, and locator.
A dictionary of {"locator": locator, "session": session, "subject": subject} is returned.
+
+---------------------------------------------------------------
+``grouping`` string or ``grouping(files, dcmfilter, seqinfo)``
+---------------------------------------------------------------
+
+Whenever ``--grouping custom`` (``-g custom``) is used, this attribute or callable
+will be used to inform how to group the DICOMs into separate groups. From
+`original PR#359 <https://github.com/nipy/heudiconv/pull/359>`_::
+
+ grouping = 'AcquisitionDate'
+
+or::
+
+ def grouping(files, dcmfilter, seqinfo):
+ seqinfos = collections.OrderedDict()
+ ...
+ return seqinfos # ordered dict containing seqinfo objects: list of DICOMs
\ No newline at end of file
=====================================
docs/installation.rst
=====================================
@@ -26,7 +26,7 @@ If `Docker <https://docs.docker.com/install/>`_ is available on your system, you
can visit `our page on Docker Hub <https://hub.docker.com/r/nipy/heudiconv/tags>`_
to view available releases. To pull the latest release, run::
- $ docker pull nipy/heudiconv:0.6.0
+ $ docker pull nipy/heudiconv:0.8.0
Singularity
@@ -35,4 +35,4 @@ If `Singularity <https://www.sylabs.io/singularity/>`_ is available on your syst
you can use it to pull and convert our Docker images! For example, to pull and
build the latest release, you can run::
- $ singularity pull docker://nipy/heudiconv:0.6.0
+ $ singularity pull docker://nipy/heudiconv:0.8.0
=====================================
docs/tutorials.rst
=====================================
@@ -7,7 +7,7 @@ other users' tutorials covering their experience with ``heudiconv``.
- `YouTube tutorial <https://www.youtube.com/watch?v=O1kZAuR7E00>`_ by `James Kent <https://github.com/jdkent>`_.
-- `Walkthrough <http://reproducibility.stanford.edu/bids-tutorial-series-part-2a/>`_ by the `Standard Center for Reproducible Neuroscience <http://reproducibility.stanford.edu/>`_.
+- `Walkthrough <http://reproducibility.stanford.edu/bids-tutorial-series-part-2a/>`_ by the `Stanford Center for Reproducible Neuroscience <http://reproducibility.stanford.edu/>`_.
- `U of A Neuroimaging Core <https://neuroimaging-core-docs.readthedocs.io/en/latest/pages/heudiconv.html>`_ by `Dianne Patterson <https://github.com/dkp>`_.
=====================================
docs/usage.rst
=====================================
@@ -82,7 +82,7 @@ The second script processes a DICOM directory with ``heudiconv`` using the built
DCMDIR=${DCMDIRS[${SLURM_ARRAY_TASK_ID}]}
echo Submitted directory: ${DCMDIR}
- IMG="/singularity-images/heudiconv-0.6.0-dev.sif"
+ IMG="/singularity-images/heudiconv-0.8.0-dev.sif"
CMD="singularity run -B ${DCMDIR}:/dicoms:ro -B ${OUTDIR}:/output -e ${IMG} --files /dicoms/ -o /output -f reproin -c dcm2niix -b notop --minmeta -l ."
printf "Command:\n${CMD}\n"
@@ -97,7 +97,7 @@ This script creates the top-level bids files (e.g.,
set -eu
OUTDIR=${1}
- IMG="/singularity-images/heudiconv-0.6.0-dev.sif"
+ IMG="/singularity-images/heudiconv-0.8.0-dev.sif"
CMD="singularity run -B ${OUTDIR}:/output -e ${IMG} --files /output -f reproin --command populate-templates"
printf "Command:\n${CMD}\n"
=====================================
heudiconv/bids.py
=====================================
@@ -171,7 +171,7 @@ def populate_aggregated_jsons(path):
act = "Generating"
lgr.debug("%s %s", act, task_file)
fields.update(placeholders)
- save_json(task_file, fields, indent=2, sort_keys=True, pretty=True)
+ save_json(task_file, fields, sort_keys=True, pretty=True)
def tuneup_bids_json_files(json_files):
@@ -193,7 +193,7 @@ def tuneup_bids_json_files(json_files):
# Let's hope no word 'Date' comes within a study name or smth like
# that
raise ValueError("There must be no dates in .json sidecar")
- save_json(jsonfile, json_, indent=2)
+ save_json(jsonfile, json_)
# Load the beast
seqtype = op.basename(op.dirname(jsonfile))
@@ -223,7 +223,7 @@ def tuneup_bids_json_files(json_files):
was_readonly = is_readonly(json_phasediffname)
if was_readonly:
set_readonly(json_phasediffname, False)
- save_json(json_phasediffname, json_, indent=2)
+ save_json(json_phasediffname, json_)
if was_readonly:
set_readonly(json_phasediffname)
@@ -259,8 +259,7 @@ def add_participant_record(studydir, subject, age, sex):
("Description", "(TODO: adjust - by default everyone is in "
"control group)")])),
]),
- sort_keys=False,
- indent=2)
+ sort_keys=False)
# Add a new participant
with open(participants_tsv, 'a') as f:
f.write(
@@ -373,8 +372,7 @@ def add_rows_to_scans_keys_file(fn, newrows):
("LongName", "Random string"),
("Description", "md5 hash of UIDs")])),
]),
- sort_keys=False,
- indent=2)
+ sort_keys=False)
header = ['filename', 'acq_time', 'operator', 'randstr']
# prepare all the data rows
=====================================
heudiconv/cli/run.py
=====================================
@@ -62,6 +62,7 @@ def process_extra_commands(outdir, args):
for f in args.files:
treat_infofile(f)
elif args.command == 'ls':
+ ensure_heuristic_arg(args)
heuristic = load_heuristic(args.heuristic)
heuristic_ls = getattr(heuristic, 'ls', None)
for f in args.files:
@@ -78,6 +79,7 @@ def process_extra_commands(outdir, args):
% (str(study_session), len(sequences), suf)
)
elif args.command == 'populate-templates':
+ ensure_heuristic_arg(args)
heuristic = load_heuristic(args.heuristic)
for f in args.files:
populate_bids_templates(f, getattr(heuristic, 'DEFAULT_FIELDS', {}))
@@ -88,16 +90,21 @@ def process_extra_commands(outdir, args):
for name_desc in get_known_heuristics_with_descriptions().items():
print("- %s: %s" % name_desc)
elif args.command == 'heuristic-info':
- from ..utils import get_heuristic_description, get_known_heuristic_names
- if not args.heuristic:
- raise ValueError("Specify heuristic using -f. Known are: %s"
- % ', '.join(get_known_heuristic_names()))
+ ensure_heuristic_arg(args)
+ from ..utils import get_heuristic_description
print(get_heuristic_description(args.heuristic, full=True))
else:
raise ValueError("Unknown command %s", args.command)
return
+def ensure_heuristic_arg(args):
+ from ..utils import get_known_heuristic_names
+ if not args.heuristic:
+ raise ValueError("Specify heuristic using -f. Known are: %s"
+ % ', '.join(get_known_heuristic_names()))
+
+
def main(argv=None):
parser = get_parser()
args = parser.parse_args(argv)
@@ -124,7 +131,6 @@ def main(argv=None):
if args.debug:
setup_exceptionhook()
-
process_args(args)
@@ -154,8 +160,7 @@ def get_parser():
'If not provided, DICOMS would first be "sorted" and '
'subject IDs deduced by the heuristic')
parser.add_argument('-c', '--converter',
- default='dcm2niix',
- choices=('dcm2niix', 'none'),
+ choices=('dcm2niix', 'none'), default='dcm2niix',
help='tool to use for DICOM conversion. Setting to '
'"none" disables the actual conversion step -- useful'
'for testing heuristics.')
@@ -219,7 +224,7 @@ def get_parser():
help='custom actions to be performed on provided '
'files instead of regular operation.')
parser.add_argument('-g', '--grouping', default='studyUID',
- choices=('studyUID', 'accession_number'),
+ choices=('studyUID', 'accession_number', 'all', 'custom'),
help='How to group dicoms (default: by studyUID)')
parser.add_argument('--minmeta', action='store_true',
help='Exclude dcmstack meta information in sidecar '
@@ -248,11 +253,11 @@ def process_args(args):
outdir = op.abspath(args.outdir)
- import etelemetry
try:
+ import etelemetry
latest = etelemetry.get_project("nipy/heudiconv")
except Exception as e:
- lgr.warning("Could not check for version updates: ", e)
+ lgr.warning("Could not check for version updates: %s", str(e))
latest = {"version": 'Unknown'}
lgr.info(INIT_MSG(packname=__packagename__,
@@ -343,7 +348,8 @@ def process_args(args):
seqinfo=seqinfo,
min_meta=args.minmeta,
overwrite=args.overwrite,
- dcmconfig=args.dcmconfig,)
+ dcmconfig=args.dcmconfig,
+ grouping=args.grouping,)
lgr.info("PROCESSING DONE: {0}".format(
str(dict(subject=sid, outdir=study_outdir, session=session))))
=====================================
heudiconv/convert.py
=====================================
@@ -2,8 +2,10 @@ import filelock
import os
import os.path as op
import logging
+from math import nan
import shutil
import sys
+import re
from .utils import (
read_config,
@@ -72,8 +74,7 @@ def conversion_info(subject, outdir, info, filegroup, ses):
try:
files = filegroup[item]
except KeyError:
- PY3 = sys.version_info[0] >= 3
- files = filegroup[(str if PY3 else unicode)(item)]
+ files = filegroup[str(item)]
outprefix = template.format(**parameters)
convert_info.append((op.join(outpath, outprefix),
outtype, files))
@@ -81,8 +82,8 @@ def conversion_info(subject, outdir, info, filegroup, ses):
def prep_conversion(sid, dicoms, outdir, heuristic, converter, anon_sid,
- anon_outdir, with_prov, ses, bids_options, seqinfo, min_meta,
- overwrite, dcmconfig):
+ anon_outdir, with_prov, ses, bids_options, seqinfo,
+ min_meta, overwrite, dcmconfig, grouping):
if dicoms:
lgr.info("Processing %d dicoms", len(dicoms))
elif seqinfo:
@@ -158,16 +159,17 @@ def prep_conversion(sid, dicoms, outdir, heuristic, converter, anon_sid,
# So either it would need to be brought back or reconsidered altogether
# (since no sample data to test on etc)
else:
- # TODO -- might have been done outside already!
- # MG -- will have to try with both dicom template, files
assure_no_file_exists(target_heuristic_filename)
safe_copyfile(heuristic.filename, target_heuristic_filename)
if dicoms:
seqinfo = group_dicoms_into_seqinfos(
dicoms,
+ grouping,
file_filter=getattr(heuristic, 'filter_files', None),
dcmfilter=getattr(heuristic, 'filter_dicom', None),
- grouping=None)
+ flatten=True,
+ custom_grouping=getattr(heuristic, 'grouping', None))
+
seqinfo_list = list(seqinfo.keys())
filegroup = {si.series_id: x for si, x in seqinfo.items()}
dicominfo_file = op.join(idir, 'dicominfo%s.tsv' % ses_suffix)
@@ -322,16 +324,18 @@ def convert(items, converter, scaninfo_suffix, custom_callable, with_prov,
% (outname)
)
+ # add the taskname field to the json file(s):
+ add_taskname_to_infofile(bids_outfiles)
+
if len(bids_outfiles) > 1:
lgr.warning("For now not embedding BIDS and info generated "
".nii.gz itself since sequence produced "
"multiple files")
elif not bids_outfiles:
lgr.debug("No BIDS files were produced, nothing to embed to then")
- elif outname:
+ elif outname and not min_meta:
embed_metadata_from_dicoms(bids_options, item_dicoms, outname, outname_bids,
- prov_file, scaninfo, tempdirs, with_prov,
- min_meta)
+ prov_file, scaninfo, tempdirs, with_prov)
if scaninfo and op.exists(scaninfo):
lgr.info("Post-treating %s file", scaninfo)
treat_infofile(scaninfo)
@@ -517,6 +521,8 @@ def save_converted_files(res, item_dicoms, bids_options, outtype, prefix, outnam
bids_files = (sorted(res.outputs.bids)
if len(res.outputs.bids) == len(res_files)
else [None] * len(res_files))
+ # preload since will be used in multiple spots
+ bids_metas = [load_json(b) for b in bids_files if b]
### Do we have a multi-echo series? ###
# Some Siemens sequences (e.g. CMRR's MB-EPI) set the label 'TE1',
@@ -530,19 +536,17 @@ def save_converted_files(res, item_dicoms, bids_options, outtype, prefix, outnam
# Check for varying echo times
echo_times = sorted(list(set(
- load_json(b).get('EchoTime', None)
- for b in bids_files
+ b.get('EchoTime', nan)
+ for b in bids_metas
if b
)))
is_multiecho = len(echo_times) > 1
### Loop through the bids_files, set the output name and save files
- for fl, suffix, bids_file in zip(res_files, suffixes, bids_files):
+ for fl, suffix, bids_file, bids_meta in zip(res_files, suffixes, bids_files, bids_metas):
# TODO: monitor conversion duration
- if bids_file:
- fileinfo = load_json(bids_file)
# set the prefix basename for this specific file (we'll modify it,
# and we don't want to modify it for all the bids_files):
@@ -551,11 +555,18 @@ def save_converted_files(res, item_dicoms, bids_options, outtype, prefix, outnam
# _sbref sequences reconstructing magnitude and phase generate
# two NIfTI files IN THE SAME SERIES, so we cannot just add
# the suffix, if we want to be bids compliant:
- if bids_file and this_prefix_basename.endswith('_sbref'):
+ if bids_meta and this_prefix_basename.endswith('_sbref') \
+ and len(suffixes) > len(echo_times):
+ if len(suffixes) != len(echo_times)*2:
+ lgr.warning(
+ "Got %d suffixes for %d echo times, which isn't "
+ "multiple of two as if it was magnitude + phase pairs",
+ len(suffixes), len(echo_times)
+ )
# Check to see if it is magnitude or phase reconstruction:
- if 'M' in fileinfo.get('ImageType'):
+ if 'M' in bids_meta.get('ImageType'):
mag_or_phase = 'magnitude'
- elif 'P' in fileinfo.get('ImageType'):
+ elif 'P' in bids_meta.get('ImageType'):
mag_or_phase = 'phase'
else:
mag_or_phase = suffix
@@ -584,12 +595,12 @@ def save_converted_files(res, item_dicoms, bids_options, outtype, prefix, outnam
# (Note: it can be _sbref and multiecho, so don't use "elif"):
# For multi-echo sequences, we have to specify the echo number in
# the file name:
- if bids_file and is_multiecho:
+ if bids_meta and is_multiecho:
# Get the EchoNumber from json file info. If not present, use EchoTime
- if 'EchoNumber' in fileinfo.keys():
- echo_number = fileinfo['EchoNumber']
+ if 'EchoNumber' in bids_meta:
+ echo_number = bids_meta['EchoNumber']
else:
- echo_number = echo_times.index(fileinfo['EchoTime']) + 1
+ echo_number = echo_times.index(bids_meta['EchoTime']) + 1
supported_multiecho = ['_bold', '_phase', '_epi', '_sbref', '_T1w', '_PDT2']
# Now, decide where to insert it.
@@ -629,3 +640,32 @@ def save_converted_files(res, item_dicoms, bids_options, outtype, prefix, outnam
except TypeError as exc: ##catch lists
raise TypeError("Multiple BIDS sidecars detected.")
return bids_outfiles
+
+
+def add_taskname_to_infofile(infofiles):
+ """Add the "TaskName" field to json files corresponding to func images.
+
+ Parameters
+ ----------
+ infofiles : list with json filenames or single filename
+
+ Returns
+ -------
+ """
+
+ # in case they pass a string with a path:
+ if not isinstance(infofiles, list):
+ infofiles = [infofiles]
+
+ for infofile in infofiles:
+ meta_info = load_json(infofile)
+ try:
+ meta_info['TaskName'] = (re.search('(?<=_task-)\w+',
+ op.basename(infofile))
+ .group(0).split('_')[0])
+ except AttributeError:
+ lgr.warning("Failed to find task field in {0}.".format(infofile))
+ continue
+
+ # write to outfile
+ save_json(infofile, meta_info)
=====================================
heudiconv/dicoms.py
=====================================
@@ -6,25 +6,162 @@ from collections import OrderedDict
import tarfile
from .external.pydicom import dcm
-from .utils import SeqInfo, load_json, set_readonly
+from .utils import (
+ get_typed_attr,
+ load_json,
+ save_json,
+ SeqInfo,
+ set_readonly,
+)
+
+import warnings
+with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ # suppress warning
+ import nibabel.nicom.dicomwrappers as dw
lgr = logging.getLogger(__name__)
+total_files = 0
-def group_dicoms_into_seqinfos(files, file_filter, dcmfilter, grouping):
+
+def create_seqinfo(mw, series_files, series_id):
+ """Generate sequence info
+
+ Parameters
+ ----------
+ mw: MosaicWrapper
+ series_files: list
+ series_id: str
+ """
+ dcminfo = mw.dcm_data
+ accession_number = dcminfo.get('AccessionNumber')
+
+ # TODO: do not group echoes by default
+ size = list(mw.image_shape) + [len(series_files)]
+ if len(size) < 4:
+ size.append(1)
+
+ # parse DICOM for seqinfo fields
+ TR = get_typed_attr(dcminfo, "RepetitionTime", float, -1000) / 1000
+ TE = get_typed_attr(dcminfo, "EchoTime", float, -1)
+ refphys = get_typed_attr(dcminfo, "ReferringPhysicianName", str, "")
+ image_type = get_typed_attr(dcminfo, "ImageType", tuple, ())
+ is_moco = 'MOCO' in image_type
+ series_desc = get_typed_attr(dcminfo, "SeriesDescription", str, "")
+
+ if dcminfo.get([0x18, 0x24]):
+ # GE and Philips
+ sequence_name = dcminfo[0x18, 0x24].value
+ elif dcminfo.get([0x19, 0x109c]):
+ # Siemens
+ sequence_name = dcminfo[0x19, 0x109c].value
+ else:
+ sequence_name = ""
+
+ # initialized in `group_dicoms_to_seqinfos`
+ global total_files
+ total_files += len(series_files)
+
+ seqinfo = SeqInfo(
+ total_files_till_now=total_files,
+ example_dcm_file=op.basename(series_files[0]),
+ series_id=series_id,
+ dcm_dir_name=op.basename(op.dirname(series_files[0])),
+ series_files=len(series_files),
+ unspecified="",
+ dim1=size[0],
+ dim2=size[1],
+ dim3=size[2],
+ dim4=size[3],
+ TR=TR,
+ TE=TE,
+ protocol_name=dcminfo.ProtocolName,
+ is_motion_corrected=is_moco,
+ is_derived='derived' in [x.lower() for x in image_type],
+ patient_id=dcminfo.get('PatientID'),
+ study_description=dcminfo.get('StudyDescription'),
+ referring_physician_name=refphys,
+ series_description=series_desc,
+ sequence_name=sequence_name,
+ image_type=image_type,
+ accession_number=accession_number,
+ # For demographics to populate BIDS participants.tsv
+ patient_age=dcminfo.get('PatientAge'),
+ patient_sex=dcminfo.get('PatientSex'),
+ date=dcminfo.get('AcquisitionDate'),
+ series_uid=dcminfo.get('SeriesInstanceUID')
+ )
+ return seqinfo
+
+
+def validate_dicom(fl, dcmfilter):
+ """
+ Parse DICOM attributes. Returns None if not valid.
+ """
+ mw = dw.wrapper_from_file(fl, force=True, stop_before_pixels=True)
+ # clean series signature
+ for sig in ('iop', 'ICE_Dims', 'SequenceName'):
+ try:
+ del mw.series_signature[sig]
+ except KeyError:
+ pass
+ # Workaround for protocol name in private siemens csa header
+ if not getattr(mw.dcm_data, 'ProtocolName', '').strip():
+ mw.dcm_data.ProtocolName = parse_private_csa_header(
+ mw.dcm_data, 'ProtocolName', 'tProtocolName'
+ ) if mw.is_csa else ''
+ try:
+ series_id = (
+ int(mw.dcm_data.SeriesNumber), mw.dcm_data.ProtocolName
+ )
+ except AttributeError as e:
+ lgr.warning(
+ 'Ignoring %s since not quite a "normal" DICOM: %s', fl, e
+ )
+ return
+ if dcmfilter is not None and dcmfilter(mw.dcm_data):
+ lgr.warning("Ignoring %s because of DICOM filter", fl)
+ return
+ if mw.dcm_data[0x0008, 0x0016].repval in (
+ 'Raw Data Storage',
+ 'GrayscaleSoftcopyPresentationStateStorage'
+ ):
+ return
+ try:
+ file_studyUID = mw.dcm_data.StudyInstanceUID
+ except AttributeError:
+ lgr.info("File {} is missing any StudyInstanceUID".format(fl))
+ file_studyUID = None
+ return mw, series_id, file_studyUID
+
+
+def group_dicoms_into_seqinfos(files, grouping, file_filter=None,
+ dcmfilter=None, flatten=False,
+ custom_grouping=None):
"""Process list of dicoms and return seqinfo and file group
`seqinfo` contains per-sequence extract of fields from DICOMs which
will be later provided into heuristics to decide on filenames
+
Parameters
----------
files : list of str
List of files to consider
+ grouping : {'studyUID', 'accession_number', 'all', 'custom'}
+ How to group DICOMs for conversion. If 'custom', see `custom_grouping`
+ parameter.
file_filter : callable, optional
Applied to each item of filenames. Should return True if file needs to be
kept, False otherwise.
dcmfilter : callable, optional
If called on dcm_data and returns True, it is used to set series_id
- grouping : {'studyUID', 'accession_number', None}, optional
- what to group by: studyUID or accession_number
+ flatten : bool, optional
+ Creates a flattened `seqinfo` with corresponding DICOM files. True when
+ invoked with `dicom_dir_template`.
+ custom_grouping: str or callable, optional
+ grouping key defined within heuristic. Can be a string of a
+ DICOM attribute, or a method that handles more complex groupings.
+
+
Returns
-------
seqinfo : list of list
@@ -33,101 +170,74 @@ def group_dicoms_into_seqinfos(files, file_filter, dcmfilter, grouping):
filegrp : dict
`filegrp` is a dictionary with files groupped per each sequence
"""
- allowed_groupings = ['studyUID', 'accession_number', None]
+ allowed_groupings = ['studyUID', 'accession_number', 'all', 'custom']
if grouping not in allowed_groupings:
raise ValueError('I do not know how to group by {0}'.format(grouping))
per_studyUID = grouping == 'studyUID'
- per_accession_number = grouping == 'accession_number'
+ # per_accession_number = grouping == 'accession_number'
lgr.info("Analyzing %d dicoms", len(files))
groups = [[], []]
mwgroup = []
-
studyUID = None
- # for sanity check that all DICOMs came from the same
- # "study". If not -- what is the use-case? (interrupted acquisition?)
- # and how would then we deal with series numbers
- # which would differ already
+
if file_filter:
nfl_before = len(files)
files = list(filter(file_filter, files))
nfl_after = len(files)
lgr.info('Filtering out {0} dicoms based on their filename'.format(
nfl_before-nfl_after))
- for fidx, filename in enumerate(files):
- import nibabel.nicom.dicomwrappers as dw
- # TODO after getting a regression test check if the same behavior
- # with stop_before_pixels=True
- mw = dw.wrapper_from_data(dcm.read_file(filename, force=True))
-
- for sig in ('iop', 'ICE_Dims', 'SequenceName'):
- try:
- del mw.series_signature[sig]
- except:
- pass
-
- try:
- file_studyUID = mw.dcm_data.StudyInstanceUID
- except AttributeError:
- lgr.info("File {} is missing any StudyInstanceUID".format(filename))
- file_studyUID = None
-
- # Workaround for protocol name in private siemens csa header
- try:
- mw.dcm_data.ProtocolName
- except AttributeError:
- if not getattr(mw.dcm_data, 'ProtocolName', '').strip():
- mw.dcm_data.ProtocolName = parse_private_csa_header(
- mw.dcm_data, 'ProtocolName', 'tProtocolName'
- ) if mw.is_csa else ''
- try:
- series_id = (int(mw.dcm_data.SeriesNumber),
- mw.dcm_data.ProtocolName)
- file_studyUID = mw.dcm_data.StudyInstanceUID
+ if grouping == 'custom':
+ if custom_grouping is None:
+ raise RuntimeError("Custom grouping is not defined in heuristic")
+ if callable(custom_grouping):
+ return custom_grouping(files, dcmfilter, SeqInfo)
+ grouping = custom_grouping
+ study_customgroup = None
+
+ removeidx = []
+ for idx, filename in enumerate(files):
+ mwinfo = validate_dicom(filename, dcmfilter)
+ if mwinfo is None:
+ removeidx.append(idx)
+ continue
+ mw, series_id, file_studyUID = mwinfo
+ if per_studyUID:
+ series_id = series_id + (file_studyUID,)
- if not per_studyUID:
- # verify that we are working with a single study
+ if flatten:
+ if per_studyUID:
if studyUID is None:
studyUID = file_studyUID
- elif not per_accession_number:
- assert studyUID == file_studyUID, (
- "Conflicting study identifiers found [{}, {}].".format(
- studyUID, file_studyUID
- ))
- except AttributeError as exc:
- lgr.warning('Ignoring %s since not quite a "normal" DICOM: %s',
- filename, exc)
- series_id = (-1, 'none')
- file_studyUID = None
-
- if not series_id[0] < 0:
- if dcmfilter is not None and dcmfilter(mw.dcm_data):
- series_id = (-1, mw.dcm_data.ProtocolName)
-
- # filter out unwanted non-image-data DICOMs by assigning
- # a series number < 0 (see test below)
- if not series_id[0] < 0 and mw.dcm_data[0x0008, 0x0016].repval in (
- 'Raw Data Storage',
- 'GrayscaleSoftcopyPresentationStateStorage'):
- series_id = (-1, mw.dcm_data.ProtocolName)
-
- if per_studyUID:
- series_id = series_id + (file_studyUID,)
+ assert studyUID == file_studyUID, (
+ "Conflicting study identifiers found [{}, {}]."
+ .format(studyUID, file_studyUID)
+ )
+ elif custom_grouping:
+ file_customgroup = mw.dcm_data.get(grouping)
+ if study_customgroup is None:
+ study_customgroup = file_customgroup
+ assert study_customgroup == file_customgroup, (
+ "Conflicting {0} found: [{1}, {2}]"
+ .format(grouping, study_customgroup, file_customgroup)
+ )
ingrp = False
+ # check if same series was already converted
for idx in range(len(mwgroup)):
- # same = mw.is_same_series(mwgroup[idx])
if mw.is_same_series(mwgroup[idx]):
- # the same series should have the same study uuid
- assert (mwgroup[idx].dcm_data.get('StudyInstanceUID', None)
- == file_studyUID)
+ if grouping != 'all':
+ assert (
+ mwgroup[idx].dcm_data.get('StudyInstanceUID') == file_studyUID
+ ), "Same series found for multiple different studies"
ingrp = True
- if series_id[0] >= 0:
- series_id = (mwgroup[idx].dcm_data.SeriesNumber,
- mwgroup[idx].dcm_data.ProtocolName)
- if per_studyUID:
- series_id = series_id + (file_studyUID,)
+ series_id = (
+ mwgroup[idx].dcm_data.SeriesNumber,
+ mwgroup[idx].dcm_data.ProtocolName
+ )
+ if per_studyUID:
+ series_id = series_id + (file_studyUID,)
groups[0].append(series_id)
groups[1].append(idx)
@@ -138,135 +248,64 @@ def group_dicoms_into_seqinfos(files, file_filter, dcmfilter, grouping):
group_map = dict(zip(groups[0], groups[1]))
- total = 0
- seqinfo = OrderedDict()
+ if removeidx:
+ # remove non DICOMS from files
+ for idx in sorted(removeidx, reverse=True):
+ del files[idx]
+ seqinfos = OrderedDict()
# for the next line to make any sense the series_id needs to
# be sortable in a way that preserves the series order
for series_id, mwidx in sorted(group_map.items()):
- if series_id[0] < 0:
- # skip our fake series with unwanted files
- continue
mw = mwgroup[mwidx]
- if mw.image_shape is None:
- # this whole thing has now image data (maybe just PSg DICOMs)
- # nothing to see here, just move on
- continue
- dcminfo = mw.dcm_data
- series_files = [files[i] for i, s in enumerate(groups[0])
- if s == series_id]
- # turn the series_id into a human-readable string -- string is needed
- # for JSON storage later on
+ series_files = [files[i] for i, s in enumerate(groups[0]) if s == series_id]
if per_studyUID:
studyUID = series_id[2]
series_id = series_id[:2]
- accession_number = dcminfo.get('AccessionNumber')
-
series_id = '-'.join(map(str, series_id))
+ if mw.image_shape is None:
+ # this whole thing has no image data (maybe just PSg DICOMs)
+ # nothing to see here, just move on
+ continue
+ seqinfo = create_seqinfo(mw, series_files, series_id)
- size = list(mw.image_shape) + [len(series_files)]
- total += size[-1]
- if len(size) < 4:
- size.append(1)
-
- # MG - refactor into util function
- try:
- TR = float(dcminfo.RepetitionTime) / 1000.
- except (AttributeError, ValueError):
- TR = -1
- try:
- TE = float(dcminfo.EchoTime)
- except (AttributeError, ValueError):
- TE = -1
- try:
- refphys = str(dcminfo.ReferringPhysicianName)
- except AttributeError:
- refphys = ''
- try:
- image_type = tuple(dcminfo.ImageType)
- except AttributeError:
- image_type = ''
- try:
- series_desc = dcminfo.SeriesDescription
- except AttributeError:
- series_desc = ''
-
- motion_corrected = 'MOCO' in image_type
-
- if dcminfo.get([0x18,0x24], None):
- # GE and Philips scanners
- sequence_name = dcminfo[0x18,0x24].value
- elif dcminfo.get([0x19, 0x109c], None):
- # Siemens scanners
- sequence_name = dcminfo[0x19, 0x109c].value
- else:
- sequence_name = 'Not found'
-
- info = SeqInfo(
- total,
- op.split(series_files[0])[1],
- series_id,
- op.basename(op.dirname(series_files[0])),
- '-', '-',
- size[0], size[1], size[2], size[3],
- TR, TE,
- dcminfo.ProtocolName,
- motion_corrected,
- 'derived' in [x.lower() for x in dcminfo.get('ImageType', [])],
- dcminfo.get('PatientID'),
- dcminfo.get('StudyDescription'),
- refphys,
- series_desc, # We try to set this further up.
- sequence_name,
- image_type,
- accession_number,
- # For demographics to populate BIDS participants.tsv
- dcminfo.get('PatientAge'),
- dcminfo.get('PatientSex'),
- dcminfo.get('AcquisitionDate'),
- dcminfo.get('SeriesInstanceUID')
- )
- # candidates
- # dcminfo.AccessionNumber
- # len(dcminfo.ReferencedImageSequence)
- # len(dcminfo.SourceImageSequence)
- # FOR demographics
if per_studyUID:
- key = studyUID.split('.')[-1]
- elif per_accession_number:
- key = accession_number
+ key = studyUID
+ elif grouping == 'accession_number':
+ key = mw.dcm_data.get("AccessionNumber")
+ elif grouping == 'all':
+ key = 'all'
+ elif custom_grouping:
+ key = mw.dcm_data.get(custom_grouping)
else:
key = ''
lgr.debug("%30s %30s %27s %27s %5s nref=%-2d nsrc=%-2d %s" % (
key,
- info.series_id,
- series_desc,
- dcminfo.ProtocolName,
- info.is_derived,
- len(dcminfo.get('ReferencedImageSequence', '')),
- len(dcminfo.get('SourceImageSequence', '')),
- info.image_type
+ seqinfo.series_id,
+ seqinfo.series_description,
+ mw.dcm_data.ProtocolName,
+ seqinfo.is_derived,
+ len(mw.dcm_data.get('ReferencedImageSequence', '')),
+ len(mw.dcm_data.get('SourceImageSequence', '')),
+ seqinfo.image_type
))
- if per_studyUID:
- if studyUID not in seqinfo:
- seqinfo[studyUID] = OrderedDict()
- seqinfo[studyUID][info] = series_files
- elif per_accession_number:
- if accession_number not in seqinfo:
- seqinfo[accession_number] = OrderedDict()
- seqinfo[accession_number][info] = series_files
+
+ if not flatten:
+ if key not in seqinfos:
+ seqinfos[key] = OrderedDict()
+ seqinfos[key][seqinfo] = series_files
else:
- seqinfo[info] = series_files
+ seqinfos[seqinfo] = series_files
if per_studyUID:
lgr.info("Generated sequence info for %d studies with %d entries total",
- len(seqinfo), sum(map(len, seqinfo.values())))
- elif per_accession_number:
+ len(seqinfos), sum(map(len, seqinfos.values())))
+ elif grouping == 'accession_number':
lgr.info("Generated sequence info for %d accession numbers with %d "
- "entries total", len(seqinfo), sum(map(len, seqinfo.values())))
+ "entries total", len(seqinfos), sum(map(len, seqinfos.values())))
else:
- lgr.info("Generated sequence info with %d entries", len(seqinfo))
- return seqinfo
+ lgr.info("Generated sequence info with %d entries", len(seqinfos))
+ return seqinfos
def get_dicom_series_time(dicom_list):
@@ -353,14 +392,10 @@ def compress_dicoms(dicom_list, out_prefix, tempdirs, overwrite):
return outtar
-def embed_nifti(dcmfiles, niftifile, infofile, bids_info, min_meta):
- """
-
- If `niftifile` doesn't exist, it gets created out of the `dcmfiles` stack,
- and json representation of its meta_ext is returned (bug since should return
- both niftifile and infofile?)
+def embed_dicom_and_nifti_metadata(dcmfiles, niftifile, infofile, bids_info):
+ """Embed metadata from nifti (affine etc) and dicoms into infofile (json)
- if `niftifile` exists, its affine's orientation information is used while
+ `niftifile` should exist. Its affine's orientation information is used while
establishing new `NiftiImage` out of dicom stack and together with `bids_info`
(if provided) is dumped into json `infofile`
@@ -369,69 +404,52 @@ def embed_nifti(dcmfiles, niftifile, infofile, bids_info, min_meta):
dcmfiles
niftifile
infofile
- bids_info
- min_meta
-
- Returns
- -------
- niftifile, infofile
+ bids_info: dict
+ Additional metadata to be embedded. `infofile` is overwritten if exists,
+ so here you could pass some metadata which would overload (at the first
+ level of the dict structure, no recursive fancy updates) what is obtained
+ from nifti and dicoms
"""
# imports for nipype
import nibabel as nb
- import os
import os.path as op
import json
import re
+ from heudiconv.utils import save_json
+
+ from heudiconv.external.dcmstack import ds
+ stack = ds.parse_and_stack(dcmfiles, force=True).values()
+ if len(stack) > 1:
+ raise ValueError('Found multiple series')
+ # may be odict now - iter to be safe
+ stack = next(iter(stack))
+
+ if not op.exists(niftifile):
+ raise NotImplementedError(
+ "%s does not exist. "
+ "We are not producing new nifti files here any longer. "
+ "Use dcm2niix directly or .convert.nipype_convert helper ."
+ % niftifile
+ )
- if not min_meta:
- from heudiconv.external.dcmstack import ds
- stack = ds.parse_and_stack(dcmfiles, force=True).values()
- if len(stack) > 1:
- raise ValueError('Found multiple series')
- # may be odict now - iter to be safe
- stack = next(iter(stack))
-
- #Create the nifti image using the data array
- if not op.exists(niftifile):
- nifti_image = stack.to_nifti(embed_meta=True)
- nifti_image.to_filename(niftifile)
- return ds.NiftiWrapper(nifti_image).meta_ext.to_json()
-
- orig_nii = nb.load(niftifile)
- aff = orig_nii.affine
- ornt = nb.orientations.io_orientation(aff)
- axcodes = nb.orientations.ornt2axcodes(ornt)
- new_nii = stack.to_nifti(voxel_order=''.join(axcodes), embed_meta=True)
- meta = ds.NiftiWrapper(new_nii).meta_ext.to_json()
-
- meta_info = None if min_meta else json.loads(meta)
+ orig_nii = nb.load(niftifile)
+ aff = orig_nii.affine
+ ornt = nb.orientations.io_orientation(aff)
+ axcodes = nb.orientations.ornt2axcodes(ornt)
+ new_nii = stack.to_nifti(voxel_order=''.join(axcodes), embed_meta=True)
+ meta_info = ds.NiftiWrapper(new_nii).meta_ext.to_json()
+ meta_info = json.loads(meta_info)
if bids_info:
+ meta_info.update(bids_info)
- if min_meta:
- meta_info = bids_info
- else:
- # make nice with python 3 - same behavior?
- meta_info = meta_info.copy()
- meta_info.update(bids_info)
- # meta_info = dict(meta_info.items() + bids_info.items())
- try:
- meta_info['TaskName'] = (re.search('(?<=_task-)\w+',
- op.basename(infofile))
- .group(0).split('_')[0])
- except AttributeError:
- pass
# write to outfile
- with open(infofile, 'wt') as fp:
- json.dump(meta_info, fp, indent=3, sort_keys=True)
-
- return niftifile, infofile
+ save_json(infofile, meta_info)
def embed_metadata_from_dicoms(bids_options, item_dicoms, outname, outname_bids,
- prov_file, scaninfo, tempdirs, with_prov,
- min_meta):
+ prov_file, scaninfo, tempdirs, with_prov):
"""
Enhance sidecar information file with more information from DICOMs
@@ -445,7 +463,6 @@ def embed_metadata_from_dicoms(bids_options, item_dicoms, outname, outname_bids,
scaninfo
tempdirs
with_prov
- min_meta
Returns
-------
@@ -458,14 +475,13 @@ def embed_metadata_from_dicoms(bids_options, item_dicoms, outname, outname_bids,
item_dicoms = list(map(op.abspath, item_dicoms))
embedfunc = Node(Function(input_names=['dcmfiles', 'niftifile', 'infofile',
- 'bids_info', 'min_meta'],
+ 'bids_info',],
output_names=['outfile', 'meta'],
- function=embed_nifti),
+ function=embed_dicom_and_nifti_metadata),
name='embedder')
embedfunc.inputs.dcmfiles = item_dicoms
embedfunc.inputs.niftifile = op.abspath(outname)
embedfunc.inputs.infofile = op.abspath(scaninfo)
- embedfunc.inputs.min_meta = min_meta
embedfunc.inputs.bids_info = load_json(op.abspath(outname_bids)) if (bids_options is not None) else None
embedfunc.base_dir = tmpdir
cwd = os.getcwd()
@@ -520,5 +536,5 @@ def parse_private_csa_header(dcm_data, public_attr, private_attr, default=None):
val = parsedhdr[private_attr].replace(' ', '')
except Exception as e:
lgr.debug("Failed to parse CSA header: %s", str(e))
- val = default if default else ''
+ val = default or ""
return val
=====================================
heudiconv/external/dlad.py
=====================================
@@ -10,7 +10,7 @@ from ..utils import create_file_if_missing
lgr = logging.getLogger(__name__)
-MIN_VERSION = '0.7'
+MIN_VERSION = '0.12.4'
def prepare_datalad(studydir, outdir, sid, session, seqinfo, dicoms, bids):
@@ -34,23 +34,20 @@ def prepare_datalad(studydir, outdir, sid, session, seqinfo, dicoms, bids):
def add_to_datalad(topdir, studydir, msg, bids):
"""Do all necessary preparations (if were not done before) and save
"""
- from datalad.api import create
+ import datalad.api as dl
from datalad.api import Dataset
from datalad.support.annexrepo import AnnexRepo
from datalad.support.external_versions import external_versions
assert external_versions['datalad'] >= MIN_VERSION, (
- "Need datalad >= {}".format(MIN_VERSION)) # add to reqs
+ "Need datalad >= {}".format(MIN_VERSION)) # add to reqs
- create_kwargs = {}
- if external_versions['datalad'] >= '0.10':
- create_kwargs['fake_dates'] = True # fake dates by default
studyrelpath = op.relpath(studydir, topdir)
assert not studyrelpath.startswith(op.pardir) # so we are under
# now we need to test and initiate a DataLad dataset all along the path
curdir_ = topdir
superds = None
- subdirs = [''] + studyrelpath.split(op.sep)
+ subdirs = [''] + [d for d in studyrelpath.split(op.sep) if d != os.curdir]
for isubdir, subdir in enumerate(subdirs):
curdir_ = op.join(curdir_, subdir)
ds = Dataset(curdir_)
@@ -58,12 +55,12 @@ def add_to_datalad(topdir, studydir, msg, bids):
lgr.info("Initiating %s", ds)
# would require annex > 20161018 for correct operation on annex v6
# need to add .gitattributes first anyways
- ds_ = create(curdir_, dataset=superds,
+ ds_ = dl.create(curdir_, dataset=superds,
force=True,
- no_annex=True,
+ # initiate annex only at the bottom repository
+ no_annex=isubdir<(len(subdirs)-1),
+ fake_dates=True,
# shared_access='all',
- annex_version=6,
- **create_kwargs
)
assert ds == ds_
assert ds.is_installed()
@@ -93,17 +90,13 @@ def add_to_datalad(topdir, studydir, msg, bids):
with open(gitattributes_path, 'wb') as f:
f.write('\n'.join(known_attrs).encode('utf-8'))
- # so for mortals it just looks like a regular directory!
- if not ds.config.get('annex.thin'):
- ds.config.add('annex.thin', 'true', where='local')
- # initialize annex there if not yet initialized
- AnnexRepo(ds.path, init=True)
+
# ds might have memories of having ds.repo GitRepo
- superds = None
- del ds
- ds = Dataset(studydir)
+ superds = Dataset(topdir)
+ assert op.realpath(ds.path) == op.realpath(studydir)
+ assert isinstance(ds.repo, AnnexRepo)
# Add doesn't have all the options of save such as msg and supers
- ds.add('.gitattributes', to_git=True, save=False)
+ ds.save(path=['.gitattributes'], message="Custom .gitattributes", to_git=True)
dsh = dsh_path = None
if op.lexists(op.join(ds.path, '.heudiconv')):
dsh_path = op.join(ds.path, '.heudiconv')
@@ -120,7 +113,6 @@ def add_to_datalad(topdir, studydir, msg, bids):
else:
dsh = ds.create(path='.heudiconv',
force=True,
- **create_kwargs
# shared_access='all'
)
# Since .heudiconv could contain sensitive information
@@ -146,7 +138,7 @@ def add_to_datalad(topdir, studydir, msg, bids):
mark_sensitive(ds, '*/*/anat') # within ses/subj
if dsh_path:
mark_sensitive(ds, '.heudiconv') # entire .heudiconv!
- ds.save(message=msg, recursive=True, super_datasets=True)
+ superds.save(path=ds.path, message=msg, recursive=True)
assert not ds.repo.dirty
# TODO: they are still appearing as native annex symlinked beasts
@@ -185,4 +177,4 @@ def mark_sensitive(ds, path_glob):
init=dict([('distribution-restrictions', 'sensitive')]),
recursive=True)
if inspect.isgenerator(res):
- res = list(res)
\ No newline at end of file
+ res = list(res)
=====================================
heudiconv/external/tests/test_dlad.py
=====================================
@@ -1,10 +1,13 @@
from ..dlad import mark_sensitive
-from datalad.api import Dataset
from ...utils import create_tree
+import pytest
+
+dl = pytest.importorskip('datalad.api')
+
def test_mark_sensitive(tmpdir):
- ds = Dataset(str(tmpdir)).create(force=True)
+ ds = dl.Dataset(str(tmpdir)).create(force=True)
create_tree(
str(tmpdir),
{
=====================================
heudiconv/heuristics/reproin.py
=====================================
@@ -126,6 +126,10 @@ from glob import glob
import logging
lgr = logging.getLogger('heudiconv')
+# pythons before 3.7 didn't have re.Pattern, it was some protected
+# _sre.SRE_Pattern, so let's just sample a class of the compiled regex
+re_Pattern = re.compile('.').__class__
+
# Terminology to harmonise and use to name variables etc
# experiment
# subject
@@ -372,14 +376,14 @@ def get_study_hash(seqinfo):
return md5sum(get_study_description(seqinfo))
-def fix_canceled_runs(seqinfo, accession2run=fix_accession2run):
+def fix_canceled_runs(seqinfo):
"""Function that adds cancelme_ to known bad runs which were forgotten
"""
accession_number = get_unique(seqinfo, 'accession_number')
- if accession_number in accession2run:
+ if accession_number in fix_accession2run:
lgr.info("Considering some runs possibly marked to be "
"canceled for accession %s", accession_number)
- badruns = accession2run[accession_number]
+ badruns = fix_accession2run[accession_number]
badruns_pattern = '|'.join(badruns)
for i, s in enumerate(seqinfo):
if re.match(badruns_pattern, s.series_id):
@@ -391,39 +395,65 @@ def fix_canceled_runs(seqinfo, accession2run=fix_accession2run):
return seqinfo
-def fix_dbic_protocol(seqinfo, keys=series_spec_fields, subsdict=protocols2fix):
- """Ad-hoc fixup for existing protocols
+def fix_dbic_protocol(seqinfo):
+ """Ad-hoc fixup for existing protocols.
+
+ It will operate in 3 stages on `protocols2fix` records.
+ 1. consider a record which has md5sum of study_description
+ 2. apply all substitutions, where key is a regular expression which
+ successfully searches (not necessarily matches, so anchor appropriately)
+ study_description
+ 3. apply "catch all" substitutions in the key containing an empty string
+
+ 3. is somewhat redundant since `re.compile('.*')` could match any, but is
+ kept for simplicity of its specification.
"""
+
study_hash = get_study_hash(seqinfo)
+ study_description = get_study_description(seqinfo)
- if study_hash not in subsdict:
- raise ValueError("I don't know how to fix {0}".format(study_hash))
+ # We will consider first study specific (based on hash)
+ if study_hash in protocols2fix:
+ _apply_substitutions(seqinfo,
+ protocols2fix[study_hash],
+ 'study (%s) specific' % study_hash)
+ # Then go through all regexps returning regex "search" result
+ # on study_description
+ for sub, substitutions in protocols2fix.items():
+ if isinstance(sub, re_Pattern) and sub.search(study_description):
+ _apply_substitutions(seqinfo,
+ substitutions,
+ '%r regex matching' % sub.pattern)
+ # and at the end - global
+ if '' in protocols2fix:
+ _apply_substitutions(seqinfo, protocols2fix[''], 'global')
- # need to replace both protocol_name series_description
- substitutions = subsdict[study_hash]
+ return seqinfo
+
+
+def _apply_substitutions(seqinfo, substitutions, subs_scope):
+ lgr.info("Considering %s substitutions", subs_scope)
for i, s in enumerate(seqinfo):
fixed_kwargs = dict()
- for key in keys:
- value = getattr(s, key)
+ # need to replace both protocol_name series_description
+ for key in series_spec_fields:
+ oldvalue = value = getattr(s, key)
# replace all I need to replace
for substring, replacement in substitutions:
value = re.sub(substring, replacement, value)
+ if oldvalue != value:
+ lgr.info(" %s: %r -> %r", key, oldvalue, value)
fixed_kwargs[key] = value
# namedtuples are immutable
seqinfo[i] = s._replace(**fixed_kwargs)
- return seqinfo
-
def fix_seqinfo(seqinfo):
"""Just a helper on top of both fixers
"""
# add cancelme to known bad runs
seqinfo = fix_canceled_runs(seqinfo)
- study_hash = get_study_hash(seqinfo)
- if study_hash in protocols2fix:
- lgr.info("Fixing up protocol for {0}".format(study_hash))
- seqinfo = fix_dbic_protocol(seqinfo)
+ seqinfo = fix_dbic_protocol(seqinfo)
return seqinfo
@@ -484,10 +514,10 @@ def infotodict(seqinfo):
# 3 - Image IOD specific specialization (optional)
dcm_image_iod_spec = s.image_type[2]
image_type_seqtype = {
- 'P': 'fmap', # phase
+ # Note: P and M are too generic to make a decision here, could be
+ # for different seqtypes (bold, fmap, etc)
'FMRI': 'func',
'MPR': 'anat',
- # 'M': 'func', "magnitude" -- can be for scout, anat, bold, fmap
'DIFFUSION': 'dwi',
'MIP_SAG': 'anat', # angiography
'MIP_COR': 'anat', # angiography
@@ -540,29 +570,55 @@ def infotodict(seqinfo):
# prefix = ''
prefix = ''
+ #
+ # Figure out the seqtype_label (BIDS _suffix)
+ #
+ # If none was provided -- let's deduce it from the information we find:
# analyze s.protocol_name (series_id is based on it) for full name mapping etc
- if seqtype == 'func' and not seqtype_label:
- if '_pace_' in series_spec:
- seqtype_label = 'pace' # or should it be part of seq-
- else:
- # assume bold by default
- seqtype_label = 'bold'
-
- if seqtype == 'fmap' and not seqtype_label:
- if not dcm_image_iod_spec:
- raise ValueError("Do not know image data type yet to make decision")
- seqtype_label = {
- # might want explicit {file_index} ?
- # _epi for pepolar fieldmaps, see
- # https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/01-magnetic-resonance-imaging-data.html#case-4-multiple-phase-encoded-directions-pepolar
- 'M': 'epi' if 'dir' in series_info else 'magnitude',
- 'P': 'phasediff',
- 'DIFFUSION': 'epi', # according to KODI those DWI are the EPIs we need
- }[dcm_image_iod_spec]
-
- # label for dwi as well
- if seqtype == 'dwi' and not seqtype_label:
- seqtype_label = 'dwi'
+ if not seqtype_label:
+ if seqtype == 'func':
+ if '_pace_' in series_spec:
+ seqtype_label = 'pace' # or should it be part of seq-
+ elif 'P' in s.image_type:
+ seqtype_label = 'phase'
+ elif 'M' in s.image_type:
+ seqtype_label = 'bold'
+ else:
+ # assume bold by default
+ seqtype_label = 'bold'
+ elif seqtype == 'fmap':
+ # TODO: support phase1 phase2 like in "Case 2: Two phase images ..."
+ if not dcm_image_iod_spec:
+ raise ValueError("Do not know image data type yet to make decision")
+ seqtype_label = {
+ # might want explicit {file_index} ?
+ # _epi for pepolar fieldmaps, see
+ # https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/01-magnetic-resonance-imaging-data.html#case-4-multiple-phase-encoded-directions-pepolar
+ 'M': 'epi' if 'dir' in series_info else 'magnitude',
+ 'P': 'phasediff',
+ 'DIFFUSION': 'epi', # according to KODI those DWI are the EPIs we need
+ }[dcm_image_iod_spec]
+ elif seqtype == 'dwi':
+ # label for dwi as well
+ seqtype_label = 'dwi'
+
+ #
+ # Even if seqtype_label was provided, for some data we might need to override,
+ # since they are complementary files produced along-side with original
+ # ones.
+ #
+ if s.series_description.endswith('_SBRef'):
+ seqtype_label = 'sbref'
+
+ if not seqtype_label:
+ # Might be provided by the bids ending within series_spec, we would
+ # just want to check if that the last element is not _key-value pair
+ bids_ending = series_info.get('bids', None)
+ if not bids_ending \
+ or "-" in bids_ending.split('_')[-1]:
+ lgr.warning(
+ "We ended up with an empty label/suffix for %r",
+ series_spec)
run = series_info.get('run')
if run is not None:
@@ -741,6 +797,16 @@ def get_unique(seqinfos, attr):
# hits, or may be we could just somehow demarkate that it will be multisession
# one and so then later value parsed (again) in infotodict would be used???
def infotoids(seqinfos, outdir):
+ # In python 3.7.5 we would obtain odict_keys() object which would be
+ # immutable, and we would not be able to perform any substitutions if
+ # needed. So let's make it into a regular list
+ if isinstance(seqinfos, dict) or hasattr(seqinfos, 'keys'):
+ # just some checks for a paranoid Yarik
+ raise TypeError(
+ "Expected list-like structure here, not associative array. Got %s"
+ % type(seqinfos)
+ )
+ seqinfos = list(seqinfos)
# decide on subjid and session based on patient_id
lgr.info("Processing sequence infos to deduce study/session")
study_description = get_study_description(seqinfos)
=====================================
heudiconv/heuristics/reproin_validator.cfg
=====================================
@@ -1,6 +1,7 @@
{
"ignore": [
- "TOTAL_READOUT_TIME_NOT_DEFINED"
+ "TOTAL_READOUT_TIME_NOT_DEFINED",
+ "CUSTOM_COLUMN_WITHOUT_DESCRIPTION"
],
"warn": [],
"error": [],
=====================================
heudiconv/heuristics/test_reproin.py
=====================================
@@ -2,6 +2,10 @@
# Tests for reproin.py
#
from collections import OrderedDict
+from mock import patch
+import re
+
+from . import reproin
from .reproin import (
filter_files,
fix_canceled_runs,
@@ -78,7 +82,8 @@ def test_fix_canceled_runs():
'accession1': ['^01-', '^03-']
}
- seqinfo_ = fix_canceled_runs(seqinfo, fake_accession2run)
+ with patch.object(reproin, 'fix_accession2run', fake_accession2run):
+ seqinfo_ = fix_canceled_runs(seqinfo)
for i, s in enumerate(seqinfo_, 1):
output = runname
@@ -106,16 +111,20 @@ def test_fix_dbic_protocol():
'nochangeplease',
'nochangeeither')
-
seqinfos = [seq1, seq2]
- keys = ['field1']
- subsdict = {
+ protocols2fix = {
md5sum('mystudy'):
- [('scout_run\+', 'scout'),
+ [('scout_run\+', 'THESCOUT-runX'),
('run-life[0-9]', 'run+_task-life')],
+ re.compile('^my.*'):
+ [('THESCOUT-runX', 'THESCOUT')],
+ # rely on 'catch-all' to fix up above scout
+ '': [('THESCOUT', 'scout')]
}
- seqinfos_ = fix_dbic_protocol(seqinfos, keys=keys, subsdict=subsdict)
+ with patch.object(reproin, 'protocols2fix', protocols2fix), \
+ patch.object(reproin, 'series_spec_fields', ['field1']):
+ seqinfos_ = fix_dbic_protocol(seqinfos)
assert(seqinfos[1] == seqinfos_[1])
# field2 shouldn't have changed since I didn't pass it
assert(seqinfos_[0] == FakeSeqInfo(accession_number,
@@ -124,8 +133,9 @@ def test_fix_dbic_protocol():
seq1.field2))
# change also field2 please
- keys = ['field1', 'field2']
- seqinfos_ = fix_dbic_protocol(seqinfos, keys=keys, subsdict=subsdict)
+ with patch.object(reproin, 'protocols2fix', protocols2fix), \
+ patch.object(reproin, 'series_spec_fields', ['field1', 'field2']):
+ seqinfos_ = fix_dbic_protocol(seqinfos)
assert(seqinfos[1] == seqinfos_[1])
# now everything should have changed
assert(seqinfos_[0] == FakeSeqInfo(accession_number,
=====================================
heudiconv/info.py
=====================================
@@ -1,4 +1,4 @@
-__version__ = "0.6.0"
+__version__ = "0.8.0"
__author__ = "HeuDiConv team and contributors"
__url__ = "https://github.com/nipy/heudiconv"
__packagename__ = 'heudiconv'
@@ -12,14 +12,13 @@ CLASSIFIERS = [
'Environment :: Console',
'Intended Audience :: Science/Research',
'License :: OSI Approved :: Apache Software License',
- 'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Topic :: Scientific/Engineering'
]
-PYTHON_REQUIRES = ">=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*"
+PYTHON_REQUIRES = ">=3.5"
REQUIRES = [
'nibabel',
@@ -27,7 +26,7 @@ REQUIRES = [
'nipype >=1.0.0; python_version > "3.0"',
'nipype >=1.0.0,!=1.2.1,!=1.2.2; python_version == "2.7"',
'pathlib',
- 'dcmstack>=0.7',
+ 'dcmstack>=0.8',
'etelemetry',
'filelock>=3.0.12',
]
@@ -43,7 +42,7 @@ TESTS_REQUIRES = [
EXTRA_REQUIRES = {
'tests': TESTS_REQUIRES,
'extras': [], # Requires patched version ATM ['dcmstack'],
- 'datalad': ['datalad']
+ 'datalad': ['datalad >=0.12.3']
}
# Flatten the lists
=====================================
heudiconv/parser.py
=====================================
@@ -161,14 +161,16 @@ def get_study_sessions(dicom_dir_template, files_opt, heuristic, outdir,
files_ += files_ex
# sort all DICOMS using heuristic
- # TODO: this one is not grouping by StudyUID but may be we should!
- seqinfo_dict = group_dicoms_into_seqinfos(files_,
+ seqinfo_dict = group_dicoms_into_seqinfos(
+ files_,
+ grouping,
file_filter=getattr(heuristic, 'filter_files', None),
dcmfilter=getattr(heuristic, 'filter_dicom', None),
- grouping=grouping)
+ custom_grouping=getattr(heuristic, 'grouping', None)
+ )
if sids:
- if not (len(sids) == 1 and len(seqinfo_dict) == 1):
+ if len(sids) != 1:
raise RuntimeError(
"We were provided some subjects (%s) but "
"we can deal only "
@@ -208,17 +210,21 @@ def get_study_sessions(dicom_dir_template, files_opt, heuristic, outdir,
# TODO: probably infotoids is doomed to do more and possibly
# split into multiple sessions!!!! but then it should be provided
# full seqinfo with files which it would place into multiple groups
- lgr.info("Study session for %s" % str(ids))
study_session_info = StudySessionInfo(
ids.get('locator'),
ids.get('session', session) or session,
sid or ids.get('subject', None)
)
+ lgr.info("Study session for %r", study_session_info)
+
if study_session_info in study_sessions:
- #raise ValueError(
- lgr.warning(
- "We already have a study session with the same value %s"
- % repr(study_session_info))
- continue # skip for now
+ if grouping != 'all':
+ # MG - should this blow up to mimic -d invocation?
+ lgr.warning(
+ "Existing study session with the same values (%r)."
+ " Skipping DICOMS %s",
+ study_session_info, *seqinfo.values()
+ )
+ continue
study_sessions[study_session_info] = seqinfo
return study_sessions
=====================================
heudiconv/tests/data/phantom.dcm
=====================================
Binary files /dev/null and b/heudiconv/tests/data/phantom.dcm differ
=====================================
heudiconv/tests/test_dicoms.py
=====================================
@@ -5,8 +5,12 @@ import pytest
from heudiconv.external.pydicom import dcm
from heudiconv.cli.run import main as runner
-from heudiconv.dicoms import parse_private_csa_header, embed_nifti
-from .utils import TESTS_DATA_PATH
+from heudiconv.convert import nipype_convert
+from heudiconv.dicoms import parse_private_csa_header, embed_dicom_and_nifti_metadata
+from .utils import (
+ assert_cwd_unchanged,
+ TESTS_DATA_PATH,
+)
# Public: Private DICOM tags
DICOM_FIELDS_TO_TEST = {
@@ -15,7 +19,7 @@ DICOM_FIELDS_TO_TEST = {
def test_private_csa_header(tmpdir):
dcm_file = op.join(TESTS_DATA_PATH, 'axasc35.dcm')
- dcm_data = dcm.read_file(dcm_file)
+ dcm_data = dcm.read_file(dcm_file, stop_before_pixels=True)
for pub, priv in DICOM_FIELDS_TO_TEST.items():
# ensure missing public tag
with pytest.raises(AttributeError):
@@ -26,35 +30,37 @@ def test_private_csa_header(tmpdir):
runner(['--files', dcm_file, '-c' 'none', '-f', 'reproin'])
-def test_nifti_embed(tmpdir):
+ at assert_cwd_unchanged(ok_to_chdir=True) # so we cd back after tmpdir.chdir
+def test_embed_dicom_and_nifti_metadata(tmpdir):
"""Test dcmstack's additional fields"""
tmpdir.chdir()
# set up testing files
dcmfiles = [op.join(TESTS_DATA_PATH, 'axasc35.dcm')]
infofile = 'infofile.json'
- # 1) nifti does not exist
- out = embed_nifti(dcmfiles, 'nifti.nii', 'infofile.json', None, False)
- # string -> json
- out = json.loads(out)
- # should have created nifti file
- assert op.exists('nifti.nii')
+ out_prefix = str(tmpdir / "nifti")
+ # 1) nifti does not exist -- no longer supported
+ with pytest.raises(NotImplementedError):
+ embed_dicom_and_nifti_metadata(dcmfiles, out_prefix + '.nii.gz', infofile, None)
+ # we should produce nifti using our "standard" ways
+ nipype_out, prov_file = nipype_convert(
+ dcmfiles, prefix=out_prefix, with_prov=False,
+ bids_options=None, tmpdir=str(tmpdir))
+ niftifile = nipype_out.outputs.converted_files
+
+ assert op.exists(niftifile)
# 2) nifti exists
- nifti, info = embed_nifti(dcmfiles, 'nifti.nii', 'infofile.json', None, False)
- assert op.exists(nifti)
- assert op.exists(info)
- with open(info) as fp:
+ embed_dicom_and_nifti_metadata(dcmfiles, niftifile, infofile, None)
+ assert op.exists(infofile)
+ with open(infofile) as fp:
out2 = json.load(fp)
- assert out == out2
-
# 3) with existing metadata
bids = {"existing": "data"}
- nifti, info = embed_nifti(dcmfiles, 'nifti.nii', 'infofile.json', bids, False)
- with open(info) as fp:
+ embed_dicom_and_nifti_metadata(dcmfiles, niftifile, infofile, bids)
+ with open(infofile) as fp:
out3 = json.load(fp)
- assert out3["existing"]
- del out3["existing"]
- assert out3 == out2 == out
+ assert out3.pop("existing") == "data"
+ assert out3 == out2
=====================================
heudiconv/tests/test_main.py
=====================================
@@ -1,6 +1,9 @@
# TODO: break this up by modules
-from heudiconv.cli.run import main as runner
+from heudiconv.cli.run import (
+ main as runner,
+ process_args,
+)
from heudiconv import __version__
from heudiconv.utils import (create_file_if_missing,
set_readonly,
@@ -32,8 +35,7 @@ def test_main_help(stdout):
assert stdout.getvalue().startswith("usage: ")
- at patch('sys.stderr' if sys.version_info[:2] <= (3, 3) else
- 'sys.stdout', new_callable=StringIO)
+ at patch('sys.stdout', new_callable=StringIO)
def test_main_version(std):
with pytest.raises(SystemExit):
runner(['--version'])
@@ -63,6 +65,17 @@ def test_populate_bids_templates(tmpdir):
# it should also be available as a command
os.unlink(str(description_file))
+
+ # it must fail if no heuristic was provided
+ with pytest.raises(ValueError) as cme:
+ runner([
+ '--command', 'populate-templates',
+ '--files', str(tmpdir)
+ ])
+ assert str(cme.value).startswith("Specify heuristic using -f. Known are:")
+ assert "convertall," in str(cme.value)
+ assert not description_file.exists()
+
runner([
'--command', 'populate-templates', '-f', 'convertall',
'--files', str(tmpdir)
@@ -271,3 +284,16 @@ def test_cache(tmpdir):
assert (cachedir / 'dicominfo.tsv').exists()
assert (cachedir / 'S01.auto.txt').exists()
assert (cachedir / 'S01.edit.txt').exists()
+
+
+def test_no_etelemetry():
+ # smoke test at large - just verifying that no crash if no etelemetry
+ class args:
+ outdir = '/dev/null'
+ command = 'ls'
+ heuristic = 'reproin'
+ files = [] # Nothing to list
+
+ # must not fail if etelemetry no found
+ with patch.dict('sys.modules', {'etelemetry': None}):
+ process_args(args)
=====================================
heudiconv/tests/test_regression.py
=====================================
@@ -1,27 +1,27 @@
"""Testing conversion with conversion saved on datalad"""
-import json
from glob import glob
+import os
import os.path as op
import pytest
+from heudiconv.cli.run import main as runner
+from heudiconv.external.pydicom import dcm
+from heudiconv.utils import load_json
+# testing utilities
+from .utils import fetch_data, gen_heudiconv_args, TESTS_DATA_PATH
+
have_datalad = True
try:
- from datalad import api # to pull and grab data
from datalad.support.exceptions import IncompleteResultsError
except ImportError:
have_datalad = False
-from heudiconv.cli.run import main as runner
-from heudiconv.utils import load_json
-# testing utilities
-from .utils import fetch_data, gen_heudiconv_args
-
+ at pytest.mark.skipif(not have_datalad, reason="no datalad")
@pytest.mark.parametrize('subject', ['sub-sid000143'])
@pytest.mark.parametrize('heuristic', ['reproin.py'])
@pytest.mark.parametrize('anon_cmd', [None, 'anonymize_script.py'])
- at pytest.mark.skipif(not have_datalad, reason="no datalad")
def test_conversion(tmpdir, subject, heuristic, anon_cmd):
tmpdir.chdir()
try:
@@ -32,17 +32,17 @@ def test_conversion(tmpdir, subject, heuristic, anon_cmd):
pytest.skip("Failed to fetch test data: %s" % str(exc))
outdir = tmpdir.mkdir('out').strpath
- args = gen_heudiconv_args(datadir,
- outdir,
- subject,
- heuristic,
- anon_cmd,
- template=op.join('sourcedata/{subject}/*/*/*.tgz'))
- runner(args) # run conversion
+ args = gen_heudiconv_args(
+ datadir, outdir, subject, heuristic, anon_cmd,
+ template=op.join('sourcedata/{subject}/*/*/*.tgz')
+ )
+ runner(args) # run conversion
# verify functionals were converted
- assert glob('{}/{}/func/*'.format(outdir, subject)) == \
- glob('{}/{}/func/*'.format(datadir, subject))
+ assert (
+ glob('{}/{}/func/*'.format(outdir, subject)) ==
+ glob('{}/{}/func/*'.format(datadir, subject))
+ )
# compare some json metadata
json_ = '{}/task-rest_acq-24mm64sl1000tr32te600dyn_bold.json'.format
@@ -52,6 +52,7 @@ def test_conversion(tmpdir, subject, heuristic, anon_cmd):
for key in keys:
assert orig[key] == conv[key]
+
@pytest.mark.skipif(not have_datalad, reason="no datalad")
def test_multiecho(tmpdir, subject='MEEPI', heuristic='bids_ME.py'):
tmpdir.chdir()
@@ -62,7 +63,7 @@ def test_multiecho(tmpdir, subject='MEEPI', heuristic='bids_ME.py'):
outdir = tmpdir.mkdir('out').strpath
args = gen_heudiconv_args(datadir, outdir, subject, heuristic)
- runner(args) # run conversion
+ runner(args) # run conversion
# check if we have echo functionals
echoes = glob(op.join('out', 'sub-' + subject, 'func', '*echo*nii.gz'))
@@ -81,3 +82,43 @@ def test_multiecho(tmpdir, subject='MEEPI', heuristic='bids_ME.py'):
events = glob(op.join('out', 'sub-' + subject, 'func', '*events.tsv'))
for event in events:
assert 'echo-' not in event
+
+
+ at pytest.mark.parametrize('subject', ['merged'])
+def test_grouping(tmpdir, subject):
+ dicoms = [
+ op.join(TESTS_DATA_PATH, fl) for fl in ['axasc35.dcm', 'phantom.dcm']
+ ]
+ # ensure DICOMs are different studies
+ studyuids = {
+ dcm.read_file(fl, stop_before_pixels=True).StudyInstanceUID for fl
+ in dicoms
+ }
+ assert len(studyuids) == len(dicoms)
+ # symlink to common location
+ outdir = tmpdir.mkdir('out')
+ datadir = tmpdir.mkdir(subject)
+ for fl in dicoms:
+ os.symlink(fl, (datadir / op.basename(fl)).strpath)
+
+ template = op.join("{subject}/*.dcm")
+ hargs = gen_heudiconv_args(
+ tmpdir.strpath,
+ outdir.strpath,
+ subject,
+ 'convertall.py',
+ template=template
+ )
+
+ with pytest.raises(AssertionError):
+ runner(hargs)
+
+ # group all found DICOMs under subject, despite conflicts
+ hargs += ["-g", "all"]
+ runner(hargs)
+ assert len([fl for fl in outdir.visit(fil='run0*')]) == 4
+ tsv = (outdir / 'participants.tsv')
+ assert tsv.check()
+ lines = tsv.open().readlines()
+ assert len(lines) == 2
+ assert lines[1].split('\t')[0] == 'sub-{}'.format(subject)
=====================================
heudiconv/tests/utils.py
=====================================
@@ -1,9 +1,17 @@
+from functools import wraps
+import os
import os.path as op
+import sys
+
import heudiconv.heuristics
+
HEURISTICS_PATH = op.join(heudiconv.heuristics.__path__[0])
TESTS_DATA_PATH = op.join(op.dirname(__file__), 'data')
+import logging
+lgr = logging.getLogger(__name__)
+
def gen_heudiconv_args(datadir, outdir, subject, heuristic_file,
anon_cmd=None, template=None, xargs=None):
@@ -52,9 +60,57 @@ def fetch_data(tmpdir, dataset, getpath=None):
"""
from datalad import api
targetdir = op.join(tmpdir, op.basename(dataset))
- api.install(path=targetdir,
+ ds = api.install(path=targetdir,
source='http://datasets-tests.datalad.org/{}'.format(dataset))
getdir = targetdir + (op.sep + getpath if getpath is not None else '')
- api.get(getdir)
+ ds.get(getdir)
return targetdir
+
+
+def assert_cwd_unchanged(ok_to_chdir=False):
+ """Decorator to test whether the current working directory remains unchanged
+
+ Provenance: based on the one in datalad, but simplified.
+
+ Parameters
+ ----------
+ ok_to_chdir: bool, optional
+ If True, allow to chdir, so this decorator would not then raise exception
+ if chdir'ed but only return to original directory
+ """
+
+ def decorator(func=None): # =None to avoid pytest treating it as a fixture
+ @wraps(func)
+ def newfunc(*args, **kwargs):
+ cwd_before = os.getcwd()
+ exc = None
+ try:
+ return func(*args, **kwargs)
+ except Exception as exc_:
+ exc = exc_
+ finally:
+ try:
+ cwd_after = os.getcwd()
+ except OSError as e:
+ lgr.warning("Failed to getcwd: %s" % e)
+ cwd_after = None
+
+ if cwd_after != cwd_before:
+ os.chdir(cwd_before)
+ if not ok_to_chdir:
+ lgr.warning(
+ "%s changed cwd to %s. Mitigating and changing back to %s"
+ % (func, cwd_after, cwd_before))
+ # If there was already exception raised, we better reraise
+ # that one since it must be more important, so not masking it
+ # here with our assertion
+ if exc is None:
+ assert cwd_before == cwd_after, \
+ "CWD changed from %s to %s" % (cwd_before, cwd_after)
+
+ if exc is not None:
+ raise exc
+ return newfunc
+
+ return decorator
=====================================
heudiconv/utils.py
=====================================
@@ -19,10 +19,7 @@ from nipype.utils.filemanip import which
import logging
lgr = logging.getLogger(__name__)
-if sys.version_info[0] > 2:
- from json.decoder import JSONDecodeError
-else:
- JSONDecodeError = ValueError
+from json.decoder import JSONDecodeError
seqinfo_fields = [
@@ -30,8 +27,8 @@ seqinfo_fields = [
'example_dcm_file', # 1
'series_id', # 2
'dcm_dir_name', # 3
- 'unspecified2', # 4
- 'unspecified3', # 5
+ 'series_files', # 4
+ 'unspecified', # 5
'dim1', 'dim2', 'dim3', 'dim4', # 6, 7, 8, 9
'TR', 'TE', # 10, 11
'protocol_name', # 12
@@ -47,7 +44,7 @@ seqinfo_fields = [
'patient_age', # 22
'patient_sex', # 23
'date', # 24
- 'series_uid' # 25
+ 'series_uid', # 25
]
SeqInfo = namedtuple('SeqInfo', seqinfo_fields)
@@ -115,9 +112,7 @@ def anonymize_sid(sid, anon_sid_cmd):
cmd = [anon_sid_cmd, sid]
shell_return = check_output(cmd)
- if all([sys.version_info[0] > 2,
- isinstance(shell_return, bytes),
- isinstance(sid, str)]):
+ if isinstance(shell_return, bytes) and isinstance(sid, str):
anon_sid = shell_return.decode()
else:
anon_sid = shell_return
@@ -193,7 +188,7 @@ def assure_no_file_exists(path):
os.unlink(path)
-def save_json(filename, data, indent=4, sort_keys=True, pretty=False):
+def save_json(filename, data, indent=2, sort_keys=True, pretty=False):
"""Save data to a json file
Parameters
@@ -208,11 +203,25 @@ def save_json(filename, data, indent=4, sort_keys=True, pretty=False):
"""
assure_no_file_exists(filename)
+ dumps_kw = dict(sort_keys=sort_keys, indent=indent)
+ j = None
+ if pretty:
+ try:
+ j = json_dumps_pretty(data, **dumps_kw)
+ except AssertionError as exc:
+ pretty = False
+ lgr.warning(
+ "Prettyfication of .json failed (%s). "
+ "Original .json will be kept as is. Please share (if you "
+ "could) "
+ "that file (%s) with HeuDiConv developers"
+ % (str(exc), filename)
+ )
+ if not pretty:
+ j = _canonical_dumps(data, **dumps_kw)
+ assert j is not None # one way or another it should have been set to a str
with open(filename, 'w') as fp:
- fp.write(
- (json_dumps_pretty if pretty else _canonical_dumps)(
- data, sort_keys=sort_keys, indent=indent)
- )
+ fp.write(j)
def json_dumps_pretty(j, indent=2, sort_keys=True):
@@ -257,25 +266,9 @@ def json_dumps_pretty(j, indent=2, sort_keys=True):
def treat_infofile(filename):
"""Tune up generated .json file (slim down, pretty-print for humans).
"""
- with open(filename) as f:
- j = json.load(f)
-
+ j = load_json(filename)
j_slim = slim_down_info(j)
- dumps_kw = dict(indent=2, sort_keys=True)
- try:
- j_pretty = json_dumps_pretty(j_slim, **dumps_kw)
- except AssertionError as exc:
- lgr.warning(
- "Prettyfication of .json failed (%s). "
- "Original .json will be kept as is. Please share (if you could) "
- "that file (%s) with HeuDiConv developers"
- % (str(exc), filename)
- )
- j_pretty = json.dumps(j_slim, **dumps_kw)
-
- set_readonly(filename, False)
- with open(filename, 'wt') as fp:
- fp.write(j_pretty)
+ save_json(filename, j_slim, sort_keys=True, pretty=True)
set_readonly(filename)
@@ -324,7 +317,7 @@ def load_heuristic(heuristic):
path, fname = op.split(heuristic_file)
try:
old_syspath = sys.path[:]
- sys.path.append(path)
+ sys.path.insert(0, path)
mod = __import__(fname.split('.')[0])
mod.filename = heuristic_file
finally:
@@ -490,8 +483,25 @@ def create_tree(path, tree, archives_leading_dir=True):
create_tree(full_name, load, archives_leading_dir=archives_leading_dir)
else:
with open(full_name, 'w') as f:
- if sys.version_info[0] == 2 and not isinstance(load, str):
- load = load.encode('utf-8')
f.write(load)
if executable:
os.chmod(full_name, os.stat(full_name).st_mode | stat.S_IEXEC)
+
+
+def get_typed_attr(obj, attr, _type, default=None):
+ """
+ Typecasts an object's named attribute. If the attribute cannot be
+ converted, the default value is returned instead.
+
+ Parameters
+ ----------
+ obj: Object
+ attr: Attribute
+ _type: Type
+ default: value, optional
+ """
+ try:
+ val = _type(getattr(obj, attr, default))
+ except (TypeError, ValueError):
+ return default
+ return val
=====================================
utils/prep_release
=====================================
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+set -eu
+
+read -r newver oldver <<<$(sed -ne 's,## \[\([0-9\.]*\)\] .*,\1,gp' CHANGELOG.md | head -n 2 | tr '\n' ' ')
+
+echo "Old: $oldver New: $newver"
+curver=$(python -c 'import heudiconv; print(heudiconv.__version__)')
+# check
+test "$oldver" = "$curver"
+
+sed -i -e "s,${oldver//./\\.},$newver,g" \
+ docs/conf.py docs/installation.rst docs/usage.rst heudiconv/info.py
+
View it on GitLab: https://salsa.debian.org/med-team/heudiconv/-/commit/255bc4551ee0024428361213bf0a6b77c4807cdd
--
View it on GitLab: https://salsa.debian.org/med-team/heudiconv/-/commit/255bc4551ee0024428361213bf0a6b77c4807cdd
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200420/adc637ac/attachment-0001.html>
More information about the debian-med-commit
mailing list