[med-svn] [Git][med-team/heudiconv][upstream] New upstream version 0.8.0

Mon Apr 20 15:45:18 BST 2020


Yaroslav Halchenko pushed to branch upstream at Debian Med / heudiconv


Commits:
255bc455 by Yaroslav Halchenko at 2020-04-20T08:57:30-04:00
New upstream version 0.8.0
- - - - -


27 changed files:

- .travis.yml
- CHANGELOG.md
- + Makefile
- README.rst
- docs/conf.py
- docs/heuristics.rst
- docs/installation.rst
- docs/tutorials.rst
- docs/usage.rst
- heudiconv/bids.py
- heudiconv/cli/run.py
- heudiconv/convert.py
- heudiconv/dicoms.py
- heudiconv/external/dlad.py
- heudiconv/external/tests/test_dlad.py
- heudiconv/heuristics/reproin.py
- heudiconv/heuristics/reproin_validator.cfg
- heudiconv/heuristics/test_reproin.py
- heudiconv/info.py
- heudiconv/parser.py
- + heudiconv/tests/data/phantom.dcm
- heudiconv/tests/test_dicoms.py
- heudiconv/tests/test_main.py
- heudiconv/tests/test_regression.py
- heudiconv/tests/utils.py
- heudiconv/utils.py
- + utils/prep_release


Changes:

=====================================
.travis.yml
=====================================
@@ -1,10 +1,10 @@
 # vim ft=yaml
 language: python
 python:
-  - 2.7
   - 3.5
   - 3.6
   - 3.7
+  - 3.8
 
 cache:
   - apt


=====================================
CHANGELOG.md
=====================================
@@ -4,6 +4,62 @@ All notable changes to this project will be documented (for humans) in this file
 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
 
+## [0.8.0] - 2020-04-15
+
+### Enhancements
+
+- Centralized saving of .json files.  Indentation of some files could
+  change now from previous versions where it could have used `3`
+  spaces. Now indentation should be consistently `2` for .json files
+  we produce/modify ([#436][]) (note: dcm2niix uses tabs for indentation)
+- ReproIn heuristic: support SBRef and phase data ([#387][])
+- Set the "TaskName" field in .json sidecar files for multi-echo data
+  ([#420][])
+- Provide an informative exception if command needs heuristic to be
+  specified ([#437][])
+
+### Refactored
+
+- `embed_nifti` was refactored into `embed_dicom_and_nifti_metadata`
+  which would no longer create `.nii` file if it does not exist
+  already ([#432][])
+
+### Fixed
+
+- Skip datalad-based tests if no datalad available ([#430][])
+- Search heuristic file path first so we do not pick up a python
+  module if name conflicts ([#434][])
+
+## [0.7.0] - 2020-03-20
+
+### Removed
+
+- Python 2 support/testing
+
+### Enhancement
+
+- `-g` option obtained two new modes: `all` and `custom`. In case of `all`,
+  all provided DICOMs will be treated as coming from a single scanning session.
+  `custom` instructs to use `.grouping` value (could be a DICOM attribute or
+  a callable)provided by the heuristic ([#359][]).
+- Stop before reading pixels data while gathering metadata from DICOMs ([#404][])
+- reproin heuristic:
+  - In addition to original "md5sum of the study_description" `protocols2fix`
+    could now have (and applied after md5sum matching ones)
+    1). a regular expression searched in study_description,
+    2). an empty string as "catch all".
+    This features could be used to easily provide remapping into reproin
+    naming (documentation is to come to http://github.com/ReproNim/reproin)
+    ([#425][])
+
+### Fixed
+
+- Use nan, not None for absent echo value in sorting
+- reproin heuristic: case seqinfos into a list to be able to modify from
+  overloaded heuristic ([#419][])
+- No spurious errors from the logger upon a warning about `etelemetry`
+  absence ([#407][])
+
 ## [0.6.0] - 2019-12-16
 
 This is largely a bug fix.  Metadata and order of `_key-value` fields in BIDS
@@ -271,6 +327,7 @@ TODO Summary
 [#348]: https://github.com/nipy/heudiconv/issues/348
 [#351]: https://github.com/nipy/heudiconv/issues/351
 [#352]: https://github.com/nipy/heudiconv/issues/352
+[#359]: https://github.com/nipy/heudiconv/issues/359
 [#360]: https://github.com/nipy/heudiconv/issues/360
 [#364]: https://github.com/nipy/heudiconv/issues/364
 [#369]: https://github.com/nipy/heudiconv/issues/369
@@ -279,4 +336,15 @@ TODO Summary
 [#376]: https://github.com/nipy/heudiconv/issues/376
 [#379]: https://github.com/nipy/heudiconv/issues/379
 [#380]: https://github.com/nipy/heudiconv/issues/380
+[#387]: https://github.com/nipy/heudiconv/issues/387
 [#390]: https://github.com/nipy/heudiconv/issues/390
+[#404]: https://github.com/nipy/heudiconv/issues/404
+[#407]: https://github.com/nipy/heudiconv/issues/407
+[#419]: https://github.com/nipy/heudiconv/issues/419
+[#420]: https://github.com/nipy/heudiconv/issues/420
+[#425]: https://github.com/nipy/heudiconv/issues/425
+[#430]: https://github.com/nipy/heudiconv/issues/430
+[#432]: https://github.com/nipy/heudiconv/issues/432
+[#434]: https://github.com/nipy/heudiconv/issues/434
+[#436]: https://github.com/nipy/heudiconv/issues/436
+[#437]: https://github.com/nipy/heudiconv/issues/437


=====================================
Makefile
=====================================
@@ -0,0 +1,6 @@
+all:
+	echo 'nothing by default'
+
+prep_release:
+	# take previous one, and replace with the next one
+	utils/prep_release


=====================================
README.rst
=====================================
@@ -4,7 +4,7 @@
 
 `a heuristic-centric DICOM converter`
 
-.. image:: https://img.shields.io/badge/docker-nipy/heudiconv:0.5.4-brightgreen.svg?logo=docker&style=flat
+.. image:: https://img.shields.io/badge/docker-nipy/heudiconv:latest-brightgreen.svg?logo=docker&style=flat
   :target: https://hub.docker.com/r/nipy/heudiconv/tags/
   :alt: Our Docker image
 


=====================================
docs/conf.py
=====================================
@@ -26,7 +26,7 @@ author = 'Heudiconv team'
 # The short X.Y version
 version = ''
 # The full version, including alpha/beta/rc tags
-release = '0.6.0'
+release = '0.8.0'
 
 
 # -- General configuration ---------------------------------------------------


=====================================
docs/heuristics.rst
=====================================
@@ -68,3 +68,20 @@ DICOMs where this function returns ``True`` will be filtered out.
 Further processing on ``seqinfos`` to deduce/customize subject, session, and locator.
 
 A dictionary of {"locator": locator, "session": session, "subject": subject} is returned.
+
+---------------------------------------------------------------
+``grouping`` string or ``grouping(files, dcmfilter, seqinfo)``
+---------------------------------------------------------------
+
+Whenever ``--grouping custom`` (``-g custom``) is used, this attribute or callable
+will be used to inform how to group the DICOMs into separate groups. From
+`original PR#359 <https://github.com/nipy/heudiconv/pull/359>`_::
+
+    grouping = 'AcquisitionDate'
+
+or::
+
+    def grouping(files, dcmfilter, seqinfo):
+        seqinfos = collections.OrderedDict()
+        ...
+        return seqinfos  # ordered dict containing seqinfo objects: list of DICOMs
\ No newline at end of file


=====================================
docs/installation.rst
=====================================
@@ -26,7 +26,7 @@ If `Docker <https://docs.docker.com/install/>`_ is available on your system, you
 can visit `our page on Docker Hub <https://hub.docker.com/r/nipy/heudiconv/tags>`_ 
 to view available releases. To pull the latest release, run::
 
-    $ docker pull nipy/heudiconv:0.6.0
+    $ docker pull nipy/heudiconv:0.8.0
 
 
 Singularity
@@ -35,4 +35,4 @@ If `Singularity <https://www.sylabs.io/singularity/>`_ is available on your syst
 you can use it to pull and convert our Docker images! For example, to pull and 
 build the latest release, you can run::
 
-    $ singularity pull docker://nipy/heudiconv:0.6.0
+    $ singularity pull docker://nipy/heudiconv:0.8.0


=====================================
docs/tutorials.rst
=====================================
@@ -7,7 +7,7 @@ other users' tutorials covering their experience with ``heudiconv``.
 
 - `YouTube tutorial <https://www.youtube.com/watch?v=O1kZAuR7E00>`_ by `James Kent <https://github.com/jdkent>`_.
 
-- `Walkthrough <http://reproducibility.stanford.edu/bids-tutorial-series-part-2a/>`_ by the `Standard Center for Reproducible Neuroscience <http://reproducibility.stanford.edu/>`_.
+- `Walkthrough <http://reproducibility.stanford.edu/bids-tutorial-series-part-2a/>`_ by the `Stanford Center for Reproducible Neuroscience <http://reproducibility.stanford.edu/>`_.
 
 - `U of A Neuroimaging Core <https://neuroimaging-core-docs.readthedocs.io/en/latest/pages/heudiconv.html>`_ by `Dianne Patterson <https://github.com/dkp>`_.
 


=====================================
docs/usage.rst
=====================================
@@ -82,7 +82,7 @@ The second script processes a DICOM directory with ``heudiconv`` using the built
     DCMDIR=${DCMDIRS[${SLURM_ARRAY_TASK_ID}]}
     echo Submitted directory: ${DCMDIR}
 
-    IMG="/singularity-images/heudiconv-0.6.0-dev.sif"
+    IMG="/singularity-images/heudiconv-0.8.0-dev.sif"
     CMD="singularity run -B ${DCMDIR}:/dicoms:ro -B ${OUTDIR}:/output -e ${IMG} --files /dicoms/ -o /output -f reproin -c dcm2niix -b notop --minmeta -l ."
 
     printf "Command:\n${CMD}\n"
@@ -97,7 +97,7 @@ This script creates the top-level bids files (e.g.,
     set -eu
 
     OUTDIR=${1}
-    IMG="/singularity-images/heudiconv-0.6.0-dev.sif"
+    IMG="/singularity-images/heudiconv-0.8.0-dev.sif"
     CMD="singularity run -B ${OUTDIR}:/output -e ${IMG} --files /output -f reproin --command populate-templates"
 
     printf "Command:\n${CMD}\n"


=====================================
heudiconv/bids.py
=====================================
@@ -171,7 +171,7 @@ def populate_aggregated_jsons(path):
             act = "Generating"
         lgr.debug("%s %s", act, task_file)
         fields.update(placeholders)
-        save_json(task_file, fields, indent=2, sort_keys=True, pretty=True)
+        save_json(task_file, fields, sort_keys=True, pretty=True)
 
 
 def tuneup_bids_json_files(json_files):
@@ -193,7 +193,7 @@ def tuneup_bids_json_files(json_files):
             # Let's hope no word 'Date' comes within a study name or smth like
             # that
             raise ValueError("There must be no dates in .json sidecar")
-        save_json(jsonfile, json_, indent=2)
+        save_json(jsonfile, json_)
 
     # Load the beast
     seqtype = op.basename(op.dirname(jsonfile))
@@ -223,7 +223,7 @@ def tuneup_bids_json_files(json_files):
             was_readonly = is_readonly(json_phasediffname)
             if was_readonly:
                 set_readonly(json_phasediffname, False)
-            save_json(json_phasediffname, json_, indent=2)
+            save_json(json_phasediffname, json_)
             if was_readonly:
                 set_readonly(json_phasediffname)
 
@@ -259,8 +259,7 @@ def add_participant_record(studydir, subject, age, sex):
                         ("Description", "(TODO: adjust - by default everyone is in "
                             "control group)")])),
                 ]),
-                sort_keys=False,
-                indent=2)
+                sort_keys=False)
     # Add a new participant
     with open(participants_tsv, 'a') as f:
         f.write(
@@ -373,8 +372,7 @@ def add_rows_to_scans_keys_file(fn, newrows):
                         ("LongName", "Random string"),
                         ("Description", "md5 hash of UIDs")])),
                 ]),
-                sort_keys=False,
-                indent=2)
+                sort_keys=False)
 
     header = ['filename', 'acq_time', 'operator', 'randstr']
     # prepare all the data rows


=====================================
heudiconv/cli/run.py
=====================================
@@ -62,6 +62,7 @@ def process_extra_commands(outdir, args):
         for f in args.files:
             treat_infofile(f)
     elif args.command == 'ls':
+        ensure_heuristic_arg(args)
         heuristic = load_heuristic(args.heuristic)
         heuristic_ls = getattr(heuristic, 'ls', None)
         for f in args.files:
@@ -78,6 +79,7 @@ def process_extra_commands(outdir, args):
                     % (str(study_session), len(sequences), suf)
                 )
     elif args.command == 'populate-templates':
+        ensure_heuristic_arg(args)
         heuristic = load_heuristic(args.heuristic)
         for f in args.files:
             populate_bids_templates(f, getattr(heuristic, 'DEFAULT_FIELDS', {}))
@@ -88,16 +90,21 @@ def process_extra_commands(outdir, args):
         for name_desc in get_known_heuristics_with_descriptions().items():
             print("- %s: %s" % name_desc)
     elif args.command == 'heuristic-info':
-        from ..utils import get_heuristic_description, get_known_heuristic_names
-        if not args.heuristic:
-            raise ValueError("Specify heuristic using -f. Known are: %s"
-                             % ', '.join(get_known_heuristic_names()))
+        ensure_heuristic_arg(args)
+        from ..utils import get_heuristic_description
         print(get_heuristic_description(args.heuristic, full=True))
     else:
         raise ValueError("Unknown command %s", args.command)
     return
 
 
+def ensure_heuristic_arg(args):
+    from ..utils import get_known_heuristic_names
+    if not args.heuristic:
+        raise ValueError("Specify heuristic using -f. Known are: %s"
+                         % ', '.join(get_known_heuristic_names()))
+
+
 def main(argv=None):
     parser = get_parser()
     args = parser.parse_args(argv)
@@ -124,7 +131,6 @@ def main(argv=None):
 
     if args.debug:
         setup_exceptionhook()
-
     process_args(args)
 
 
@@ -154,8 +160,7 @@ def get_parser():
                         'If not provided, DICOMS would first be "sorted" and '
                         'subject IDs deduced by the heuristic')
     parser.add_argument('-c', '--converter',
-                        default='dcm2niix',
-                        choices=('dcm2niix', 'none'),
+                        choices=('dcm2niix', 'none'), default='dcm2niix',
                         help='tool to use for DICOM conversion. Setting to '
                         '"none" disables the actual conversion step -- useful'
                         'for testing heuristics.')
@@ -219,7 +224,7 @@ def get_parser():
                         help='custom actions to be performed on provided '
                         'files instead of regular operation.')
     parser.add_argument('-g', '--grouping', default='studyUID',
-                        choices=('studyUID', 'accession_number'),
+                        choices=('studyUID', 'accession_number', 'all', 'custom'),
                         help='How to group dicoms (default: by studyUID)')
     parser.add_argument('--minmeta', action='store_true',
                         help='Exclude dcmstack meta information in sidecar '
@@ -248,11 +253,11 @@ def process_args(args):
 
     outdir = op.abspath(args.outdir)
 
-    import etelemetry
     try:
+        import etelemetry
         latest = etelemetry.get_project("nipy/heudiconv")
     except Exception as e:
-        lgr.warning("Could not check for version updates: ", e)
+        lgr.warning("Could not check for version updates: %s", str(e))
         latest = {"version": 'Unknown'}
 
     lgr.info(INIT_MSG(packname=__packagename__,
@@ -343,7 +348,8 @@ def process_args(args):
                         seqinfo=seqinfo,
                         min_meta=args.minmeta,
                         overwrite=args.overwrite,
-                        dcmconfig=args.dcmconfig,)
+                        dcmconfig=args.dcmconfig,
+                        grouping=args.grouping,)
 
         lgr.info("PROCESSING DONE: {0}".format(
             str(dict(subject=sid, outdir=study_outdir, session=session))))


=====================================
heudiconv/convert.py
=====================================
@@ -2,8 +2,10 @@ import filelock
 import os
 import os.path as op
 import logging
+from math import nan
 import shutil
 import sys
+import re
 
 from .utils import (
     read_config,
@@ -72,8 +74,7 @@ def conversion_info(subject, outdir, info, filegroup, ses):
                 try:
                     files = filegroup[item]
                 except KeyError:
-                    PY3 = sys.version_info[0] >= 3
-                    files = filegroup[(str if PY3 else unicode)(item)]
+                    files = filegroup[str(item)]
                 outprefix = template.format(**parameters)
                 convert_info.append((op.join(outpath, outprefix),
                                     outtype, files))
@@ -81,8 +82,8 @@ def conversion_info(subject, outdir, info, filegroup, ses):
 
 
 def prep_conversion(sid, dicoms, outdir, heuristic, converter, anon_sid,
-                   anon_outdir, with_prov, ses, bids_options, seqinfo, min_meta,
-                   overwrite, dcmconfig):
+                    anon_outdir, with_prov, ses, bids_options, seqinfo, 
+                    min_meta, overwrite, dcmconfig, grouping):
     if dicoms:
         lgr.info("Processing %d dicoms", len(dicoms))
     elif seqinfo:
@@ -158,16 +159,17 @@ def prep_conversion(sid, dicoms, outdir, heuristic, converter, anon_sid,
         # So either it would need to be brought back or reconsidered altogether
         # (since no sample data to test on etc)
     else:
-        # TODO -- might have been done outside already!
-        # MG -- will have to try with both dicom template, files
         assure_no_file_exists(target_heuristic_filename)
         safe_copyfile(heuristic.filename, target_heuristic_filename)
         if dicoms:
             seqinfo = group_dicoms_into_seqinfos(
                 dicoms,
+                grouping,
                 file_filter=getattr(heuristic, 'filter_files', None),
                 dcmfilter=getattr(heuristic, 'filter_dicom', None),
-                grouping=None)
+                flatten=True,
+                custom_grouping=getattr(heuristic, 'grouping', None))
+
         seqinfo_list = list(seqinfo.keys())
         filegroup = {si.series_id: x for si, x in seqinfo.items()}
         dicominfo_file = op.join(idir, 'dicominfo%s.tsv' % ses_suffix)
@@ -322,16 +324,18 @@ def convert(items, converter, scaninfo_suffix, custom_callable, with_prov,
                         % (outname)
                     )
 
+        # add the taskname field to the json file(s):
+        add_taskname_to_infofile(bids_outfiles)
+
         if len(bids_outfiles) > 1:
             lgr.warning("For now not embedding BIDS and info generated "
                         ".nii.gz itself since sequence produced "
                         "multiple files")
         elif not bids_outfiles:
             lgr.debug("No BIDS files were produced, nothing to embed to then")
-        elif outname:
+        elif outname and not min_meta:
             embed_metadata_from_dicoms(bids_options, item_dicoms, outname, outname_bids,
-                                       prov_file, scaninfo, tempdirs, with_prov,
-                                       min_meta)
+                                       prov_file, scaninfo, tempdirs, with_prov)
         if scaninfo and op.exists(scaninfo):
             lgr.info("Post-treating %s file", scaninfo)
             treat_infofile(scaninfo)
@@ -517,6 +521,8 @@ def save_converted_files(res, item_dicoms, bids_options, outtype, prefix, outnam
         bids_files = (sorted(res.outputs.bids)
                       if len(res.outputs.bids) == len(res_files)
                       else [None] * len(res_files))
+        # preload since will be used in multiple spots
+        bids_metas = [load_json(b) for b in bids_files if b]
 
         ###   Do we have a multi-echo series?   ###
         #   Some Siemens sequences (e.g. CMRR's MB-EPI) set the label 'TE1',
@@ -530,19 +536,17 @@ def save_converted_files(res, item_dicoms, bids_options, outtype, prefix, outnam
 
         # Check for varying echo times
         echo_times = sorted(list(set(
-            load_json(b).get('EchoTime', None)
-            for b in bids_files
+            b.get('EchoTime', nan)
+            for b in bids_metas
             if b
         )))
 
         is_multiecho = len(echo_times) > 1
 
         ### Loop through the bids_files, set the output name and save files
-        for fl, suffix, bids_file in zip(res_files, suffixes, bids_files):
+        for fl, suffix, bids_file, bids_meta in zip(res_files, suffixes, bids_files, bids_metas):
 
             # TODO: monitor conversion duration
-            if bids_file:
-                fileinfo = load_json(bids_file)
 
             # set the prefix basename for this specific file (we'll modify it,
             # and we don't want to modify it for all the bids_files):
@@ -551,11 +555,18 @@ def save_converted_files(res, item_dicoms, bids_options, outtype, prefix, outnam
             # _sbref sequences reconstructing magnitude and phase generate
             # two NIfTI files IN THE SAME SERIES, so we cannot just add
             # the suffix, if we want to be bids compliant:
-            if bids_file and this_prefix_basename.endswith('_sbref'):
+            if bids_meta and this_prefix_basename.endswith('_sbref') \
+                    and len(suffixes) > len(echo_times):
+                if len(suffixes) != len(echo_times)*2:
+                    lgr.warning(
+                        "Got %d suffixes for %d echo times, which isn't "
+                        "multiple of two as if it was magnitude + phase pairs",
+                        len(suffixes), len(echo_times)
+                    )
                 # Check to see if it is magnitude or phase reconstruction:
-                if 'M' in fileinfo.get('ImageType'):
+                if 'M' in bids_meta.get('ImageType'):
                     mag_or_phase = 'magnitude'
-                elif 'P' in fileinfo.get('ImageType'):
+                elif 'P' in bids_meta.get('ImageType'):
                     mag_or_phase = 'phase'
                 else:
                     mag_or_phase = suffix
@@ -584,12 +595,12 @@ def save_converted_files(res, item_dicoms, bids_options, outtype, prefix, outnam
             # (Note: it can be _sbref and multiecho, so don't use "elif"):
             # For multi-echo sequences, we have to specify the echo number in
             # the file name:
-            if bids_file and is_multiecho:
+            if bids_meta and is_multiecho:
                 # Get the EchoNumber from json file info.  If not present, use EchoTime
-                if 'EchoNumber' in fileinfo.keys():
-                    echo_number = fileinfo['EchoNumber']
+                if 'EchoNumber' in bids_meta:
+                    echo_number = bids_meta['EchoNumber']
                 else:
-                    echo_number = echo_times.index(fileinfo['EchoTime']) + 1
+                    echo_number = echo_times.index(bids_meta['EchoTime']) + 1
 
                 supported_multiecho = ['_bold', '_phase', '_epi', '_sbref', '_T1w', '_PDT2']
                 # Now, decide where to insert it.
@@ -629,3 +640,32 @@ def save_converted_files(res, item_dicoms, bids_options, outtype, prefix, outnam
             except TypeError as exc:  ##catch lists
                 raise TypeError("Multiple BIDS sidecars detected.")
     return bids_outfiles
+
+
+def  add_taskname_to_infofile(infofiles):
+    """Add the "TaskName" field to json files corresponding to func images.
+
+    Parameters
+    ----------
+    infofiles : list with json filenames or single filename
+
+    Returns
+    -------
+    """
+
+    # in case they pass a string with a path:
+    if not isinstance(infofiles, list):
+        infofiles = [infofiles]
+
+    for infofile in infofiles:
+        meta_info = load_json(infofile)
+        try:
+            meta_info['TaskName'] = (re.search('(?<=_task-)\w+',
+                                               op.basename(infofile))
+                                     .group(0).split('_')[0])
+        except AttributeError:
+            lgr.warning("Failed to find task field in {0}.".format(infofile))
+            continue
+
+        # write to outfile
+        save_json(infofile, meta_info)


=====================================
heudiconv/dicoms.py
=====================================
@@ -6,25 +6,162 @@ from collections import OrderedDict
 import tarfile
 
 from .external.pydicom import dcm
-from .utils import SeqInfo, load_json, set_readonly
+from .utils import (
+    get_typed_attr,
+    load_json,
+    save_json,
+    SeqInfo,
+    set_readonly,
+)
+
+import warnings
+with warnings.catch_warnings():
+    warnings.simplefilter("ignore")
+    # suppress warning
+    import nibabel.nicom.dicomwrappers as dw
 
 lgr = logging.getLogger(__name__)
+total_files = 0
 
-def group_dicoms_into_seqinfos(files, file_filter, dcmfilter, grouping):
+
+def create_seqinfo(mw, series_files, series_id):
+    """Generate sequence info
+
+    Parameters
+    ----------
+    mw: MosaicWrapper
+    series_files: list
+    series_id: str
+    """
+    dcminfo = mw.dcm_data
+    accession_number = dcminfo.get('AccessionNumber')
+
+    # TODO: do not group echoes by default
+    size = list(mw.image_shape) + [len(series_files)]
+    if len(size) < 4:
+        size.append(1)
+
+    # parse DICOM for seqinfo fields
+    TR = get_typed_attr(dcminfo, "RepetitionTime", float, -1000) / 1000
+    TE = get_typed_attr(dcminfo, "EchoTime", float, -1)
+    refphys = get_typed_attr(dcminfo, "ReferringPhysicianName", str, "")
+    image_type = get_typed_attr(dcminfo, "ImageType", tuple, ())
+    is_moco = 'MOCO' in image_type
+    series_desc = get_typed_attr(dcminfo, "SeriesDescription", str, "")
+
+    if dcminfo.get([0x18, 0x24]):
+        # GE and Philips
+        sequence_name = dcminfo[0x18, 0x24].value
+    elif dcminfo.get([0x19, 0x109c]):
+        # Siemens
+        sequence_name = dcminfo[0x19, 0x109c].value
+    else:
+        sequence_name = ""
+
+    # initialized in `group_dicoms_to_seqinfos`
+    global total_files
+    total_files += len(series_files)
+
+    seqinfo = SeqInfo(
+        total_files_till_now=total_files,
+        example_dcm_file=op.basename(series_files[0]),
+        series_id=series_id,
+        dcm_dir_name=op.basename(op.dirname(series_files[0])),
+        series_files=len(series_files),
+        unspecified="",
+        dim1=size[0],
+        dim2=size[1],
+        dim3=size[2],
+        dim4=size[3],
+        TR=TR,
+        TE=TE,
+        protocol_name=dcminfo.ProtocolName,
+        is_motion_corrected=is_moco,
+        is_derived='derived' in [x.lower() for x in image_type],
+        patient_id=dcminfo.get('PatientID'),
+        study_description=dcminfo.get('StudyDescription'),
+        referring_physician_name=refphys,
+        series_description=series_desc,
+        sequence_name=sequence_name,
+        image_type=image_type,
+        accession_number=accession_number,
+        # For demographics to populate BIDS participants.tsv
+        patient_age=dcminfo.get('PatientAge'),
+        patient_sex=dcminfo.get('PatientSex'),
+        date=dcminfo.get('AcquisitionDate'),
+        series_uid=dcminfo.get('SeriesInstanceUID')
+    )
+    return seqinfo
+
+
+def validate_dicom(fl, dcmfilter):
+    """
+    Parse DICOM attributes. Returns None if not valid.
+    """
+    mw = dw.wrapper_from_file(fl, force=True, stop_before_pixels=True)
+    # clean series signature
+    for sig in ('iop', 'ICE_Dims', 'SequenceName'):
+        try:
+            del mw.series_signature[sig]
+        except KeyError:
+            pass
+    # Workaround for protocol name in private siemens csa header
+    if not getattr(mw.dcm_data, 'ProtocolName', '').strip():
+        mw.dcm_data.ProtocolName = parse_private_csa_header(
+            mw.dcm_data, 'ProtocolName', 'tProtocolName'
+        ) if mw.is_csa else ''
+    try:
+        series_id = (
+            int(mw.dcm_data.SeriesNumber), mw.dcm_data.ProtocolName
+        )
+    except AttributeError as e:
+        lgr.warning(
+            'Ignoring %s since not quite a "normal" DICOM: %s', fl, e
+        )
+        return
+    if dcmfilter is not None and dcmfilter(mw.dcm_data):
+        lgr.warning("Ignoring %s because of DICOM filter", fl)
+        return
+    if mw.dcm_data[0x0008, 0x0016].repval in (
+        'Raw Data Storage',
+        'GrayscaleSoftcopyPresentationStateStorage'
+    ):
+        return
+    try:
+        file_studyUID = mw.dcm_data.StudyInstanceUID
+    except AttributeError:
+        lgr.info("File {} is missing any StudyInstanceUID".format(fl))
+        file_studyUID = None
+    return mw, series_id, file_studyUID
+
+
+def group_dicoms_into_seqinfos(files, grouping, file_filter=None,
+                               dcmfilter=None, flatten=False,
+                               custom_grouping=None):
     """Process list of dicoms and return seqinfo and file group
     `seqinfo` contains per-sequence extract of fields from DICOMs which
     will be later provided into heuristics to decide on filenames
+
     Parameters
     ----------
     files : list of str
       List of files to consider
+    grouping : {'studyUID', 'accession_number', 'all', 'custom'}
+      How to group DICOMs for conversion. If 'custom', see `custom_grouping`
+      parameter.
     file_filter : callable, optional
       Applied to each item of filenames. Should return True if file needs to be
       kept, False otherwise.
     dcmfilter : callable, optional
       If called on dcm_data and returns True, it is used to set series_id
-    grouping : {'studyUID', 'accession_number', None}, optional
-        what to group by: studyUID or accession_number
+    flatten : bool, optional
+      Creates a flattened `seqinfo` with corresponding DICOM files. True when
+      invoked with `dicom_dir_template`.
+    custom_grouping: str or callable, optional
+     grouping key defined within heuristic. Can be a string of a
+     DICOM attribute, or a method that handles more complex groupings.
+
+
     Returns
     -------
     seqinfo : list of list
@@ -33,101 +170,74 @@ def group_dicoms_into_seqinfos(files, file_filter, dcmfilter, grouping):
     filegrp : dict
       `filegrp` is a dictionary with files groupped per each sequence
     """
-    allowed_groupings = ['studyUID', 'accession_number', None]
+    allowed_groupings = ['studyUID', 'accession_number', 'all', 'custom']
     if grouping not in allowed_groupings:
         raise ValueError('I do not know how to group by {0}'.format(grouping))
     per_studyUID = grouping == 'studyUID'
-    per_accession_number = grouping == 'accession_number'
+    # per_accession_number = grouping == 'accession_number'
     lgr.info("Analyzing %d dicoms", len(files))
 
     groups = [[], []]
     mwgroup = []
-
     studyUID = None
-    # for sanity check that all DICOMs came from the same
-    # "study".  If not -- what is the use-case? (interrupted acquisition?)
-    # and how would then we deal with series numbers
-    # which would differ already
+
     if file_filter:
         nfl_before = len(files)
         files = list(filter(file_filter, files))
         nfl_after = len(files)
         lgr.info('Filtering out {0} dicoms based on their filename'.format(
             nfl_before-nfl_after))
-    for fidx, filename in enumerate(files):
-        import nibabel.nicom.dicomwrappers as dw
-        # TODO after getting a regression test check if the same behavior
-        #      with stop_before_pixels=True
-        mw = dw.wrapper_from_data(dcm.read_file(filename, force=True))
-
-        for sig in ('iop', 'ICE_Dims', 'SequenceName'):
-            try:
-                del mw.series_signature[sig]
-            except:
-                pass
-
-        try:
-            file_studyUID = mw.dcm_data.StudyInstanceUID
-        except AttributeError:
-            lgr.info("File {} is missing any StudyInstanceUID".format(filename))
-            file_studyUID = None
-
-        # Workaround for protocol name in private siemens csa header
-        try:
-            mw.dcm_data.ProtocolName
-        except AttributeError:
-            if not getattr(mw.dcm_data, 'ProtocolName', '').strip():
-                mw.dcm_data.ProtocolName = parse_private_csa_header(
-                    mw.dcm_data, 'ProtocolName', 'tProtocolName'
-                    ) if mw.is_csa else ''
 
-        try:
-            series_id = (int(mw.dcm_data.SeriesNumber),
-                         mw.dcm_data.ProtocolName)
-            file_studyUID = mw.dcm_data.StudyInstanceUID
+    if grouping == 'custom':
+        if custom_grouping is None:
+            raise RuntimeError("Custom grouping is not defined in heuristic")
+        if callable(custom_grouping):
+            return custom_grouping(files, dcmfilter, SeqInfo)
+        grouping = custom_grouping
+        study_customgroup = None
+
+    removeidx = []
+    for idx, filename in enumerate(files):
+        mwinfo = validate_dicom(filename, dcmfilter)
+        if mwinfo is None:
+            removeidx.append(idx)
+            continue
+        mw, series_id, file_studyUID = mwinfo
+        if per_studyUID:
+            series_id = series_id + (file_studyUID,)
 
-            if not per_studyUID:
-                # verify that we are working with a single study
+        if flatten:
+            if per_studyUID:
                 if studyUID is None:
                     studyUID = file_studyUID
-                elif not per_accession_number:
-                    assert studyUID == file_studyUID, (
-                    "Conflicting study identifiers found [{}, {}].".format(
-                    studyUID, file_studyUID
-                    ))
-        except AttributeError as exc:
-            lgr.warning('Ignoring %s since not quite a "normal" DICOM: %s',
-                        filename, exc)
-            series_id = (-1, 'none')
-            file_studyUID = None
-
-        if not series_id[0] < 0:
-            if dcmfilter is not None and dcmfilter(mw.dcm_data):
-                series_id = (-1, mw.dcm_data.ProtocolName)
-
-        # filter out unwanted non-image-data DICOMs by assigning
-        # a series number < 0 (see test below)
-        if not series_id[0] < 0 and mw.dcm_data[0x0008, 0x0016].repval in (
-                'Raw Data Storage',
-                'GrayscaleSoftcopyPresentationStateStorage'):
-            series_id = (-1, mw.dcm_data.ProtocolName)
-
-        if per_studyUID:
-            series_id = series_id + (file_studyUID,)
+                assert studyUID == file_studyUID, (
+                    "Conflicting study identifiers found [{}, {}]."
+                    .format(studyUID, file_studyUID)
+                )
+            elif custom_grouping:
+                file_customgroup = mw.dcm_data.get(grouping)
+                if study_customgroup is None:
+                    study_customgroup = file_customgroup
+                assert study_customgroup == file_customgroup, (
+                    "Conflicting {0} found: [{1}, {2}]"
+                    .format(grouping, study_customgroup, file_customgroup)
+                )
 
         ingrp = False
+        # check if same series was already converted
         for idx in range(len(mwgroup)):
-            # same = mw.is_same_series(mwgroup[idx])
             if mw.is_same_series(mwgroup[idx]):
-                # the same series should have the same study uuid
-                assert (mwgroup[idx].dcm_data.get('StudyInstanceUID', None)
-                        == file_studyUID)
+                if grouping != 'all':
+                    assert (
+                        mwgroup[idx].dcm_data.get('StudyInstanceUID') == file_studyUID
+                    ), "Same series found for multiple different studies"
                 ingrp = True
-                if series_id[0] >= 0:
-                    series_id = (mwgroup[idx].dcm_data.SeriesNumber,
-                                 mwgroup[idx].dcm_data.ProtocolName)
-                    if per_studyUID:
-                        series_id = series_id + (file_studyUID,)
+                series_id = (
+                    mwgroup[idx].dcm_data.SeriesNumber,
+                    mwgroup[idx].dcm_data.ProtocolName
+                )
+                if per_studyUID:
+                    series_id = series_id + (file_studyUID,)
                 groups[0].append(series_id)
                 groups[1].append(idx)
 
@@ -138,135 +248,64 @@ def group_dicoms_into_seqinfos(files, file_filter, dcmfilter, grouping):
 
     group_map = dict(zip(groups[0], groups[1]))
 
-    total = 0
-    seqinfo = OrderedDict()
+    if removeidx:
+        # remove non DICOMS from files
+        for idx in sorted(removeidx, reverse=True):
+            del files[idx]
 
+    seqinfos = OrderedDict()
     # for the next line to make any sense the series_id needs to
     # be sortable in a way that preserves the series order
     for series_id, mwidx in sorted(group_map.items()):
-        if series_id[0] < 0:
-            # skip our fake series with unwanted files
-            continue
         mw = mwgroup[mwidx]
-        if mw.image_shape is None:
-            # this whole thing has now image data (maybe just PSg DICOMs)
-            # nothing to see here, just move on
-            continue
-        dcminfo = mw.dcm_data
-        series_files = [files[i] for i, s in enumerate(groups[0])
-                        if s == series_id]
-        # turn the series_id into a human-readable string -- string is needed
-        # for JSON storage later on
+        series_files = [files[i] for i, s in enumerate(groups[0]) if s == series_id]
         if per_studyUID:
             studyUID = series_id[2]
             series_id = series_id[:2]
-        accession_number = dcminfo.get('AccessionNumber')
-
         series_id = '-'.join(map(str, series_id))
+        if mw.image_shape is None:
+            # this whole thing has no image data (maybe just PSg DICOMs)
+            # nothing to see here, just move on
+            continue
+        seqinfo = create_seqinfo(mw, series_files, series_id)
 
-        size = list(mw.image_shape) + [len(series_files)]
-        total += size[-1]
-        if len(size) < 4:
-            size.append(1)
-
-        # MG - refactor into util function
-        try:
-            TR = float(dcminfo.RepetitionTime) / 1000.
-        except (AttributeError, ValueError):
-            TR = -1
-        try:
-            TE = float(dcminfo.EchoTime)
-        except (AttributeError, ValueError):
-            TE = -1
-        try:
-            refphys = str(dcminfo.ReferringPhysicianName)
-        except AttributeError:
-            refphys = ''
-        try:
-            image_type = tuple(dcminfo.ImageType)
-        except AttributeError:
-            image_type = ''
-        try:
-            series_desc = dcminfo.SeriesDescription
-        except AttributeError:
-            series_desc = ''
-
-        motion_corrected = 'MOCO' in image_type
-
-        if dcminfo.get([0x18,0x24], None):
-            # GE and Philips scanners
-            sequence_name = dcminfo[0x18,0x24].value
-        elif dcminfo.get([0x19, 0x109c], None):
-            # Siemens scanners
-            sequence_name = dcminfo[0x19, 0x109c].value
-        else:
-            sequence_name = 'Not found'
-
-        info = SeqInfo(
-            total,
-            op.split(series_files[0])[1],
-            series_id,
-            op.basename(op.dirname(series_files[0])),
-            '-', '-',
-            size[0], size[1], size[2], size[3],
-            TR, TE,
-            dcminfo.ProtocolName,
-            motion_corrected,
-            'derived' in [x.lower() for x in dcminfo.get('ImageType', [])],
-            dcminfo.get('PatientID'),
-            dcminfo.get('StudyDescription'),
-            refphys,
-            series_desc,  # We try to set this further up.
-            sequence_name,
-            image_type,
-            accession_number,
-            # For demographics to populate BIDS participants.tsv
-            dcminfo.get('PatientAge'),
-            dcminfo.get('PatientSex'),
-            dcminfo.get('AcquisitionDate'),
-            dcminfo.get('SeriesInstanceUID')
-        )
-        # candidates
-        # dcminfo.AccessionNumber
-        #   len(dcminfo.ReferencedImageSequence)
-        #   len(dcminfo.SourceImageSequence)
-        # FOR demographics
         if per_studyUID:
-            key = studyUID.split('.')[-1]
-        elif per_accession_number:
-            key = accession_number
+            key = studyUID
+        elif grouping == 'accession_number':
+            key = mw.dcm_data.get("AccessionNumber")
+        elif grouping == 'all':
+            key = 'all'
+        elif custom_grouping:
+            key = mw.dcm_data.get(custom_grouping)
         else:
             key = ''
         lgr.debug("%30s %30s %27s %27s %5s nref=%-2d nsrc=%-2d %s" % (
             key,
-            info.series_id,
-            series_desc,
-            dcminfo.ProtocolName,
-            info.is_derived,
-            len(dcminfo.get('ReferencedImageSequence', '')),
-            len(dcminfo.get('SourceImageSequence', '')),
-            info.image_type
+            seqinfo.series_id,
+            seqinfo.series_description,
+            mw.dcm_data.ProtocolName,
+            seqinfo.is_derived,
+            len(mw.dcm_data.get('ReferencedImageSequence', '')),
+            len(mw.dcm_data.get('SourceImageSequence', '')),
+            seqinfo.image_type
         ))
-        if per_studyUID:
-            if studyUID not in seqinfo:
-                seqinfo[studyUID] = OrderedDict()
-            seqinfo[studyUID][info] = series_files
-        elif per_accession_number:
-            if accession_number not in seqinfo:
-                seqinfo[accession_number] = OrderedDict()
-            seqinfo[accession_number][info] = series_files
+
+        if not flatten:
+            if key not in seqinfos:
+                seqinfos[key] = OrderedDict()
+            seqinfos[key][seqinfo] = series_files
         else:
-            seqinfo[info] = series_files
+            seqinfos[seqinfo] = series_files
 
     if per_studyUID:
         lgr.info("Generated sequence info for %d studies with %d entries total",
-                 len(seqinfo), sum(map(len, seqinfo.values())))
-    elif per_accession_number:
+                 len(seqinfos), sum(map(len, seqinfos.values())))
+    elif grouping == 'accession_number':
         lgr.info("Generated sequence info for %d accession numbers with %d "
-                 "entries total", len(seqinfo), sum(map(len, seqinfo.values())))
+                 "entries total", len(seqinfos), sum(map(len, seqinfos.values())))
     else:
-        lgr.info("Generated sequence info with %d entries", len(seqinfo))
-    return seqinfo
+        lgr.info("Generated sequence info with %d entries", len(seqinfos))
+    return seqinfos
 
 
 def get_dicom_series_time(dicom_list):
@@ -353,14 +392,10 @@ def compress_dicoms(dicom_list, out_prefix, tempdirs, overwrite):
     return outtar
 
 
-def embed_nifti(dcmfiles, niftifile, infofile, bids_info, min_meta):
-    """
-
-    If `niftifile` doesn't exist, it gets created out of the `dcmfiles` stack,
-    and json representation of its meta_ext is returned (bug since should return
-    both niftifile and infofile?)
+def embed_dicom_and_nifti_metadata(dcmfiles, niftifile, infofile, bids_info):
+    """Embed metadata from nifti (affine etc) and dicoms into infofile (json)
 
-    if `niftifile` exists, its affine's orientation information is used while
+    `niftifile` should exist. Its affine's orientation information is used while
     establishing new `NiftiImage` out of dicom stack and together with `bids_info`
     (if provided) is dumped into json `infofile`
 
@@ -369,69 +404,52 @@ def embed_nifti(dcmfiles, niftifile, infofile, bids_info, min_meta):
     dcmfiles
     niftifile
     infofile
-    bids_info
-    min_meta
-
-    Returns
-    -------
-    niftifile, infofile
+    bids_info: dict
+      Additional metadata to be embedded. `infofile` is overwritten if exists,
+      so here you could pass some metadata which would overload (at the first
+      level of the dict structure, no recursive fancy updates) what is obtained
+      from nifti and dicoms
 
     """
     # imports for nipype
     import nibabel as nb
-    import os
     import os.path as op
     import json
     import re
+    from heudiconv.utils import save_json
+
+    from heudiconv.external.dcmstack import ds
+    stack = ds.parse_and_stack(dcmfiles, force=True).values()
+    if len(stack) > 1:
+        raise ValueError('Found multiple series')
+    # may be odict now - iter to be safe
+    stack = next(iter(stack))
+
+    if not op.exists(niftifile):
+        raise NotImplementedError(
+            "%s does not exist. "
+            "We are not producing new nifti files here any longer. "
+            "Use dcm2niix directly or .convert.nipype_convert helper ."
+            % niftifile
+        )
 
-    if not min_meta:
-        from heudiconv.external.dcmstack import ds
-        stack = ds.parse_and_stack(dcmfiles, force=True).values()
-        if len(stack) > 1:
-            raise ValueError('Found multiple series')
-        # may be odict now - iter to be safe
-        stack = next(iter(stack))
-
-        #Create the nifti image using the data array
-        if not op.exists(niftifile):
-            nifti_image = stack.to_nifti(embed_meta=True)
-            nifti_image.to_filename(niftifile)
-            return ds.NiftiWrapper(nifti_image).meta_ext.to_json()
-
-        orig_nii = nb.load(niftifile)
-        aff = orig_nii.affine
-        ornt = nb.orientations.io_orientation(aff)
-        axcodes = nb.orientations.ornt2axcodes(ornt)
-        new_nii = stack.to_nifti(voxel_order=''.join(axcodes), embed_meta=True)
-        meta = ds.NiftiWrapper(new_nii).meta_ext.to_json()
-
-    meta_info = None if min_meta else json.loads(meta)
+    orig_nii = nb.load(niftifile)
+    aff = orig_nii.affine
+    ornt = nb.orientations.io_orientation(aff)
+    axcodes = nb.orientations.ornt2axcodes(ornt)
+    new_nii = stack.to_nifti(voxel_order=''.join(axcodes), embed_meta=True)
+    meta_info = ds.NiftiWrapper(new_nii).meta_ext.to_json()
+    meta_info = json.loads(meta_info)
 
     if bids_info:
+        meta_info.update(bids_info)
 
-        if min_meta:
-            meta_info = bids_info
-        else:
-            # make nice with python 3 - same behavior?
-            meta_info = meta_info.copy()
-            meta_info.update(bids_info)
-            # meta_info = dict(meta_info.items() + bids_info.items())
-        try:
-            meta_info['TaskName'] = (re.search('(?<=_task-)\w+',
-                                               op.basename(infofile))
-                                     .group(0).split('_')[0])
-        except AttributeError:
-            pass
     # write to outfile
-    with open(infofile, 'wt') as fp:
-        json.dump(meta_info, fp, indent=3, sort_keys=True)
-
-    return niftifile, infofile
+    save_json(infofile, meta_info)
 
 
 def embed_metadata_from_dicoms(bids_options, item_dicoms, outname, outname_bids,
-                               prov_file, scaninfo, tempdirs, with_prov,
-                               min_meta):
+                               prov_file, scaninfo, tempdirs, with_prov):
     """
     Enhance sidecar information file with more information from DICOMs
 
@@ -445,7 +463,6 @@ def embed_metadata_from_dicoms(bids_options, item_dicoms, outname, outname_bids,
     scaninfo
     tempdirs
     with_prov
-    min_meta
 
     Returns
     -------
@@ -458,14 +475,13 @@ def embed_metadata_from_dicoms(bids_options, item_dicoms, outname, outname_bids,
     item_dicoms = list(map(op.abspath, item_dicoms))
 
     embedfunc = Node(Function(input_names=['dcmfiles', 'niftifile', 'infofile',
-                                           'bids_info', 'min_meta'],
+                                           'bids_info',],
                               output_names=['outfile', 'meta'],
-                              function=embed_nifti),
+                              function=embed_dicom_and_nifti_metadata),
                      name='embedder')
     embedfunc.inputs.dcmfiles = item_dicoms
     embedfunc.inputs.niftifile = op.abspath(outname)
     embedfunc.inputs.infofile = op.abspath(scaninfo)
-    embedfunc.inputs.min_meta = min_meta
     embedfunc.inputs.bids_info = load_json(op.abspath(outname_bids)) if (bids_options is not None) else None
     embedfunc.base_dir = tmpdir
     cwd = os.getcwd()
@@ -520,5 +536,5 @@ def parse_private_csa_header(dcm_data, public_attr, private_attr, default=None):
         val = parsedhdr[private_attr].replace(' ', '')
     except Exception as e:
         lgr.debug("Failed to parse CSA header: %s", str(e))
-        val = default if default else ''
+        val = default or ""
     return val


=====================================
heudiconv/external/dlad.py
=====================================
@@ -10,7 +10,7 @@ from ..utils import create_file_if_missing
 
 lgr = logging.getLogger(__name__)
 
-MIN_VERSION = '0.7'
+MIN_VERSION = '0.12.4'
 
 
 def prepare_datalad(studydir, outdir, sid, session, seqinfo, dicoms, bids):
@@ -34,23 +34,20 @@ def prepare_datalad(studydir, outdir, sid, session, seqinfo, dicoms, bids):
 def add_to_datalad(topdir, studydir, msg, bids):
     """Do all necessary preparations (if were not done before) and save
     """
-    from datalad.api import create
+    import datalad.api as dl
     from datalad.api import Dataset
     from datalad.support.annexrepo import AnnexRepo
     from datalad.support.external_versions import external_versions
     assert external_versions['datalad'] >= MIN_VERSION, (
-      "Need datalad >= {}".format(MIN_VERSION))  # add to reqs
+        "Need datalad >= {}".format(MIN_VERSION))  # add to reqs
 
-    create_kwargs = {}
-    if external_versions['datalad'] >= '0.10':
-        create_kwargs['fake_dates'] = True  # fake dates by default
 
     studyrelpath = op.relpath(studydir, topdir)
     assert not studyrelpath.startswith(op.pardir)  # so we are under
     # now we need to test and initiate a DataLad dataset all along the path
     curdir_ = topdir
     superds = None
-    subdirs = [''] + studyrelpath.split(op.sep)
+    subdirs = [''] + [d for d in studyrelpath.split(op.sep) if d != os.curdir]
     for isubdir, subdir in enumerate(subdirs):
         curdir_ = op.join(curdir_, subdir)
         ds = Dataset(curdir_)
@@ -58,12 +55,12 @@ def add_to_datalad(topdir, studydir, msg, bids):
             lgr.info("Initiating %s", ds)
             # would require annex > 20161018 for correct operation on annex v6
             # need to add .gitattributes first anyways
-            ds_ = create(curdir_, dataset=superds,
+            ds_ = dl.create(curdir_, dataset=superds,
                          force=True,
-                         no_annex=True,
+                         # initiate annex only at the bottom repository
+                         no_annex=isubdir<(len(subdirs)-1),
+                         fake_dates=True,
                          # shared_access='all',
-                         annex_version=6,
-                         **create_kwargs
                          )
             assert ds == ds_
         assert ds.is_installed()
@@ -93,17 +90,13 @@ def add_to_datalad(topdir, studydir, msg, bids):
     with open(gitattributes_path, 'wb') as f:
         f.write('\n'.join(known_attrs).encode('utf-8'))
 
-    # so for mortals it just looks like a regular directory!
-    if not ds.config.get('annex.thin'):
-        ds.config.add('annex.thin', 'true', where='local')
-    # initialize annex there if not yet initialized
-    AnnexRepo(ds.path, init=True)
+
     # ds might have memories of having ds.repo GitRepo
-    superds = None
-    del ds
-    ds = Dataset(studydir)
+    superds = Dataset(topdir)
+    assert op.realpath(ds.path) == op.realpath(studydir)
+    assert isinstance(ds.repo, AnnexRepo)
     # Add doesn't have all the options of save such as msg and supers
-    ds.add('.gitattributes', to_git=True, save=False)
+    ds.save(path=['.gitattributes'], message="Custom .gitattributes", to_git=True)
     dsh = dsh_path = None
     if op.lexists(op.join(ds.path, '.heudiconv')):
         dsh_path = op.join(ds.path, '.heudiconv')
@@ -120,7 +113,6 @@ def add_to_datalad(topdir, studydir, msg, bids):
             else:
                 dsh = ds.create(path='.heudiconv',
                                 force=True,
-                                **create_kwargs
                                 # shared_access='all'
                                 )
         # Since .heudiconv could contain sensitive information
@@ -146,7 +138,7 @@ def add_to_datalad(topdir, studydir, msg, bids):
     mark_sensitive(ds, '*/*/anat')  # within ses/subj
     if dsh_path:
         mark_sensitive(ds, '.heudiconv')  # entire .heudiconv!
-    ds.save(message=msg, recursive=True, super_datasets=True)
+    superds.save(path=ds.path, message=msg, recursive=True)
 
     assert not ds.repo.dirty
     # TODO:  they are still appearing as native annex symlinked beasts
@@ -185,4 +177,4 @@ def mark_sensitive(ds, path_glob):
         init=dict([('distribution-restrictions', 'sensitive')]),
         recursive=True)
     if inspect.isgenerator(res):
-        res = list(res)
\ No newline at end of file
+        res = list(res)


=====================================
heudiconv/external/tests/test_dlad.py
=====================================
@@ -1,10 +1,13 @@
 from ..dlad import mark_sensitive
-from datalad.api import Dataset
 from ...utils import create_tree
 
+import pytest
+
+dl = pytest.importorskip('datalad.api')
+
 
 def test_mark_sensitive(tmpdir):
-    ds = Dataset(str(tmpdir)).create(force=True)
+    ds = dl.Dataset(str(tmpdir)).create(force=True)
     create_tree(
         str(tmpdir),
         {


=====================================
heudiconv/heuristics/reproin.py
=====================================
@@ -126,6 +126,10 @@ from glob import glob
 import logging
 lgr = logging.getLogger('heudiconv')
 
+# pythons before 3.7 didn't have re.Pattern, it was some protected
+# _sre.SRE_Pattern, so let's just sample a class of the compiled regex
+re_Pattern = re.compile('.').__class__
+
 # Terminology to harmonise and use to name variables etc
 # experiment
 #  subject
@@ -372,14 +376,14 @@ def get_study_hash(seqinfo):
     return md5sum(get_study_description(seqinfo))
 
 
-def fix_canceled_runs(seqinfo, accession2run=fix_accession2run):
+def fix_canceled_runs(seqinfo):
     """Function that adds cancelme_ to known bad runs which were forgotten
     """
     accession_number = get_unique(seqinfo, 'accession_number')
-    if accession_number in accession2run:
+    if accession_number in fix_accession2run:
         lgr.info("Considering some runs possibly marked to be "
                  "canceled for accession %s", accession_number)
-        badruns = accession2run[accession_number]
+        badruns = fix_accession2run[accession_number]
         badruns_pattern = '|'.join(badruns)
         for i, s in enumerate(seqinfo):
             if re.match(badruns_pattern, s.series_id):
@@ -391,39 +395,65 @@ def fix_canceled_runs(seqinfo, accession2run=fix_accession2run):
     return seqinfo
 
 
-def fix_dbic_protocol(seqinfo, keys=series_spec_fields, subsdict=protocols2fix):
-    """Ad-hoc fixup for existing protocols
+def fix_dbic_protocol(seqinfo):
+    """Ad-hoc fixup for existing protocols.
+
+    It will operate in 3 stages on `protocols2fix` records.
+    1. consider a record which has md5sum of study_description
+    2. apply all substitutions, where key is a regular expression which
+       successfully searches (not necessarily matches, so anchor appropriately)
+       study_description
+    3. apply "catch all" substitutions in the key containing an empty string
+
+    3. is somewhat redundant since `re.compile('.*')` could match any, but is
+    kept for simplicity of its specification.
     """
+
     study_hash = get_study_hash(seqinfo)
+    study_description = get_study_description(seqinfo)
 
-    if study_hash not in subsdict:
-        raise ValueError("I don't know how to fix {0}".format(study_hash))
+    # We will consider first study specific (based on hash)
+    if study_hash in protocols2fix:
+        _apply_substitutions(seqinfo,
+                             protocols2fix[study_hash],
+                             'study (%s) specific' % study_hash)
+    # Then go through all regexps returning regex "search" result
+    # on study_description
+    for sub, substitutions in protocols2fix.items():
+        if isinstance(sub, re_Pattern) and sub.search(study_description):
+            _apply_substitutions(seqinfo,
+                                 substitutions,
+                                 '%r regex matching' % sub.pattern)
+    # and at the end - global
+    if '' in protocols2fix:
+        _apply_substitutions(seqinfo, protocols2fix[''], 'global')
 
-    # need to replace both protocol_name series_description
-    substitutions = subsdict[study_hash]
+    return seqinfo
+
+
+def _apply_substitutions(seqinfo, substitutions, subs_scope):
+    lgr.info("Considering %s substitutions", subs_scope)
     for i, s in enumerate(seqinfo):
         fixed_kwargs = dict()
-        for key in keys:
-            value = getattr(s, key)
+        # need to replace both protocol_name series_description
+        for key in series_spec_fields:
+            oldvalue = value = getattr(s, key)
             # replace all I need to replace
             for substring, replacement in substitutions:
                 value = re.sub(substring, replacement, value)
+            if oldvalue != value:
+                lgr.info(" %s: %r -> %r", key, oldvalue, value)
             fixed_kwargs[key] = value
         # namedtuples are immutable
         seqinfo[i] = s._replace(**fixed_kwargs)
 
-    return seqinfo
-
 
 def fix_seqinfo(seqinfo):
     """Just a helper on top of both fixers
     """
     # add cancelme to known bad runs
     seqinfo = fix_canceled_runs(seqinfo)
-    study_hash = get_study_hash(seqinfo)
-    if study_hash in protocols2fix:
-        lgr.info("Fixing up protocol for {0}".format(study_hash))
-        seqinfo = fix_dbic_protocol(seqinfo)
+    seqinfo = fix_dbic_protocol(seqinfo)
     return seqinfo
 
 
@@ -484,10 +514,10 @@ def infotodict(seqinfo):
             # 3 - Image IOD specific specialization (optional)
             dcm_image_iod_spec = s.image_type[2]
             image_type_seqtype = {
-                'P': 'fmap',   # phase
+                # Note: P and M are too generic to make a decision here, could be
+                #  for different seqtypes (bold, fmap, etc)
                 'FMRI': 'func',
                 'MPR': 'anat',
-                # 'M': 'func',  "magnitude"  -- can be for scout, anat, bold, fmap
                 'DIFFUSION': 'dwi',
                 'MIP_SAG': 'anat',  # angiography
                 'MIP_COR': 'anat',  # angiography
@@ -540,29 +570,55 @@ def infotodict(seqinfo):
         #     prefix = ''
         prefix = ''
 
+        #
+        # Figure out the seqtype_label (BIDS _suffix)
+        #
+        # If none was provided -- let's deduce it from the information we find:
         # analyze s.protocol_name (series_id is based on it) for full name mapping etc
-        if seqtype == 'func' and not seqtype_label:
-            if '_pace_' in series_spec:
-                seqtype_label = 'pace'  # or should it be part of seq-
-            else:
-                # assume bold by default
-                seqtype_label = 'bold'
-
-        if seqtype == 'fmap' and not seqtype_label:
-            if not dcm_image_iod_spec:
-                raise ValueError("Do not know image data type yet to make decision")
-            seqtype_label = {
-                # might want explicit {file_index}  ?
-                # _epi for pepolar fieldmaps, see
-                # https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/01-magnetic-resonance-imaging-data.html#case-4-multiple-phase-encoded-directions-pepolar
-                'M': 'epi' if 'dir' in series_info else 'magnitude',
-                'P': 'phasediff',
-                'DIFFUSION': 'epi',  # according to KODI those DWI are the EPIs we need
-            }[dcm_image_iod_spec]
-
-        # label for dwi as well
-        if seqtype == 'dwi' and not seqtype_label:
-            seqtype_label = 'dwi'
+        if not seqtype_label:
+            if seqtype == 'func':
+                if '_pace_' in series_spec:
+                    seqtype_label = 'pace'  # or should it be part of seq-
+                elif 'P' in s.image_type:
+                    seqtype_label = 'phase'
+                elif 'M' in s.image_type:
+                    seqtype_label = 'bold'
+                else:
+                    # assume bold by default
+                    seqtype_label = 'bold'
+            elif seqtype == 'fmap':
+                # TODO: support phase1 phase2 like in "Case 2: Two phase images ..."
+                if not dcm_image_iod_spec:
+                    raise ValueError("Do not know image data type yet to make decision")
+                seqtype_label = {
+                    # might want explicit {file_index}  ?
+                    # _epi for pepolar fieldmaps, see
+                    # https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/01-magnetic-resonance-imaging-data.html#case-4-multiple-phase-encoded-directions-pepolar
+                    'M': 'epi' if 'dir' in series_info else 'magnitude',
+                    'P': 'phasediff',
+                    'DIFFUSION': 'epi',  # according to KODI those DWI are the EPIs we need
+                }[dcm_image_iod_spec]
+            elif seqtype == 'dwi':
+                # label for dwi as well
+                seqtype_label = 'dwi'
+
+        #
+        # Even if seqtype_label was provided, for some data we might need to override,
+        # since they are complementary files produced along-side with original
+        # ones.
+        #
+        if s.series_description.endswith('_SBRef'):
+            seqtype_label = 'sbref'
+
+        if not seqtype_label:
+            # Might be provided by the bids ending within series_spec, we would
+            # just want to check if that the last element is not _key-value pair
+            bids_ending = series_info.get('bids', None)
+            if not bids_ending \
+                    or "-" in bids_ending.split('_')[-1]:
+                lgr.warning(
+                    "We ended up with an empty label/suffix for %r",
+                    series_spec)
 
         run = series_info.get('run')
         if run is not None:
@@ -741,6 +797,16 @@ def get_unique(seqinfos, attr):
 # hits, or may be we could just somehow demarkate that it will be multisession
 # one and so then later value parsed (again) in infotodict would be used???
 def infotoids(seqinfos, outdir):
+    # In python 3.7.5 we would obtain odict_keys() object which would be
+    # immutable, and we would not be able to perform any substitutions if
+    # needed.  So let's make it into a regular list
+    if isinstance(seqinfos, dict) or hasattr(seqinfos, 'keys'):
+        # just some checks for a paranoid Yarik
+        raise TypeError(
+            "Expected list-like structure here, not associative array. Got %s"
+            % type(seqinfos)
+        )
+    seqinfos = list(seqinfos)
     # decide on subjid and session based on patient_id
     lgr.info("Processing sequence infos to deduce study/session")
     study_description = get_study_description(seqinfos)


=====================================
heudiconv/heuristics/reproin_validator.cfg
=====================================
@@ -1,6 +1,7 @@
 {
     "ignore": [
-       "TOTAL_READOUT_TIME_NOT_DEFINED"
+       "TOTAL_READOUT_TIME_NOT_DEFINED",
+       "CUSTOM_COLUMN_WITHOUT_DESCRIPTION"
     ],
     "warn": [],
     "error": [],


=====================================
heudiconv/heuristics/test_reproin.py
=====================================
@@ -2,6 +2,10 @@
 # Tests for reproin.py
 #
 from collections import OrderedDict
+from mock import patch
+import re
+
+from . import reproin
 from .reproin import (
     filter_files,
     fix_canceled_runs,
@@ -78,7 +82,8 @@ def test_fix_canceled_runs():
         'accession1': ['^01-', '^03-']
     }
 
-    seqinfo_ = fix_canceled_runs(seqinfo, fake_accession2run)
+    with patch.object(reproin, 'fix_accession2run', fake_accession2run):
+        seqinfo_ = fix_canceled_runs(seqinfo)
 
     for i, s in enumerate(seqinfo_, 1):
         output = runname
@@ -106,16 +111,20 @@ def test_fix_dbic_protocol():
                        'nochangeplease',
                        'nochangeeither')
 
-
     seqinfos = [seq1, seq2]
-    keys = ['field1']
-    subsdict = {
+    protocols2fix = {
         md5sum('mystudy'):
-            [('scout_run\+', 'scout'),
+            [('scout_run\+', 'THESCOUT-runX'),
              ('run-life[0-9]', 'run+_task-life')],
+        re.compile('^my.*'):
+            [('THESCOUT-runX', 'THESCOUT')],
+        # rely on 'catch-all' to fix up above scout
+        '': [('THESCOUT', 'scout')]
     }
 
-    seqinfos_ = fix_dbic_protocol(seqinfos, keys=keys, subsdict=subsdict)
+    with patch.object(reproin, 'protocols2fix', protocols2fix), \
+            patch.object(reproin, 'series_spec_fields', ['field1']):
+        seqinfos_ = fix_dbic_protocol(seqinfos)
     assert(seqinfos[1] == seqinfos_[1])
     # field2 shouldn't have changed since I didn't pass it
     assert(seqinfos_[0] == FakeSeqInfo(accession_number,
@@ -124,8 +133,9 @@ def test_fix_dbic_protocol():
                                        seq1.field2))
 
     # change also field2 please
-    keys = ['field1', 'field2']
-    seqinfos_ = fix_dbic_protocol(seqinfos, keys=keys, subsdict=subsdict)
+    with patch.object(reproin, 'protocols2fix', protocols2fix), \
+            patch.object(reproin, 'series_spec_fields', ['field1', 'field2']):
+        seqinfos_ = fix_dbic_protocol(seqinfos)
     assert(seqinfos[1] == seqinfos_[1])
     # now everything should have changed
     assert(seqinfos_[0] == FakeSeqInfo(accession_number,


=====================================
heudiconv/info.py
=====================================
@@ -1,4 +1,4 @@
-__version__ = "0.6.0"
+__version__ = "0.8.0"
 __author__ = "HeuDiConv team and contributors"
 __url__ = "https://github.com/nipy/heudiconv"
 __packagename__ = 'heudiconv'
@@ -12,14 +12,13 @@ CLASSIFIERS = [
     'Environment :: Console',
     'Intended Audience :: Science/Research',
     'License :: OSI Approved :: Apache Software License',
-    'Programming Language :: Python :: 2.7',
     'Programming Language :: Python :: 3.5',
     'Programming Language :: Python :: 3.6',
     'Programming Language :: Python :: 3.7',
     'Topic :: Scientific/Engineering'
 ]
 
-PYTHON_REQUIRES = ">=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*"
+PYTHON_REQUIRES = ">=3.5"
 
 REQUIRES = [
     'nibabel',
@@ -27,7 +26,7 @@ REQUIRES = [
     'nipype >=1.0.0; python_version > "3.0"',
     'nipype >=1.0.0,!=1.2.1,!=1.2.2; python_version == "2.7"',
     'pathlib',
-    'dcmstack>=0.7',
+    'dcmstack>=0.8',
     'etelemetry',
     'filelock>=3.0.12',
 ]
@@ -43,7 +42,7 @@ TESTS_REQUIRES = [
 EXTRA_REQUIRES = {
     'tests': TESTS_REQUIRES,
     'extras': [],  # Requires patched version ATM ['dcmstack'],
-    'datalad': ['datalad']
+    'datalad': ['datalad >=0.12.3']
 }
 
 # Flatten the lists


=====================================
heudiconv/parser.py
=====================================
@@ -161,14 +161,16 @@ def get_study_sessions(dicom_dir_template, files_opt, heuristic, outdir,
             files_ += files_ex
 
         # sort all DICOMS using heuristic
-        # TODO:  this one is not grouping by StudyUID but may be we should!
-        seqinfo_dict = group_dicoms_into_seqinfos(files_,
+        seqinfo_dict = group_dicoms_into_seqinfos(
+            files_,
+            grouping,
             file_filter=getattr(heuristic, 'filter_files', None),
             dcmfilter=getattr(heuristic, 'filter_dicom', None),
-            grouping=grouping)
+            custom_grouping=getattr(heuristic, 'grouping', None)
+        )
 
         if sids:
-            if not (len(sids) == 1 and len(seqinfo_dict) == 1):
+            if len(sids) != 1:
                 raise RuntimeError(
                     "We were provided some subjects (%s) but "
                     "we can deal only "
@@ -208,17 +210,21 @@ def get_study_sessions(dicom_dir_template, files_opt, heuristic, outdir,
             # TODO:  probably infotoids is doomed to do more and possibly
             # split into multiple sessions!!!! but then it should be provided
             # full seqinfo with files which it would place into multiple groups
-            lgr.info("Study session for %s" % str(ids))
             study_session_info = StudySessionInfo(
                 ids.get('locator'),
                 ids.get('session', session) or session,
                 sid or ids.get('subject', None)
             )
+            lgr.info("Study session for %r", study_session_info)
+
             if study_session_info in study_sessions:
-                #raise ValueError(
-                lgr.warning(
-                    "We already have a study session with the same value %s"
-                    % repr(study_session_info))
-                continue # skip for now
+                if grouping != 'all':
+                    # MG - should this blow up to mimic -d invocation?
+                    lgr.warning(
+                        "Existing study session with the same values (%r)."
+                        " Skipping DICOMS %s",
+                        study_session_info, *seqinfo.values()
+                    )
+                    continue
             study_sessions[study_session_info] = seqinfo
     return study_sessions


=====================================
heudiconv/tests/data/phantom.dcm
=====================================
Binary files /dev/null and b/heudiconv/tests/data/phantom.dcm differ


=====================================
heudiconv/tests/test_dicoms.py
=====================================
@@ -5,8 +5,12 @@ import pytest
 
 from heudiconv.external.pydicom import dcm
 from heudiconv.cli.run import main as runner
-from heudiconv.dicoms import parse_private_csa_header, embed_nifti
-from .utils import TESTS_DATA_PATH
+from heudiconv.convert import nipype_convert
+from heudiconv.dicoms import parse_private_csa_header, embed_dicom_and_nifti_metadata
+from .utils import (
+    assert_cwd_unchanged,
+    TESTS_DATA_PATH,
+)
 
 # Public: Private DICOM tags
 DICOM_FIELDS_TO_TEST = {
@@ -15,7 +19,7 @@ DICOM_FIELDS_TO_TEST = {
 
 def test_private_csa_header(tmpdir):
     dcm_file = op.join(TESTS_DATA_PATH, 'axasc35.dcm')
-    dcm_data = dcm.read_file(dcm_file)
+    dcm_data = dcm.read_file(dcm_file, stop_before_pixels=True)
     for pub, priv in DICOM_FIELDS_TO_TEST.items():
         # ensure missing public tag
         with pytest.raises(AttributeError):
@@ -26,35 +30,37 @@ def test_private_csa_header(tmpdir):
         runner(['--files', dcm_file, '-c' 'none', '-f', 'reproin'])
 
 
-def test_nifti_embed(tmpdir):
+ at assert_cwd_unchanged(ok_to_chdir=True)  # so we cd back after tmpdir.chdir
+def test_embed_dicom_and_nifti_metadata(tmpdir):
     """Test dcmstack's additional fields"""
     tmpdir.chdir()
     # set up testing files
     dcmfiles = [op.join(TESTS_DATA_PATH, 'axasc35.dcm')]
     infofile = 'infofile.json'
 
-    # 1) nifti does not exist
-    out = embed_nifti(dcmfiles, 'nifti.nii', 'infofile.json', None, False)
-    # string -> json
-    out = json.loads(out)
-    # should have created nifti file
-    assert op.exists('nifti.nii')
+    out_prefix = str(tmpdir / "nifti")
+    # 1) nifti does not exist -- no longer supported
+    with pytest.raises(NotImplementedError):
+        embed_dicom_and_nifti_metadata(dcmfiles, out_prefix + '.nii.gz', infofile, None)
 
+    # we should produce nifti using our "standard" ways
+    nipype_out, prov_file = nipype_convert(
+        dcmfiles, prefix=out_prefix, with_prov=False,
+        bids_options=None, tmpdir=str(tmpdir))
+    niftifile = nipype_out.outputs.converted_files
+
+    assert op.exists(niftifile)
     # 2) nifti exists
-    nifti, info = embed_nifti(dcmfiles, 'nifti.nii', 'infofile.json', None, False)
-    assert op.exists(nifti)
-    assert op.exists(info)
-    with open(info) as fp:
+    embed_dicom_and_nifti_metadata(dcmfiles, niftifile, infofile, None)
+    assert op.exists(infofile)
+    with open(infofile) as fp:
         out2 = json.load(fp)
 
-    assert out == out2
-
     # 3) with existing metadata
     bids = {"existing": "data"}
-    nifti, info = embed_nifti(dcmfiles, 'nifti.nii', 'infofile.json', bids, False)
-    with open(info) as fp:
+    embed_dicom_and_nifti_metadata(dcmfiles, niftifile, infofile, bids)
+    with open(infofile) as fp:
         out3 = json.load(fp)
 
-    assert out3["existing"]
-    del out3["existing"]
-    assert out3 == out2 == out
+    assert out3.pop("existing") == "data"
+    assert out3 == out2


=====================================
heudiconv/tests/test_main.py
=====================================
@@ -1,6 +1,9 @@
 # TODO: break this up by modules
 
-from heudiconv.cli.run import main as runner
+from heudiconv.cli.run import (
+    main as runner,
+    process_args,
+)
 from heudiconv import __version__
 from heudiconv.utils import (create_file_if_missing,
                              set_readonly,
@@ -32,8 +35,7 @@ def test_main_help(stdout):
     assert stdout.getvalue().startswith("usage: ")
 
 
- at patch('sys.stderr' if sys.version_info[:2] <= (3, 3) else
-       'sys.stdout', new_callable=StringIO)
+ at patch('sys.stdout', new_callable=StringIO)
 def test_main_version(std):
     with pytest.raises(SystemExit):
         runner(['--version'])
@@ -63,6 +65,17 @@ def test_populate_bids_templates(tmpdir):
 
     # it should also be available as a command
     os.unlink(str(description_file))
+
+    # it must fail if no heuristic was provided
+    with pytest.raises(ValueError) as cme:
+        runner([
+            '--command', 'populate-templates',
+            '--files', str(tmpdir)
+        ])
+    assert str(cme.value).startswith("Specify heuristic using -f. Known are:")
+    assert "convertall," in str(cme.value)
+    assert not description_file.exists()
+
     runner([
         '--command', 'populate-templates', '-f', 'convertall',
         '--files', str(tmpdir)
@@ -271,3 +284,16 @@ def test_cache(tmpdir):
     assert (cachedir / 'dicominfo.tsv').exists()
     assert (cachedir / 'S01.auto.txt').exists()
     assert (cachedir / 'S01.edit.txt').exists()
+
+
+def test_no_etelemetry():
+    # smoke test at large - just verifying that no crash if no etelemetry
+    class args:
+        outdir = '/dev/null'
+        command = 'ls'
+        heuristic = 'reproin'
+        files = []  # Nothing to list
+
+    # must not fail if etelemetry no found
+    with patch.dict('sys.modules', {'etelemetry': None}):
+        process_args(args)


=====================================
heudiconv/tests/test_regression.py
=====================================
@@ -1,27 +1,27 @@
 """Testing conversion with conversion saved on datalad"""
-import json
 from glob import glob
+import os
 import os.path as op
 
 import pytest
 
+from heudiconv.cli.run import main as runner
+from heudiconv.external.pydicom import dcm
+from heudiconv.utils import load_json
+# testing utilities
+from .utils import fetch_data, gen_heudiconv_args, TESTS_DATA_PATH
+
 have_datalad = True
 try:
-    from datalad import api # to pull and grab data
     from datalad.support.exceptions import IncompleteResultsError
 except ImportError:
     have_datalad = False
 
-from heudiconv.cli.run import main as runner
-from heudiconv.utils import load_json
-# testing utilities
-from .utils import fetch_data, gen_heudiconv_args
-
 
+ at pytest.mark.skipif(not have_datalad, reason="no datalad")
 @pytest.mark.parametrize('subject', ['sub-sid000143'])
 @pytest.mark.parametrize('heuristic', ['reproin.py'])
 @pytest.mark.parametrize('anon_cmd', [None, 'anonymize_script.py'])
- at pytest.mark.skipif(not have_datalad, reason="no datalad")
 def test_conversion(tmpdir, subject, heuristic, anon_cmd):
     tmpdir.chdir()
     try:
@@ -32,17 +32,17 @@ def test_conversion(tmpdir, subject, heuristic, anon_cmd):
         pytest.skip("Failed to fetch test data: %s" % str(exc))
     outdir = tmpdir.mkdir('out').strpath
 
-    args = gen_heudiconv_args(datadir,
-                              outdir,
-                              subject,
-                              heuristic,
-                              anon_cmd,
-                              template=op.join('sourcedata/{subject}/*/*/*.tgz'))
-    runner(args) # run conversion
+    args = gen_heudiconv_args(
+        datadir, outdir, subject, heuristic, anon_cmd,
+        template=op.join('sourcedata/{subject}/*/*/*.tgz')
+    )
+    runner(args)  # run conversion
 
     # verify functionals were converted
-    assert glob('{}/{}/func/*'.format(outdir, subject)) == \
-           glob('{}/{}/func/*'.format(datadir, subject))
+    assert (
+        glob('{}/{}/func/*'.format(outdir, subject)) ==
+        glob('{}/{}/func/*'.format(datadir, subject))
+    )
 
     # compare some json metadata
     json_ = '{}/task-rest_acq-24mm64sl1000tr32te600dyn_bold.json'.format
@@ -52,6 +52,7 @@ def test_conversion(tmpdir, subject, heuristic, anon_cmd):
     for key in keys:
         assert orig[key] == conv[key]
 
+
 @pytest.mark.skipif(not have_datalad, reason="no datalad")
 def test_multiecho(tmpdir, subject='MEEPI', heuristic='bids_ME.py'):
     tmpdir.chdir()
@@ -62,7 +63,7 @@ def test_multiecho(tmpdir, subject='MEEPI', heuristic='bids_ME.py'):
 
     outdir = tmpdir.mkdir('out').strpath
     args = gen_heudiconv_args(datadir, outdir, subject, heuristic)
-    runner(args) # run conversion
+    runner(args)  # run conversion
 
     # check if we have echo functionals
     echoes = glob(op.join('out', 'sub-' + subject, 'func', '*echo*nii.gz'))
@@ -81,3 +82,43 @@ def test_multiecho(tmpdir, subject='MEEPI', heuristic='bids_ME.py'):
     events = glob(op.join('out', 'sub-' + subject, 'func', '*events.tsv'))
     for event in events:
         assert 'echo-' not in event
+
+
+ at pytest.mark.parametrize('subject', ['merged'])
+def test_grouping(tmpdir, subject):
+    dicoms = [
+        op.join(TESTS_DATA_PATH, fl) for fl in ['axasc35.dcm', 'phantom.dcm']
+    ]
+    # ensure DICOMs are different studies
+    studyuids = {
+        dcm.read_file(fl, stop_before_pixels=True).StudyInstanceUID for fl
+        in dicoms
+    }
+    assert len(studyuids) == len(dicoms)
+    # symlink to common location
+    outdir = tmpdir.mkdir('out')
+    datadir = tmpdir.mkdir(subject)
+    for fl in dicoms:
+        os.symlink(fl, (datadir / op.basename(fl)).strpath)
+
+    template = op.join("{subject}/*.dcm")
+    hargs = gen_heudiconv_args(
+        tmpdir.strpath,
+        outdir.strpath,
+        subject,
+        'convertall.py',
+        template=template
+    )
+
+    with pytest.raises(AssertionError):
+        runner(hargs)
+
+    # group all found DICOMs under subject, despite conflicts
+    hargs += ["-g", "all"]
+    runner(hargs)
+    assert len([fl for fl in outdir.visit(fil='run0*')]) == 4
+    tsv = (outdir / 'participants.tsv')
+    assert tsv.check()
+    lines = tsv.open().readlines()
+    assert len(lines) == 2
+    assert lines[1].split('\t')[0] == 'sub-{}'.format(subject)


=====================================
heudiconv/tests/utils.py
=====================================
@@ -1,9 +1,17 @@
+from functools import wraps
+import os
 import os.path as op
+import sys
+
 import heudiconv.heuristics
 
+
 HEURISTICS_PATH = op.join(heudiconv.heuristics.__path__[0])
 TESTS_DATA_PATH = op.join(op.dirname(__file__), 'data')
 
+import logging
+lgr = logging.getLogger(__name__)
+
 
 def gen_heudiconv_args(datadir, outdir, subject, heuristic_file,
                        anon_cmd=None, template=None, xargs=None):
@@ -52,9 +60,57 @@ def fetch_data(tmpdir, dataset, getpath=None):
     """
     from datalad import api
     targetdir = op.join(tmpdir, op.basename(dataset))
-    api.install(path=targetdir,
+    ds = api.install(path=targetdir,
                 source='http://datasets-tests.datalad.org/{}'.format(dataset))
 
     getdir = targetdir + (op.sep + getpath if getpath is not None else '')
-    api.get(getdir)
+    ds.get(getdir)
     return targetdir
+
+
+def assert_cwd_unchanged(ok_to_chdir=False):
+    """Decorator to test whether the current working directory remains unchanged
+
+    Provenance: based on the one in datalad, but simplified.
+
+    Parameters
+    ----------
+    ok_to_chdir: bool, optional
+      If True, allow to chdir, so this decorator would not then raise exception
+      if chdir'ed but only return to original directory
+    """
+
+    def decorator(func=None):  # =None to avoid pytest treating it as a fixture
+        @wraps(func)
+        def newfunc(*args, **kwargs):
+            cwd_before = os.getcwd()
+            exc = None
+            try:
+                return func(*args, **kwargs)
+            except Exception as exc_:
+                exc = exc_
+            finally:
+                try:
+                    cwd_after = os.getcwd()
+                except OSError as e:
+                    lgr.warning("Failed to getcwd: %s" % e)
+                    cwd_after = None
+
+                if cwd_after != cwd_before:
+                    os.chdir(cwd_before)
+                    if not ok_to_chdir:
+                        lgr.warning(
+                            "%s changed cwd to %s. Mitigating and changing back to %s"
+                            % (func, cwd_after, cwd_before))
+                        # If there was already exception raised, we better reraise
+                        # that one since it must be more important, so not masking it
+                        # here with our assertion
+                        if exc is None:
+                            assert cwd_before == cwd_after, \
+                                     "CWD changed from %s to %s" % (cwd_before, cwd_after)
+
+                if exc is not None:
+                    raise exc
+        return newfunc
+
+    return decorator


=====================================
heudiconv/utils.py
=====================================
@@ -19,10 +19,7 @@ from nipype.utils.filemanip import which
 import logging
 lgr = logging.getLogger(__name__)
 
-if sys.version_info[0] > 2:
-    from json.decoder import JSONDecodeError
-else:
-    JSONDecodeError = ValueError
+from json.decoder import JSONDecodeError
 
 
 seqinfo_fields = [
@@ -30,8 +27,8 @@ seqinfo_fields = [
     'example_dcm_file',      # 1
     'series_id',             # 2
     'dcm_dir_name',          # 3
-    'unspecified2',          # 4
-    'unspecified3',          # 5
+    'series_files',          # 4
+    'unspecified',           # 5
     'dim1', 'dim2', 'dim3', 'dim4', # 6, 7, 8, 9
     'TR', 'TE',              # 10, 11
     'protocol_name',         # 12
@@ -47,7 +44,7 @@ seqinfo_fields = [
     'patient_age',           # 22
     'patient_sex',           # 23
     'date',                  # 24
-    'series_uid'             # 25
+    'series_uid',            # 25
  ]
 
 SeqInfo = namedtuple('SeqInfo', seqinfo_fields)
@@ -115,9 +112,7 @@ def anonymize_sid(sid, anon_sid_cmd):
     cmd = [anon_sid_cmd, sid]
     shell_return = check_output(cmd)
 
-    if all([sys.version_info[0] > 2,
-            isinstance(shell_return, bytes),
-            isinstance(sid, str)]):
+    if isinstance(shell_return, bytes) and isinstance(sid, str):
         anon_sid = shell_return.decode()
     else:
         anon_sid = shell_return
@@ -193,7 +188,7 @@ def assure_no_file_exists(path):
         os.unlink(path)
 
 
-def save_json(filename, data, indent=4, sort_keys=True, pretty=False):
+def save_json(filename, data, indent=2, sort_keys=True, pretty=False):
     """Save data to a json file
 
     Parameters
@@ -208,11 +203,25 @@ def save_json(filename, data, indent=4, sort_keys=True, pretty=False):
 
     """
     assure_no_file_exists(filename)
+    dumps_kw = dict(sort_keys=sort_keys, indent=indent)
+    j = None
+    if pretty:
+        try:
+            j = json_dumps_pretty(data, **dumps_kw)
+        except AssertionError as exc:
+            pretty = False
+            lgr.warning(
+                "Prettyfication of .json failed (%s).  "
+                "Original .json will be kept as is.  Please share (if you "
+                "could) "
+                "that file (%s) with HeuDiConv developers"
+                % (str(exc), filename)
+            )
+    if not pretty:
+        j = _canonical_dumps(data, **dumps_kw)
+    assert j is not None  # one way or another it should have been set to a str
     with open(filename, 'w') as fp:
-        fp.write(
-            (json_dumps_pretty if pretty else _canonical_dumps)(
-                data, sort_keys=sort_keys, indent=indent)
-        )
+        fp.write(j)
 
 
 def json_dumps_pretty(j, indent=2, sort_keys=True):
@@ -257,25 +266,9 @@ def json_dumps_pretty(j, indent=2, sort_keys=True):
 def treat_infofile(filename):
     """Tune up generated .json file (slim down, pretty-print for humans).
     """
-    with open(filename) as f:
-        j = json.load(f)
-
+    j = load_json(filename)
     j_slim = slim_down_info(j)
-    dumps_kw = dict(indent=2, sort_keys=True)
-    try:
-        j_pretty = json_dumps_pretty(j_slim, **dumps_kw)
-    except AssertionError as exc:
-        lgr.warning(
-            "Prettyfication of .json failed (%s).  "
-            "Original .json will be kept as is.  Please share (if you could) "
-            "that file (%s) with HeuDiConv developers"
-            % (str(exc), filename)
-        )
-        j_pretty = json.dumps(j_slim, **dumps_kw)
-
-    set_readonly(filename, False)
-    with open(filename, 'wt') as fp:
-        fp.write(j_pretty)
+    save_json(filename, j_slim, sort_keys=True, pretty=True)
     set_readonly(filename)
 
 
@@ -324,7 +317,7 @@ def load_heuristic(heuristic):
         path, fname = op.split(heuristic_file)
         try:
             old_syspath = sys.path[:]
-            sys.path.append(path)
+            sys.path.insert(0, path)
             mod = __import__(fname.split('.')[0])
             mod.filename = heuristic_file
         finally:
@@ -490,8 +483,25 @@ def create_tree(path, tree, archives_leading_dir=True):
             create_tree(full_name, load, archives_leading_dir=archives_leading_dir)
         else:
             with open(full_name, 'w') as f:
-                if sys.version_info[0] == 2 and not isinstance(load, str):
-                    load = load.encode('utf-8')
                 f.write(load)
         if executable:
             os.chmod(full_name, os.stat(full_name).st_mode | stat.S_IEXEC)
+
+
+def get_typed_attr(obj, attr, _type, default=None):
+    """
+    Typecasts an object's named attribute. If the attribute cannot be
+    converted, the default value is returned instead.
+
+    Parameters
+    ----------
+    obj: Object
+    attr: Attribute
+    _type: Type
+    default: value, optional
+    """
+    try:
+        val = _type(getattr(obj, attr, default))
+    except (TypeError, ValueError):
+        return default
+    return val


=====================================
utils/prep_release
=====================================
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+set -eu
+
+read -r newver oldver <<<$(sed -ne 's,## \[\([0-9\.]*\)\] .*,\1,gp' CHANGELOG.md | head -n 2 | tr '\n' ' ')
+
+echo "Old: $oldver  New: $newver"
+curver=$(python -c 'import heudiconv; print(heudiconv.__version__)')
+# check
+test "$oldver" = "$curver"
+
+sed -i -e "s,${oldver//./\\.},$newver,g" \
+    docs/conf.py docs/installation.rst docs/usage.rst heudiconv/info.py
+



View it on GitLab: https://salsa.debian.org/med-team/heudiconv/-/commit/255bc4551ee0024428361213bf0a6b77c4807cdd

-- 
View it on GitLab: https://salsa.debian.org/med-team/heudiconv/-/commit/255bc4551ee0024428361213bf0a6b77c4807cdd
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200420/adc637ac/attachment-0001.html>