[med-svn] [Git][med-team/hdmf][upstream] New upstream version 3.0.1
Nilesh Patra (@nilesh)
gitlab at salsa.debian.org
Fri Jul 9 18:18:13 BST 2021
Nilesh Patra pushed to branch upstream at Debian Med / hdmf
Commits:
a6cddcdd by Nilesh Patra at 2021-07-09T22:19:42+05:30
New upstream version 3.0.1
- - - - -
25 changed files:
- PKG-INFO
- README.rst
- requirements-dev.txt
- requirements-min.txt
- requirements.txt
- setup.py
- src/hdmf.egg-info/PKG-INFO
- src/hdmf.egg-info/SOURCES.txt
- src/hdmf.egg-info/requires.txt
- src/hdmf/_version.py
- src/hdmf/backends/hdf5/h5_utils.py
- src/hdmf/backends/hdf5/h5tools.py
- src/hdmf/common/resources.py
- src/hdmf/common/table.py
- src/hdmf/testing/testcase.py
- src/hdmf/utils.py
- src/hdmf/validate/errors.py
- src/hdmf/validate/validator.py
- tests/unit/common/test_resources.py
- tests/unit/common/test_table.py
- tests/unit/test_io_hdf5_h5tools.py
- tests/unit/utils_test/test_utils.py
- + tests/unit/validator_tests/test_errors.py
- tests/unit/validator_tests/test_validate.py
- tox.ini
Changes:
=====================================
PKG-INFO
=====================================
@@ -1,127 +1,17 @@
Metadata-Version: 2.1
Name: hdmf
-Version: 2.5.8
+Version: 3.0.1
Summary: A package for standardizing hierarchical object data
Home-page: https://github.com/hdmf-dev/hdmf
Author: Andrew Tritt
Author-email: ajtritt at lbl.gov
License: BSD
-Description: ========================================
- The Hierarchical Data Modeling Framework
- ========================================
-
- The Hierarchical Data Modeling Framework, or *HDMF*, is a Python package for working with hierarchical data.
- It provides APIs for specifying data models, reading and writing data to different storage backends, and
- representing data with Python object.
-
- Documentation of HDMF can be found at https://hdmf.readthedocs.io
-
- Latest Release
- ==============
-
- .. image:: https://badge.fury.io/py/hdmf.svg
- :target: https://badge.fury.io/py/hdmf
-
- .. image:: https://anaconda.org/conda-forge/hdmf/badges/version.svg
- :target: https://anaconda.org/conda-forge/hdmf
-
-
- Build Status
- ============
-
- .. table::
-
- +---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+
- | Linux | Windows and macOS |
- +=====================================================================+==================================================================================================+
- | .. image:: https://circleci.com/gh/hdmf-dev/hdmf.svg?style=shield | .. image:: https://dev.azure.com/hdmf-dev/hdmf/_apis/build/status/hdmf-dev.hdmf?branchName=dev |
- | :target: https://circleci.com/gh/hdmf-dev/hdmf | :target: https://dev.azure.com/hdmf-dev/hdmf/_build/latest?definitionId=1&branchName=dev |
- +---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+
-
-
- **Conda**
-
- .. image:: https://circleci.com/gh/conda-forge/hdmf-feedstock.svg?style=shield
- :target: https://circleci.com/gh/conda-forge/hdmf-feedstock
-
-
- Overall Health
- ==============
-
- .. image:: https://github.com/hdmf-dev/hdmf/workflows/Run%20coverage/badge.svg
- :target: https://github.com/hdmf-dev/hdmf/actions?query=workflow%3A%22Run+coverage%22
-
- .. image:: https://codecov.io/gh/hdmf-dev/hdmf/branch/dev/graph/badge.svg
- :target: https://codecov.io/gh/hdmf-dev/hdmf
-
- .. image:: https://requires.io/github/hdmf-dev/hdmf/requirements.svg?branch=dev
- :target: https://requires.io/github/hdmf-dev/hdmf/requirements/?branch=dev
- :alt: Requirements Status
-
- .. image:: https://readthedocs.org/projects/hdmf/badge/?version=latest
- :target: https://hdmf.readthedocs.io/en/latest/?badge=latest
- :alt: Documentation Status
-
- Installation
- ============
-
- See the HDMF documentation for details http://hdmf.readthedocs.io/en/latest/getting_started.html#installation
-
- Code of Conduct
- ===============
-
- This project and everyone participating in it is governed by our `code of conduct guidelines <.github/CODE_OF_CONDUCT.md>`_. By participating, you are expected to uphold this code.
-
- Contributing
- ============
-
- For details on how to contribute to HDMF see our `contribution guidelines <docs/CONTRIBUTING.rst>`_.
-
- Citing HDMF
- ===========
-
- .. code-block:: bibtex
-
- @INPROCEEDINGS{9005648,
- author={A. J. {Tritt} and O. {Rübel} and B. {Dichter} and R. {Ly} and D. {Kang} and E. F. {Chang} and L. M. {Frank} and K. {Bouchard}},
- booktitle={2019 IEEE International Conference on Big Data (Big Data)},
- title={HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards},
- year={2019},
- volume={},
- number={},
- pages={165-179},
- doi={10.1109/BigData47090.2019.9005648}}
-
- LICENSE
- =======
-
- "hdmf" Copyright (c) 2017-2021, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
- Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
- (1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-
- (2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
-
- (3) Neither the name of the University of California, Lawrence Berkeley National Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
-
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
- You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code ("Enhancements") to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such enhancements or derivative works thereof, in binary and source code form.
-
- COPYRIGHT
- =========
-
- "hdmf" Copyright (c) 2017-2021, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
- If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Innovation & Partnerships Office at IPO at lbl.gov.
-
- NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit other to do so.
-
Keywords: python HDF HDF5 cross-platform open-data data-format open-source open-science reproducible-research
Platform: UNKNOWN
Classifier: Programming Language :: Python
-Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
Classifier: License :: OSI Approved :: BSD License
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
@@ -131,3 +21,121 @@ Classifier: Operating System :: MacOS
Classifier: Operating System :: Unix
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Description-Content-Type: text/x-rst; charset=UTF-8
+
+========================================
+The Hierarchical Data Modeling Framework
+========================================
+
+The Hierarchical Data Modeling Framework, or *HDMF*, is a Python package for working with hierarchical data.
+It provides APIs for specifying data models, reading and writing data to different storage backends, and
+representing data with Python object.
+
+Documentation of HDMF can be found at https://hdmf.readthedocs.io
+
+Latest Release
+==============
+
+.. image:: https://badge.fury.io/py/hdmf.svg
+ :target: https://badge.fury.io/py/hdmf
+
+.. image:: https://anaconda.org/conda-forge/hdmf/badges/version.svg
+ :target: https://anaconda.org/conda-forge/hdmf
+
+
+Build Status
+============
+
+.. table::
+
+ +---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+
+ | Linux | Windows and macOS |
+ +=====================================================================+==================================================================================================+
+ | .. image:: https://circleci.com/gh/hdmf-dev/hdmf.svg?style=shield | .. image:: https://dev.azure.com/hdmf-dev/hdmf/_apis/build/status/hdmf-dev.hdmf?branchName=dev |
+ | :target: https://circleci.com/gh/hdmf-dev/hdmf | :target: https://dev.azure.com/hdmf-dev/hdmf/_build/latest?definitionId=1&branchName=dev |
+ +---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+
+
+
+**Conda**
+
+.. image:: https://circleci.com/gh/conda-forge/hdmf-feedstock.svg?style=shield
+ :target: https://circleci.com/gh/conda-forge/hdmf-feedstock
+
+
+Overall Health
+==============
+
+.. image:: https://github.com/hdmf-dev/hdmf/workflows/Run%20coverage/badge.svg
+ :target: https://github.com/hdmf-dev/hdmf/actions?query=workflow%3A%22Run+coverage%22
+
+.. image:: https://codecov.io/gh/hdmf-dev/hdmf/branch/dev/graph/badge.svg
+ :target: https://codecov.io/gh/hdmf-dev/hdmf
+
+.. image:: https://requires.io/github/hdmf-dev/hdmf/requirements.svg?branch=dev
+ :target: https://requires.io/github/hdmf-dev/hdmf/requirements/?branch=dev
+ :alt: Requirements Status
+
+.. image:: https://readthedocs.org/projects/hdmf/badge/?version=latest
+ :target: https://hdmf.readthedocs.io/en/latest/?badge=latest
+ :alt: Documentation Status
+
+Installation
+============
+
+See the HDMF documentation for details http://hdmf.readthedocs.io/en/latest/getting_started.html#installation
+
+Code of Conduct
+===============
+
+This project and everyone participating in it is governed by our `code of conduct guidelines <.github/CODE_OF_CONDUCT.md>`_. By participating, you are expected to uphold this code.
+
+Contributing
+============
+
+For details on how to contribute to HDMF see our `contribution guidelines <docs/CONTRIBUTING.rst>`_.
+
+Citing HDMF
+===========
+
+* **Manuscript:**
+
+.. code-block:: bibtex
+
+ @INPROCEEDINGS{9005648,
+ author={A. J. {Tritt} and O. {Rübel} and B. {Dichter} and R. {Ly} and D. {Kang} and E. F. {Chang} and L. M. {Frank} and K. {Bouchard}},
+ booktitle={2019 IEEE International Conference on Big Data (Big Data)},
+ title={HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards},
+ year={2019},
+ volume={},
+ number={},
+ pages={165-179},
+ doi={10.1109/BigData47090.2019.9005648},
+ note={}}
+
+* **RRID:** (Hierarchical Data Modeling Framework, RRID:SCR_021303)
+
+
+LICENSE
+=======
+
+"hdmf" Copyright (c) 2017-2021, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+(1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+(2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+(3) Neither the name of the University of California, Lawrence Berkeley National Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code ("Enhancements") to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such enhancements or derivative works thereof, in binary and source code form.
+
+COPYRIGHT
+=========
+
+"hdmf" Copyright (c) 2017-2021, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
+If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Innovation & Partnerships Office at IPO at lbl.gov.
+
+NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit other to do so.
+
+
=====================================
README.rst
=====================================
@@ -72,6 +72,8 @@ For details on how to contribute to HDMF see our `contribution guidelines <docs/
Citing HDMF
===========
+* **Manuscript:**
+
.. code-block:: bibtex
@INPROCEEDINGS{9005648,
@@ -82,7 +84,11 @@ Citing HDMF
volume={},
number={},
pages={165-179},
- doi={10.1109/BigData47090.2019.9005648}}
+ doi={10.1109/BigData47090.2019.9005648},
+ note={}}
+
+* **RRID:** (Hierarchical Data Modeling Framework, RRID:SCR_021303)
+
LICENSE
=======
=====================================
requirements-dev.txt
=====================================
@@ -2,9 +2,9 @@
# compute coverage, and create test environments
codecov==2.1.11
coverage==5.5
-flake8==3.9.1
+flake8==3.9.2
flake8-debugger==4.0.0
flake8-print==4.0.0
-importlib-metadata==4.0.1
+importlib-metadata==4.6.1
python-dateutil==2.8.1
-tox==3.23.0
+tox==3.23.1
=====================================
requirements-min.txt
=====================================
@@ -1,8 +1,8 @@
# minimum versions of package dependencies for installing HDMF
-h5py==2.9 # support for setting attrs to lists of utf-8 added in 2.9
+h5py==2.10 # support for selection of datasets with list of indices added in 2.10
numpy==1.16
scipy==1.1
pandas==1.0.5
-ruamel.yaml==0.15
+ruamel.yaml==0.16
jsonschema==2.6.0
setuptools
=====================================
requirements.txt
=====================================
@@ -1,8 +1,8 @@
# pinned dependencies to reproduce an entire development environment to use HDMF
-h5py==2.10.0
-numpy==1.19.3
-scipy==1.5.4
-pandas==1.1.5
-ruamel.yaml==0.17.4
+h5py==3.3.0
+numpy==1.21.0
+scipy==1.7.0
+pandas==1.3.0
+ruamel.yaml==0.17.10
jsonschema==3.2.0
-setuptools==56.0.0
+setuptools==57.1.0
=====================================
setup.py
=====================================
@@ -12,11 +12,11 @@ print('found these packages:', pkgs)
schema_dir = 'common/hdmf-common-schema/common'
reqs = [
- 'h5py>=2.9,<3',
- 'numpy>=1.16,<1.21',
+ 'h5py>=2.10,<4',
+ 'numpy>=1.16,<1.22',
'scipy>=1.1,<2',
'pandas>=1.0.5,<2',
- 'ruamel.yaml>=0.15,<1',
+ 'ruamel.yaml>=0.16,<1',
'jsonschema>=2.6.0,<4',
'setuptools',
]
@@ -40,9 +40,9 @@ setup_args = {
'package_data': {'hdmf': ["%s/*.yaml" % schema_dir, "%s/*.json" % schema_dir]},
'classifiers': [
"Programming Language :: Python",
- "Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
+ "Programming Language :: Python :: 3.9",
"License :: OSI Approved :: BSD License",
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
=====================================
src/hdmf.egg-info/PKG-INFO
=====================================
@@ -1,127 +1,17 @@
Metadata-Version: 2.1
Name: hdmf
-Version: 2.5.8
+Version: 3.0.1
Summary: A package for standardizing hierarchical object data
Home-page: https://github.com/hdmf-dev/hdmf
Author: Andrew Tritt
Author-email: ajtritt at lbl.gov
License: BSD
-Description: ========================================
- The Hierarchical Data Modeling Framework
- ========================================
-
- The Hierarchical Data Modeling Framework, or *HDMF*, is a Python package for working with hierarchical data.
- It provides APIs for specifying data models, reading and writing data to different storage backends, and
- representing data with Python object.
-
- Documentation of HDMF can be found at https://hdmf.readthedocs.io
-
- Latest Release
- ==============
-
- .. image:: https://badge.fury.io/py/hdmf.svg
- :target: https://badge.fury.io/py/hdmf
-
- .. image:: https://anaconda.org/conda-forge/hdmf/badges/version.svg
- :target: https://anaconda.org/conda-forge/hdmf
-
-
- Build Status
- ============
-
- .. table::
-
- +---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+
- | Linux | Windows and macOS |
- +=====================================================================+==================================================================================================+
- | .. image:: https://circleci.com/gh/hdmf-dev/hdmf.svg?style=shield | .. image:: https://dev.azure.com/hdmf-dev/hdmf/_apis/build/status/hdmf-dev.hdmf?branchName=dev |
- | :target: https://circleci.com/gh/hdmf-dev/hdmf | :target: https://dev.azure.com/hdmf-dev/hdmf/_build/latest?definitionId=1&branchName=dev |
- +---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+
-
-
- **Conda**
-
- .. image:: https://circleci.com/gh/conda-forge/hdmf-feedstock.svg?style=shield
- :target: https://circleci.com/gh/conda-forge/hdmf-feedstock
-
-
- Overall Health
- ==============
-
- .. image:: https://github.com/hdmf-dev/hdmf/workflows/Run%20coverage/badge.svg
- :target: https://github.com/hdmf-dev/hdmf/actions?query=workflow%3A%22Run+coverage%22
-
- .. image:: https://codecov.io/gh/hdmf-dev/hdmf/branch/dev/graph/badge.svg
- :target: https://codecov.io/gh/hdmf-dev/hdmf
-
- .. image:: https://requires.io/github/hdmf-dev/hdmf/requirements.svg?branch=dev
- :target: https://requires.io/github/hdmf-dev/hdmf/requirements/?branch=dev
- :alt: Requirements Status
-
- .. image:: https://readthedocs.org/projects/hdmf/badge/?version=latest
- :target: https://hdmf.readthedocs.io/en/latest/?badge=latest
- :alt: Documentation Status
-
- Installation
- ============
-
- See the HDMF documentation for details http://hdmf.readthedocs.io/en/latest/getting_started.html#installation
-
- Code of Conduct
- ===============
-
- This project and everyone participating in it is governed by our `code of conduct guidelines <.github/CODE_OF_CONDUCT.md>`_. By participating, you are expected to uphold this code.
-
- Contributing
- ============
-
- For details on how to contribute to HDMF see our `contribution guidelines <docs/CONTRIBUTING.rst>`_.
-
- Citing HDMF
- ===========
-
- .. code-block:: bibtex
-
- @INPROCEEDINGS{9005648,
- author={A. J. {Tritt} and O. {Rübel} and B. {Dichter} and R. {Ly} and D. {Kang} and E. F. {Chang} and L. M. {Frank} and K. {Bouchard}},
- booktitle={2019 IEEE International Conference on Big Data (Big Data)},
- title={HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards},
- year={2019},
- volume={},
- number={},
- pages={165-179},
- doi={10.1109/BigData47090.2019.9005648}}
-
- LICENSE
- =======
-
- "hdmf" Copyright (c) 2017-2021, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
- Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
- (1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-
- (2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
-
- (3) Neither the name of the University of California, Lawrence Berkeley National Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
-
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
- You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code ("Enhancements") to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such enhancements or derivative works thereof, in binary and source code form.
-
- COPYRIGHT
- =========
-
- "hdmf" Copyright (c) 2017-2021, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
- If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Innovation & Partnerships Office at IPO at lbl.gov.
-
- NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit other to do so.
-
Keywords: python HDF HDF5 cross-platform open-data data-format open-source open-science reproducible-research
Platform: UNKNOWN
Classifier: Programming Language :: Python
-Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
Classifier: License :: OSI Approved :: BSD License
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
@@ -131,3 +21,121 @@ Classifier: Operating System :: MacOS
Classifier: Operating System :: Unix
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Description-Content-Type: text/x-rst; charset=UTF-8
+
+========================================
+The Hierarchical Data Modeling Framework
+========================================
+
+The Hierarchical Data Modeling Framework, or *HDMF*, is a Python package for working with hierarchical data.
+It provides APIs for specifying data models, reading and writing data to different storage backends, and
+representing data with Python object.
+
+Documentation of HDMF can be found at https://hdmf.readthedocs.io
+
+Latest Release
+==============
+
+.. image:: https://badge.fury.io/py/hdmf.svg
+ :target: https://badge.fury.io/py/hdmf
+
+.. image:: https://anaconda.org/conda-forge/hdmf/badges/version.svg
+ :target: https://anaconda.org/conda-forge/hdmf
+
+
+Build Status
+============
+
+.. table::
+
+ +---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+
+ | Linux | Windows and macOS |
+ +=====================================================================+==================================================================================================+
+ | .. image:: https://circleci.com/gh/hdmf-dev/hdmf.svg?style=shield | .. image:: https://dev.azure.com/hdmf-dev/hdmf/_apis/build/status/hdmf-dev.hdmf?branchName=dev |
+ | :target: https://circleci.com/gh/hdmf-dev/hdmf | :target: https://dev.azure.com/hdmf-dev/hdmf/_build/latest?definitionId=1&branchName=dev |
+ +---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+
+
+
+**Conda**
+
+.. image:: https://circleci.com/gh/conda-forge/hdmf-feedstock.svg?style=shield
+ :target: https://circleci.com/gh/conda-forge/hdmf-feedstock
+
+
+Overall Health
+==============
+
+.. image:: https://github.com/hdmf-dev/hdmf/workflows/Run%20coverage/badge.svg
+ :target: https://github.com/hdmf-dev/hdmf/actions?query=workflow%3A%22Run+coverage%22
+
+.. image:: https://codecov.io/gh/hdmf-dev/hdmf/branch/dev/graph/badge.svg
+ :target: https://codecov.io/gh/hdmf-dev/hdmf
+
+.. image:: https://requires.io/github/hdmf-dev/hdmf/requirements.svg?branch=dev
+ :target: https://requires.io/github/hdmf-dev/hdmf/requirements/?branch=dev
+ :alt: Requirements Status
+
+.. image:: https://readthedocs.org/projects/hdmf/badge/?version=latest
+ :target: https://hdmf.readthedocs.io/en/latest/?badge=latest
+ :alt: Documentation Status
+
+Installation
+============
+
+See the HDMF documentation for details http://hdmf.readthedocs.io/en/latest/getting_started.html#installation
+
+Code of Conduct
+===============
+
+This project and everyone participating in it is governed by our `code of conduct guidelines <.github/CODE_OF_CONDUCT.md>`_. By participating, you are expected to uphold this code.
+
+Contributing
+============
+
+For details on how to contribute to HDMF see our `contribution guidelines <docs/CONTRIBUTING.rst>`_.
+
+Citing HDMF
+===========
+
+* **Manuscript:**
+
+.. code-block:: bibtex
+
+ @INPROCEEDINGS{9005648,
+ author={A. J. {Tritt} and O. {Rübel} and B. {Dichter} and R. {Ly} and D. {Kang} and E. F. {Chang} and L. M. {Frank} and K. {Bouchard}},
+ booktitle={2019 IEEE International Conference on Big Data (Big Data)},
+ title={HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards},
+ year={2019},
+ volume={},
+ number={},
+ pages={165-179},
+ doi={10.1109/BigData47090.2019.9005648},
+ note={}}
+
+* **RRID:** (Hierarchical Data Modeling Framework, RRID:SCR_021303)
+
+
+LICENSE
+=======
+
+"hdmf" Copyright (c) 2017-2021, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+(1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+(2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+(3) Neither the name of the University of California, Lawrence Berkeley National Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code ("Enhancements") to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such enhancements or derivative works thereof, in binary and source code form.
+
+COPYRIGHT
+=========
+
+"hdmf" Copyright (c) 2017-2021, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
+If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Innovation & Partnerships Office at IPO at lbl.gov.
+
+NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit other to do so.
+
+
=====================================
src/hdmf.egg-info/SOURCES.txt
=====================================
@@ -126,4 +126,5 @@ tests/unit/utils_test/test_docval.py
tests/unit/utils_test/test_labelleddict.py
tests/unit/utils_test/test_utils.py
tests/unit/validator_tests/__init__.py
+tests/unit/validator_tests/test_errors.py
tests/unit/validator_tests/test_validate.py
\ No newline at end of file
=====================================
src/hdmf.egg-info/requires.txt
=====================================
@@ -1,7 +1,7 @@
-h5py<3,>=2.9
-numpy<1.21,>=1.16
+h5py<4,>=2.10
+numpy<1.22,>=1.16
scipy<2,>=1.1
pandas<2,>=1.0.5
-ruamel.yaml<1,>=0.15
+ruamel.yaml<1,>=0.16
jsonschema<4,>=2.6.0
setuptools
=====================================
src/hdmf/_version.py
=====================================
@@ -8,11 +8,11 @@ import json
version_json = '''
{
- "date": "2021-06-16T13:54:41-0700",
+ "date": "2021-07-07T09:42:02-0700",
"dirty": false,
"error": null,
- "full-revisionid": "929ec93232bfa1069c764abc7b3b280ab0fc0c1e",
- "version": "2.5.8"
+ "full-revisionid": "935d9838bb4268768e9eaab2e56f7d5c936ef1f4",
+ "version": "3.0.1"
}
''' # END VERSION_JSON
=====================================
src/hdmf/backends/hdf5/h5_utils.py
=====================================
@@ -452,7 +452,7 @@ class H5DataIO(DataIO):
self.__allow_plugin_filters):
msg = "%s compression may not be supported by this version of h5py." % str(self.__iosettings['compression'])
if not self.__allow_plugin_filters:
- msg += "Set `allow_plugin_filters=True` to enable the use of dynamically-loaded plugin filters."
+ msg += " Set `allow_plugin_filters=True` to enable the use of dynamically-loaded plugin filters."
raise ValueError(msg)
# Check possible parameter collisions
if isinstance(self.data, Dataset):
=====================================
src/hdmf/backends/hdf5/h5tools.py
=====================================
@@ -6,6 +6,7 @@ from functools import partial
from pathlib import Path
import numpy as np
+import h5py
from h5py import File, Group, Dataset, special_dtype, SoftLink, ExternalLink, Reference, RegionReference, check_dtype
from .h5_utils import (BuilderH5ReferenceDataset, BuilderH5RegionDataset, BuilderH5TableDataset, H5DataIO,
@@ -17,7 +18,7 @@ from ...build import (Builder, GroupBuilder, DatasetBuilder, LinkBuilder, BuildM
from ...container import Container
from ...data_utils import AbstractDataChunkIterator
from ...spec import RefSpec, DtypeSpec, NamespaceCatalog, GroupSpec, NamespaceBuilder
-from ...utils import docval, getargs, popargs, call_docval_func, get_data_shape, fmt_docval_args, get_docval
+from ...utils import docval, getargs, popargs, call_docval_func, get_data_shape, fmt_docval_args, get_docval, StrDataset
ROOT_NAME = 'root'
SPEC_LOC_ATTR = '.specloc'
@@ -26,6 +27,8 @@ H5_BINARY = special_dtype(vlen=bytes)
H5_REF = special_dtype(ref=Reference)
H5_REGREF = special_dtype(ref=RegionReference)
+H5PY_3 = h5py.__version__.startswith('3')
+
class HDF5IO(HDMFIO):
@@ -694,12 +697,12 @@ class HDF5IO(HDMFIO):
kwargs['dtype'] = d.dtype
else:
kwargs["data"] = scalar
- elif ndims == 1:
+ else:
d = None
if h5obj.dtype.kind == 'O' and len(h5obj) > 0:
- elem1 = h5obj[0]
+ elem1 = h5obj[tuple([0] * (h5obj.ndim - 1) + [0])]
if isinstance(elem1, (str, bytes)):
- d = h5obj
+ d = self._check_str_dtype(h5obj)
elif isinstance(elem1, RegionReference): # read list of references
d = BuilderH5RegionDataset(h5obj, self)
kwargs['dtype'] = d.dtype
@@ -714,12 +717,17 @@ class HDF5IO(HDMFIO):
else:
d = h5obj
kwargs["data"] = d
- else:
- kwargs["data"] = h5obj
ret = DatasetBuilder(name, **kwargs)
self.__set_written(ret)
return ret
+ def _check_str_dtype(self, h5obj):
+ dtype = h5obj.dtype
+ if dtype.kind == 'O':
+ if dtype.metadata.get('vlen') == str and H5PY_3:
+ return StrDataset(h5obj, None)
+ return h5obj
+
@classmethod
def __compound_dtype_to_list(cls, h5obj_dtype, dset_dtype):
ret = []
@@ -922,10 +930,8 @@ class HDF5IO(HDMFIO):
if isinstance(value, (set, list, tuple)):
tmp = tuple(value)
if len(tmp) > 0:
- if isinstance(tmp[0], str):
- value = [np.unicode_(s) for s in tmp]
- elif isinstance(tmp[0], bytes):
- value = [np.string_(s) for s in tmp]
+ if isinstance(tmp[0], (str, bytes)):
+ value = np.array(value, dtype=special_dtype(vlen=type(tmp[0])))
elif isinstance(tmp[0], Container): # a list of references
self.__queue_ref(self._make_attr_ref_filler(obj, key, tmp))
else:
@@ -938,6 +944,8 @@ class HDF5IO(HDMFIO):
else:
self.logger.debug("Setting %s '%s' attribute '%s' to %s"
% (obj.__class__.__name__, obj.name, key, value.__class__.__name__))
+ if isinstance(value, np.ndarray) and value.dtype.kind == 'U':
+ value = np.array(value, dtype=H5_TEXT)
obj.attrs[key] = value # a regular scalar
except Exception as e:
msg = "unable to write attribute '%s' on object '%s'" % (key, obj.name)
=====================================
src/hdmf/common/resources.py
=====================================
@@ -212,7 +212,7 @@ class ExternalResources(Container):
@docval({'name': 'obj', 'type': (int, Object), 'doc': 'the Object to that uses the Key'},
{'name': 'key', 'type': (int, Key), 'doc': 'the Key that the Object uses'})
- def _add_external_reference(self, **kwargs):
+ def _add_object_key(self, **kwargs):
"""
Specify that an object (i.e. container and field) uses a key to reference
an external resource
@@ -332,6 +332,7 @@ class ExternalResources(Container):
if not isinstance(key, Key):
key = self._add_key(key)
+ self._add_object_key(object_field, key)
if kwargs['resources_idx'] is not None and kwargs['resource_name'] is None and kwargs['resource_uri'] is None:
resource_table_idx = kwargs['resources_idx']
@@ -358,10 +359,36 @@ class ExternalResources(Container):
if add_entity:
entity = self._add_entity(key, resource_table_idx, entity_id, entity_uri)
- self._add_external_reference(object_field, key)
return key, resource_table_idx, entity
+ @docval({'name': 'container', 'type': (str, AbstractContainer),
+ 'doc': 'the Container/data object that is linked to resources/entities',
+ 'default': None},
+ {'name': 'field', 'type': str,
+ 'doc': 'the field of the Container',
+ 'default': None})
+ def get_object_resources(self, **kwargs):
+ """
+ Get all entities/resources associated with an object
+ """
+ container = kwargs['container']
+ field = kwargs['field']
+
+ keys = []
+ entities = []
+ if container is not None and field is not None:
+ object_field = self._check_object_field(container, field)
+ # Find all keys associated with the object
+ for row_idx in self.object_keys.which(objects_idx=object_field.idx):
+ keys.append(self.object_keys['keys_idx', row_idx])
+ # Find all the entities/resources for each key.
+ for key_idx in keys:
+ entity_idx = self.entities.which(keys_idx=key_idx)
+ entities.append(self.entities.__getitem__(entity_idx[0]))
+ df = pd.DataFrame(entities, columns=['keys_idx', 'resource_idx', 'entity_id', 'entity_uri'])
+ return df
+
@docval({'name': 'keys', 'type': (list, Key), 'default': None,
'doc': 'the Key(s) to get external resource data for'},
rtype=pd.DataFrame, returns='a DataFrame with keys and external resource data')
=====================================
src/hdmf/common/table.py
=====================================
@@ -9,7 +9,6 @@ from warnings import warn
import numpy as np
import pandas as pd
-from h5py import Dataset
from . import register_class, EXP_NAMESPACE
from ..container import Container, Data
@@ -157,7 +156,7 @@ class VectorIndex(VectorData):
def __getitem__(self, arg):
"""
- Select elements in this VectorIndex and retrieve the corrsponding data from the self.target VectorData
+ Select elements in this VectorIndex and retrieve the corresponding data from the self.target VectorData
:param arg: slice or integer index indicating the elements we want to select in this VectorIndex
:return: Scalar or list of values retrieved
@@ -791,25 +790,39 @@ class DynamicTable(Container):
raise KeyError(key)
return ret
- def get(self, key, default=None, df=True, **kwargs): # noqa: C901
- """
- Select a subset from the table
+ def get(self, key, default=None, df=True, index=True, **kwargs):
+ """Select a subset from the table.
+
+ If the table includes a DynamicTableRegion column, then by default,
+ the index/indices of the DynamicTableRegion will be returned. If ``df=True`` and ``index=False``,
+ then the returned pandas DataFrame will contain a nested DataFrame in each row of the
+ DynamicTableRegion column. If ``df=False`` and ``index=True``, then a list of lists will be returned
+ where the list containing the DynamicTableRegion column contains the indices of the DynamicTableRegion.
+ Note that in this case, the DynamicTable referenced by the DynamicTableRegion can be accessed through
+ the ``table`` attribute of the DynamicTableRegion object. ``df=False`` and ``index=False`` is
+ not yet supported.
:param key: Key defining which elements of the table to select. This may be one of the following:
1) string with the name of the column to select
2) a tuple consisting of (str, int) where the string identifies the column to select by name
and the int selects the row
- 3) int, list of ints, or slice selecting a set of full rows in the table
+ 3) int, list of ints, array, or slice selecting a set of full rows in the table. If an int is used, then
+ scalars are returned for each column that has a single value. If a list, array, or slice is used and
+ df=False, then lists are returned for each column, even if the list, array, or slice resolves to a
+ single row.
:return: 1) If key is a string, then return array with the data of the selected column
2) If key is a tuple of (int, str), then return the scalar value of the selected cell
- 3) If key is an int, list, np.ndarray, or slice, then return pandas.DataFrame consisting of one or
- more rows
+ 3) If key is an int, list, np.ndarray, or slice, then return pandas.DataFrame or lists
+ consisting of one or more rows
:raises: KeyError
"""
ret = None
+ if not df and not index:
+ # returning nested lists of lists for DTRs and ragged DTRs is complicated and not yet supported
+ raise ValueError('DynamicTable.get() with df=False and index=False is not yet supported.')
if isinstance(key, tuple):
# index by row and column --> return specific cell
arg1 = key[0]
@@ -828,104 +841,143 @@ class DynamicTable(Container):
else:
return default
else:
- # index by int, list, np.ndarray, or slice --> return pandas Dataframe consisting of one or more rows
- arg = key
- ret = OrderedDict()
- try:
- # index with a python slice or single int to select one or multiple rows
- if not (np.issubdtype(type(arg), np.integer) or isinstance(arg, (slice, list, np.ndarray))):
- raise KeyError("Key type not supported by DynamicTable %s" % str(type(arg)))
- if isinstance(arg, np.ndarray) and len(arg.shape) != 1:
- raise ValueError("cannot index DynamicTable with multiple dimensions")
- ret['id'] = self.id[arg]
- for name in self.colnames:
- col = self.__df_cols[self.__colids[name]]
- ret[name] = col.get(arg, df=df, **kwargs)
- except ValueError as ve:
- x = re.match(r"^Index \((.*)\) out of range \(.*\)$", str(ve))
- if x:
- msg = ("Row index %s out of range for %s '%s' (length %d)."
- % (x.groups()[0], self.__class__.__name__, self.name, len(self)))
- raise IndexError(msg)
- else: # pragma: no cover
- raise ve
- except IndexError as ie:
- if str(ie) == 'list index out of range':
- msg = ("Row index out of range for %s '%s' (length %d)."
- % (self.__class__.__name__, self.name, len(self)))
- raise IndexError(msg)
- else: # pragma: no cover
- raise ie
+ # index by int, list, np.ndarray, or slice -->
+ # return pandas Dataframe or lists consisting of one or more rows
+ sel = self.__get_selection_as_dict(key, df, index, **kwargs)
if df:
# reformat objects to fit into a pandas DataFrame
- id_index = ret.pop('id')
- if np.isscalar(id_index):
- id_index = [id_index]
- retdf = OrderedDict()
- for k in ret: # for each column
- if isinstance(ret[k], np.ndarray):
- if ret[k].ndim == 1:
- if len(id_index) == 1:
- # k is a multi-dimension column, and
- # only one element has been selected
- retdf[k] = [ret[k]]
- else:
- retdf[k] = ret[k]
- else:
- if len(id_index) == ret[k].shape[0]:
- # k is a multi-dimension column, and
- # more than one element has been selected
- retdf[k] = list(ret[k])
- else:
- raise ValueError('unable to convert selection to DataFrame')
- elif isinstance(ret[k], (list, tuple)):
- if len(id_index) == 1:
- # k is a multi-dimension column, and
- # only one element has been selected
- retdf[k] = [ret[k]]
- else:
- retdf[k] = ret[k]
- elif isinstance(ret[k], pd.DataFrame):
- retdf['%s_%s' % (k, ret[k].index.name)] = ret[k].index.values
- for col in ret[k].columns:
- newcolname = "%s_%s" % (k, col)
- retdf[newcolname] = ret[k][col].values
- else:
- retdf[k] = ret[k]
- ret = pd.DataFrame(retdf, index=pd.Index(name=self.id.name, data=id_index))
- # if isinstance(key, (int, np.integer)):
- # ret = ret.iloc[0]
+ if np.isscalar(key):
+ ret = self.__get_selection_as_df_single_row(sel)
+ else:
+ ret = self.__get_selection_as_df(sel)
else:
- ret = list(ret.values())
+ ret = list(sel.values())
return ret
+ def __get_selection_as_dict(self, arg, df, index, exclude=None, **kwargs):
+ """Return a dict mapping column names to values (lists/arrays or dataframes) for the given selection.
+ Uses each column's get() method, passing kwargs as necessary.
+
+ :param arg: key passed to get() to return one or more rows
+ :type arg: int, list, np.ndarray, or slice
+ """
+ if not (np.issubdtype(type(arg), np.integer) or isinstance(arg, (slice, list, np.ndarray))):
+ raise KeyError("Key type not supported by DynamicTable %s" % str(type(arg)))
+ if isinstance(arg, np.ndarray) and arg.ndim != 1:
+ raise ValueError("Cannot index DynamicTable with multiple dimensions")
+ if exclude is None:
+ exclude = set([])
+ ret = OrderedDict()
+ try:
+ # index with a python slice or single int to select one or multiple rows
+ ret['id'] = self.id[arg]
+ for name in self.colnames:
+ if name in exclude:
+ continue
+ col = self.__df_cols[self.__colids[name]]
+ if index and (isinstance(col, DynamicTableRegion) or
+ (isinstance(col, VectorIndex) and isinstance(col.target, DynamicTableRegion))):
+ # return indices (in list, array, etc.) for DTR and ragged DTR
+ ret[name] = col.get(arg, df=False, index=True, **kwargs)
+ else:
+ ret[name] = col.get(arg, df=df, index=index, **kwargs)
+ return ret
+ # if index is out of range, different errors can be generated depending on the dtype of the column
+ # but despite the differences, raise an IndexError from that error
+ except ValueError as ve:
+ # in h5py <2, if the column is an h5py.Dataset, a ValueError was raised
+ # in h5py 3+, this became an IndexError
+ x = re.match(r"^Index \((.*)\) out of range \(.*\)$", str(ve))
+ if x:
+ msg = ("Row index %s out of range for %s '%s' (length %d)."
+ % (x.groups()[0], self.__class__.__name__, self.name, len(self)))
+ raise IndexError(msg) from ve
+ else: # pragma: no cover
+ raise ve
+ except IndexError as ie:
+ x = re.match(r"^Index \((.*)\) out of range for \(.*\)$", str(ie))
+ if x:
+ msg = ("Row index %s out of range for %s '%s' (length %d)."
+ % (x.groups()[0], self.__class__.__name__, self.name, len(self)))
+ raise IndexError(msg)
+ elif str(ie) == 'list index out of range':
+ msg = ("Row index out of range for %s '%s' (length %d)."
+ % (self.__class__.__name__, self.name, len(self)))
+ raise IndexError(msg) from ie
+ else: # pragma: no cover
+ raise ie
+
+ def __get_selection_as_df_single_row(self, coldata):
+ """Return a pandas dataframe for the given row and columns with the id column as the index.
+
+ This is a special case of __get_selection_as_df where a single row was requested.
+
+ :param coldata: dict mapping column names to values (list/arrays or dataframes)
+ :type coldata: dict
+ """
+ id_index_orig = coldata.pop('id')
+ id_index = [id_index_orig]
+ df_input = OrderedDict()
+ for k in coldata: # for each column
+ if isinstance(coldata[k], (np.ndarray, list, tuple, pd.DataFrame)):
+ # wrap in a list because coldata[k] may be an array/list/tuple with multiple elements (ragged or
+ # multi-dim column) and pandas needs to have one element per index row (=1 in this case)
+ df_input[k] = [coldata[k]]
+ else: # scalar, don't wrap
+ df_input[k] = coldata[k]
+ ret = pd.DataFrame(df_input, index=pd.Index(name=self.id.name, data=id_index))
+ return ret
+
+ def __get_selection_as_df(self, coldata):
+ """Return a pandas dataframe for the given rows and columns with the id column as the index.
+
+ This is used when multiple row indices are selected (or a list/array/slice of a single index is passed to get).
+ __get_selection_as_df_single_row should be used if a single index is passed to get.
+
+ :param coldata: dict mapping column names to values (list/arrays or dataframes)
+ :type coldata: dict
+ """
+ id_index = coldata.pop('id')
+ df_input = OrderedDict()
+ for k in coldata: # for each column
+ if isinstance(coldata[k], np.ndarray) and coldata[k].ndim > 1:
+ df_input[k] = list(coldata[k]) # convert multi-dim array to list of inner arrays
+ elif isinstance(coldata[k], pd.DataFrame):
+ # multiple rows were selected and collapsed into a dataframe
+ # split up the rows of the df into a list of dataframes, one per row
+ # TODO make this more efficient
+ df_input[k] = [coldata[k].iloc[[i]] for i in range(len(coldata[k]))]
+ else:
+ df_input[k] = coldata[k]
+ ret = pd.DataFrame(df_input, index=pd.Index(name=self.id.name, data=id_index))
+ return ret
+
def __contains__(self, val):
"""
Check if the given value (i.e., column) exists in this table
"""
return val in self.__colids or val in self.__indices
- @docval({'name': 'exclude', 'type': set, 'doc': ' Set of columns to exclude from the dataframe', 'default': None})
+ @docval({'name': 'exclude', 'type': set, 'doc': 'Set of column names to exclude from the dataframe',
+ 'default': None},
+ {'name': 'index', 'type': bool,
+ 'doc': ('Whether to return indices for a DynamicTableRegion column. If False, nested dataframes will be '
+ 'returned.'),
+ 'default': False}
+ )
def to_dataframe(self, **kwargs):
"""
Produce a pandas DataFrame containing this table's data.
- """
- exclude = popargs('exclude', kwargs)
- if exclude is None:
- exclude = set([])
- data = OrderedDict()
- for name in self.colnames:
- if name in exclude:
- continue
- col = self.__df_cols[self.__colids[name]]
-
- if isinstance(col.data, (Dataset, np.ndarray)) and col.data.ndim > 1:
- data[name] = [x for x in col[:]]
- else:
- data[name] = col[:]
- return pd.DataFrame(data, index=pd.Index(name=self.id.name, data=self.id.data))
+ If this table contains a DynamicTableRegion, by default,
+
+ If exclude is None, this is equivalent to table.get(slice(None, None, None), index=False).
+ """
+ arg = slice(None, None, None) # select all rows
+ sel = self.__get_selection_as_dict(arg, df=True, **kwargs)
+ ret = self.__get_selection_as_df(sel)
+ return ret
@classmethod
@docval(
@@ -1077,11 +1129,15 @@ class DynamicTableRegion(VectorData):
:param arg: 1) tuple consisting of (str, int) where the string defines the column to select
and the int selects the row, 2) int or slice to select a subset of rows
- :param df: Boolean indicating whether we want to return the result as a pandas dataframe
+ :param index: Boolean indicating whether to return indices of the DTR (default False)
+ :param df: Boolean indicating whether to return the result as a pandas DataFrame (default True)
- :return: Result from self.table[....] with the appropritate selection based on the
+ :return: Result from self.table[....] with the appropriate selection based on the
rows selected by this DynamicTableRegion
"""
+ if not df and not index:
+ # returning nested lists of lists for DTRs and ragged DTRs is complicated and not yet supported
+ raise ValueError('DynamicTableRegion.get() with df=False and index=False is not yet supported.')
# treat the list of indices as data that can be indexed. then pass the
# result to the table to get the data
if isinstance(arg, tuple):
@@ -1093,13 +1149,13 @@ class DynamicTableRegion(VectorData):
raise IndexError('index {} out of bounds for data of length {}'.format(arg, len(self.data)))
ret = self.data[arg]
if not index:
- ret = self.table.get(ret, df=df, **kwargs)
+ ret = self.table.get(ret, df=df, index=index, **kwargs)
return ret
elif isinstance(arg, (list, slice, np.ndarray)):
idx = arg
# get the data at the specified indices
- if isinstance(self.data, (tuple, list)) and isinstance(idx, list):
+ if isinstance(self.data, (tuple, list)) and isinstance(idx, (list, np.ndarray)):
ret = [self.data[i] for i in idx]
else:
ret = self.data[idx]
@@ -1116,7 +1172,7 @@ class DynamicTableRegion(VectorData):
# of the list we are returning. This is carried out by the recursive method _index_lol
uniq = np.unique(ret)
lut = {val: i for i, val in enumerate(uniq)}
- values = self.table.get(uniq, df=df, **kwargs)
+ values = self.table.get(uniq, df=df, index=index, **kwargs)
if df:
ret = values.iloc[[lut[i] for i in ret]]
else:
@@ -1136,8 +1192,10 @@ class DynamicTableRegion(VectorData):
for col in result:
if isinstance(col, list):
if isinstance(col[0], list):
+ # list of columns that need to be sorted
ret.append(self._index_lol(col, index, lut))
else:
+ # list of elements, one for each row to return
ret.append([col[lut[i]] for i in index])
elif isinstance(col, np.ndarray):
ret.append(np.array([col[lut[i]] for i in index], dtype=col.dtype))
=====================================
src/hdmf/testing/testcase.py
=====================================
@@ -101,6 +101,10 @@ class TestCase(unittest.TestCase):
if isinstance(arr1, (float, np.floating)):
np.testing.assert_allclose(arr1, arr2)
else:
+ if isinstance(arr1, bytes):
+ arr1 = arr1.decode('utf-8')
+ if isinstance(arr2, bytes):
+ arr2 = arr2.decode('utf-8')
self.assertEqual(arr1, arr2) # scalar
else:
self.assertEqual(len(arr1), len(arr2))
@@ -109,7 +113,10 @@ class TestCase(unittest.TestCase):
if isinstance(arr2, np.ndarray) and len(arr2.dtype) > 1: # compound type
arr2 = arr2.tolist()
if isinstance(arr1, np.ndarray) and isinstance(arr2, np.ndarray):
- np.testing.assert_allclose(arr1, arr2)
+ if np.issubdtype(arr1.dtype, np.number):
+ np.testing.assert_allclose(arr1, arr2)
+ else:
+ np.testing.assert_array_equal(arr1, arr2)
else:
for sub1, sub2 in zip(arr1, arr2):
if isinstance(sub1, Container):
=====================================
src/hdmf/utils.py
=====================================
@@ -8,6 +8,7 @@ from enum import Enum
import h5py
import numpy as np
+
__macros = {
'array_data': [np.ndarray, list, tuple, h5py.Dataset],
'scalar_data': [str, int, float, bytes, bool],
@@ -999,3 +1000,38 @@ class LabelledDict(dict):
def update(self, other):
"""update is not supported. A TypeError will be raised."""
raise TypeError('update is not supported for %s' % self.__class__.__name__)
+
+
+ at docval_macro('array_data')
+class StrDataset(h5py.Dataset):
+ """Wrapper to decode strings on reading the dataset"""
+ def __init__(self, dset, encoding, errors='strict'):
+ self.dset = dset
+ if encoding is None:
+ encoding = h5py.h5t.check_string_dtype(dset.dtype).encoding
+ self.encoding = encoding
+ self.errors = errors
+
+ def __getattr__(self, name):
+ return getattr(self.dset, name)
+
+ def __repr__(self):
+ return '<StrDataset for %s>' % repr(self.dset)[1:-1]
+
+ def __len__(self):
+ return len(self.dset)
+
+ def __getitem__(self, args):
+ bytes_arr = self.dset[args]
+ # numpy.char.decode() seems like the obvious thing to use. But it only
+ # accepts numpy string arrays, not object arrays of bytes (which we
+ # return from HDF5 variable-length strings). And the numpy
+ # implementation is not faster than doing it with a loop; in fact, by
+ # not converting the result to a numpy unicode array, the
+ # naive way can be faster! (Comparing with numpy 1.18.4, June 2020)
+ if np.isscalar(bytes_arr):
+ return bytes_arr.decode(self.encoding, self.errors)
+
+ return np.array([
+ b.decode(self.encoding, self.errors) for b in bytes_arr.flat
+ ], dtype=object).reshape(bytes_arr.shape)
=====================================
src/hdmf/validate/errors.py
=====================================
@@ -25,10 +25,6 @@ class Error:
self.__name = getargs('name', kwargs)
self.__reason = getargs('reason', kwargs)
self.__location = getargs('location', kwargs)
- if self.__location is not None:
- self.__str = "%s (%s): %s" % (self.__name, self.__location, self.__reason)
- else:
- self.__str = "%s: %s" % (self.name, self.reason)
@property
def name(self):
@@ -45,14 +41,49 @@ class Error:
@location.setter
def location(self, loc):
self.__location = loc
- self.__str = "%s (%s): %s" % (self.__name, self.__location, self.__reason)
def __str__(self):
- return self.__str
+ return self.__format_str(self.name, self.location, self.reason)
+
+ @staticmethod
+ def __format_str(name, location, reason):
+ if location is not None:
+ return "%s (%s): %s" % (name, location, reason)
+ else:
+ return "%s: %s" % (name, reason)
def __repr__(self):
return self.__str__()
+ def __hash__(self):
+ """Returns the hash value of this Error
+
+ Note: if the location property is set after creation, the hash value will
+ change. Therefore, it is important to finalize the value of location
+ before getting the hash value.
+ """
+ return hash(self.__equatable_str())
+
+ def __equatable_str(self):
+ """A string representation of the error which can be used to check for equality
+
+ For a single error, name can end up being different depending on whether it is
+ generated from a base data type spec or from an inner type definition. These errors
+ should still be considered equal because they are caused by the same problem.
+
+ When a location is provided, we only consider the name of the field and drop the
+ rest of the spec name. However, when a location is not available, then we need to
+ use the fully-provided name.
+ """
+ if self.location is not None:
+ equatable_name = self.name.split('/')[-1]
+ else:
+ equatable_name = self.name
+ return self.__format_str(equatable_name, self.location, self.reason)
+
+ def __eq__(self, other):
+ return hash(self) == hash(other)
+
class DtypeError(Error):
=====================================
src/hdmf/validate/validator.py
=====================================
@@ -2,7 +2,7 @@ import re
from abc import ABCMeta, abstractmethod
from copy import copy
from itertools import chain
-from collections import defaultdict
+from collections import defaultdict, OrderedDict
import numpy as np
@@ -14,6 +14,8 @@ from ..spec import Spec, AttributeSpec, GroupSpec, DatasetSpec, RefSpec, LinkSpe
from ..spec import SpecNamespace
from ..spec.spec import BaseStorageSpec, DtypeHelper
from ..utils import docval, getargs, call_docval_func, pystr, get_data_shape
+from ..query import ReferenceResolver
+
__synonyms = DtypeHelper.primary_dtype_synonyms
@@ -107,6 +109,8 @@ def get_type(data):
return 'region'
elif isinstance(data, ReferenceBuilder):
return 'object'
+ elif isinstance(data, ReferenceResolver):
+ return data.dtype
elif isinstance(data, np.ndarray):
if data.size == 0:
raise EmptyArrayError()
@@ -415,10 +419,9 @@ class GroupValidator(BaseStorageValidator):
returns='a list of Errors', rtype=list)
def validate(self, **kwargs): # noqa: C901
builder = getargs('builder', kwargs)
-
errors = super().validate(builder)
errors.extend(self.__validate_children(builder))
- return errors
+ return self._remove_duplicates(errors)
def __validate_children(self, parent_builder):
"""Validates the children of the group builder against the children in the spec.
@@ -491,8 +494,8 @@ class GroupValidator(BaseStorageValidator):
yield self.__construct_illegal_link_error(child_spec, parent_builder)
return # do not validate illegally linked objects
child_builder = child_builder.builder
- child_validator = self.__get_child_validator(child_spec)
- yield from child_validator.validate(child_builder)
+ for child_validator in self.__get_child_validators(child_spec):
+ yield from child_validator.validate(child_builder)
def __construct_illegal_link_error(self, child_spec, parent_builder):
name_of_erroneous = self.get_spec_loc(child_spec)
@@ -503,22 +506,48 @@ class GroupValidator(BaseStorageValidator):
def __cannot_be_link(spec):
return not isinstance(spec, LinkSpec) and not spec.linkable
- def __get_child_validator(self, spec):
- """Returns the appropriate validator for a child spec
-
- If a specific data type can be resolved, the validator is acquired from
- the ValidatorMap, otherwise a new Validator is created.
+ def __get_child_validators(self, spec):
+ """Returns the appropriate list of validators for a child spec
+
+ Due to the fact that child specs can both inherit a data type via data_type_inc
+ and also modify the type without defining a new data type via data_type_def,
+ we need to validate against both the spec for the base data type and the spec
+ at the current hierarchy of the data type in case there have been any
+ modifications.
+
+ If a specific data type can be resolved, a validator for that type is acquired
+ from the ValidatorMap and included in the returned validators. If the spec is
+ a GroupSpec or a DatasetSpec, then a new Validator is created and also
+ returned. If the spec is a LinkSpec, no additional Validator is returned
+ because the LinkSpec cannot add or modify fields and the target_type will be
+ validated by the Validator returned from the ValidatorMap.
"""
if _resolve_data_type(spec) is not None:
- return self.vmap.get_validator(_resolve_data_type(spec))
- elif isinstance(spec, GroupSpec):
- return GroupValidator(spec, self.vmap)
+ yield self.vmap.get_validator(_resolve_data_type(spec))
+
+ if isinstance(spec, GroupSpec):
+ yield GroupValidator(spec, self.vmap)
elif isinstance(spec, DatasetSpec):
- return DatasetValidator(spec, self.vmap)
+ yield DatasetValidator(spec, self.vmap)
+ elif isinstance(spec, LinkSpec):
+ return
else:
msg = "Unable to resolve a validator for spec %s" % spec
raise ValueError(msg)
+ @staticmethod
+ def _remove_duplicates(errors):
+ """Return a list of validation errors where duplicates have been removed
+
+ In some cases a child of a group to be validated against two specs which can
+ redundantly define the same fields/children. If the builder doesn't match the
+ spec, it is possible for duplicate errors to be generated.
+ """
+ ordered_errors = OrderedDict()
+ for error in errors:
+ ordered_errors[error] = error
+ return list(ordered_errors)
+
class SpecMatches:
"""A utility class to hold a spec and the builders matched to it"""
=====================================
tests/unit/common/test_resources.py
=====================================
@@ -3,31 +3,25 @@ import pandas as pd
from hdmf.common.resources import ExternalResources, Key, Resource
from hdmf import Data
from hdmf.testing import TestCase, H5RoundTripMixin
+import numpy as np
+import unittest
class TestExternalResources(H5RoundTripMixin, TestCase):
def setUpContainer(self):
er = ExternalResources('terms')
- key1 = er._add_key('key1')
- key2 = er._add_key('key1')
- resource1 = er._add_resource(resource='resource0', uri='resource_uri0')
er.add_ref(
- container='uuid1', field='field1', key=key1,
+ container='uuid1', field='field1', key='key1',
resource_name='resource11', resource_uri='resource_uri11',
entity_id="id11", entity_uri='url11')
er.add_ref(
- container='uuid2', field='field2', key=key2,
+ container='uuid2', field='field2', key='key2',
resource_name='resource21', resource_uri='resource_uri21', entity_id="id12", entity_uri='url21')
- er.add_ref(
- container='uuid3', field='field1', key='key1',
- resource_name='resource12', resource_uri='resource_uri12', entity_id="id13", entity_uri='url12')
- er.add_ref(
- container='uuid4', field='field2', key=key2, resources_idx=resource1,
- entity_id="id14", entity_uri='url23')
return er
+ @unittest.skip('Outdated do to privatization')
def test_piecewise_add(self):
er = ExternalResources('terms')
@@ -43,7 +37,7 @@ class TestExternalResources(H5RoundTripMixin, TestCase):
obj = er._add_object('object', 'species')
# This could also be wrapped up under NWBFile
- er._add_external_reference(obj, key)
+ er._add_object_key(obj, key)
self.assertEqual(er.keys.data, [('mouse',)])
self.assertEqual(er.entities.data,
@@ -53,20 +47,21 @@ class TestExternalResources(H5RoundTripMixin, TestCase):
def test_add_ref(self):
er = ExternalResources('terms')
data = Data(name="species", data=['Homo sapiens', 'Mus musculus'])
- resource1 = er._add_resource(resource='resource0', uri='resource_uri0')
er.add_ref(
container=data, field='', key='key1',
- resources_idx=resource1, entity_id='entity_id1', entity_uri='entity1')
+ resource_name='resource1', resource_uri='uri1',
+ entity_id='entity_id1', entity_uri='entity1')
self.assertEqual(er.keys.data, [('key1',)])
+ self.assertEqual(er.resources.data, [('resource1', 'uri1')])
self.assertEqual(er.entities.data, [(0, 0, 'entity_id1', 'entity1')])
self.assertEqual(er.objects.data, [(data.object_id, '')])
def test_add_ref_duplicate_resource(self):
er = ExternalResources('terms')
- resource1 = er._add_resource(resource='resource0', uri='resource_uri0')
er.add_ref(
container='uuid1', field='field1', key='key1',
- resources_idx=resource1, entity_id='entity_id1', entity_uri='entity1')
+ resource_name='resource0', resource_uri='uri0',
+ entity_id='entity_id1', entity_uri='entity1')
resource_list = er.resources.which(resource='resource0')
self.assertEqual(len(resource_list), 1)
@@ -153,13 +148,11 @@ class TestExternalResources(H5RoundTripMixin, TestCase):
def test_add_ref_same_keyname(self):
er = ExternalResources('terms')
- key1 = er._add_key('key1')
- key2 = er._add_key('key1')
er.add_ref(
- container='uuid1', field='field1', key=key1, resource_name='resource1',
+ container='uuid1', field='field1', key='key1', resource_name='resource1',
resource_uri='resource_uri1', entity_id="id11", entity_uri='url11')
er.add_ref(
- container='uuid2', field='field2', key=key2, resource_name='resource2',
+ container='uuid2', field='field2', key='key1', resource_name='resource2',
resource_uri='resource_uri2', entity_id="id12", entity_uri='url21')
er.add_ref(
container='uuid3', field='field3', key='key1', resource_name='resource3',
@@ -220,6 +213,38 @@ class TestExternalResources(H5RoundTripMixin, TestCase):
columns=['key_name', 'resources_idx', 'entity_id', 'entity_uri'])
pd.testing.assert_frame_equal(received, expected)
+ def test_get_object_resources(self):
+ er = ExternalResources('terms')
+ data = Data(name='data_name', data=np.array([('Mus musculus', 9, 81.0), ('Homo sapien', 3, 27.0)],
+ dtype=[('species', 'U14'), ('age', 'i4'), ('weight', 'f4')]))
+
+ er.add_ref(container=data, field='data/species', key='Mus musculus', resource_name='NCBI_Taxonomy',
+ resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
+ entity_id='NCBI:txid10090',
+ entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090')
+ received = er.get_object_resources(data, 'data/species')
+ expected = pd.DataFrame(
+ data=[[0, 0, 'NCBI:txid10090', 'https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090']],
+ columns=['keys_idx', 'resource_idx', 'entity_id', 'entity_uri'])
+ pd.testing.assert_frame_equal(received, expected)
+
+ def test_object_key_unqiueness(self):
+ er = ExternalResources('terms')
+ data = Data(name='data_name', data=np.array([('Mus musculus', 9, 81.0), ('Homo sapien', 3, 27.0)],
+ dtype=[('species', 'U14'), ('age', 'i4'), ('weight', 'f4')]))
+
+ er.add_ref(container=data, field='data/species', key='Mus musculus', resource_name='NCBI_Taxonomy',
+ resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
+ entity_id='NCBI:txid10090',
+ entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090')
+ existing_key = er.get_key('Mus musculus')
+ er.add_ref(container=data, field='data/species', key=existing_key, resource_name='resource2',
+ resource_uri='resource_uri2',
+ entity_id='entity2',
+ entity_uri='entity_uri2')
+
+ self.assertEqual(er.object_keys.data, [(0, 0)])
+
def test_check_object_field_add(self):
er = ExternalResources('terms')
data = Data(name="species", data=['Homo sapiens', 'Mus musculus'])
@@ -254,6 +279,7 @@ class TestExternalResourcesGetKey(TestCase):
with self.assertRaises(ValueError):
self.er.get_key('key2', 'uuid1', 'field1')
+ @unittest.skip('Outdated do to privatization')
def test_get_key_without_container(self):
self.er = ExternalResources('terms')
self.er._add_key('key1')
@@ -292,6 +318,7 @@ class TestExternalResourcesGetKey(TestCase):
with self.assertRaisesRegex(ValueError, "key 'bad_key' does not exist"):
self.er.get_key('bad_key')
+ @unittest.skip('Outdated do to privatization')
def test_get_key_same_keyname_all(self):
self.er = ExternalResources('terms')
key1 = self.er._add_key('key1')
@@ -308,19 +335,18 @@ class TestExternalResourcesGetKey(TestCase):
keys = self.er.get_key('key1')
- self.assertIsInstance(keys, list)
+ self.assertIsInstance(keys, Key)
self.assertEqual(keys[0].key, 'key1')
self.assertEqual(keys[1].key, 'key1')
def test_get_key_same_keyname_specific(self):
self.er = ExternalResources('terms')
- key1 = self.er._add_key('key1')
- key2 = self.er._add_key('key1')
+
self.er.add_ref(
- 'uuid1', 'field1', key1, resource_name='resource1',
+ 'uuid1', 'field1', 'key1', resource_name='resource1',
resource_uri='resource_uri1', entity_id="id11", entity_uri='url11')
self.er.add_ref(
- 'uuid2', 'field2', key2, resource_name='resource2',
+ 'uuid2', 'field2', 'key2', resource_name='resource2',
resource_uri='resource_uri2', entity_id="id12", entity_uri='url12')
self.er.add_ref(
'uuid1', 'field1', self.er.get_key('key1', 'uuid1', 'field1'), resource_name='resource3',
@@ -329,4 +355,4 @@ class TestExternalResourcesGetKey(TestCase):
keys = self.er.get_key('key1', 'uuid1', 'field1')
self.assertIsInstance(keys, Key)
self.assertEqual(keys.key, 'key1')
- self.assertEqual(self.er.keys.data, [('key1',), ('key1',)])
+ self.assertEqual(self.er.keys.data, [('key1',), ('key2',)])
=====================================
tests/unit/common/test_table.py
=====================================
@@ -1,15 +1,19 @@
from collections import OrderedDict
import h5py
import numpy as np
+import os
import pandas as pd
import unittest
-
from hdmf import Container
from hdmf.backends.hdf5 import H5DataIO, HDF5IO
+from hdmf.backends.hdf5.h5tools import H5_TEXT, H5PY_3
from hdmf.common import (DynamicTable, VectorData, VectorIndex, ElementIdentifiers, EnumData,
DynamicTableRegion, get_manager, SimpleMultiContainer)
from hdmf.testing import TestCase, H5RoundTripMixin, remove_test_file
+from hdmf.utils import StrDataset
+
+from tests.unit.utils import get_temp_filepath
class TestDynamicTable(TestCase):
@@ -565,6 +569,12 @@ Fields:
with self.assertRaisesWith(IndexError, msg):
table[5]
+ def test_no_df_nested(self):
+ table = self.with_columns_and_data()
+ msg = 'DynamicTable.get() with df=False and index=False is not yet supported.'
+ with self.assertRaisesWith(ValueError, msg):
+ table.get(0, df=False, index=False)
+
def test_multidim_col(self):
multidim_data = [
[[1, 2], [3, 4], [5, 6]],
@@ -827,6 +837,13 @@ class TestDynamicTableRegion(TestCase):
expected = expected % (id(dynamic_table_region), id(table))
self.assertEqual(str(dynamic_table_region), expected)
+ def test_no_df_nested(self):
+ table = self.with_columns_and_data()
+ dynamic_table_region = DynamicTableRegion('dtr', [0, 1, 2, 2], 'desc', table=table)
+ msg = 'DynamicTableRegion.get() with df=False and index=False is not yet supported.'
+ with self.assertRaisesWith(ValueError, msg):
+ dynamic_table_region.get(0, df=False, index=False)
+
class DynamicTableRegionRoundTrip(H5RoundTripMixin, TestCase):
@@ -880,6 +897,11 @@ class DynamicTableRegionRoundTrip(H5RoundTripMixin, TestCase):
table = mc.containers['table_with_dtr']
return table.get(arg)
+ def _get_nested(self, arg):
+ mc = self.roundtripContainer()
+ table = mc.containers['table_with_dtr']
+ return table.get(arg, index=False)
+
def _get_nodf(self, arg):
mc = self.roundtripContainer()
table = mc.containers['table_with_dtr']
@@ -900,24 +922,58 @@ class DynamicTableRegionRoundTrip(H5RoundTripMixin, TestCase):
self._getitem('boo')
def _assert_two_elem_df(self, rec):
- columns = ['foo', 'bar', 'baz', 'dtr_id', 'dtr_qux', 'dtr_quz']
- data = [[1, 10.0, 'cat', 0, 'qux_1', 'quz_1'],
- [2, 20.0, 'dog', 1, 'qux_2', 'quz_2']]
+ columns = ['foo', 'bar', 'baz', 'dtr']
+ data = [[1, 10.0, 'cat', 0],
+ [2, 20.0, 'dog', 1]]
exp = pd.DataFrame(data=data, columns=columns, index=pd.Series(name='id', data=[0, 1]))
pd.testing.assert_frame_equal(rec, exp, check_dtype=False)
def _assert_one_elem_df(self, rec):
- columns = ['foo', 'bar', 'baz', 'dtr_id', 'dtr_qux', 'dtr_quz']
- data = [[1, 10.0, 'cat', 0, 'qux_1', 'quz_1']]
+ columns = ['foo', 'bar', 'baz', 'dtr']
+ data = [[1, 10.0, 'cat', 0]]
exp = pd.DataFrame(data=data, columns=columns, index=pd.Series(name='id', data=[0]))
pd.testing.assert_frame_equal(rec, exp, check_dtype=False)
+ def _assert_two_elem_df_nested(self, rec):
+ nested_columns = ['qux', 'quz']
+ nested_data = [['qux_1', 'quz_1'], ['qux_2', 'quz_2']]
+ nested_df = pd.DataFrame(data=nested_data, columns=nested_columns, index=pd.Series(name='id', data=[0, 1]))
+
+ columns = ['foo', 'bar', 'baz']
+ data = [[1, 10.0, 'cat'],
+ [2, 20.0, 'dog']]
+ exp = pd.DataFrame(data=data, columns=columns, index=pd.Series(name='id', data=[0, 1]))
+
+ # remove nested dataframe and test each df separately
+ pd.testing.assert_frame_equal(rec['dtr'][0], nested_df.iloc[[0]])
+ pd.testing.assert_frame_equal(rec['dtr'][1], nested_df.iloc[[1]])
+ del rec['dtr']
+ pd.testing.assert_frame_equal(rec, exp, check_dtype=False)
+
+ def _assert_one_elem_df_nested(self, rec):
+ nested_columns = ['qux', 'quz']
+ nested_data = [['qux_1', 'quz_1'], ['qux_2', 'quz_2']]
+ nested_df = pd.DataFrame(data=nested_data, columns=nested_columns, index=pd.Series(name='id', data=[0, 1]))
+
+ columns = ['foo', 'bar', 'baz']
+ data = [[1, 10.0, 'cat']]
+ exp = pd.DataFrame(data=data, columns=columns, index=pd.Series(name='id', data=[0]))
+
+ # remove nested dataframe and test each df separately
+ pd.testing.assert_frame_equal(rec['dtr'][0], nested_df.iloc[[0]])
+ del rec['dtr']
+ pd.testing.assert_frame_equal(rec, exp, check_dtype=False)
+
#####################
# tests DynamicTableRegion.__getitem__
def test_getitem_int(self):
rec = self._getitem(0)
self._assert_one_elem_df(rec)
+ def test_getitem_list_single(self):
+ rec = self._getitem([0])
+ self._assert_one_elem_df(rec)
+
def test_getitem_list(self):
rec = self._getitem([0, 1])
self._assert_two_elem_df(rec)
@@ -932,6 +988,10 @@ class DynamicTableRegionRoundTrip(H5RoundTripMixin, TestCase):
rec = self._get(0)
self._assert_one_elem_df(rec)
+ def test_get_list_single(self):
+ rec = self._get([0])
+ self._assert_one_elem_df(rec)
+
def test_get_list(self):
rec = self._get([0, 1])
self._assert_two_elem_df(rec)
@@ -940,11 +1000,29 @@ class DynamicTableRegionRoundTrip(H5RoundTripMixin, TestCase):
rec = self._get(slice(0, 2, None))
self._assert_two_elem_df(rec)
+ #####################
+ # tests DynamicTableRegion.get, return a DataFrame with nested DataFrame
+ def test_get_nested_int(self):
+ rec = self._get_nested(0)
+ self._assert_one_elem_df_nested(rec)
+
+ def test_get_nested_list_single(self):
+ rec = self._get_nested([0])
+ self._assert_one_elem_df_nested(rec)
+
+ def test_get_nested_list(self):
+ rec = self._get_nested([0, 1])
+ self._assert_two_elem_df_nested(rec)
+
+ def test_get_nested_slice(self):
+ rec = self._get_nested(slice(0, 2, None))
+ self._assert_two_elem_df_nested(rec)
+
#####################
# tests DynamicTableRegion.get, DO NOT return a DataFrame
def test_get_nodf_int(self):
rec = self._get_nodf(0)
- exp = [0, 1, 10.0, 'cat', [0, 'qux_1', 'quz_1']]
+ exp = [0, 1, 10.0, 'cat', 0]
self.assertListEqual(rec, exp)
def _assert_list_of_ndarray_equal(self, l1, l2):
@@ -958,16 +1036,19 @@ class DynamicTableRegionRoundTrip(H5RoundTripMixin, TestCase):
else:
np.testing.assert_array_equal(a1, a2)
+ def test_get_nodf_list_single(self):
+ rec = self._get_nodf([0])
+ exp = [np.array([0]), np.array([1]), np.array([10.0]), np.array(['cat']), np.array([0])]
+ self._assert_list_of_ndarray_equal(exp, rec)
+
def test_get_nodf_list(self):
rec = self._get_nodf([0, 1])
- exp = [np.array([0, 1]), np.array([1, 2]), np.array([10.0, 20.0]), np.array(['cat', 'dog']),
- [np.array([0, 1]), np.array(['qux_1', 'qux_2']), np.array(['quz_1', 'quz_2'])]]
+ exp = [np.array([0, 1]), np.array([1, 2]), np.array([10.0, 20.0]), np.array(['cat', 'dog']), np.array([0, 1])]
self._assert_list_of_ndarray_equal(exp, rec)
def test_get_nodf_slice(self):
rec = self._get_nodf(slice(0, 2, None))
- exp = [np.array([0, 1]), np.array([1, 2]), np.array([10.0, 20.0]), np.array(['cat', 'dog']),
- [np.array([0, 1]), np.array(['qux_1', 'qux_2']), np.array(['quz_1', 'quz_2'])]]
+ exp = [np.array([0, 1]), np.array([1, 2]), np.array([10.0, 20.0]), np.array(['cat', 'dog']), np.array([0, 1])]
self._assert_list_of_ndarray_equal(exp, rec)
@@ -1366,74 +1447,420 @@ class TestIndexedEnumData(TestCase):
np.testing.assert_array_equal(idx[2], [['c', 'c'], ['c', 'c'], ['c', 'c'], ['c', 'c']])
-class TestIndexing(TestCase):
+class SelectionTestMixin:
def setUp(self):
- dt = DynamicTable(name='slice_test_table', description='a table to test slicing',
- id=[0, 1, 2])
- dt.add_column('foo', 'scalar column', data=np.array([0.0, 1.0, 2.0]))
- dt.add_column('bar', 'ragged column', index=np.array([2, 3, 6]),
- data=np.array(['r11', 'r12', 'r21', 'r31', 'r32', 'r33']))
- dt.add_column('baz', 'multi-dimension column',
- data=np.array([[10.0, 11.0, 12.0],
- [20.0, 21.0, 22.0],
- [30.0, 31.0, 32.0]]))
- self.table = dt
+ # table1 contains a non-ragged DTR and a ragged DTR, both of which point to table2
+ # table2 contains a non-ragged DTR and a ragged DTR, both of which point to table3
+ self.table3 = DynamicTable(
+ name='table3',
+ description='a test table',
+ id=[20, 21, 22]
+ )
+ self.table3.add_column('foo', 'scalar column', data=self._wrap([20.0, 21.0, 22.0]))
+ self.table3.add_column('bar', 'ragged column', index=self._wrap([2, 3, 6]),
+ data=self._wrap(['t11', 't12', 't21', 't31', 't32', 't33']))
+ self.table3.add_column('baz', 'multi-dimension column',
+ data=self._wrap([[210.0, 211.0, 212.0],
+ [220.0, 221.0, 222.0],
+ [230.0, 231.0, 232.0]]))
+ # generate expected dataframe for table3
+ data = OrderedDict()
+ data['foo'] = [20.0, 21.0, 22.0]
+ data['bar'] = [['t11', 't12'], ['t21'], ['t31', 't32', 't33']]
+ data['baz'] = [[210.0, 211.0, 212.0], [220.0, 221.0, 222.0], [230.0, 231.0, 232.0]]
+ idx = [20, 21, 22]
+ self.table3_df = pd.DataFrame(data=data, index=pd.Index(name='id', data=idx))
- def test_single_item(self):
- elem = self.table[0]
+ self.table2 = DynamicTable(
+ name='table2',
+ description='a test table',
+ id=[10, 11, 12]
+ )
+ self.table2.add_column('foo', 'scalar column', data=self._wrap([10.0, 11.0, 12.0]))
+ self.table2.add_column('bar', 'ragged column', index=self._wrap([2, 3, 6]),
+ data=self._wrap(['s11', 's12', 's21', 's31', 's32', 's33']))
+ self.table2.add_column('baz', 'multi-dimension column',
+ data=self._wrap([[110.0, 111.0, 112.0],
+ [120.0, 121.0, 122.0],
+ [130.0, 131.0, 132.0]]))
+ self.table2.add_column('qux', 'DTR column', table=self.table3, data=self._wrap([0, 1, 0]))
+ self.table2.add_column('corge', 'ragged DTR column', index=self._wrap([2, 3, 6]), table=self.table3,
+ data=self._wrap([0, 1, 2, 0, 1, 2]))
+ # TODO test when ragged DTR indices are not in ascending order
+
+ # generate expected dataframe for table2 *without DTR*
+ data = OrderedDict()
+ data['foo'] = [10.0, 11.0, 12.0]
+ data['bar'] = [['s11', 's12'], ['s21'], ['s31', 's32', 's33']]
+ data['baz'] = [[110.0, 111.0, 112.0], [120.0, 121.0, 122.0], [130.0, 131.0, 132.0]]
+ idx = [10, 11, 12]
+ self.table2_df = pd.DataFrame(data=data, index=pd.Index(name='id', data=idx))
+
+ self.table1 = DynamicTable(
+ name='table1',
+ description='a table to test slicing',
+ id=[0, 1]
+ )
+ self.table1.add_column('foo', 'scalar column', data=self._wrap([0.0, 1.0]))
+ self.table1.add_column('bar', 'ragged column', index=self._wrap([2, 3]),
+ data=self._wrap(['r11', 'r12', 'r21']))
+ self.table1.add_column('baz', 'multi-dimension column',
+ data=self._wrap([[10.0, 11.0, 12.0],
+ [20.0, 21.0, 22.0]]))
+ self.table1.add_column('qux', 'DTR column', table=self.table2, data=self._wrap([0, 1]))
+ self.table1.add_column('corge', 'ragged DTR column', index=self._wrap([2, 3]), table=self.table2,
+ data=self._wrap([0, 1, 2]))
+ self.table1.add_column('barz', 'ragged column of tuples (cpd type)', index=self._wrap([2, 3]),
+ data=self._wrap([(1.0, 11), (2.0, 12), (3.0, 21)]))
+
+ # generate expected dataframe for table1 *without DTR*
+ data = OrderedDict()
+ data['foo'] = self._wrap_check([0.0, 1.0])
+ data['bar'] = [self._wrap_check(['r11', 'r12']), self._wrap_check(['r21'])]
+ data['baz'] = [self._wrap_check([10.0, 11.0, 12.0]),
+ self._wrap_check([20.0, 21.0, 22.0])]
+ data['barz'] = [self._wrap_check([(1.0, 11), (2.0, 12)]), self._wrap_check([(3.0, 21)])]
+ idx = [0, 1]
+ self.table1_df = pd.DataFrame(data=data, index=pd.Index(name='id', data=idx))
+
+ def _check_two_rows_df(self, rec):
+ data = OrderedDict()
+ data['foo'] = self._wrap_check([0.0, 1.0])
+ data['bar'] = [self._wrap_check(['r11', 'r12']), self._wrap_check(['r21'])]
+ data['baz'] = [self._wrap_check([10.0, 11.0, 12.0]),
+ self._wrap_check([20.0, 21.0, 22.0])]
+ data['qux'] = self._wrap_check([0, 1])
+ data['corge'] = [self._wrap_check([0, 1]), self._wrap_check([2])]
+ data['barz'] = [self._wrap_check([(1.0, 11), (2.0, 12)]), self._wrap_check([(3.0, 21)])]
+ idx = [0, 1]
+ exp = pd.DataFrame(data=data, index=pd.Index(name='id', data=idx))
+ pd.testing.assert_frame_equal(rec, exp)
+
+ def _check_two_rows_df_nested(self, rec):
+ # first level: cache nested df cols and remove them before calling pd.testing.assert_frame_equal
+ qux_series = rec['qux']
+ corge_series = rec['corge']
+ del rec['qux']
+ del rec['corge']
+
+ idx = [0, 1]
+ pd.testing.assert_frame_equal(rec, self.table1_df.loc[idx])
+
+ # second level: compare the nested columns separately
+ self.assertEqual(len(qux_series), 2)
+ rec_qux1 = qux_series[0]
+ rec_qux2 = qux_series[1]
+ self._check_table2_first_row_qux(rec_qux1)
+ self._check_table2_second_row_qux(rec_qux2)
+
+ self.assertEqual(len(corge_series), 2)
+ rec_corge1 = corge_series[0]
+ rec_corge2 = corge_series[1]
+ self._check_table2_first_row_corge(rec_corge1)
+ self._check_table2_second_row_corge(rec_corge2)
+
+ def _check_one_row_df(self, rec):
data = OrderedDict()
- data['foo'] = 0.0
- data['bar'] = [np.array(['r11', 'r12'])]
- data['baz'] = [np.array([10.0, 11.0, 12.0])]
+ data['foo'] = self._wrap_check([0.0])
+ data['bar'] = [self._wrap_check(['r11', 'r12'])]
+ data['baz'] = [self._wrap_check([10.0, 11.0, 12.0])]
+ data['qux'] = self._wrap_check([0])
+ data['corge'] = [self._wrap_check([0, 1])]
+ data['barz'] = [self._wrap_check([(1.0, 11), (2.0, 12)])]
idx = [0]
exp = pd.DataFrame(data=data, index=pd.Index(name='id', data=idx))
- pd.testing.assert_frame_equal(elem, exp)
+ pd.testing.assert_frame_equal(rec, exp)
+
+ def _check_one_row_df_nested(self, rec):
+ # first level: cache nested df cols and remove them before calling pd.testing.assert_frame_equal
+ qux_series = rec['qux']
+ corge_series = rec['corge']
+ del rec['qux']
+ del rec['corge']
+
+ idx = [0]
+ pd.testing.assert_frame_equal(rec, self.table1_df.loc[idx])
+
+ # second level: compare the nested columns separately
+ self.assertEqual(len(qux_series), 1)
+ rec_qux = qux_series[0]
+ self._check_table2_first_row_qux(rec_qux)
+
+ self.assertEqual(len(corge_series), 1)
+ rec_corge = corge_series[0]
+ self._check_table2_first_row_corge(rec_corge)
+
+ def _check_table2_first_row_qux(self, rec_qux):
+ # second level: cache nested df cols and remove them before calling pd.testing.assert_frame_equal
+ qux_qux_series = rec_qux['qux']
+ qux_corge_series = rec_qux['corge']
+ del rec_qux['qux']
+ del rec_qux['corge']
+
+ qux_idx = [10]
+ pd.testing.assert_frame_equal(rec_qux, self.table2_df.loc[qux_idx])
+
+ # third level: compare the nested columns separately
+ self.assertEqual(len(qux_qux_series), 1)
+ pd.testing.assert_frame_equal(qux_qux_series[qux_idx[0]], self.table3_df.iloc[[0]])
+ self.assertEqual(len(qux_corge_series), 1)
+ pd.testing.assert_frame_equal(qux_corge_series[qux_idx[0]], self.table3_df.iloc[[0, 1]])
+
+ def _check_table2_second_row_qux(self, rec_qux):
+ # second level: cache nested df cols and remove them before calling pd.testing.assert_frame_equal
+ qux_qux_series = rec_qux['qux']
+ qux_corge_series = rec_qux['corge']
+ del rec_qux['qux']
+ del rec_qux['corge']
+
+ qux_idx = [11]
+ pd.testing.assert_frame_equal(rec_qux, self.table2_df.loc[qux_idx])
+
+ # third level: compare the nested columns separately
+ self.assertEqual(len(qux_qux_series), 1)
+ pd.testing.assert_frame_equal(qux_qux_series[qux_idx[0]], self.table3_df.iloc[[1]])
+ self.assertEqual(len(qux_corge_series), 1)
+ pd.testing.assert_frame_equal(qux_corge_series[qux_idx[0]], self.table3_df.iloc[[2]])
+
+ def _check_table2_first_row_corge(self, rec_corge):
+ # second level: cache nested df cols and remove them before calling pd.testing.assert_frame_equal
+ corge_qux_series = rec_corge['qux']
+ corge_corge_series = rec_corge['corge']
+ del rec_corge['qux']
+ del rec_corge['corge']
+
+ corge_idx = [10, 11]
+ pd.testing.assert_frame_equal(rec_corge, self.table2_df.loc[corge_idx])
+
+ # third level: compare the nested columns separately
+ self.assertEqual(len(corge_qux_series), 2)
+ pd.testing.assert_frame_equal(corge_qux_series[corge_idx[0]], self.table3_df.iloc[[0]])
+ pd.testing.assert_frame_equal(corge_qux_series[corge_idx[1]], self.table3_df.iloc[[1]])
+ self.assertEqual(len(corge_corge_series), 2)
+ pd.testing.assert_frame_equal(corge_corge_series[corge_idx[0]], self.table3_df.iloc[[0, 1]])
+ pd.testing.assert_frame_equal(corge_corge_series[corge_idx[1]], self.table3_df.iloc[[2]])
+
+ def _check_table2_second_row_corge(self, rec_corge):
+ # second level: cache nested df cols and remove them before calling pd.testing.assert_frame_equal
+ corge_qux_series = rec_corge['qux']
+ corge_corge_series = rec_corge['corge']
+ del rec_corge['qux']
+ del rec_corge['corge']
+
+ corge_idx = [12]
+ pd.testing.assert_frame_equal(rec_corge, self.table2_df.loc[corge_idx])
+
+ # third level: compare the nested columns separately
+ self.assertEqual(len(corge_qux_series), 1)
+ pd.testing.assert_frame_equal(corge_qux_series[corge_idx[0]], self.table3_df.iloc[[0]])
+ self.assertEqual(len(corge_corge_series), 1)
+ pd.testing.assert_frame_equal(corge_corge_series[corge_idx[0]], self.table3_df.iloc[[0, 1, 2]])
+
+ def _check_two_rows_no_df(self, rec):
+ self.assertEqual(rec[0], [0, 1])
+ np.testing.assert_array_equal(rec[1], self._wrap_check([0.0, 1.0]))
+ expected = [self._wrap_check(['r11', 'r12']), self._wrap_check(['r21'])]
+ self._assertNestedRaggedArrayEqual(rec[2], expected)
+ np.testing.assert_array_equal(rec[3], self._wrap_check([[10.0, 11.0, 12.0], [20.0, 21.0, 22.0]]))
+ np.testing.assert_array_equal(rec[4], self._wrap_check([0, 1]))
+ expected = [self._wrap_check([0, 1]), self._wrap_check([2])]
+ for i, j in zip(rec[5], expected):
+ np.testing.assert_array_equal(i, j)
+
+ def _check_one_row_no_df(self, rec):
+ self.assertEqual(rec[0], 0)
+ self.assertEqual(rec[1], 0.0)
+ np.testing.assert_array_equal(rec[2], self._wrap_check(['r11', 'r12']))
+ np.testing.assert_array_equal(rec[3], self._wrap_check([10.0, 11.0, 12.0]))
+ self.assertEqual(rec[4], 0)
+ np.testing.assert_array_equal(rec[5], self._wrap_check([0, 1]))
+ np.testing.assert_array_equal(rec[6], self._wrap_check([(1.0, 11), (2.0, 12)]))
+
+ def _check_one_row_multiselect_no_df(self, rec):
+ # difference from _check_one_row_no_df is that everything is wrapped in a list
+ self.assertEqual(rec[0], [0])
+ self.assertEqual(rec[1], [0.0])
+ np.testing.assert_array_equal(rec[2], [self._wrap_check(['r11', 'r12'])])
+ np.testing.assert_array_equal(rec[3], [self._wrap_check([10.0, 11.0, 12.0])])
+ self.assertEqual(rec[4], [0])
+ np.testing.assert_array_equal(rec[5], [self._wrap_check([0, 1])])
+ np.testing.assert_array_equal(rec[6], [self._wrap_check([(1.0, 11), (2.0, 12)])])
+
+ def _assertNestedRaggedArrayEqual(self, arr1, arr2):
+ """
+ This is a helper function for _check_two_rows_no_df.
+ It compares arrays or lists containing numpy arrays that may be ragged
+ """
+ self.assertEqual(type(arr1), type(arr2))
+ self.assertEqual(len(arr1), len(arr2))
+ if isinstance(arr1, np.ndarray):
+ if arr1.dtype == object: # both are arrays containing arrays, lists, or h5py.Dataset strings
+ for i, j in zip(arr1, arr2):
+ self._assertNestedRaggedArrayEqual(i, j)
+ elif np.issubdtype(arr1.dtype, np.number):
+ np.testing.assert_allclose(arr1, arr2)
+ else:
+ np.testing.assert_array_equal(arr1, arr2)
+ elif isinstance(arr1, list):
+ for i, j in zip(arr1, arr2):
+ self._assertNestedRaggedArrayEqual(i, j)
+ else: # scalar
+ self.assertEqual(arr1, arr2)
+
+ def test_single_item(self):
+ rec = self.table1[0]
+ self._check_one_row_df(rec)
+
+ def test_single_item_nested(self):
+ rec = self.table1.get(0, index=False)
+ self._check_one_row_df_nested(rec)
def test_single_item_no_df(self):
- elem = self.table.get(0, df=False)
- self.assertEqual(elem[0], 0)
- self.assertEqual(elem[1], 0.0)
- np.testing.assert_array_equal(elem[2], np.array(['r11', 'r12']))
- np.testing.assert_array_equal(elem[3], np.array([10.0, 11.0, 12.0]))
+ rec = self.table1.get(0, df=False)
+ self._check_one_row_no_df(rec)
def test_slice(self):
- elem = self.table[0:2]
- data = OrderedDict()
- data['foo'] = [0.0, 1.0]
- data['bar'] = [np.array(['r11', 'r12']), np.array(['r21'])]
- data['baz'] = [np.array([10.0, 11.0, 12.0]),
- np.array([20.0, 21.0, 22.0])]
- idx = [0, 1]
- exp = pd.DataFrame(data=data, index=pd.Index(name='id', data=idx))
- pd.testing.assert_frame_equal(elem, exp)
+ rec = self.table1[0:2]
+ self._check_two_rows_df(rec)
+
+ def test_slice_nested(self):
+ rec = self.table1.get(slice(0, 2), index=False)
+ self._check_two_rows_df_nested(rec)
def test_slice_no_df(self):
- elem = self.table.get(slice(0, 2), df=False)
- self.assertEqual(elem[0], [0, 1])
- np.testing.assert_array_equal(elem[1], np.array([0.0, 1.0]))
- np.testing.assert_array_equal(elem[2][0], np.array(['r11', 'r12']))
- np.testing.assert_array_equal(elem[2][1], np.array(['r21']))
- np.testing.assert_array_equal(elem[3], np.array([[10.0, 11.0, 12.0], [20.0, 21.0, 22.0]]))
+ rec = self.table1.get(slice(0, 2), df=False)
+ self._check_two_rows_no_df(rec)
+
+ def test_slice_single(self):
+ rec = self.table1[0:1]
+ self._check_one_row_df(rec)
+
+ def test_slice_single_nested(self):
+ rec = self.table1.get(slice(0, 1), index=False)
+ self._check_one_row_df_nested(rec)
+
+ def test_slice_single_no_df(self):
+ rec = self.table1.get(slice(0, 1), df=False)
+ self._check_one_row_multiselect_no_df(rec)
def test_list(self):
- elem = self.table[[0, 1]]
- data = OrderedDict()
- data['foo'] = [0.0, 1.0]
- data['bar'] = [np.array(['r11', 'r12']), np.array(['r21'])]
- data['baz'] = [np.array([10.0, 11.0, 12.0]),
- np.array([20.0, 21.0, 22.0])]
- idx = [0, 1]
- exp = pd.DataFrame(data=data, index=pd.Index(name='id', data=idx))
- pd.testing.assert_frame_equal(elem, exp)
+ rec = self.table1[[0, 1]]
+ self._check_two_rows_df(rec)
+
+ def test_list_nested(self):
+ rec = self.table1.get([0, 1], index=False)
+ self._check_two_rows_df_nested(rec)
def test_list_no_df(self):
- elem = self.table.get([0, 1], df=False)
- self.assertEqual(elem[0], [0, 1])
- np.testing.assert_array_equal(elem[1], np.array([0.0, 1.0]))
- np.testing.assert_array_equal(elem[2][0], np.array(['r11', 'r12']))
- np.testing.assert_array_equal(elem[2][1], np.array(['r21']))
- np.testing.assert_array_equal(elem[3], np.array([[10.0, 11.0, 12.0], [20.0, 21.0, 22.0]]))
+ rec = self.table1.get([0, 1], df=False)
+ self._check_two_rows_no_df(rec)
+
+ def test_list_single(self):
+ rec = self.table1[[0]]
+ self._check_one_row_df(rec)
+
+ def test_list_single_nested(self):
+ rec = self.table1.get([0], index=False)
+ self._check_one_row_df_nested(rec)
+
+ def test_list_single_no_df(self):
+ rec = self.table1.get([0], df=False)
+ self._check_one_row_multiselect_no_df(rec)
+
+ def test_array(self):
+ rec = self.table1[np.array([0, 1])]
+ self._check_two_rows_df(rec)
+
+ def test_array_nested(self):
+ rec = self.table1.get(np.array([0, 1]), index=False)
+ self._check_two_rows_df_nested(rec)
+
+ def test_array_no_df(self):
+ rec = self.table1.get(np.array([0, 1]), df=False)
+ self._check_two_rows_no_df(rec)
+
+ def test_array_single(self):
+ rec = self.table1[np.array([0])]
+ self._check_one_row_df(rec)
+
+ def test_array_single_nested(self):
+ rec = self.table1.get(np.array([0]), index=False)
+ self._check_one_row_df_nested(rec)
+
+ def test_array_single_no_df(self):
+ rec = self.table1.get(np.array([0]), df=False)
+ self._check_one_row_multiselect_no_df(rec)
+
+ def test_to_dataframe_nested(self):
+ rec = self.table1.to_dataframe()
+ self._check_two_rows_df_nested(rec)
+
+ def test_to_dataframe(self):
+ rec = self.table1.to_dataframe(index=True)
+ self._check_two_rows_df(rec)
+
+
+class TestSelectionArray(SelectionTestMixin, TestCase):
+
+ def _wrap(self, my_list):
+ return np.array(my_list)
+
+ def _wrap_check(self, my_list):
+ return self._wrap(my_list)
+
+
+class TestSelectionList(SelectionTestMixin, TestCase):
+
+ def _wrap(self, my_list):
+ return my_list
+
+ def _wrap_check(self, my_list):
+ return self._wrap(my_list)
+
+
+class TestSelectionH5Dataset(SelectionTestMixin, TestCase):
+
+ def setUp(self):
+ self.path = get_temp_filepath()
+ self.file = h5py.File(self.path, 'w')
+ self.dset_counter = 0
+ super().setUp()
+
+ def tearDown(self):
+ super().tearDown()
+ self.file.close()
+ if os.path.exists(self.path):
+ os.remove(self.path)
+
+ def _wrap(self, my_list):
+ self.dset_counter = self.dset_counter + 1
+ kwargs = dict()
+ if isinstance(my_list[0], str):
+ kwargs['dtype'] = H5_TEXT
+ elif isinstance(my_list[0], tuple): # compound dtype
+ # normally for cpd dtype, __resolve_dtype__ takes a list of DtypeSpec objects
+ cpd_type = [dict(name='cpd_float', dtype=np.dtype('float64')),
+ dict(name='cpd_int', dtype=np.dtype('int32'))]
+ kwargs['dtype'] = HDF5IO.__resolve_dtype__(cpd_type, my_list[0])
+ dset = self.file.create_dataset('dset%d' % self.dset_counter, data=np.array(my_list, **kwargs))
+ if H5PY_3 and isinstance(my_list[0], str):
+ return StrDataset(dset, None) # return a wrapper to read data as str instead of bytes
+ else:
+ # NOTE: h5py.Dataset with compound dtype are read as numpy arrays with compound dtype, not tuples
+ return dset
+
+ def _wrap_check(self, my_list):
+ # getitem on h5dataset backed data will return np.array
+ kwargs = dict()
+ if isinstance(my_list[0], str):
+ kwargs['dtype'] = H5_TEXT
+ elif isinstance(my_list[0], tuple):
+ cpd_type = [dict(name='cpd_float', dtype=np.dtype('float64')),
+ dict(name='cpd_int', dtype=np.dtype('int32'))]
+ kwargs['dtype'] = np.dtype([(x['name'], x['dtype']) for x in cpd_type])
+ # compound dtypes with str are read as bytes, see https://github.com/h5py/h5py/issues/1751
+ return np.array(my_list, **kwargs)
class TestVectorIndex(TestCase):
=====================================
tests/unit/test_io_hdf5_h5tools.py
=====================================
@@ -9,7 +9,7 @@ import numpy as np
from h5py import SoftLink, HardLink, ExternalLink, File
from h5py import filters as h5py_filters
from hdmf.backends.hdf5 import H5DataIO
-from hdmf.backends.hdf5.h5tools import HDF5IO, ROOT_NAME, SPEC_LOC_ATTR
+from hdmf.backends.hdf5.h5tools import HDF5IO, ROOT_NAME, SPEC_LOC_ATTR, H5PY_3
from hdmf.backends.io import HDMFIO, UnsupportedOperation
from hdmf.backends.warnings import BrokenLinkWarning
from hdmf.build import (GroupBuilder, DatasetBuilder, BuildManager, TypeMap, ObjectMapper, OrphanContainerBuildError,
@@ -556,22 +556,28 @@ class H5IOTest(TestCase):
self.assertEqual(len(w), 0)
self.assertEqual(dset.io_settings['compression'], 'gzip')
# Make sure a warning is issued when using szip (even if installed)
+ warn_msg = ("szip compression may not be available on all installations of HDF5. Use of gzip is "
+ "recommended to ensure portability of the generated HDF5 files.")
if "szip" in h5py_filters.encode:
- with warnings.catch_warnings(record=True) as w:
+ with self.assertWarnsWith(UserWarning, warn_msg):
dset = H5DataIO(np.arange(30),
compression='szip',
compression_opts=('ec', 16))
- self.assertEqual(len(w), 1)
- self.assertEqual(dset.io_settings['compression'], 'szip')
+ self.assertEqual(dset.io_settings['compression'], 'szip')
else:
with self.assertRaises(ValueError):
- H5DataIO(np.arange(30), compression='szip', compression_opts=('ec', 16))
+ with self.assertWarnsWith(UserWarning, warn_msg):
+ dset = H5DataIO(np.arange(30),
+ compression='szip',
+ compression_opts=('ec', 16))
+ self.assertEqual(dset.io_settings['compression'], 'szip')
# Make sure a warning is issued when using lzf compression
- with warnings.catch_warnings(record=True) as w:
+ warn_msg = ("lzf compression may not be available on all installations of HDF5. Use of gzip is "
+ "recommended to ensure portability of the generated HDF5 files.")
+ with self.assertWarnsWith(UserWarning, warn_msg):
dset = H5DataIO(np.arange(30),
compression='lzf')
- self.assertEqual(len(w), 1)
- self.assertEqual(dset.io_settings['compression'], 'lzf')
+ self.assertEqual(dset.io_settings['compression'], 'lzf')
def test_error_on_unsupported_compression_filter(self):
# Make sure gzip does not raise an error
@@ -584,7 +590,8 @@ class H5IOTest(TestCase):
"recommended to ensure portability of the generated HDF5 files.")
if "szip" not in h5py_filters.encode:
with self.assertRaises(ValueError):
- H5DataIO(np.arange(30), compression='szip', compression_opts=('ec', 16))
+ with self.assertWarnsWith(UserWarning, warn_msg):
+ H5DataIO(np.arange(30), compression='szip', compression_opts=('ec', 16))
else:
try:
with self.assertWarnsWith(UserWarning, warn_msg):
@@ -715,6 +722,22 @@ class H5IOTest(TestCase):
with self.assertRaisesRegex(Exception, r"cannot add \S+ to [/\S]+ - could not determine type"):
self.io.__list_fill__(self.f, 'empty_dataset', [])
+ def test_read_str(self):
+ a = ['a', 'bb', 'ccc', 'dddd', 'e']
+ attr = 'foobar'
+ self.io.write_dataset(self.f, DatasetBuilder('test_dataset', a, attributes={'test_attr': attr}, dtype='text'))
+ self.io.close()
+ with HDF5IO(self.path, 'r') as io:
+ bldr = io.read_builder()
+ np.array_equal(bldr['test_dataset'].data[:], ['a', 'bb', 'ccc', 'dddd', 'e'])
+ np.array_equal(bldr['test_dataset'].attributes['test_attr'], attr)
+ if H5PY_3:
+ self.assertEqual(str(bldr['test_dataset'].data),
+ '<StrDataset for HDF5 dataset "test_dataset": shape (5,), type "|O">')
+ else:
+ self.assertEqual(str(bldr['test_dataset'].data),
+ '<HDF5 dataset "test_dataset": shape (5,), type "|O">')
+
def _get_manager():
@@ -2039,7 +2062,11 @@ class TestLoadNamespaces(TestCase):
'{"name":"my_attr","dtype":"text","doc":"an attr"}]},'
'{"data_type_def":"BiggerFoo","data_type_inc":"BigFoo","doc":"doc"}]}')
old_test_source = f['/specifications/test_core/0.1.0/test']
- old_test_source[()] = old_test_source[()][0:-2] + added_types # strip the ]} from end, then add to groups
+ # strip the ]} from end, then add to groups
+ if H5PY_3: # string datasets are returned as bytes
+ old_test_source[()] = old_test_source[()][0:-2].decode('utf-8') + added_types
+ else:
+ old_test_source[()] = old_test_source[()][0:-2] + added_types
new_ns = ('{"namespaces":[{"doc":"a test namespace","schema":['
'{"namespace":"test_core","my_data_types":["Foo"]},'
'{"source":"test-ext.extensions"}'
=====================================
tests/unit/utils_test/test_utils.py
=====================================
@@ -170,29 +170,29 @@ class TestGetDataShape(TestCase):
class TestToUintArray(TestCase):
def test_ndarray_uint(self):
- arr = np.array([0, 1, 2], dtype=np.uint)
+ arr = np.array([0, 1, 2], dtype=np.uint32)
res = to_uint_array(arr)
np.testing.assert_array_equal(res, arr)
def test_ndarray_int(self):
- arr = np.array([0, 1, 2], dtype=np.int)
+ arr = np.array([0, 1, 2], dtype=np.int32)
res = to_uint_array(arr)
np.testing.assert_array_equal(res, arr)
def test_ndarray_int_neg(self):
- arr = np.array([0, -1, 2], dtype=np.int)
+ arr = np.array([0, -1, 2], dtype=np.int32)
with self.assertRaisesWith(ValueError, 'Cannot convert negative integer values to uint.'):
to_uint_array(arr)
def test_ndarray_float(self):
- arr = np.array([0, 1, 2], dtype=np.float)
+ arr = np.array([0, 1, 2], dtype=np.float64)
with self.assertRaisesWith(ValueError, 'Cannot convert array of dtype float64 to uint.'):
to_uint_array(arr)
def test_list_int(self):
arr = [0, 1, 2]
res = to_uint_array(arr)
- expected = np.array([0, 1, 2], dtype=np.uint)
+ expected = np.array([0, 1, 2], dtype=np.uint32)
np.testing.assert_array_equal(res, expected)
def test_list_int_neg(self):
=====================================
tests/unit/validator_tests/test_errors.py
=====================================
@@ -0,0 +1,54 @@
+from unittest import TestCase
+
+from hdmf.validate.errors import Error
+
+
+class TestErrorEquality(TestCase):
+ def test_self_equality(self):
+ """Verify that one error equals itself"""
+ error = Error('foo', 'bad thing', 'a.b.c')
+ self.assertEqual(error, error)
+
+ def test_equality_with_same_field_values(self):
+ """Verify that two errors with the same field values are equal"""
+ err1 = Error('foo', 'bad thing', 'a.b.c')
+ err2 = Error('foo', 'bad thing', 'a.b.c')
+ self.assertEqual(err1, err2)
+
+ def test_not_equal_with_different_reason(self):
+ """Verify that two errors with a different reason are not equal"""
+ err1 = Error('foo', 'bad thing', 'a.b.c')
+ err2 = Error('foo', 'something else', 'a.b.c')
+ self.assertNotEqual(err1, err2)
+
+ def test_not_equal_with_different_name(self):
+ """Verify that two errors with a different name are not equal"""
+ err1 = Error('foo', 'bad thing', 'a.b.c')
+ err2 = Error('bar', 'bad thing', 'a.b.c')
+ self.assertNotEqual(err1, err2)
+
+ def test_not_equal_with_different_location(self):
+ """Verify that two errors with a different location are not equal"""
+ err1 = Error('foo', 'bad thing', 'a.b.c')
+ err2 = Error('foo', 'bad thing', 'd.e.f')
+ self.assertNotEqual(err1, err2)
+
+ def test_equal_with_no_location(self):
+ """Verify that two errors with no location but the same name are equal"""
+ err1 = Error('foo', 'bad thing')
+ err2 = Error('foo', 'bad thing')
+ self.assertEqual(err1, err2)
+
+ def test_not_equal_with_overlapping_name_when_no_location(self):
+ """Verify that two errors with an overlapping name but no location are
+ not equal
+ """
+ err1 = Error('foo', 'bad thing')
+ err2 = Error('x/y/foo', 'bad thing')
+ self.assertNotEqual(err1, err2)
+
+ def test_equal_with_overlapping_name_when_location_present(self):
+ """Verify that two errors with an overlapping name and a location are equal"""
+ err1 = Error('foo', 'bad thing', 'a.b.c')
+ err2 = Error('x/y/foo', 'bad thing', 'a.b.c')
+ self.assertEqual(err1, err2)
=====================================
tests/unit/validator_tests/test_validate.py
=====================================
@@ -4,13 +4,15 @@ from unittest import mock, skip
import numpy as np
from dateutil.tz import tzlocal
-from hdmf.build import GroupBuilder, DatasetBuilder, LinkBuilder
-from hdmf.spec import GroupSpec, AttributeSpec, DatasetSpec, SpecCatalog, SpecNamespace, LinkSpec
+from hdmf.build import GroupBuilder, DatasetBuilder, LinkBuilder, ReferenceBuilder, TypeMap, BuildManager
+from hdmf.spec import (GroupSpec, AttributeSpec, DatasetSpec, SpecCatalog, SpecNamespace,
+ LinkSpec, RefSpec, NamespaceCatalog, DtypeSpec)
from hdmf.spec.spec import ONE_OR_MANY, ZERO_OR_MANY, ZERO_OR_ONE
-from hdmf.testing import TestCase
+from hdmf.testing import TestCase, remove_test_file
from hdmf.validate import ValidatorMap
from hdmf.validate.errors import (DtypeError, MissingError, ExpectedArrayError, MissingDataType,
IncorrectQuantityError, IllegalLinkError)
+from hdmf.backends.hdf5 import HDF5IO
CORE_NAMESPACE = 'test_core'
@@ -422,7 +424,7 @@ class TestDtypeValidation(TestCase):
def test_bool_for_numeric(self):
"""Test that validator does not allow bool data where numeric is specified."""
self.set_up_spec('numeric')
- value = np.bool(1)
+ value = True
bar_builder = GroupBuilder('my_bar',
attributes={'data_type': 'Bar', 'attr1': value},
datasets=[DatasetBuilder('data', value)])
@@ -821,3 +823,189 @@ class TestMultipleChildrenAtDifferentLevelsOfInheritance(TestCase):
builder = GroupBuilder('my_baz', attributes={'data_type': 'Baz'}, datasets=datasets)
result = self.vmap.validate(builder)
self.assertEqual(len(result), 0)
+
+
+class TestExtendedIncDataTypes(TestCase):
+ """Test validation against specs where a data type is included via data_type_inc
+ and modified by adding new fields or constraining existing fields but is not
+ defined as a new type via data_type_inc.
+
+ For the purpose of this test class: we are calling a data type which is nested
+ inside a group an "inner" data type. When an inner data type inherits from a data type
+ via data_type_inc and has fields that are either added or modified from the base
+ data type, we are labeling that data type as an "extension". When the inner data
+ type extension does not define a new data type via data_type_def we say that it is
+ an "anonymous extension".
+
+ Anonymous data type extensions should be avoided in for new specs, but
+ it does occur in existing nwb
+ specs, so we need to allow and validate against it.
+ One example is the `Units.spike_times` dataset attached to Units in the `core`
+ nwb namespace, which extends `VectorData` via neurodata_type_inc but adds a new
+ attribute named `resolution` without defining a new data type via neurodata_type_def.
+ """
+
+ def setup_spec(self):
+ """Prepare a set of specs for tests which includes an anonymous data type extension"""
+ spec_catalog = SpecCatalog()
+ attr_foo = AttributeSpec(name='foo', doc='an attribute', dtype='text')
+ attr_bar = AttributeSpec(name='bar', doc='an attribute', dtype='numeric')
+ d1_spec = DatasetSpec(doc='type D1', data_type_def='D1', dtype='numeric',
+ attributes=[attr_foo])
+ d2_spec = DatasetSpec(doc='type D2', data_type_def='D2', data_type_inc=d1_spec)
+ g1_spec = GroupSpec(doc='type G1', data_type_def='G1',
+ datasets=[DatasetSpec(doc='D1 extension', data_type_inc=d1_spec,
+ attributes=[attr_foo, attr_bar])])
+ for spec in [d1_spec, d2_spec, g1_spec]:
+ spec_catalog.register_spec(spec, 'test.yaml')
+ self.namespace = SpecNamespace('a test namespace', CORE_NAMESPACE,
+ [{'source': 'test.yaml'}], version='0.1.0', catalog=spec_catalog)
+ self.vmap = ValidatorMap(self.namespace)
+
+ def test_missing_additional_attribute_on_anonymous_data_type_extension(self):
+ """Verify that a MissingError is returned when a required attribute from an
+ anonymous extension is not present
+ """
+ self.setup_spec()
+ dataset = DatasetBuilder('test_d1', 42.0, attributes={'data_type': 'D1', 'foo': 'xyz'})
+ builder = GroupBuilder('test_g1', attributes={'data_type': 'G1'}, datasets=[dataset])
+ result = self.vmap.validate(builder)
+ self.assertEqual(len(result), 1)
+ error = result[0]
+ self.assertIsInstance(error, MissingError)
+ self.assertTrue('G1/D1/bar' in str(error))
+
+ def test_validate_child_type_against_anonymous_data_type_extension(self):
+ """Verify that a MissingError is returned when a required attribute from an
+ anonymous extension is not present on a data type which inherits from the data
+ type included in the anonymous extension.
+ """
+ self.setup_spec()
+ dataset = DatasetBuilder('test_d2', 42.0, attributes={'data_type': 'D2', 'foo': 'xyz'})
+ builder = GroupBuilder('test_g1', attributes={'data_type': 'G1'}, datasets=[dataset])
+ result = self.vmap.validate(builder)
+ self.assertEqual(len(result), 1)
+ error = result[0]
+ self.assertIsInstance(error, MissingError)
+ self.assertTrue('G1/D1/bar' in str(error))
+
+ def test_redundant_attribute_in_spec(self):
+ """Test that only one MissingError is returned when an attribute is missing
+ which is redundantly defined in both a base data type and an inner data type
+ """
+ self.setup_spec()
+ dataset = DatasetBuilder('test_d2', 42.0, attributes={'data_type': 'D2', 'bar': 5})
+ builder = GroupBuilder('test_g1', attributes={'data_type': 'G1'}, datasets=[dataset])
+ result = self.vmap.validate(builder)
+ self.assertEqual(len(result), 1)
+
+
+class TestReferenceDatasetsRoundTrip(ValidatorTestBase):
+ """Test that no errors occur when when datasets containing references either in an
+ array or as part of a compound type are written out to file, read back in, and
+ then validated.
+
+ In order to support lazy reading on loading, datasets containing references are
+ wrapped in lazy-loading ReferenceResolver objects. These tests verify that the
+ validator can work with these ReferenceResolver objects.
+ """
+
+ def setUp(self):
+ self.filename = 'test_ref_dataset.h5'
+ super().setUp()
+
+ def tearDown(self):
+ remove_test_file(self.filename)
+ super().tearDown()
+
+ def getSpecs(self):
+ qux_spec = DatasetSpec(
+ doc='a simple scalar dataset',
+ data_type_def='Qux',
+ dtype='int',
+ shape=None
+ )
+ baz_spec = DatasetSpec(
+ doc='a dataset with a compound datatype that includes a reference',
+ data_type_def='Baz',
+ dtype=[
+ DtypeSpec('x', doc='x-value', dtype='int'),
+ DtypeSpec('y', doc='y-ref', dtype=RefSpec('Qux', reftype='object'))
+ ],
+ shape=None
+ )
+ bar_spec = DatasetSpec(
+ doc='a dataset of an array of references',
+ dtype=RefSpec('Qux', reftype='object'),
+ data_type_def='Bar',
+ shape=(None,)
+ )
+ foo_spec = GroupSpec(
+ doc='a base group for containing test datasets',
+ data_type_def='Foo',
+ datasets=[
+ DatasetSpec(doc='optional Bar', data_type_inc=bar_spec, quantity=ZERO_OR_ONE),
+ DatasetSpec(doc='optional Baz', data_type_inc=baz_spec, quantity=ZERO_OR_ONE),
+ DatasetSpec(doc='multiple qux', data_type_inc=qux_spec, quantity=ONE_OR_MANY)
+ ]
+ )
+ return (foo_spec, bar_spec, baz_spec, qux_spec)
+
+ def runBuilderRoundTrip(self, builder):
+ """Executes a round-trip test for a builder
+
+ 1. First writes the builder to file,
+ 2. next reads a new builder from disk
+ 3. and finally runs the builder through the validator.
+ The test is successful if there are no validation errors."""
+ ns_catalog = NamespaceCatalog()
+ ns_catalog.add_namespace(self.namespace.name, self.namespace)
+ typemap = TypeMap(ns_catalog)
+ self.manager = BuildManager(typemap)
+
+ with HDF5IO(self.filename, manager=self.manager, mode='w') as write_io:
+ write_io.write_builder(builder)
+
+ with HDF5IO(self.filename, manager=self.manager, mode='r') as read_io:
+ read_builder = read_io.read_builder()
+ errors = self.vmap.validate(read_builder)
+ self.assertEqual(len(errors), 0, errors)
+
+ def test_round_trip_validation_of_reference_dataset_array(self):
+ """Verify that a dataset builder containing an array of references passes
+ validation after a round trip"""
+ qux1 = DatasetBuilder('q1', 5, attributes={'data_type': 'Qux'})
+ qux2 = DatasetBuilder('q2', 10, attributes={'data_type': 'Qux'})
+ bar = DatasetBuilder(
+ name='bar',
+ data=[ReferenceBuilder(qux1), ReferenceBuilder(qux2)],
+ attributes={'data_type': 'Bar'},
+ dtype='object'
+ )
+ foo = GroupBuilder(
+ name='foo',
+ datasets=[bar, qux1, qux2],
+ attributes={'data_type': 'Foo'}
+ )
+ self.runBuilderRoundTrip(foo)
+
+ def test_round_trip_validation_of_compound_dtype_with_reference(self):
+ """Verify that a dataset builder containing data with a compound dtype
+ containing a reference passes validation after a round trip"""
+ qux1 = DatasetBuilder('q1', 5, attributes={'data_type': 'Qux'})
+ qux2 = DatasetBuilder('q2', 10, attributes={'data_type': 'Qux'})
+ baz = DatasetBuilder(
+ name='baz',
+ data=[(10, ReferenceBuilder(qux1))],
+ dtype=[
+ DtypeSpec('x', doc='x-value', dtype='int'),
+ DtypeSpec('y', doc='y-ref', dtype=RefSpec('Qux', reftype='object'))
+ ],
+ attributes={'data_type': 'Baz'}
+ )
+ foo = GroupBuilder(
+ name='foo',
+ datasets=[baz, qux1, qux2],
+ attributes={'data_type': 'Foo'}
+ )
+ self.runBuilderRoundTrip(foo)
=====================================
tox.ini
=====================================
@@ -4,7 +4,7 @@
# and then run "tox" from this directory.
[tox]
-envlist = py36, py37, py38, py39
+envlist = py37, py38, py39
[testenv]
usedevelop = True
@@ -22,29 +22,11 @@ commands =
# Env to create coverage report locally
[testenv:localcoverage]
-basepython = python3.8
+basepython = python3.9
commands =
python -m coverage run test.py -u
coverage html -d tests/coverage/htmlcov
-# Test with python 3.8, pinned dev reqs, and upgraded run requirements
-[testenv:py38-upgrade-dev]
-basepython = python3.8
-install_command =
- pip install -U -e . {opts} {packages}
-deps =
- -rrequirements-dev.txt
-commands = {[testenv]commands}
-
-# Test with python 3.8, pinned dev reqs, and pre-release run requirements
-[testenv:py38-upgrade-dev-pre]
-basepython = python3.8
-install_command =
- pip install -U --pre -e . {opts} {packages}
-deps =
- -rrequirements-dev.txt
-commands = {[testenv]commands}
-
# Test with python 3.9, pinned dev reqs, and upgraded run requirements
[testenv:py39-upgrade-dev]
basepython = python3.9
@@ -63,9 +45,9 @@ deps =
-rrequirements-dev.txt
commands = {[testenv]commands}
-# Test with python 3.6, pinned dev reqs, and minimum run requirements
-[testenv:py36-min-req]
-basepython = python3.6
+# Test with python 3.7, pinned dev reqs, and minimum run requirements
+[testenv:py37-min-req]
+basepython = python3.7
deps =
-rrequirements-dev.txt
-rrequirements-min.txt
@@ -77,10 +59,6 @@ commands =
python setup.py sdist
python setup.py bdist_wheel
-[testenv:build-py36]
-basepython = python3.6
-commands = {[testenv:build]commands}
-
[testenv:build-py37]
basepython = python3.7
commands = {[testenv:build]commands}
@@ -93,22 +71,6 @@ commands = {[testenv:build]commands}
basepython = python3.9
commands = {[testenv:build]commands}
-[testenv:build-py38-upgrade-dev]
-basepython = python3.8
-install_command =
- pip install -U -e . {opts} {packages}
-deps =
- -rrequirements-dev.txt
-commands = {[testenv:build]commands}
-
-[testenv:build-py38-upgrade-dev-pre]
-basepython = python3.8
-install_command =
- pip install -U --pre -e . {opts} {packages}
-deps =
- -rrequirements-dev.txt
-commands = {[testenv:build]commands}
-
[testenv:build-py39-upgrade-dev]
basepython = python3.9
install_command =
@@ -125,8 +87,8 @@ deps =
-rrequirements-dev.txt
commands = {[testenv:build]commands}
-[testenv:build-py36-min-req]
-basepython = python3.6
+[testenv:build-py37-min-req]
+basepython = python3.7
deps =
-rrequirements-dev.txt
-rrequirements-min.txt
@@ -150,11 +112,6 @@ deps =
commands =
python test.py --example
-[testenv:gallery-py36]
-basepython = python3.6
-deps = {[testenv:gallery]deps}
-commands = {[testenv:gallery]commands}
-
[testenv:gallery-py37]
basepython = python3.7
deps = {[testenv:gallery]deps}
@@ -170,26 +127,6 @@ basepython = python3.9
deps = {[testenv:gallery]deps}
commands = {[testenv:gallery]commands}
-# Test with python 3.8, pinned dev and doc reqs, and upgraded run requirements
-[testenv:gallery-py38-upgrade-dev]
-basepython = python3.8
-install_command =
- pip install -U -e . {opts} {packages}
-deps =
- -rrequirements-dev.txt
- -rrequirements-doc.txt
-commands = {[testenv:gallery]commands}
-
-# Test with python 3.8, pinned dev and doc reqs, and pre-release run requirements
-[testenv:gallery-py38-upgrade-dev-pre]
-basepython = python3.8
-install_command =
- pip install -U --pre -e . {opts} {packages}
-deps =
- -rrequirements-dev.txt
- -rrequirements-doc.txt
-commands = {[testenv:gallery]commands}
-
# Test with python 3.9, pinned dev and doc reqs, and upgraded run requirements
[testenv:gallery-py39-upgrade-dev]
basepython = python3.9
@@ -210,9 +147,9 @@ deps =
-rrequirements-doc.txt
commands = {[testenv:gallery]commands}
-# Test with python 3.6, pinned dev reqs, and minimum run requirements
-[testenv:gallery-py36-min-req]
-basepython = python3.6
+# Test with python 3.7, pinned dev reqs, and minimum run requirements
+[testenv:gallery-py37-min-req]
+basepython = python3.7
deps =
-rrequirements-dev.txt
-rrequirements-min.txt
View it on GitLab: https://salsa.debian.org/med-team/hdmf/-/commit/a6cddcdda5d32c767d765e04e0a2a321c5d24f1e
--
View it on GitLab: https://salsa.debian.org/med-team/hdmf/-/commit/a6cddcdda5d32c767d765e04e0a2a321c5d24f1e
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20210709/ddfc8df0/attachment-0001.htm>
More information about the debian-med-commit
mailing list