[med-svn] [Git][med-team/python-biom-format][upstream] New upstream version 2.1.9

Andreas Tille gitlab at salsa.debian.org
Thu Nov 5 18:20:54 GMT 2020



Andreas Tille pushed to branch upstream at Debian Med / python-biom-format


Commits:
29fca8e9 by Andreas Tille at 2020-11-05T19:06:55+01:00
New upstream version 2.1.9
- - - - -


12 changed files:

- .travis.yml
- ChangeLog.md
- MANIFEST.in
- biom/__init__.py
- biom/table.py
- biom/tests/test_parse.py
- biom/tests/test_table.py
- biom/util.py
- doc/conf.py
- doc/index.rst
- + pyproject.toml
- setup.py


Changes:

=====================================
.travis.yml
=====================================
@@ -1,9 +1,9 @@
 # Modified from https://github.com/biocore/scikit-bio
 language: python
 env:
-  - PYTHON_VERSION=3.6 WITH_DOCTEST=True 
-  - PYTHON_VERSION=3.7 WITH_DOCTEST=True 
-  - PYTHON_VERSION=3.8 WITH_DOCTEST=True 
+  - PYTHON_VERSION=3.6 WITH_DOCTEST=True
+  - PYTHON_VERSION=3.7 WITH_DOCTEST=True
+  - PYTHON_VERSION=3.8 WITH_DOCTEST=True
 before_install:
   - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
   - chmod +x miniconda.sh
@@ -11,15 +11,15 @@ before_install:
   - export PATH=/home/travis/miniconda3/bin:$PATH
 install:
   - conda create --yes -n env_name python=$PYTHON_VERSION pip click numpy "scipy>=1.3.1" pep8 flake8 coverage future six "pandas>=0.20.0" nose h5py>=2.2.0 cython
-  - rm biom/*.c
   - source activate env_name
-  - if [ ${PYTHON_VERSION} = "3.6" ]; then pip install sphinx==1.2.2; fi
+  - if [ ${PYTHON_VERSION} = "3.7" ]; then pip install sphinx==1.2.2 "docutils<0.14"; fi
   - pip install coveralls
+  - pip install anndata
   - pip install -e . --no-deps
 script:
-  - make test 
+  - make test
   - biom show-install-info
-  - if [ ${PYTHON_VERSION} = "3.6" ]; then make -C doc html; fi
+  - if [ ${PYTHON_VERSION} = "3.7" ]; then make -C doc html; fi
   # we can only validate the tables if we have H5PY
   - for table in examples/*hdf5.biom; do echo ${table}; biom validate-table -i ${table}; done
   # validate JSON formatted tables


=====================================
ChangeLog.md
=====================================
@@ -1,10 +1,32 @@
 BIOM-Format ChangeLog
 =====================
 
+biom 2.1.9
+----------
+
+New features and support for Pandas >= 1.0, released on 5 November 2020.
+
+Important:
+
+* Cython and numpy are no more required to be present before building, see [PR #840](https://github.com/biocore/biom-format/pull/840)
+
+New Features:
+
+* Added support for the AnnData format, see [PR #845](https://github.com/biocore/biom-format/pull/845)
+* Performance boost to `Table.remove_empty`. For large tables this cuts the running time from 20 seconds to ~1.1 seconds, see [PR #847](https://github.com/biocore/biom-format/pull/847)
+* A much faster way to merge tables (without metadata) has been added. For large tables, this was a few minutes rather than a few hours. This method is implicitly invoked when calling `Table.merge` if unioning both axes, and the tables lack metadata. `Table.concat` is still much faster, but assumes one axis is disjoint. See [PR #848](https://github.com/biocore/biom-format/pull/848).
+
+* Simplify interaction with the concatenation method, allowing for passing in an individual table and support for a general `biom.concat(tables)` wrapper. See [PR #851](https://github.com/biocore/biom-format/pull/851).
+* Added support for parsing adjacency table structures, see [issue #823](https://github.com/biocore/biom-format/issues/823). 
+
+Bug fixes:
+
+* Support for pandas >= 1.0, see the comment and commits [here](https://github.com/biocore/biom-format/issues/837#issuecomment-721241751)
+
 biom 2.1.8
 ----------
 
-New features and bug fixes, released on 6 January 2020.
+New features and bug fixes, released on 28 January 2020.
 
 Important:
 
@@ -18,6 +40,7 @@ New Features:
 * The detailed report is no longer part of the table validator. See [issue #378](https://github.com/biocore/biom-format/issues/378).
 * `load_table` now accepts open file handles. See [issue #481](https://github.com/biocore/biom-format/issues/481).
 * `biom export-metadata` has been added to export metadata as TSV. See [issue #820](https://github.com/biocore/biom-format/issues/820).
+* `Table.to_tsv` has been modified to allow for `direct_io`. See [issue #836](https://github.com/biocore/biom-format/pull/836).
 
 Bug fixes:
 


=====================================
MANIFEST.in
=====================================
@@ -14,5 +14,6 @@ prune docs/_build
 global-exclude *.pyc
 global-exclude *.pyo
 global-exclude .git
+global-exclude *.c
 global-exclude *.so
 global-exclude .*.swp


=====================================
biom/__init__.py
=====================================
@@ -73,5 +73,26 @@ example_table = Table([[0, 1, 2], [3, 4, 5]], ['O1', 'O2'],
                        {'environment': 'A'}], input_is_dense=True)
 
 
+def concat(tables, *args, **kwargs):
+    """Wrapper for biom.Table.concat which requires a table instance
+
+    Parameters
+    ----------
+    tables : iterable of biom.Table, or a single biom.Table instance
+        Tables to concatenate
+
+    Raises
+    ------
+    DisjointIDError
+        If IDs over the axis are not disjoint.
+
+    Returns
+    -------
+    biom.Table
+        A table object reflecting the concatenation of the tables.
+    """
+    return tables[0].concat(tables[1:], *args, **kwargs)
+
+
 __all__ = ['Table', 'example_table', 'parse_table', 'load_table',
            '__format_version__', '__version__']


=====================================
biom/table.py
=====================================
@@ -179,7 +179,7 @@ from copy import deepcopy
 from datetime import datetime
 from json import dumps
 from functools import reduce, partial
-from operator import itemgetter
+from operator import itemgetter, or_
 from future.builtins import zip
 from future.utils import viewitems
 from collections import defaultdict, Hashable, Iterable
@@ -187,7 +187,7 @@ from numpy import ndarray, asarray, zeros, newaxis
 from scipy.sparse import (coo_matrix, csc_matrix, csr_matrix, isspmatrix,
                           vstack, hstack)
 import pandas as pd
-
+import re
 import six
 from future.utils import string_types as _future_string_types
 from biom.exception import (TableException, UnknownAxisError, UnknownIDError,
@@ -1547,7 +1547,7 @@ class Table(object):
 
     def delimited_self(self, delim=u'\t', header_key=None, header_value=None,
                        metadata_formatter=str,
-                       observation_column_name=u'#OTU ID'):
+                       observation_column_name=u'#OTU ID', direct_io=None):
         """Return self as a string in a delimited form
 
         Default str output for the Table is just row/col ids and table data
@@ -1599,20 +1599,35 @@ class Table(object):
         else:
             output = ['# Constructed from biom file',
                       '%s%s%s' % (observation_column_name, delim, samp_ids)]
+
+        if direct_io is not None:
+            direct_io.writelines([i+"\n" for i in output])
+
         obs_metadata = self.metadata(axis='observation')
-        for obs_id, obs_values in zip(self.ids(axis='observation'),
+        iterable = self.ids(axis='observation')
+        end_line = '' if direct_io is None else '\n'
+
+        for obs_id, obs_values in zip(iterable,
                                       self._iter_obs()):
             str_obs_vals = delim.join(map(str, self._to_dense(obs_values)))
-
             obs_id = to_utf8(obs_id)
             if header_key and obs_metadata is not None:
                 md = obs_metadata[self._obs_index[obs_id]]
                 md_out = metadata_formatter(md.get(header_key, None))
-                output.append(
-                    u'%s%s%s\t%s' %
-                    (obs_id, delim, str_obs_vals, md_out))
+                output_row = u'%s%s%s\t%s%s' % \
+                    (obs_id, delim, str_obs_vals, md_out, end_line)
+
+                if direct_io is None:
+                    output.append(output_row)
+                else:
+                    direct_io.write(output_row)
             else:
-                output.append(u'%s%s%s' % (obs_id, delim, str_obs_vals))
+                output_row = u'%s%s%s%s' % \
+                            (obs_id, delim, str_obs_vals, end_line)
+                if direct_io is None:
+                    output.append(output_row)
+                else:
+                    direct_io.write((output_row))
 
         return '\n'.join(output)
 
@@ -3201,7 +3216,7 @@ class Table(object):
             axes = [axis]
 
         for ax in axes:
-            table.filter(lambda v, i, md: (v > 0).sum(), axis=ax)
+            table.filter(table.ids(axis=ax)[table.sum(axis=ax) > 0], axis=ax)
 
         return table
 
@@ -3288,7 +3303,7 @@ class Table(object):
 
         Parameters
         ----------
-        others : iterable of biom.Table
+        others : iterable of biom.Table, or a single biom.Table instance
             Tables to concatenate
         axis : {'sample', 'observation'}, optional
             The axis to concatenate on. i.e., if axis is 'sample', then tables
@@ -3332,6 +3347,9 @@ class Table(object):
         O5	0.0	0.0	0.0	0.0	0.0	0.0	15.0	16.0	17.0
 
         """
+        if isinstance(others, self.__class__):
+            others = [others, ]
+
         # we grow along the opposite axis
         invaxis = self._invert_axis(axis)
         if axis == 'sample':
@@ -3442,6 +3460,67 @@ class Table(object):
 
         return concat
 
+    def _fast_merge(self, others):
+        """For simple merge operations it is faster to aggregate using pandas
+
+        Parameters
+        ----------
+        others : Table, or Iterable of Table
+            If a Table, then merge with that table. If an iterable, then merge
+            all of the tables
+        """
+        tables = [self] + others
+
+        # gather all identifiers across tables
+        all_features = reduce(or_, [set(t.ids(axis='observation'))
+                                    for t in tables])
+        all_samples = reduce(or_, [set(t.ids()) for t in tables])
+
+        # generate unique integer ids for the identifiers, and let's order
+        # it to be polite
+        feature_map = {i: idx for idx, i in enumerate(sorted(all_features))}
+        sample_map = {i: idx for idx, i in enumerate(sorted(all_samples))}
+
+        # produce a new stable order
+        get1 = lambda x: x[1]  # noqa
+        feature_order = [k for k, v in sorted(feature_map.items(), key=get1)]
+        sample_order = [k for k, v in sorted(sample_map.items(), key=get1)]
+
+        mi = []
+        values = []
+        for table in tables:
+            # these data are effectively [((row_index, col_index), value), ]
+            data_as_dok = table.matrix_data.todok()
+
+            # construct a map of the feature integer index to what it is in
+            # the full table
+            feat_ids = table.ids(axis='observation')
+            samp_ids = table.ids()
+            table_features = {idx: feature_map[i]
+                              for idx, i in enumerate(feat_ids)}
+            table_samples = {idx: sample_map[i]
+                             for idx, i in enumerate(samp_ids)}
+
+            for (f, s), v in data_as_dok.items():
+                # collect the indices and values, adjusting the indices as we
+                # go
+                mi.append((table_features[f], table_samples[s]))
+                values.append(v)
+
+        # construct a multiindex of the indices where the outer index is the
+        # feature and the inner index is the sample
+        mi = pd.MultiIndex.from_tuples(mi)
+        grouped = pd.Series(values, index=mi)
+
+        # aggregate the values where the outer and inner values in the
+        # multiindex are the same
+        collapsed_rcv = grouped.groupby(level=[0, 1]).sum()
+
+        # convert into a representation understood by the Table constructor
+        list_list = [[r, c, v] for (r, c), v in collapsed_rcv.items()]
+
+        return self.__class__(list_list, feature_order, sample_order)
+
     def merge(self, other, sample='union', observation='union',
               sample_metadata_f=prefer_self,
               observation_metadata_f=prefer_self):
@@ -3458,8 +3537,9 @@ class Table(object):
 
         Parameters
         ----------
-        other : biom.Table
-            The other table to merge with this one
+        other : biom.Table or Iterable of Table
+            The other table to merge with this one. If an iterable, the tables
+            are expected to not have metadata.
         sample : {'union', 'intersection'}, optional
         observation : {'union', 'intersection'}, optional
         sample_metadata_f : function, optional
@@ -3476,6 +3556,8 @@ class Table(object):
 
         Notes
         -----
+        - If ``sample_metadata_f`` and ``observation_metadata_f`` are None,
+            then a fast merge is applied.
         - There is an implicit type conversion to ``float``.
         - The return type is always that of ``self``
 
@@ -3505,6 +3587,19 @@ class Table(object):
         O3	10.0	10.0
 
         """
+        s_md = self.metadata()
+        o_md = self.metadata(axis='observation')
+        no_md = (s_md is None) and (o_md is None)
+        ignore_md = (sample_metadata_f is None) and \
+            (observation_metadata_f is None)
+
+        if no_md or ignore_md:
+            if sample == 'union' and observation == 'union':
+                if isinstance(other, (list, set, tuple)):
+                    return self._fast_merge(other)
+                else:
+                    return self._fast_merge([other, ])
+
         # determine the sample order in the resulting table
         if sample == 'union':
             new_samp_order = self._union_id_order(self.ids(), other.ids())
@@ -3972,7 +4067,7 @@ html
                                      for i in keep])
             # Create the new indptr
             indptr_subset = np.array([end - start
-                                     for start, end in indptr_indices])
+                                      for start, end in indptr_indices])
             indptr = np.empty(len(keep) + 1, dtype=np.int32)
             indptr[0] = 0
             indptr[1:] = indptr_subset.cumsum()
@@ -4045,13 +4140,62 @@ html
             mat = self.matrix_data.toarray()
             constructor = pd.DataFrame
         else:
-            mat = self.matrix_data
-            constructor = partial(pd.SparseDataFrame,
-                                  default_fill_value=0,
-                                  copy=True)
+            mat = self.matrix_data.copy()
+            constructor = partial(pd.DataFrame.sparse.from_spmatrix)
 
         return constructor(mat, index=index, columns=columns)
 
+    def to_anndata(self, dense=False, dtype="float32", transpose=True):
+        """Convert Table to AnnData format
+
+        Parameters
+        ----------
+        dense : bool, optional
+            If True, set adata.X as np.ndarray instead of sparse matrix.
+        dtype: str, optional
+            dtype used for storage in anndata object.
+        tranpose: bool, optional
+            If True, transpose the anndata so that observations are columns
+
+        Returns
+        -------
+        anndata.AnnData
+            AnnData with matrix data and associated observation and
+            sample metadata.
+
+        Notes
+        -----
+        Nested metadata are not included.
+
+        Examples
+        --------
+        >>> from biom import example_table
+        >>> adata = example_table.to_anndata()
+        >>> adata
+        AnnData object with n_obs × n_vars = 3 × 2
+            obs: 'environment'
+            var: 'taxonomy_0', 'taxonomy_1'
+        """
+        try:
+            import anndata
+        except ImportError:
+            raise ImportError(
+                "Please install anndata package -- `pip install anndata`"
+            )
+        mat = self.matrix_data
+
+        if dense:
+            mat = mat.toarray()
+
+        var = self.metadata_to_dataframe("sample")
+        obs = self.metadata_to_dataframe("observation")
+
+        adata = anndata.AnnData(mat, obs=obs, var=var, dtype=dtype)
+        # Convention for scRNA-seq analysis in Python
+        adata = adata.transpose()
+
+        return adata
+
     def metadata_to_dataframe(self, axis):
         """Convert axis metadata to a Pandas DataFrame
 
@@ -4596,6 +4740,103 @@ html
                                       matrix_element_type, shape,
                                       u''.join(data), rows, columns])
 
+    @staticmethod
+    def from_adjacency(lines):
+        """Parse an adjacency format into BIOM
+
+        Parameters
+        ----------
+        lines : list, str, or file-like object
+            The tab delimited data to parse
+
+        Returns
+        -------
+        biom.Table
+            A BIOM ``Table`` object
+
+        Notes
+        -----
+        The input is expected to be of the form: observation, sample, value. A
+        header is not required, but if present, it must be of the form:
+
+        #OTU ID<tab>SampleID<tab>value
+
+        Raises
+        ------
+        ValueError
+            If the input is not an iterable or file-like object.
+        ValueError
+            If the data is incorrectly formatted.
+
+        Examples
+        --------
+        Parse tab separated adjacency data into a table:
+
+        >>> from biom.table import Table
+        >>> from io import StringIO
+        >>> data = 'a\\tb\\t1\\na\\tc\\t2\\nd\\tc\\t3'
+        >>> data_fh = StringIO(data)
+        >>> test_table = Table.from_adjacency(data_fh)
+        """
+        if not isinstance(lines, (list, tuple)):
+            if hasattr(lines, 'readlines'):
+                lines = lines.readlines()
+            elif hasattr(lines, 'splitlines'):
+                lines = lines.splitlines()
+            else:
+                raise ValueError("Not sure how to handle this input")
+
+        def is_num(item):
+            # from https://stackoverflow.com/a/23059703
+            numeric = re.compile(r'(?=.)([+-]?([0-9]*)(\.([0-9]+))?)([eE][+-]?\d+)?')  # noqa
+            match = numeric.match(item)
+            start, stop = match.span()
+            if (stop - start) == len(item):
+                return True
+            else:
+                return False
+
+        # sanity check and determine if we have a header or not
+        lh = lines[0].strip().split('\t')
+        if len(lh) != 3:
+            raise ValueError("Does not appear to be an adjacency format")
+        elif lh == ['#OTU ID', 'SampleID', 'value']:
+            include_line_zero = False
+        elif is_num(lh[2]):
+            # allow anything for columns 1 and 2, but test that column 3 is
+            # numeric
+            include_line_zero = True
+        else:
+            raise ValueError("Does not appear to be an adjacency format")
+
+        if not include_line_zero:
+            lines = lines[1:]
+
+        # extract the entities
+        observations = []
+        samples = []
+        values = []
+        for line in lines:
+            parts = line.split('\t')
+            assert len(parts) == 3
+            observations.append(parts[0])
+            samples.append(parts[1])
+            values.append(float(parts[2]))
+
+        # determine a stable order and index positioning for the identifiers
+        obs_order = sorted(set(observations))
+        samp_order = sorted(set(samples))
+        obs_index = {o: i for i, o in enumerate(obs_order)}
+        samp_index = {s: i for i, s in enumerate(samp_order)}
+
+        # fill the matrix
+        row = np.array([obs_index[obs] for obs in observations], dtype=int)
+        col = np.array([samp_index[samp] for samp in samples], dtype=int)
+        data = np.asarray(values)
+        mat = coo_matrix((data, (row, col)))
+
+        return Table(mat, obs_order, samp_order)
+
     @staticmethod
     def from_tsv(lines, obs_mapping, sample_mapping,
                  process_func, **kwargs):
@@ -4629,7 +4870,7 @@ html
         >>> test_table = Table.from_tsv(tsv_fh, None, None, func)
         """
         (sample_ids, obs_ids, data, t_md,
-            t_md_name) = Table._extract_data_from_tsv(lines, **kwargs)
+         t_md_name) = Table._extract_data_from_tsv(lines, **kwargs)
 
         # if we have it, keep it
         if t_md is None:
@@ -4797,7 +5038,9 @@ html
         return samp_ids, obs_ids, data, metadata, md_name
 
     def to_tsv(self, header_key=None, header_value=None,
-               metadata_formatter=str, observation_column_name='#OTU ID'):
+               metadata_formatter=str,
+               observation_column_name='#OTU ID',
+               direct_io=None):
         """Return self as a string in tab delimited form
 
         Default ``str`` output for the ``Table`` is just row/col ids and table
@@ -4815,6 +5058,10 @@ html
         observation_column_name : str, optional
             Defaults to "#OTU ID". The name of the first column in the output
             table, corresponding to the observation IDs.
+        direct_io : file or file-like object, optional
+            Defaults to ``None``. Must implement a ``write`` function. If
+            `direct_io` is not ``None``, the final output is written directly
+            to `direct_io` during processing.
 
         Returns
         -------
@@ -4838,10 +5085,13 @@ html
         #OTU ID	S1	S2	S3
         O1	0.0	0.0	1.0
         O2	1.0	3.0	42.0
+        >>> with open("result.tsv", "w") as f:
+                table.to_tsv(direct_io=f)
         """
         return self.delimited_self(u'\t', header_key, header_value,
                                    metadata_formatter,
-                                   observation_column_name)
+                                   observation_column_name,
+                                   direct_io=direct_io)
 
 
 def coo_arrays_to_sparse(data, dtype=np.float64, shape=None):


=====================================
biom/tests/test_parse.py
=====================================
@@ -229,6 +229,51 @@ class ParseTests(TestCase):
         self.assertEqual(tab.metadata(), None)
         self.assertEqual(tab.metadata(axis='observation'), None)
 
+    def test_parse_adjacency_weird_input(self):
+        with self.assertRaisesRegex(ValueError, "Not sure"):
+            Table.from_adjacency({'foo', 'bar'})
+
+    def test_parse_adjacency_bad_input(self):
+        with self.assertRaisesRegex(ValueError, "Does not appear"):
+            Table.from_adjacency(['a\tb\tc\n', 'd\te\tf\n'])
+
+        with self.assertRaisesRegex(ValueError, "Does not appear"):
+            Table.from_adjacency(['a\tb\n', 'd\te\n'])
+
+        with self.assertRaises(AssertionError):
+            Table.from_adjacency(['a\tb\t1\n', 'd\te\n'])
+
+    def test_parse_adjacency_table_header(self):
+        lines = ['#OTU ID\tSampleID\tvalue\n',
+                 'O1\tS1\t10\n',
+                 'O4\tS2\t1\n',
+                 'O3\tS3\t2\n',
+                 'O4\tS1\t5\n',
+                 'O2\tS2\t3\n']
+        exp = Table(np.array([[10, 0, 0],
+                              [0, 3, 0],
+                              [0, 0, 2],
+                              [5, 1, 0]]),
+                    ['O1', 'O2', 'O3', 'O4'],
+                    ['S1', 'S2', 'S3'])
+        obs = Table.from_adjacency(''.join(lines))
+        self.assertEqual(obs, exp)
+
+    def test_parse_adjacency_table_no_header(self):
+        lines = ['O1\tS1\t10\n',
+                 'O4\tS2\t1\n',
+                 'O3\tS3\t2\n',
+                 'O4\tS1\t5\n',
+                 'O2\tS2\t3\n']
+        exp = Table(np.array([[10, 0, 0],
+                              [0, 3, 0],
+                              [0, 0, 2],
+                              [5, 1, 0]]),
+                    ['O1', 'O2', 'O3', 'O4'],
+                    ['S1', 'S2', 'S3'])
+        obs = Table.from_adjacency(''.join(lines))
+        self.assertEqual(obs, exp)
+
     @npt.dec.skipif(HAVE_H5PY is False, msg='H5PY is not installed')
     def test_parse_biom_table_hdf5(self):
         """Make sure we can parse a HDF5 table through the same loader"""


=====================================
biom/tests/test_table.py
=====================================
@@ -19,8 +19,9 @@ from scipy.sparse import lil_matrix, csr_matrix, csc_matrix
 import scipy.sparse
 import pandas.util.testing as pdt
 import pandas as pd
+import pytest
 
-from biom import example_table, load_table
+from biom import example_table, load_table, concat
 from biom.exception import (UnknownAxisError, UnknownIDError, TableException,
                             DisjointIDError)
 from biom.util import unzip, HAVE_H5PY, H5PY_VLEN_STR
@@ -37,6 +38,13 @@ np.random.seed(1234)
 if HAVE_H5PY:
     import h5py
 
+try:
+    import anndata
+    anndata.__version__
+    HAVE_ANNDATA = True
+except ImportError:
+    HAVE_ANNDATA = False
+
 __author__ = "Daniel McDonald"
 __copyright__ = "Copyright 2011-2017, The BIOM Format Development Team"
 __credits__ = ["Daniel McDonald", "Jai Ram Rideout", "Justin Kuczynski",
@@ -148,11 +156,37 @@ class SupportTests(TestCase):
         obs = table1.concat([table2, ], axis='sample')
         self.assertEqual(obs, exp)
 
+    def test_concat_single_table_nonlist(self):
+        table2 = example_table.copy()
+        table2.update_ids({'S1': 'S4', 'S2': 'S5', 'S3': 'S6'})
+
+        exp = Table(np.array([[0, 1, 2, 0, 1, 2],
+                              [3, 4, 5, 3, 4, 5]]),
+                    ['O1', 'O2'],
+                    ['S1', 'S2', 'S3', 'S4', 'S5', 'S6'],
+                    example_table.metadata(axis='observation'),
+                    list(example_table.metadata()) * 2)
+        obs = example_table.concat(table2, axis='sample')
+        self.assertEqual(obs, exp)
+
     def test_concat_empty(self):
         exp = example_table.copy()
         obs = example_table.concat([])
         self.assertEqual(obs, exp)
 
+    def test_concat_wrapper(self):
+        table2 = example_table.copy()
+        table2.update_ids({'S1': 'S4', 'S2': 'S5', 'S3': 'S6'})
+
+        exp = Table(np.array([[0, 1, 2, 0, 1, 2],
+                              [3, 4, 5, 3, 4, 5]]),
+                    ['O1', 'O2'],
+                    ['S1', 'S2', 'S3', 'S4', 'S5', 'S6'],
+                    example_table.metadata(axis='observation'),
+                    list(example_table.metadata()) * 2)
+        obs = concat([example_table, table2], axis='sample')
+        self.assertEqual(obs, exp)
+
     def test_concat_samples(self):
         table2 = example_table.copy()
         table2.update_ids({'S1': 'S4', 'S2': 'S5', 'S3': 'S6'})
@@ -1473,18 +1507,25 @@ class TableTests(TestCase):
              'tree': ('newick', '((4:0.1,5:0.1):0.2,(6:0.1,7:0.1):0.2):0.3;')})
 
     def test_to_dataframe(self):
-        exp = pd.SparseDataFrame(np.array([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]),
-                                 index=['O1', 'O2'],
-                                 columns=['S1', 'S2', 'S3'],
-                                 default_fill_value=0.0)
+        mat = csr_matrix(np.array([[0.0, 1.0, 2.0],
+                                   [3.0, 4.0, 5.0]]))
+        exp = pd.DataFrame.sparse.from_spmatrix(mat,
+                                                index=['O1', 'O2'],
+                                                columns=['S1', 'S2', 'S3'])
         obs = example_table.to_dataframe()
-        pdt.assert_frame_equal(obs, exp)
+
+        # assert frame equal between sparse and dense frames wasn't working
+        # as expected
+        npt.assert_equal(obs.values, exp.values)
+        self.assertTrue(all(obs.index == exp.index))
+        self.assertTrue(all(obs.columns == exp.columns))
 
     def test_to_dataframe_is_sparse(self):
         df = example_table.to_dataframe()
         density = (float(example_table.matrix_data.getnnz()) /
                    np.prod(example_table.shape))
-        assert np.allclose(df.density, density)
+        df_density = (df > 0).sum().sum() / np.prod(df.shape)
+        assert np.allclose(df_density, density)
 
     def test_to_dataframe_dense(self):
         exp = pd.DataFrame(np.array([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]),
@@ -1493,6 +1534,28 @@ class TableTests(TestCase):
         obs = example_table.to_dataframe(dense=True)
         pdt.assert_frame_equal(obs, exp)
 
+    @pytest.mark.skipif(not HAVE_ANNDATA, reason="anndata not installed")
+    def test_to_anndata_dense(self):
+        exp = example_table.to_dataframe(dense=True)
+        adata = example_table.to_anndata(dense=True, dtype='float64')
+        pdt.assert_frame_equal(adata.transpose().to_df(), exp)
+
+    @pytest.mark.skipif(not HAVE_ANNDATA, reason="anndata not installed")
+    def test_to_anndata_sparse(self):
+        adata = example_table.to_anndata(dense=False)
+        mat = example_table.matrix_data.toarray()
+        np.testing.assert_array_equal(adata.transpose().X.toarray(), mat)
+
+    @pytest.mark.skipif(not HAVE_ANNDATA, reason="anndata not installed")
+    def test_to_anndata_metadata(self):
+        adata = example_table.to_anndata()
+
+        obs_samp = example_table.metadata_to_dataframe(axis='sample')
+        obs_obs = example_table.metadata_to_dataframe(axis='observation')
+
+        pdt.assert_frame_equal(adata.obs, obs_samp)
+        pdt.assert_frame_equal(adata.var, obs_obs)
+
     def test_metadata_to_dataframe(self):
         exp_samp = pd.DataFrame(['A', 'B', 'A'], index=['S1', 'S2', 'S3'],
                                 columns=['environment'])
@@ -2262,6 +2325,30 @@ class SparseTableTests(TestCase):
         npt.assert_equal(obs_obs, exp_obs)
         npt.assert_equal(obs_whole, exp_whole)
 
+    def test_fast_merge(self):
+        data = {(0, 0): 10, (0, 1): 12, (1, 0): 14, (1, 1): 16}
+        exp = Table(data, ['1', '2'], ['a', 'b'])
+        obs = self.st1._fast_merge([self.st1])
+        self.assertEqual(obs, exp)
+
+    def test_fast_merge_multiple(self):
+        data = {(0, 0): 20, (0, 1): 24, (1, 0): 28, (1, 1): 32}
+        exp = Table(data, ['1', '2'], ['a', 'b'])
+        obs = self.st1._fast_merge([self.st1, self.st1, self.st1])
+        self.assertEqual(obs, exp)
+
+    def test_fast_merge_nonoverlapping(self):
+        t2 = self.st1.copy()
+        t2.update_ids({'a': 'd'}, inplace=True, strict=False)
+        t2.update_ids({'2': '3'}, axis='observation', inplace=True,
+                      strict=False)
+        exp = Table(np.array([[5, 12, 5],
+                              [7, 8, 0],
+                              [0, 8, 7]]), ['1', '2', '3'],
+                    ['a', 'b', 'd'])
+        obs = t2._fast_merge([self.st1])
+        self.assertEqual(obs, exp)
+
     def test_merge(self):
         """Merge two tables"""
         u = 'union'


=====================================
biom/util.py
=====================================
@@ -45,7 +45,7 @@ __url__ = "http://biom-format.org"
 __maintainer__ = "Daniel McDonald"
 __email__ = "daniel.mcdonald at colorado.edu"
 __format_version__ = (2, 1)
-__version__ = "2.1.8"
+__version__ = "2.1.9"
 
 
 def generate_subsamples(table, n, axis='sample', by_id=False):
@@ -202,9 +202,9 @@ def prefer_self(x, y):
     return x if x is not None else y
 
 
-def index_list(l):
+def index_list(item):
     """Takes a list and returns {l[idx]:idx}"""
-    return dict([(id_, idx) for idx, id_ in enumerate(l)])
+    return dict([(id_, idx) for idx, id_ in enumerate(item)])
 
 
 def load_biom_config():


=====================================
doc/conf.py
=====================================
@@ -59,15 +59,15 @@ master_doc = 'index'
 
 # General information about the project.
 project = u'biom-format'
-copyright = u'2011-2018 The BIOM Format Development Team'
+copyright = u'2011-2020 The BIOM Format Development Team'
 
 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the
 # built documents.
 #
 # The full version, including alpha/beta/rc tags.
-version = "2.1.8"
-release = "2.1.8"
+version = "2.1.9"
+release = "2.1.9"
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.


=====================================
doc/index.rst
=====================================
@@ -75,7 +75,7 @@ To enable Bash tab completion of ``biom`` commands, add the following line to ``
 Citing the BIOM project
 =======================
 
-You can cite the BIOM format as follows (`link <http://www.gigasciencejournal.com/content/1/1/7>`_):
+You can cite the BIOM format as follows (`link <https://academic.oup.com/gigascience/article/1/1/2047-217X-1-7/2656152>`_):
 
 | The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome.
 | Daniel McDonald, Jose C. Clemente, Justin Kuczynski, Jai Ram Rideout, Jesse Stombaugh, Doug Wendel, Andreas Wilke, Susan Huse, John Hufnagle, Folker Meyer, Rob Knight, and J. Gregory Caporaso.


=====================================
pyproject.toml
=====================================
@@ -0,0 +1,2 @@
+[build-system]
+requires = ["setuptools","wheel", "numpy", "cython"]


=====================================
setup.py
=====================================
@@ -14,18 +14,8 @@ import sys
 from setuptools import setup, find_packages
 from setuptools.extension import Extension
 from setuptools.command.test import test as TestCommand
-
-try:
-    import numpy as np
-except ImportError:
-    raise ImportError("numpy must be installed prior to installing biom")
-
-
-try:
-    from Cython.Build import cythonize
-except ImportError:
-    raise ImportError("cython must be installed prior to installing biom")
-
+import numpy as np
+from Cython.Build import cythonize
 
 # Hack to prevent stupid "TypeError: 'NoneType' object is not callable" error
 # in multiprocessing/util.py _exit_function when running `python
@@ -43,7 +33,7 @@ __copyright__ = "Copyright 2011-2017, The BIOM Format Development Team"
 __credits__ = ["Greg Caporaso", "Daniel McDonald", "Jose Clemente",
                "Jai Ram Rideout", "Jorge Cañardo Alastuey", "Michael Hall"]
 __license__ = "BSD"
-__version__ = "2.1.8"
+__version__ = "2.1.9"
 __maintainer__ = "Daniel McDonald"
 __email__ = "mcdonadt at colorado.edu"
 
@@ -75,7 +65,6 @@ class PyTest(TestCommand):
 
         # import here, cause outside the eggs aren't loaded
         import pytest
-
         errno = pytest.main(shlex.split(self.pytest_args))
         sys.exit(errno)
 
@@ -123,7 +112,8 @@ extensions = cythonize(extensions)
 
 install_requires = ["click", "numpy >= 1.9.2", "future >= 0.16.0",
                     "scipy >= 1.3.1", 'pandas >= 0.20.0',
-                    "six >= 1.10.0", "cython >= 0.29"]
+                    "six >= 1.10.0", "cython >= 0.29", "h5py",
+                    "cython"]
 
 if sys.version_info[0] < 3:
     raise SystemExit("Python 2.7 is no longer supported")
@@ -140,7 +130,7 @@ setup(name='biom-format',
       maintainer_email=__email__,
       url='http://www.biom-format.org',
       packages=find_packages(),
-      tests_require=['pytest',
+      tests_require=['pytest < 5.3.4',
                      'pytest-cov',
                      'flake8',
                      'nose'],
@@ -148,7 +138,8 @@ setup(name='biom-format',
       ext_modules=extensions,
       include_dirs=[np.get_include()],
       install_requires=install_requires,
-      extras_require={'hdf5': ["h5py >= 2.2.0"]
+      extras_require={'hdf5': ["h5py >= 2.2.0"],
+                      'anndata': ["anndata"],
                       },
       classifiers=classifiers,
       cmdclass={"pytest": PyTest},



View it on GitLab: https://salsa.debian.org/med-team/python-biom-format/-/commit/29fca8e932fbf74620831f1cde788bb1dd0ae403

-- 
View it on GitLab: https://salsa.debian.org/med-team/python-biom-format/-/commit/29fca8e932fbf74620831f1cde788bb1dd0ae403
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20201105/454e15ef/attachment-0001.html>


More information about the debian-med-commit mailing list