[med-svn] [Git][med-team/python-biom-format][upstream] New upstream version 2.1.9
Andreas Tille
gitlab at salsa.debian.org
Thu Nov 5 18:20:54 GMT 2020
Andreas Tille pushed to branch upstream at Debian Med / python-biom-format
Commits:
29fca8e9 by Andreas Tille at 2020-11-05T19:06:55+01:00
New upstream version 2.1.9
- - - - -
12 changed files:
- .travis.yml
- ChangeLog.md
- MANIFEST.in
- biom/__init__.py
- biom/table.py
- biom/tests/test_parse.py
- biom/tests/test_table.py
- biom/util.py
- doc/conf.py
- doc/index.rst
- + pyproject.toml
- setup.py
Changes:
=====================================
.travis.yml
=====================================
@@ -1,9 +1,9 @@
# Modified from https://github.com/biocore/scikit-bio
language: python
env:
- - PYTHON_VERSION=3.6 WITH_DOCTEST=True
- - PYTHON_VERSION=3.7 WITH_DOCTEST=True
- - PYTHON_VERSION=3.8 WITH_DOCTEST=True
+ - PYTHON_VERSION=3.6 WITH_DOCTEST=True
+ - PYTHON_VERSION=3.7 WITH_DOCTEST=True
+ - PYTHON_VERSION=3.8 WITH_DOCTEST=True
before_install:
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- chmod +x miniconda.sh
@@ -11,15 +11,15 @@ before_install:
- export PATH=/home/travis/miniconda3/bin:$PATH
install:
- conda create --yes -n env_name python=$PYTHON_VERSION pip click numpy "scipy>=1.3.1" pep8 flake8 coverage future six "pandas>=0.20.0" nose h5py>=2.2.0 cython
- - rm biom/*.c
- source activate env_name
- - if [ ${PYTHON_VERSION} = "3.6" ]; then pip install sphinx==1.2.2; fi
+ - if [ ${PYTHON_VERSION} = "3.7" ]; then pip install sphinx==1.2.2 "docutils<0.14"; fi
- pip install coveralls
+ - pip install anndata
- pip install -e . --no-deps
script:
- - make test
+ - make test
- biom show-install-info
- - if [ ${PYTHON_VERSION} = "3.6" ]; then make -C doc html; fi
+ - if [ ${PYTHON_VERSION} = "3.7" ]; then make -C doc html; fi
# we can only validate the tables if we have H5PY
- for table in examples/*hdf5.biom; do echo ${table}; biom validate-table -i ${table}; done
# validate JSON formatted tables
=====================================
ChangeLog.md
=====================================
@@ -1,10 +1,32 @@
BIOM-Format ChangeLog
=====================
+biom 2.1.9
+----------
+
+New features and support for Pandas >= 1.0, released on 5 November 2020.
+
+Important:
+
+* Cython and numpy are no more required to be present before building, see [PR #840](https://github.com/biocore/biom-format/pull/840)
+
+New Features:
+
+* Added support for the AnnData format, see [PR #845](https://github.com/biocore/biom-format/pull/845)
+* Performance boost to `Table.remove_empty`. For large tables this cuts the running time from 20 seconds to ~1.1 seconds, see [PR #847](https://github.com/biocore/biom-format/pull/847)
+* A much faster way to merge tables (without metadata) has been added. For large tables, this was a few minutes rather than a few hours. This method is implicitly invoked when calling `Table.merge` if unioning both axes, and the tables lack metadata. `Table.concat` is still much faster, but assumes one axis is disjoint. See [PR #848](https://github.com/biocore/biom-format/pull/848).
+
+* Simplify interaction with the concatenation method, allowing for passing in an individual table and support for a general `biom.concat(tables)` wrapper. See [PR #851](https://github.com/biocore/biom-format/pull/851).
+* Added support for parsing adjacency table structures, see [issue #823](https://github.com/biocore/biom-format/issues/823).
+
+Bug fixes:
+
+* Support for pandas >= 1.0, see the comment and commits [here](https://github.com/biocore/biom-format/issues/837#issuecomment-721241751)
+
biom 2.1.8
----------
-New features and bug fixes, released on 6 January 2020.
+New features and bug fixes, released on 28 January 2020.
Important:
@@ -18,6 +40,7 @@ New Features:
* The detailed report is no longer part of the table validator. See [issue #378](https://github.com/biocore/biom-format/issues/378).
* `load_table` now accepts open file handles. See [issue #481](https://github.com/biocore/biom-format/issues/481).
* `biom export-metadata` has been added to export metadata as TSV. See [issue #820](https://github.com/biocore/biom-format/issues/820).
+* `Table.to_tsv` has been modified to allow for `direct_io`. See [issue #836](https://github.com/biocore/biom-format/pull/836).
Bug fixes:
=====================================
MANIFEST.in
=====================================
@@ -14,5 +14,6 @@ prune docs/_build
global-exclude *.pyc
global-exclude *.pyo
global-exclude .git
+global-exclude *.c
global-exclude *.so
global-exclude .*.swp
=====================================
biom/__init__.py
=====================================
@@ -73,5 +73,26 @@ example_table = Table([[0, 1, 2], [3, 4, 5]], ['O1', 'O2'],
{'environment': 'A'}], input_is_dense=True)
+def concat(tables, *args, **kwargs):
+ """Wrapper for biom.Table.concat which requires a table instance
+
+ Parameters
+ ----------
+ tables : iterable of biom.Table, or a single biom.Table instance
+ Tables to concatenate
+
+ Raises
+ ------
+ DisjointIDError
+ If IDs over the axis are not disjoint.
+
+ Returns
+ -------
+ biom.Table
+ A table object reflecting the concatenation of the tables.
+ """
+ return tables[0].concat(tables[1:], *args, **kwargs)
+
+
__all__ = ['Table', 'example_table', 'parse_table', 'load_table',
'__format_version__', '__version__']
=====================================
biom/table.py
=====================================
@@ -179,7 +179,7 @@ from copy import deepcopy
from datetime import datetime
from json import dumps
from functools import reduce, partial
-from operator import itemgetter
+from operator import itemgetter, or_
from future.builtins import zip
from future.utils import viewitems
from collections import defaultdict, Hashable, Iterable
@@ -187,7 +187,7 @@ from numpy import ndarray, asarray, zeros, newaxis
from scipy.sparse import (coo_matrix, csc_matrix, csr_matrix, isspmatrix,
vstack, hstack)
import pandas as pd
-
+import re
import six
from future.utils import string_types as _future_string_types
from biom.exception import (TableException, UnknownAxisError, UnknownIDError,
@@ -1547,7 +1547,7 @@ class Table(object):
def delimited_self(self, delim=u'\t', header_key=None, header_value=None,
metadata_formatter=str,
- observation_column_name=u'#OTU ID'):
+ observation_column_name=u'#OTU ID', direct_io=None):
"""Return self as a string in a delimited form
Default str output for the Table is just row/col ids and table data
@@ -1599,20 +1599,35 @@ class Table(object):
else:
output = ['# Constructed from biom file',
'%s%s%s' % (observation_column_name, delim, samp_ids)]
+
+ if direct_io is not None:
+ direct_io.writelines([i+"\n" for i in output])
+
obs_metadata = self.metadata(axis='observation')
- for obs_id, obs_values in zip(self.ids(axis='observation'),
+ iterable = self.ids(axis='observation')
+ end_line = '' if direct_io is None else '\n'
+
+ for obs_id, obs_values in zip(iterable,
self._iter_obs()):
str_obs_vals = delim.join(map(str, self._to_dense(obs_values)))
-
obs_id = to_utf8(obs_id)
if header_key and obs_metadata is not None:
md = obs_metadata[self._obs_index[obs_id]]
md_out = metadata_formatter(md.get(header_key, None))
- output.append(
- u'%s%s%s\t%s' %
- (obs_id, delim, str_obs_vals, md_out))
+ output_row = u'%s%s%s\t%s%s' % \
+ (obs_id, delim, str_obs_vals, md_out, end_line)
+
+ if direct_io is None:
+ output.append(output_row)
+ else:
+ direct_io.write(output_row)
else:
- output.append(u'%s%s%s' % (obs_id, delim, str_obs_vals))
+ output_row = u'%s%s%s%s' % \
+ (obs_id, delim, str_obs_vals, end_line)
+ if direct_io is None:
+ output.append(output_row)
+ else:
+ direct_io.write((output_row))
return '\n'.join(output)
@@ -3201,7 +3216,7 @@ class Table(object):
axes = [axis]
for ax in axes:
- table.filter(lambda v, i, md: (v > 0).sum(), axis=ax)
+ table.filter(table.ids(axis=ax)[table.sum(axis=ax) > 0], axis=ax)
return table
@@ -3288,7 +3303,7 @@ class Table(object):
Parameters
----------
- others : iterable of biom.Table
+ others : iterable of biom.Table, or a single biom.Table instance
Tables to concatenate
axis : {'sample', 'observation'}, optional
The axis to concatenate on. i.e., if axis is 'sample', then tables
@@ -3332,6 +3347,9 @@ class Table(object):
O5 0.0 0.0 0.0 0.0 0.0 0.0 15.0 16.0 17.0
"""
+ if isinstance(others, self.__class__):
+ others = [others, ]
+
# we grow along the opposite axis
invaxis = self._invert_axis(axis)
if axis == 'sample':
@@ -3442,6 +3460,67 @@ class Table(object):
return concat
+ def _fast_merge(self, others):
+ """For simple merge operations it is faster to aggregate using pandas
+
+ Parameters
+ ----------
+ others : Table, or Iterable of Table
+ If a Table, then merge with that table. If an iterable, then merge
+ all of the tables
+ """
+ tables = [self] + others
+
+ # gather all identifiers across tables
+ all_features = reduce(or_, [set(t.ids(axis='observation'))
+ for t in tables])
+ all_samples = reduce(or_, [set(t.ids()) for t in tables])
+
+ # generate unique integer ids for the identifiers, and let's order
+ # it to be polite
+ feature_map = {i: idx for idx, i in enumerate(sorted(all_features))}
+ sample_map = {i: idx for idx, i in enumerate(sorted(all_samples))}
+
+ # produce a new stable order
+ get1 = lambda x: x[1] # noqa
+ feature_order = [k for k, v in sorted(feature_map.items(), key=get1)]
+ sample_order = [k for k, v in sorted(sample_map.items(), key=get1)]
+
+ mi = []
+ values = []
+ for table in tables:
+ # these data are effectively [((row_index, col_index), value), ]
+ data_as_dok = table.matrix_data.todok()
+
+ # construct a map of the feature integer index to what it is in
+ # the full table
+ feat_ids = table.ids(axis='observation')
+ samp_ids = table.ids()
+ table_features = {idx: feature_map[i]
+ for idx, i in enumerate(feat_ids)}
+ table_samples = {idx: sample_map[i]
+ for idx, i in enumerate(samp_ids)}
+
+ for (f, s), v in data_as_dok.items():
+ # collect the indices and values, adjusting the indices as we
+ # go
+ mi.append((table_features[f], table_samples[s]))
+ values.append(v)
+
+ # construct a multiindex of the indices where the outer index is the
+ # feature and the inner index is the sample
+ mi = pd.MultiIndex.from_tuples(mi)
+ grouped = pd.Series(values, index=mi)
+
+ # aggregate the values where the outer and inner values in the
+ # multiindex are the same
+ collapsed_rcv = grouped.groupby(level=[0, 1]).sum()
+
+ # convert into a representation understood by the Table constructor
+ list_list = [[r, c, v] for (r, c), v in collapsed_rcv.items()]
+
+ return self.__class__(list_list, feature_order, sample_order)
+
def merge(self, other, sample='union', observation='union',
sample_metadata_f=prefer_self,
observation_metadata_f=prefer_self):
@@ -3458,8 +3537,9 @@ class Table(object):
Parameters
----------
- other : biom.Table
- The other table to merge with this one
+ other : biom.Table or Iterable of Table
+ The other table to merge with this one. If an iterable, the tables
+ are expected to not have metadata.
sample : {'union', 'intersection'}, optional
observation : {'union', 'intersection'}, optional
sample_metadata_f : function, optional
@@ -3476,6 +3556,8 @@ class Table(object):
Notes
-----
+ - If ``sample_metadata_f`` and ``observation_metadata_f`` are None,
+ then a fast merge is applied.
- There is an implicit type conversion to ``float``.
- The return type is always that of ``self``
@@ -3505,6 +3587,19 @@ class Table(object):
O3 10.0 10.0
"""
+ s_md = self.metadata()
+ o_md = self.metadata(axis='observation')
+ no_md = (s_md is None) and (o_md is None)
+ ignore_md = (sample_metadata_f is None) and \
+ (observation_metadata_f is None)
+
+ if no_md or ignore_md:
+ if sample == 'union' and observation == 'union':
+ if isinstance(other, (list, set, tuple)):
+ return self._fast_merge(other)
+ else:
+ return self._fast_merge([other, ])
+
# determine the sample order in the resulting table
if sample == 'union':
new_samp_order = self._union_id_order(self.ids(), other.ids())
@@ -3972,7 +4067,7 @@ html
for i in keep])
# Create the new indptr
indptr_subset = np.array([end - start
- for start, end in indptr_indices])
+ for start, end in indptr_indices])
indptr = np.empty(len(keep) + 1, dtype=np.int32)
indptr[0] = 0
indptr[1:] = indptr_subset.cumsum()
@@ -4045,13 +4140,62 @@ html
mat = self.matrix_data.toarray()
constructor = pd.DataFrame
else:
- mat = self.matrix_data
- constructor = partial(pd.SparseDataFrame,
- default_fill_value=0,
- copy=True)
+ mat = self.matrix_data.copy()
+ constructor = partial(pd.DataFrame.sparse.from_spmatrix)
return constructor(mat, index=index, columns=columns)
+ def to_anndata(self, dense=False, dtype="float32", transpose=True):
+ """Convert Table to AnnData format
+
+ Parameters
+ ----------
+ dense : bool, optional
+ If True, set adata.X as np.ndarray instead of sparse matrix.
+ dtype: str, optional
+ dtype used for storage in anndata object.
+ tranpose: bool, optional
+ If True, transpose the anndata so that observations are columns
+
+ Returns
+ -------
+ anndata.AnnData
+ AnnData with matrix data and associated observation and
+ sample metadata.
+
+ Notes
+ -----
+ Nested metadata are not included.
+
+ Examples
+ --------
+ >>> from biom import example_table
+ >>> adata = example_table.to_anndata()
+ >>> adata
+ AnnData object with n_obs × n_vars = 3 × 2
+ obs: 'environment'
+ var: 'taxonomy_0', 'taxonomy_1'
+ """
+ try:
+ import anndata
+ except ImportError:
+ raise ImportError(
+ "Please install anndata package -- `pip install anndata`"
+ )
+ mat = self.matrix_data
+
+ if dense:
+ mat = mat.toarray()
+
+ var = self.metadata_to_dataframe("sample")
+ obs = self.metadata_to_dataframe("observation")
+
+ adata = anndata.AnnData(mat, obs=obs, var=var, dtype=dtype)
+ # Convention for scRNA-seq analysis in Python
+ adata = adata.transpose()
+
+ return adata
+
def metadata_to_dataframe(self, axis):
"""Convert axis metadata to a Pandas DataFrame
@@ -4596,6 +4740,103 @@ html
matrix_element_type, shape,
u''.join(data), rows, columns])
+ @staticmethod
+ def from_adjacency(lines):
+ """Parse an adjacency format into BIOM
+
+ Parameters
+ ----------
+ lines : list, str, or file-like object
+ The tab delimited data to parse
+
+ Returns
+ -------
+ biom.Table
+ A BIOM ``Table`` object
+
+ Notes
+ -----
+ The input is expected to be of the form: observation, sample, value. A
+ header is not required, but if present, it must be of the form:
+
+ #OTU ID<tab>SampleID<tab>value
+
+ Raises
+ ------
+ ValueError
+ If the input is not an iterable or file-like object.
+ ValueError
+ If the data is incorrectly formatted.
+
+ Examples
+ --------
+ Parse tab separated adjacency data into a table:
+
+ >>> from biom.table import Table
+ >>> from io import StringIO
+ >>> data = 'a\\tb\\t1\\na\\tc\\t2\\nd\\tc\\t3'
+ >>> data_fh = StringIO(data)
+ >>> test_table = Table.from_adjacency(data_fh)
+ """
+ if not isinstance(lines, (list, tuple)):
+ if hasattr(lines, 'readlines'):
+ lines = lines.readlines()
+ elif hasattr(lines, 'splitlines'):
+ lines = lines.splitlines()
+ else:
+ raise ValueError("Not sure how to handle this input")
+
+ def is_num(item):
+ # from https://stackoverflow.com/a/23059703
+ numeric = re.compile(r'(?=.)([+-]?([0-9]*)(\.([0-9]+))?)([eE][+-]?\d+)?') # noqa
+ match = numeric.match(item)
+ start, stop = match.span()
+ if (stop - start) == len(item):
+ return True
+ else:
+ return False
+
+ # sanity check and determine if we have a header or not
+ lh = lines[0].strip().split('\t')
+ if len(lh) != 3:
+ raise ValueError("Does not appear to be an adjacency format")
+ elif lh == ['#OTU ID', 'SampleID', 'value']:
+ include_line_zero = False
+ elif is_num(lh[2]):
+ # allow anything for columns 1 and 2, but test that column 3 is
+ # numeric
+ include_line_zero = True
+ else:
+ raise ValueError("Does not appear to be an adjacency format")
+
+ if not include_line_zero:
+ lines = lines[1:]
+
+ # extract the entities
+ observations = []
+ samples = []
+ values = []
+ for line in lines:
+ parts = line.split('\t')
+ assert len(parts) == 3
+ observations.append(parts[0])
+ samples.append(parts[1])
+ values.append(float(parts[2]))
+
+ # determine a stable order and index positioning for the identifiers
+ obs_order = sorted(set(observations))
+ samp_order = sorted(set(samples))
+ obs_index = {o: i for i, o in enumerate(obs_order)}
+ samp_index = {s: i for i, s in enumerate(samp_order)}
+
+ # fill the matrix
+ row = np.array([obs_index[obs] for obs in observations], dtype=int)
+ col = np.array([samp_index[samp] for samp in samples], dtype=int)
+ data = np.asarray(values)
+ mat = coo_matrix((data, (row, col)))
+
+ return Table(mat, obs_order, samp_order)
+
@staticmethod
def from_tsv(lines, obs_mapping, sample_mapping,
process_func, **kwargs):
@@ -4629,7 +4870,7 @@ html
>>> test_table = Table.from_tsv(tsv_fh, None, None, func)
"""
(sample_ids, obs_ids, data, t_md,
- t_md_name) = Table._extract_data_from_tsv(lines, **kwargs)
+ t_md_name) = Table._extract_data_from_tsv(lines, **kwargs)
# if we have it, keep it
if t_md is None:
@@ -4797,7 +5038,9 @@ html
return samp_ids, obs_ids, data, metadata, md_name
def to_tsv(self, header_key=None, header_value=None,
- metadata_formatter=str, observation_column_name='#OTU ID'):
+ metadata_formatter=str,
+ observation_column_name='#OTU ID',
+ direct_io=None):
"""Return self as a string in tab delimited form
Default ``str`` output for the ``Table`` is just row/col ids and table
@@ -4815,6 +5058,10 @@ html
observation_column_name : str, optional
Defaults to "#OTU ID". The name of the first column in the output
table, corresponding to the observation IDs.
+ direct_io : file or file-like object, optional
+ Defaults to ``None``. Must implement a ``write`` function. If
+ `direct_io` is not ``None``, the final output is written directly
+ to `direct_io` during processing.
Returns
-------
@@ -4838,10 +5085,13 @@ html
#OTU ID S1 S2 S3
O1 0.0 0.0 1.0
O2 1.0 3.0 42.0
+ >>> with open("result.tsv", "w") as f:
+ table.to_tsv(direct_io=f)
"""
return self.delimited_self(u'\t', header_key, header_value,
metadata_formatter,
- observation_column_name)
+ observation_column_name,
+ direct_io=direct_io)
def coo_arrays_to_sparse(data, dtype=np.float64, shape=None):
=====================================
biom/tests/test_parse.py
=====================================
@@ -229,6 +229,51 @@ class ParseTests(TestCase):
self.assertEqual(tab.metadata(), None)
self.assertEqual(tab.metadata(axis='observation'), None)
+ def test_parse_adjacency_weird_input(self):
+ with self.assertRaisesRegex(ValueError, "Not sure"):
+ Table.from_adjacency({'foo', 'bar'})
+
+ def test_parse_adjacency_bad_input(self):
+ with self.assertRaisesRegex(ValueError, "Does not appear"):
+ Table.from_adjacency(['a\tb\tc\n', 'd\te\tf\n'])
+
+ with self.assertRaisesRegex(ValueError, "Does not appear"):
+ Table.from_adjacency(['a\tb\n', 'd\te\n'])
+
+ with self.assertRaises(AssertionError):
+ Table.from_adjacency(['a\tb\t1\n', 'd\te\n'])
+
+ def test_parse_adjacency_table_header(self):
+ lines = ['#OTU ID\tSampleID\tvalue\n',
+ 'O1\tS1\t10\n',
+ 'O4\tS2\t1\n',
+ 'O3\tS3\t2\n',
+ 'O4\tS1\t5\n',
+ 'O2\tS2\t3\n']
+ exp = Table(np.array([[10, 0, 0],
+ [0, 3, 0],
+ [0, 0, 2],
+ [5, 1, 0]]),
+ ['O1', 'O2', 'O3', 'O4'],
+ ['S1', 'S2', 'S3'])
+ obs = Table.from_adjacency(''.join(lines))
+ self.assertEqual(obs, exp)
+
+ def test_parse_adjacency_table_no_header(self):
+ lines = ['O1\tS1\t10\n',
+ 'O4\tS2\t1\n',
+ 'O3\tS3\t2\n',
+ 'O4\tS1\t5\n',
+ 'O2\tS2\t3\n']
+ exp = Table(np.array([[10, 0, 0],
+ [0, 3, 0],
+ [0, 0, 2],
+ [5, 1, 0]]),
+ ['O1', 'O2', 'O3', 'O4'],
+ ['S1', 'S2', 'S3'])
+ obs = Table.from_adjacency(''.join(lines))
+ self.assertEqual(obs, exp)
+
@npt.dec.skipif(HAVE_H5PY is False, msg='H5PY is not installed')
def test_parse_biom_table_hdf5(self):
"""Make sure we can parse a HDF5 table through the same loader"""
=====================================
biom/tests/test_table.py
=====================================
@@ -19,8 +19,9 @@ from scipy.sparse import lil_matrix, csr_matrix, csc_matrix
import scipy.sparse
import pandas.util.testing as pdt
import pandas as pd
+import pytest
-from biom import example_table, load_table
+from biom import example_table, load_table, concat
from biom.exception import (UnknownAxisError, UnknownIDError, TableException,
DisjointIDError)
from biom.util import unzip, HAVE_H5PY, H5PY_VLEN_STR
@@ -37,6 +38,13 @@ np.random.seed(1234)
if HAVE_H5PY:
import h5py
+try:
+ import anndata
+ anndata.__version__
+ HAVE_ANNDATA = True
+except ImportError:
+ HAVE_ANNDATA = False
+
__author__ = "Daniel McDonald"
__copyright__ = "Copyright 2011-2017, The BIOM Format Development Team"
__credits__ = ["Daniel McDonald", "Jai Ram Rideout", "Justin Kuczynski",
@@ -148,11 +156,37 @@ class SupportTests(TestCase):
obs = table1.concat([table2, ], axis='sample')
self.assertEqual(obs, exp)
+ def test_concat_single_table_nonlist(self):
+ table2 = example_table.copy()
+ table2.update_ids({'S1': 'S4', 'S2': 'S5', 'S3': 'S6'})
+
+ exp = Table(np.array([[0, 1, 2, 0, 1, 2],
+ [3, 4, 5, 3, 4, 5]]),
+ ['O1', 'O2'],
+ ['S1', 'S2', 'S3', 'S4', 'S5', 'S6'],
+ example_table.metadata(axis='observation'),
+ list(example_table.metadata()) * 2)
+ obs = example_table.concat(table2, axis='sample')
+ self.assertEqual(obs, exp)
+
def test_concat_empty(self):
exp = example_table.copy()
obs = example_table.concat([])
self.assertEqual(obs, exp)
+ def test_concat_wrapper(self):
+ table2 = example_table.copy()
+ table2.update_ids({'S1': 'S4', 'S2': 'S5', 'S3': 'S6'})
+
+ exp = Table(np.array([[0, 1, 2, 0, 1, 2],
+ [3, 4, 5, 3, 4, 5]]),
+ ['O1', 'O2'],
+ ['S1', 'S2', 'S3', 'S4', 'S5', 'S6'],
+ example_table.metadata(axis='observation'),
+ list(example_table.metadata()) * 2)
+ obs = concat([example_table, table2], axis='sample')
+ self.assertEqual(obs, exp)
+
def test_concat_samples(self):
table2 = example_table.copy()
table2.update_ids({'S1': 'S4', 'S2': 'S5', 'S3': 'S6'})
@@ -1473,18 +1507,25 @@ class TableTests(TestCase):
'tree': ('newick', '((4:0.1,5:0.1):0.2,(6:0.1,7:0.1):0.2):0.3;')})
def test_to_dataframe(self):
- exp = pd.SparseDataFrame(np.array([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]),
- index=['O1', 'O2'],
- columns=['S1', 'S2', 'S3'],
- default_fill_value=0.0)
+ mat = csr_matrix(np.array([[0.0, 1.0, 2.0],
+ [3.0, 4.0, 5.0]]))
+ exp = pd.DataFrame.sparse.from_spmatrix(mat,
+ index=['O1', 'O2'],
+ columns=['S1', 'S2', 'S3'])
obs = example_table.to_dataframe()
- pdt.assert_frame_equal(obs, exp)
+
+ # assert frame equal between sparse and dense frames wasn't working
+ # as expected
+ npt.assert_equal(obs.values, exp.values)
+ self.assertTrue(all(obs.index == exp.index))
+ self.assertTrue(all(obs.columns == exp.columns))
def test_to_dataframe_is_sparse(self):
df = example_table.to_dataframe()
density = (float(example_table.matrix_data.getnnz()) /
np.prod(example_table.shape))
- assert np.allclose(df.density, density)
+ df_density = (df > 0).sum().sum() / np.prod(df.shape)
+ assert np.allclose(df_density, density)
def test_to_dataframe_dense(self):
exp = pd.DataFrame(np.array([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]),
@@ -1493,6 +1534,28 @@ class TableTests(TestCase):
obs = example_table.to_dataframe(dense=True)
pdt.assert_frame_equal(obs, exp)
+ @pytest.mark.skipif(not HAVE_ANNDATA, reason="anndata not installed")
+ def test_to_anndata_dense(self):
+ exp = example_table.to_dataframe(dense=True)
+ adata = example_table.to_anndata(dense=True, dtype='float64')
+ pdt.assert_frame_equal(adata.transpose().to_df(), exp)
+
+ @pytest.mark.skipif(not HAVE_ANNDATA, reason="anndata not installed")
+ def test_to_anndata_sparse(self):
+ adata = example_table.to_anndata(dense=False)
+ mat = example_table.matrix_data.toarray()
+ np.testing.assert_array_equal(adata.transpose().X.toarray(), mat)
+
+ @pytest.mark.skipif(not HAVE_ANNDATA, reason="anndata not installed")
+ def test_to_anndata_metadata(self):
+ adata = example_table.to_anndata()
+
+ obs_samp = example_table.metadata_to_dataframe(axis='sample')
+ obs_obs = example_table.metadata_to_dataframe(axis='observation')
+
+ pdt.assert_frame_equal(adata.obs, obs_samp)
+ pdt.assert_frame_equal(adata.var, obs_obs)
+
def test_metadata_to_dataframe(self):
exp_samp = pd.DataFrame(['A', 'B', 'A'], index=['S1', 'S2', 'S3'],
columns=['environment'])
@@ -2262,6 +2325,30 @@ class SparseTableTests(TestCase):
npt.assert_equal(obs_obs, exp_obs)
npt.assert_equal(obs_whole, exp_whole)
+ def test_fast_merge(self):
+ data = {(0, 0): 10, (0, 1): 12, (1, 0): 14, (1, 1): 16}
+ exp = Table(data, ['1', '2'], ['a', 'b'])
+ obs = self.st1._fast_merge([self.st1])
+ self.assertEqual(obs, exp)
+
+ def test_fast_merge_multiple(self):
+ data = {(0, 0): 20, (0, 1): 24, (1, 0): 28, (1, 1): 32}
+ exp = Table(data, ['1', '2'], ['a', 'b'])
+ obs = self.st1._fast_merge([self.st1, self.st1, self.st1])
+ self.assertEqual(obs, exp)
+
+ def test_fast_merge_nonoverlapping(self):
+ t2 = self.st1.copy()
+ t2.update_ids({'a': 'd'}, inplace=True, strict=False)
+ t2.update_ids({'2': '3'}, axis='observation', inplace=True,
+ strict=False)
+ exp = Table(np.array([[5, 12, 5],
+ [7, 8, 0],
+ [0, 8, 7]]), ['1', '2', '3'],
+ ['a', 'b', 'd'])
+ obs = t2._fast_merge([self.st1])
+ self.assertEqual(obs, exp)
+
def test_merge(self):
"""Merge two tables"""
u = 'union'
=====================================
biom/util.py
=====================================
@@ -45,7 +45,7 @@ __url__ = "http://biom-format.org"
__maintainer__ = "Daniel McDonald"
__email__ = "daniel.mcdonald at colorado.edu"
__format_version__ = (2, 1)
-__version__ = "2.1.8"
+__version__ = "2.1.9"
def generate_subsamples(table, n, axis='sample', by_id=False):
@@ -202,9 +202,9 @@ def prefer_self(x, y):
return x if x is not None else y
-def index_list(l):
+def index_list(item):
"""Takes a list and returns {l[idx]:idx}"""
- return dict([(id_, idx) for idx, id_ in enumerate(l)])
+ return dict([(id_, idx) for idx, id_ in enumerate(item)])
def load_biom_config():
=====================================
doc/conf.py
=====================================
@@ -59,15 +59,15 @@ master_doc = 'index'
# General information about the project.
project = u'biom-format'
-copyright = u'2011-2018 The BIOM Format Development Team'
+copyright = u'2011-2020 The BIOM Format Development Team'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The full version, including alpha/beta/rc tags.
-version = "2.1.8"
-release = "2.1.8"
+version = "2.1.9"
+release = "2.1.9"
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
=====================================
doc/index.rst
=====================================
@@ -75,7 +75,7 @@ To enable Bash tab completion of ``biom`` commands, add the following line to ``
Citing the BIOM project
=======================
-You can cite the BIOM format as follows (`link <http://www.gigasciencejournal.com/content/1/1/7>`_):
+You can cite the BIOM format as follows (`link <https://academic.oup.com/gigascience/article/1/1/2047-217X-1-7/2656152>`_):
| The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome.
| Daniel McDonald, Jose C. Clemente, Justin Kuczynski, Jai Ram Rideout, Jesse Stombaugh, Doug Wendel, Andreas Wilke, Susan Huse, John Hufnagle, Folker Meyer, Rob Knight, and J. Gregory Caporaso.
=====================================
pyproject.toml
=====================================
@@ -0,0 +1,2 @@
+[build-system]
+requires = ["setuptools","wheel", "numpy", "cython"]
=====================================
setup.py
=====================================
@@ -14,18 +14,8 @@ import sys
from setuptools import setup, find_packages
from setuptools.extension import Extension
from setuptools.command.test import test as TestCommand
-
-try:
- import numpy as np
-except ImportError:
- raise ImportError("numpy must be installed prior to installing biom")
-
-
-try:
- from Cython.Build import cythonize
-except ImportError:
- raise ImportError("cython must be installed prior to installing biom")
-
+import numpy as np
+from Cython.Build import cythonize
# Hack to prevent stupid "TypeError: 'NoneType' object is not callable" error
# in multiprocessing/util.py _exit_function when running `python
@@ -43,7 +33,7 @@ __copyright__ = "Copyright 2011-2017, The BIOM Format Development Team"
__credits__ = ["Greg Caporaso", "Daniel McDonald", "Jose Clemente",
"Jai Ram Rideout", "Jorge Cañardo Alastuey", "Michael Hall"]
__license__ = "BSD"
-__version__ = "2.1.8"
+__version__ = "2.1.9"
__maintainer__ = "Daniel McDonald"
__email__ = "mcdonadt at colorado.edu"
@@ -75,7 +65,6 @@ class PyTest(TestCommand):
# import here, cause outside the eggs aren't loaded
import pytest
-
errno = pytest.main(shlex.split(self.pytest_args))
sys.exit(errno)
@@ -123,7 +112,8 @@ extensions = cythonize(extensions)
install_requires = ["click", "numpy >= 1.9.2", "future >= 0.16.0",
"scipy >= 1.3.1", 'pandas >= 0.20.0',
- "six >= 1.10.0", "cython >= 0.29"]
+ "six >= 1.10.0", "cython >= 0.29", "h5py",
+ "cython"]
if sys.version_info[0] < 3:
raise SystemExit("Python 2.7 is no longer supported")
@@ -140,7 +130,7 @@ setup(name='biom-format',
maintainer_email=__email__,
url='http://www.biom-format.org',
packages=find_packages(),
- tests_require=['pytest',
+ tests_require=['pytest < 5.3.4',
'pytest-cov',
'flake8',
'nose'],
@@ -148,7 +138,8 @@ setup(name='biom-format',
ext_modules=extensions,
include_dirs=[np.get_include()],
install_requires=install_requires,
- extras_require={'hdf5': ["h5py >= 2.2.0"]
+ extras_require={'hdf5': ["h5py >= 2.2.0"],
+ 'anndata': ["anndata"],
},
classifiers=classifiers,
cmdclass={"pytest": PyTest},
View it on GitLab: https://salsa.debian.org/med-team/python-biom-format/-/commit/29fca8e932fbf74620831f1cde788bb1dd0ae403
--
View it on GitLab: https://salsa.debian.org/med-team/python-biom-format/-/commit/29fca8e932fbf74620831f1cde788bb1dd0ae403
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20201105/454e15ef/attachment-0001.html>
More information about the debian-med-commit
mailing list