[med-svn] [Git][med-team/python-biom-format][upstream] New upstream version 2.1.8+dfsg
Andreas Tille
gitlab at salsa.debian.org
Mon Jan 20 11:09:17 GMT 2020
Andreas Tille pushed to branch upstream at Debian Med / python-biom-format
Commits:
d993b066 by Andreas Tille at 2020-01-20T11:34:29+01:00
New upstream version 2.1.8+dfsg
- - - - -
15 changed files:
- .travis.yml
- ChangeLog.md
- biom/cli/__init__.py
- + biom/cli/metadata_exporter.py
- biom/cli/table_validator.py
- biom/err.py
- biom/parse.py
- biom/table.py
- biom/tests/test_cli/test_validate_table.py
- biom/tests/test_err.py
- biom/tests/test_parse.py
- biom/tests/test_table.py
- biom/util.py
- doc/conf.py
- setup.py
Changes:
=====================================
.travis.yml
=====================================
@@ -1,27 +1,25 @@
-# Modified from https://github.com/biocore/scikit-bio/
+# Modified from https://github.com/biocore/scikit-bio
language: python
env:
- - PYTHON_VERSION=2.7 WITH_DOCTEST=False USE_CYTHON=True
- - PYTHON_VERSION=3.5 WITH_DOCTEST=True USE_CYTHON=True
- - PYTHON_VERSION=3.6 WITH_DOCTEST=True USE_CYTHON=True
- - PYTHON_VERSION=3.7 WITH_DOCTEST=True USE_CYTHON=True
+ - PYTHON_VERSION=3.6 WITH_DOCTEST=True
+ - PYTHON_VERSION=3.7 WITH_DOCTEST=True
+ - PYTHON_VERSION=3.8 WITH_DOCTEST=True
before_install:
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- chmod +x miniconda.sh
- ./miniconda.sh -b
- export PATH=/home/travis/miniconda3/bin:$PATH
install:
- - conda create --yes -n env_name python=$PYTHON_VERSION pip click numpy scipy pep8 flake8 coverage future six "pandas>=0.20.0" nose h5py>=2.2.0 cython
+ - conda create --yes -n env_name python=$PYTHON_VERSION pip click numpy "scipy>=1.3.1" pep8 flake8 coverage future six "pandas>=0.20.0" nose h5py>=2.2.0 cython
- rm biom/*.c
- source activate env_name
- - if [ ${PYTHON_VERSION} = "2.7" ]; then pip install pyqi; fi
- - if [ ${PYTHON_VERSION} = "2.7" ]; then conda install --yes Sphinx=1.2.2; fi
+ - if [ ${PYTHON_VERSION} = "3.6" ]; then pip install sphinx==1.2.2; fi
- pip install coveralls
- pip install -e . --no-deps
script:
- make test
- biom show-install-info
- - if [ ${PYTHON_VERSION} = "2.7" ]; then make -C doc html; fi
+ - if [ ${PYTHON_VERSION} = "3.6" ]; then make -C doc html; fi
# we can only validate the tables if we have H5PY
- for table in examples/*hdf5.biom; do echo ${table}; biom validate-table -i ${table}; done
# validate JSON formatted tables
=====================================
ChangeLog.md
=====================================
@@ -1,21 +1,45 @@
BIOM-Format ChangeLog
=====================
+biom 2.1.8
+----------
+
+New features and bug fixes, released on 6 January 2020.
+
+Important:
+
+* Python 2.7 and 3.5 support has been dropped.
+* Python 3.8 support has been added into Travis CI.
+* A change to the defaults for `Table.nonzero_counts` was performed such that the default now is to count the number of nonzero features. See [issue #685](https://github.com/biocore/biom-format/issues/685)
+* We now require a SciPy >= 1.3.1. See [issue #816](https://github.com/biocore/biom-format/issues/816)
+
+New Features:
+
+* The detailed report is no longer part of the table validator. See [issue #378](https://github.com/biocore/biom-format/issues/378).
+* `load_table` now accepts open file handles. See [issue #481](https://github.com/biocore/biom-format/issues/481).
+* `biom export-metadata` has been added to export metadata as TSV. See [issue #820](https://github.com/biocore/biom-format/issues/820).
+
+Bug fixes:
+
+* `Table.to_dataframe(dense=False)` does now correctly produce sparse data frames (and not accidentally dense ones as before). See [issue #808](https://github.com/biocore/biom-format/issues/808).
+* Order of error evaluations was unstable in Python versions without implicit `OrderedDict`. See [issue #813](https://github.com/biocore/biom-format/issues/813). Thanks @gwarmstrong for identifying this bug.
+* `Table._extract_data_from_tsv` would fail if taxonomy was provided, and if the first row had the empty string for taxonomy. See [issue #827](https://github.com/biocore/biom-format/issues/827). Thanks @KasperSkytte for identifying this bug.
+
biom 2.1.7
----------
New features and bug fixes, released on 28 September 2018.
-Important:
+Important:
* Python 3.4 support has been dropped. We now only support Python 2.7, 3.5, 3.6 and 3.7.
* We will be dropping Python 2.7 support on the next release.
-* Pandas >= 0.20.0 is now the minimum required version.
+* Pandas >= 0.20.0 is now the minimum required version.
* pytest is now used instead of nose.
New Features:
-* Massive performance boost to `Table.collapse` with the default collapse function. The difference was 10s of milliseconds vs. minutes stemming from prior use of `operator.add`. See [issue #761](https://github.com/biocore/biom-format/issues/761).
+* Massive performance boost to `Table.collapse` with the default collapse function. The difference was 10s of milliseconds vs. minutes stemming from prior use of `operator.add`. See [issue #761](https://github.com/biocore/biom-format/issues/761).
* `Table.align_to` for aligning one table to another. This is useful in multi-omic analyses where multiple preparations have been performed on the sample physical samples. This is essentially a helper method around `Table.sort_order`. See [issue #747](https://github.com/biocore/biom-format/issues/747).
* Added additional sanity checks when calling `Table.to_hdf5`, see [PR #769](https://github.com/biocore/biom-format/pull/769).
* `Table.subsample()` can optionally perform subsampling with replacement. See [issue #774](https://github.com/biocore/biom-format/issues/774).
@@ -38,7 +62,7 @@ New Features:
* `Table.from_hdf5` now supports a rapid subset in the event that metadata is
not needed. In benchmarking against the Earth Microbiome Project BIOM table,
the reduction in runtime was multiple orders of magnitude while additionally
- preserving substantial memory.
+ preserving substantial memory.
* `Table.rankdata` has been added to convert values to ranked abundances on
either axis. See [issue #645](https://github.com/biocore/biom-format/issues/639).
* Format of numbers in ``biom summarize-table`` output is now more readable and localized. See [issue #679](https://github.com/biocore/biom-format/issues/679).
@@ -96,8 +120,8 @@ Bug fixes:
* `biom --version` now prints the software version (previously the individual
commands did this, but not the base command).
* `Table.vlen_list_of_str_formatter` was considering a `str` to be valid for
- formatting resulting in an obscure error when a `str`, as opposed to a
- `list` of `str`, was used for taxonomy. See
+ formatting resulting in an obscure error when a `str`, as opposed to a
+ `list` of `str`, was used for taxonomy. See
[issue #709](https://github.com/biocore/biom-format/issues/709).
biom 2.1.4
=====================================
biom/cli/__init__.py
=====================================
@@ -30,6 +30,7 @@ def cli(ctx):
import_module('biom.cli.table_summarizer')
import_module('biom.cli.metadata_adder')
+import_module('biom.cli.metadata_exporter')
import_module('biom.cli.table_converter')
import_module('biom.cli.installation_informer')
import_module('biom.cli.table_subsetter')
=====================================
biom/cli/metadata_exporter.py
=====================================
@@ -0,0 +1,51 @@
+# -----------------------------------------------------------------------------
+# Copyright (c) 2011-2017, The BIOM Format Development Team.
+#
+# Distributed under the terms of the Modified BSD License.
+#
+# The full license is in the file COPYING.txt, distributed with this software.
+# -----------------------------------------------------------------------------
+
+import click
+
+from biom import load_table
+from biom.cli import cli
+
+
+ at cli.command(name='export-metadata')
+ at click.option('-i', '--input-fp', required=True,
+ type=click.Path(exists=True, dir_okay=False),
+ help='The input BIOM table')
+ at click.option('-m', '--sample-metadata-fp', required=False,
+ type=click.Path(exists=False, dir_okay=False),
+ help='The sample metadata output file.')
+ at click.option('--observation-metadata-fp', required=False,
+ type=click.Path(exists=False, dir_okay=False),
+ help='The observation metadata output file.')
+def export_metadata(input_fp, sample_metadata_fp, observation_metadata_fp):
+ """Export metadata as TSV.
+
+ Example usage:
+
+ Export metadata as TSV:
+
+ $ biom export-metadata -i otu_table.biom
+ --sample-metadata-fp sample.tsv
+ --observation-metadata-fp observation.tsv
+ """
+ table = load_table(input_fp)
+
+ if sample_metadata_fp:
+ _export_metadata(table, 'sample', input_fp, sample_metadata_fp)
+ if observation_metadata_fp:
+ _export_metadata(table, 'observation', input_fp,
+ observation_metadata_fp)
+
+
+def _export_metadata(table, axis, input_fp, output_fp):
+ try:
+ metadata = table.metadata_to_dataframe(axis)
+ metadata.to_csv(output_fp, sep='\t')
+ except KeyError:
+ click.echo('File {} does not contain {} metadata'.format(input_fp,
+ axis))
=====================================
biom/cli/table_validator.py
=====================================
@@ -29,9 +29,7 @@ from biom.util import HAVE_H5PY, biom_open, is_hdf5_file
' specification')
@click.option('-f', '--format-version', default=None,
help='The specific format version to validate against')
- at click.option('--detailed-report', is_flag=True, default=False,
- help='Include more details in the output report')
-def validate_table(input_fp, format_version, detailed_report):
+def validate_table(input_fp, format_version):
"""Validate a BIOM-formatted file.
Test a file for adherence to the Biological Observation Matrix (BIOM)
@@ -46,7 +44,7 @@ def validate_table(input_fp, format_version, detailed_report):
$ biom validate-table -i table.biom
"""
- valid, report = _validate_table(input_fp, format_version, detailed_report)
+ valid, report = _validate_table(input_fp, format_version)
click.echo("\n".join(report))
if valid:
# apparently silence is too quiet to be golden.
@@ -57,9 +55,8 @@ def validate_table(input_fp, format_version, detailed_report):
sys.exit(1)
-def _validate_table(input_fp, format_version=None, detailed_report=False):
- result = TableValidator()(table=input_fp, format_version=format_version,
- detailed_report=detailed_report)
+def _validate_table(input_fp, format_version=None):
+ result = TableValidator()(table=input_fp, format_version=format_version)
return result['valid_table'], result['report_lines']
@@ -108,23 +105,15 @@ class TableValidator(object):
raise IOError("h5py is not installed, can only validate JSON "
"tables")
- def __call__(self, table, format_version=None, detailed_report=False):
- return self.run(table=table, format_version=format_version,
- detailed_report=detailed_report)
+ def __call__(self, table, format_version=None):
+ return self.run(table=table, format_version=format_version)
def _validate_hdf5(self, **kwargs):
table = kwargs['table']
- # Need to make this an attribute so that we have this info during
- # validation.
- detailed_report = kwargs['detailed_report']
-
report_lines = []
valid_table = True
- if detailed_report:
- report_lines.append("Validating BIOM table...")
-
required_attrs = [
('format-url', self._valid_format_url),
('format-version', self._valid_hdf5_format_version),
@@ -154,9 +143,6 @@ class TableValidator(object):
report_lines.append("Missing attribute: '%s'" % required_attr)
continue
- if detailed_report:
- report_lines.append("Validating '%s'..." % required_attr)
-
status_msg = attr_validator(table)
if len(status_msg) > 0:
@@ -166,20 +152,12 @@ class TableValidator(object):
for group in required_groups:
if group not in table:
valid_table = False
- if detailed_report:
- report_lines.append("Missing group: %s" % group)
for dataset in required_datasets:
if dataset not in table:
valid_table = False
- if detailed_report:
- report_lines.append("Missing dataset: %s" % dataset)
if 'shape' in table.attrs:
- if detailed_report:
- report_lines.append("Validating 'shape' versus number of "
- "samples and observations...")
-
n_obs, n_samp = table.attrs['shape']
obs_ids = table.get('observation/ids', None)
samp_ids = table.get('sample/ids', None)
@@ -270,14 +248,10 @@ class TableValidator(object):
# Need to make this an attribute so that we have this info during
# validation.
self._format_version = kwargs['format_version']
- detailed_report = kwargs['detailed_report']
report_lines = []
valid_table = True
- if detailed_report:
- report_lines.append("Validating BIOM table...")
-
required_keys = [
('format', self._valid_format),
('format_url', self._valid_format_url),
@@ -299,9 +273,6 @@ class TableValidator(object):
report_lines.append("Missing field: '%s'" % key)
continue
- if detailed_report:
- report_lines.append("Validating '%s'..." % key)
-
status_msg = method(table_json)
if len(status_msg) > 0:
@@ -309,10 +280,6 @@ class TableValidator(object):
report_lines.append(status_msg)
if 'shape' in table_json:
- if detailed_report:
- report_lines.append("Validating 'shape' versus number of rows "
- "and columns...")
-
if ('rows' in table_json and
len(table_json['rows']) != table_json['shape'][0]):
valid_table = False
=====================================
biom/err.py
=====================================
@@ -75,7 +75,8 @@ OBSMDSIZE = "Size of observation metadata differs from matrix size!"
SAMPMDSIZE = "Size of sample metadata differs from matrix size!"
-def _test_empty(t):
+# _zz_ so the sort order places this test last
+def _zz_test_empty(t):
"""Check if t is empty"""
return t.is_empty()
@@ -250,8 +251,9 @@ class ErrorProfile(object):
if not args:
args = self._test.keys()
- for errtype in args:
+ for errtype in sorted(args):
test = self._test.get(errtype, lambda: None)
+
if test(item):
return self._handle_error(errtype, item)
@@ -318,7 +320,7 @@ class ErrorProfile(object):
__errprof = ErrorProfile()
-__errprof.register('empty', EMPTY, 'ignore', _test_empty,
+__errprof.register('empty', EMPTY, 'ignore', _zz_test_empty,
exception=TableException)
__errprof.register('obssize', OBSSIZE, 'raise', _test_obssize,
exception=TableException)
=====================================
biom/parse.py
=====================================
@@ -12,6 +12,8 @@ from __future__ import division
import numpy as np
from future.utils import string_types
+import io
+import h5py
from biom.exception import BiomParseException, UnknownAxisError
from biom.table import Table
@@ -341,13 +343,14 @@ def parse_uc(fh):
return Table(data, observation_ids=observation_ids, sample_ids=sample_ids)
-def parse_biom_table(fp, ids=None, axis='sample', input_is_dense=False):
- r"""Parses the biom table stored in the filepath `fp`
+def parse_biom_table(file_obj, ids=None, axis='sample', input_is_dense=False):
+ r"""Parses the biom table stored in `file_obj`
Parameters
----------
- fp : file like
- File alike object storing the BIOM table
+ file_obj : file-like object, or list
+ file-like object storing the BIOM table (tab-delimited or JSON), or
+ a list of lines of the BIOM table in tab-delimited or JSON format
ids : iterable
The sample/observation ids of the samples/observations that we need
to retrieve from the biom table
@@ -360,7 +363,7 @@ def parse_biom_table(fp, ids=None, axis='sample', input_is_dense=False):
Returns
-------
Table
- The BIOM table stored at fp
+ The BIOM table stored at file_obj
Raises
------
@@ -391,34 +394,36 @@ def parse_biom_table(fp, ids=None, axis='sample', input_is_dense=False):
UnknownAxisError(axis)
try:
- return Table.from_hdf5(fp, ids=ids, axis=axis)
+ return Table.from_hdf5(file_obj, ids=ids, axis=axis)
except ValueError:
pass
except RuntimeError:
pass
- if hasattr(fp, 'read'):
- old_pos = fp.tell()
+ if hasattr(file_obj, 'read'):
+ old_pos = file_obj.tell()
# Read in characters until first non-whitespace
# If it is a {, then this is (most likely) JSON
- c = fp.read(1)
+ c = file_obj.read(1)
while c.isspace():
- c = fp.read(1)
+ c = file_obj.read(1)
if c == '{':
- fp.seek(old_pos)
- t = Table.from_json(json.load(fp, object_pairs_hook=OrderedDict),
+ file_obj.seek(old_pos)
+ t = Table.from_json(json.load(file_obj,
+ object_pairs_hook=OrderedDict),
input_is_dense=input_is_dense)
else:
- fp.seek(old_pos)
- t = Table.from_tsv(fp, None, None, lambda x: x)
- elif isinstance(fp, list):
+ file_obj.seek(old_pos)
+ t = Table.from_tsv(file_obj, None, None, lambda x: x)
+ elif isinstance(file_obj, list):
try:
- t = Table.from_json(json.loads(''.join(fp),
+ t = Table.from_json(json.loads(''.join(file_obj),
object_pairs_hook=OrderedDict),
input_is_dense=input_is_dense)
except ValueError:
- t = Table.from_tsv(fp, None, None, lambda x: x)
+ t = Table.from_tsv(file_obj, None, None, lambda x: x)
else:
- t = Table.from_json(json.loads(fp, object_pairs_hook=OrderedDict),
+ t = Table.from_json(json.loads(file_obj,
+ object_pairs_hook=OrderedDict),
input_is_dense=input_is_dense)
def subset_ids(data, id_, md):
@@ -632,7 +637,8 @@ def load_table(f):
Parameters
----------
- f : str
+ f : str or file-like object
+ The entity to parse
Returns
-------
@@ -655,9 +661,15 @@ def load_table(f):
>>> table = load_table('path/to/table.biom') # doctest: +SKIP
"""
- with biom_open(f) as fp:
+ if isinstance(f, (io.IOBase, h5py.File)):
try:
- table = parse_biom_table(fp)
+ table = parse_biom_table(f)
except (IndexError, TypeError):
raise TypeError("%s does not appear to be a BIOM file!" % f)
+ else:
+ with biom_open(f) as fp:
+ try:
+ table = parse_biom_table(fp)
+ except (IndexError, TypeError):
+ raise TypeError("%s does not appear to be a BIOM file!" % f)
return table
=====================================
biom/table.py
=====================================
@@ -178,7 +178,7 @@ import scipy.stats
from copy import deepcopy
from datetime import datetime
from json import dumps
-from functools import reduce
+from functools import reduce, partial
from operator import itemgetter
from future.builtins import zip
from future.utils import viewitems
@@ -2822,7 +2822,7 @@ class Table(object):
Parameters
----------
inplace : bool, optional
- Defaults to ``False``
+ Defaults to ``True``
Returns
-------
@@ -3103,7 +3103,7 @@ class Table(object):
for col_idx in indices[start:end]:
yield (obs_id, samp_ids[col_idx])
- def nonzero_counts(self, axis, binary=False):
+ def nonzero_counts(self, axis, binary=True):
"""Get nonzero summaries about an axis
Parameters
@@ -3111,7 +3111,7 @@ class Table(object):
axis : {'sample', 'observation', 'whole'}
The axis on which to count nonzero entries
binary : bool, optional
- Defaults to ``False``. If ``True``, return number of nonzero
+ Defaults to ``True``. If ``True``, return number of nonzero
entries. If ``False``, sum the values of the entries.
Returns
@@ -3252,26 +3252,26 @@ class Table(object):
alignable_o = self_o == other_o
alignable_s = self_s == other_s
- if axis is 'both' and not (alignable_o and alignable_s):
+ if axis == 'both' and not (alignable_o and alignable_s):
raise DisjointIDError("Cannot align both axes")
- elif axis is 'sample' and not alignable_s:
+ elif axis == 'sample' and not alignable_s:
raise DisjointIDError("Cannot align samples")
- elif axis is 'observation' and not alignable_o:
+ elif axis == 'observation' and not alignable_o:
raise DisjointIDError("Cannot align observations")
- elif axis is 'detect' and not (alignable_o or alignable_s):
+ elif axis == 'detect' and not (alignable_o or alignable_s):
raise DisjointIDError("Neither axis appears alignable")
- if axis is 'both':
+ if axis == 'both':
order = ['observation', 'sample']
- elif axis is 'detect':
+ elif axis == 'detect':
order = []
if alignable_s:
order.append('sample')
if alignable_o:
order.append('observation')
- elif axis is 'sample':
+ elif axis == 'sample':
order = ['sample']
- elif axis is 'observation':
+ elif axis == 'observation':
order = ['observation']
else:
raise UnknownAxisError("Unrecognized axis: %s" % axis)
@@ -3506,18 +3506,18 @@ class Table(object):
"""
# determine the sample order in the resulting table
- if sample is 'union':
+ if sample == 'union':
new_samp_order = self._union_id_order(self.ids(), other.ids())
- elif sample is 'intersection':
+ elif sample == 'intersection':
new_samp_order = self._intersect_id_order(self.ids(), other.ids())
else:
raise TableException("Unknown sample merge type: %s" % sample)
# determine the observation order in the resulting table
- if observation is 'union':
+ if observation == 'union':
new_obs_order = self._union_id_order(
self.ids(axis='observation'), other.ids(axis='observation'))
- elif observation is 'intersection':
+ elif observation == 'intersection':
new_obs_order = self._intersect_id_order(
self.ids(axis='observation'), other.ids(axis='observation'))
else:
@@ -4045,9 +4045,10 @@ html
mat = self.matrix_data.toarray()
constructor = pd.DataFrame
else:
- mat = [pd.SparseSeries(r.toarray().squeeze())
- for r in self.matrix_data.tocsr()]
- constructor = pd.SparseDataFrame
+ mat = self.matrix_data
+ constructor = partial(pd.SparseDataFrame,
+ default_fill_value=0,
+ copy=True)
return constructor(mat, index=index, columns=columns)
@@ -4688,6 +4689,14 @@ html
.. shownumpydoc
"""
+ def isfloat(value):
+ # see https://stackoverflow.com/a/20929881
+ try:
+ float(value)
+ return True
+ except ValueError:
+ return False
+
if not isinstance(lines, list):
try:
hasattr(lines, 'seek')
@@ -4706,37 +4715,28 @@ html
# Covers the case where the first line is the header
# and there is no indication of it (no comment character)
if not header:
- header = line.strip().split(delim)[1:]
+ header = line.rstrip().split(delim)[1:]
data_start = list_index + 1
else:
data_start = list_index
break
list_index += 1
header = line.strip().split(delim)[1:]
- # If the first line is the header, then we need to get the next
+
+ # If the first line is the header, then we need to get the data lines
# line for the "last column" check
if isinstance(lines, list):
- line = lines[data_start]
+ value_checks = lines[data_start:]
else:
lines.seek(0)
- for index in range(0, data_start + 1):
- line = lines.readline()
+ for index in range(0, data_start):
+ lines.readline()
+ value_checks = [line for line in lines]
# attempt to determine if the last column is non-numeric, ie, metadata
- first_values = line.strip().split(delim)
- last_value = first_values[-1]
- last_column_is_numeric = True
-
- if '.' in last_value:
- try:
- float(last_value)
- except ValueError:
- last_column_is_numeric = False
- else:
- try:
- int(last_value)
- except ValueError:
- last_column_is_numeric = False
+ last_values = [line.rsplit(delim, 1)[-1].strip()
+ for line in value_checks]
+ last_column_is_numeric = all([isfloat(i) for i in last_values])
# determine sample ids
if last_column_is_numeric:
@@ -4761,13 +4761,13 @@ html
lines = lines[data_start:]
for lineno, line in enumerate(lines, data_start):
- line = line.strip()
- if not line:
+ if not line.strip():
continue
if line.startswith('#'):
continue
- fields = line.strip().split(delim)
+ fields = line.split(delim)
+ fields[-1] = fields[-1].strip()
obs_ids.append(fields[0])
if last_column_is_numeric:
=====================================
biom/tests/test_cli/test_validate_table.py
=====================================
@@ -121,9 +121,8 @@ class TableValidatorTests(TestCase):
f.close()
self.to_remove.append('valid_test3')
- obs = self.cmd(table='valid_test3', detailed_report=True)
+ obs = self.cmd(table='valid_test3')
self.assertTrue(obs['valid_table'])
- self.assertTrue(len(obs['report_lines']) > 0)
def test_invalid(self):
"""Correctly invalidates a table that is... invalid."""
=====================================
biom/tests/test_err.py
=====================================
@@ -11,12 +11,14 @@
from unittest import TestCase, main
from copy import deepcopy
+import numpy as np
+
from biom import example_table, Table
from biom.exception import TableException
-from biom.err import (_test_empty, _test_obssize, _test_sampsize, _test_obsdup,
- _test_sampdup, _test_obsmdsize, _test_sampmdsize,
- errstate, geterr, seterr, geterrcall, seterrcall,
- errcheck, __errprof)
+from biom.err import (_zz_test_empty, _test_obssize, _test_sampsize,
+ _test_obsdup, _test_sampdup, _test_obsmdsize,
+ _test_sampmdsize, errstate, geterr, seterr, geterrcall,
+ seterrcall, errcheck, __errprof)
runtime_ep = __errprof
@@ -30,8 +32,8 @@ class ErrModeTests(TestCase):
self.ex_table = example_table.copy()
def test_test_empty(self):
- self.assertTrue(_test_empty(Table([], [], [])))
- self.assertFalse(_test_empty(self.ex_table))
+ self.assertTrue(_zz_test_empty(Table([], [], [])))
+ self.assertFalse(_zz_test_empty(self.ex_table))
def test_test_obssize(self):
self.assertFalse(_test_obssize(self.ex_table))
@@ -87,6 +89,17 @@ class ErrorProfileTests(TestCase):
self.assertTrue(isinstance(self.ep.test(self.ex_table, 'obssize'),
TableException))
+ def test_test_evaluation_order(self):
+ # issue 813
+ tab = Table(np.array([[1, 2], [3, 4]]), ['A', 'B'], ['C', 'D'])
+ tab._observation_ids = np.array(['A', 'A'], dtype='object')
+ tab._sample_ids = np.array(['B', 'B'], dtype='object')
+
+ self.assertEqual(self.ep.test(tab, 'obsdup', 'sampdup').args[0],
+ 'Duplicate observation IDs')
+ self.assertEqual(self.ep.test(tab, 'sampdup', 'obsdup').args[0],
+ 'Duplicate observation IDs')
+
def test_state(self):
self.ep.state = {'all': 'ignore'}
self.assertEqual(set(self.ep._state.values()), set(['ignore']))
=====================================
biom/tests/test_parse.py
=====================================
@@ -16,7 +16,8 @@ from unittest import TestCase, main
import numpy as np
import numpy.testing as npt
-from biom.parse import generatedby, MetadataMap, parse_biom_table, parse_uc
+from biom.parse import (generatedby, MetadataMap, parse_biom_table, parse_uc,
+ load_table)
from biom.table import Table
from biom.util import HAVE_H5PY, __version__
from biom.tests.long_lines import (uc_empty, uc_invalid_id, uc_minimal,
@@ -237,6 +238,32 @@ class ParseTests(TestCase):
Table.from_hdf5(h5py.File('test_data/test.biom'))
os.chdir(cwd)
+ @npt.dec.skipif(HAVE_H5PY is False, msg='H5PY is not installed')
+ def test_load_table_filepath(self):
+ cwd = os.getcwd()
+ if '/' in __file__[1:]:
+ os.chdir(__file__.rsplit('/', 1)[0])
+ load_table('test_data/test.biom')
+ os.chdir(cwd)
+
+ @npt.dec.skipif(HAVE_H5PY is False, msg='H5PY is not installed')
+ def test_load_table_inmemory(self):
+ cwd = os.getcwd()
+ if '/' in __file__[1:]:
+ os.chdir(__file__.rsplit('/', 1)[0])
+ load_table(h5py.File('test_data/test.biom'))
+ os.chdir(cwd)
+
+ def test_load_table_inmemory_json(self):
+ cwd = os.getcwd()
+ if '/' in __file__[1:]:
+ os.chdir(__file__.rsplit('/', 1)[0])
+ load_table(open('test_data/test.json'))
+ os.chdir(cwd)
+
+ def test_load_table_inmemory_stringio(self):
+ load_table(StringIO('\n'.join(self.classic_otu_table1_no_tax)))
+
def test_parse_biom_table(self):
"""tests for parse_biom_table when we do not have h5py"""
# This is a TSV as a list of lines
=====================================
biom/tests/test_table.py
=====================================
@@ -1475,10 +1475,17 @@ class TableTests(TestCase):
def test_to_dataframe(self):
exp = pd.SparseDataFrame(np.array([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]),
index=['O1', 'O2'],
- columns=['S1', 'S2', 'S3'])
+ columns=['S1', 'S2', 'S3'],
+ default_fill_value=0.0)
obs = example_table.to_dataframe()
pdt.assert_frame_equal(obs, exp)
+ def test_to_dataframe_is_sparse(self):
+ df = example_table.to_dataframe()
+ density = (float(example_table.matrix_data.getnnz()) /
+ np.prod(example_table.shape))
+ assert np.allclose(df.density, density)
+
def test_to_dataframe_dense(self):
exp = pd.DataFrame(np.array([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]),
index=['O1', 'O2'],
@@ -2228,9 +2235,9 @@ class SparseTableTests(TestCase):
exp_obs = np.array([14, 15, 0])
exp_whole = np.array([29])
- obs_samp = st.nonzero_counts('sample')
- obs_obs = st.nonzero_counts('observation')
- obs_whole = st.nonzero_counts('whole')
+ obs_samp = st.nonzero_counts('sample', binary=False)
+ obs_obs = st.nonzero_counts('observation', binary=False)
+ obs_whole = st.nonzero_counts('whole', binary=False)
npt.assert_equal(obs_samp, exp_samp)
npt.assert_equal(obs_obs, exp_obs)
@@ -3771,6 +3778,47 @@ class SparseTableTests(TestCase):
obs = Table._extract_data_from_tsv(input, dtype=int)
npt.assert_equal(obs, exp)
+ def test_extract_data_from_tsv_bad_metadata(self):
+ input = legacy_otu_table_bad_metadata.splitlines()
+ samp_ids = ['Fing', 'Key', 'NA']
+ obs_ids = ['0', '1', '7', '3', '4']
+ metadata = [
+ '',
+ 'Bacteria; Firmicutes; Alicyclobacillaceae; Bacilli; Lactobacillal'
+ 'es; Lactobacillales; Streptococcaceae; Streptococcus',
+ 'Bacteria; Actinobacteria; Actinobacteridae; Gordoniaceae; Coryneb'
+ 'acteriaceae',
+ 'Bacteria; Firmicutes; Alicyclobacillaceae; Bacilli; Staphylococca'
+ 'ceae',
+ 'Bacteria; Cyanobacteria; Chloroplasts; vectors']
+ md_name = 'Consensus Lineage'
+ data = [[0, 0, 19111], [0, 1, 44536], [0, 2, 42],
+ [1, 0, 1216], [1, 1, 3500], [1, 2, 6],
+ [2, 0, 1803], [2, 1, 1184], [2, 2, 2],
+ [3, 0, 1722], [3, 1, 4903], [3, 2, 17],
+ [4, 0, 589], [4, 1, 2074], [4, 2, 34]]
+
+ exp = (samp_ids, obs_ids, data, metadata, md_name)
+ obs = Table._extract_data_from_tsv(input, dtype=int)
+ npt.assert_equal(obs, exp)
+
+ # and assert the exact identified bug in #827 is resolved
+ input = extract_tsv_bug.splitlines()
+ samp_ids = ['s1', 's2']
+ obs_ids = ['1', '2', '3']
+ metadata = [
+ '',
+ 'k__test;p__test',
+ 'k__test;p__test']
+ md_name = 'taxonomy'
+ data = [[0, 0, 123], [0, 1, 32],
+ [1, 0, 315], [1, 1, 3],
+ [2, 1, 22]]
+
+ exp = (samp_ids, obs_ids, data, metadata, md_name)
+ obs = Table._extract_data_from_tsv(input, dtype=int)
+ npt.assert_equal(obs, exp)
+
def test_identify_bad_value(self):
pos = [str(i) for i in range(10)]
exp = (None, None)
@@ -4116,6 +4164,21 @@ ae; Corynebacteriaceae
aphylococcaceae
4\t589\t2074\t34\tBacteria; Cyanobacteria; Chloroplasts; vectors
"""
+legacy_otu_table_bad_metadata = u"""# some comment goes here
+#OTU id\tFing\tKey\tNA\tConsensus Lineage
+0\t19111\t44536\t42 \t
+1\t1216\t3500\t6\tBacteria; Firmicutes; Alicyclobacillaceae; Bacilli; La\
+ctobacillales; Lactobacillales; Streptococcaceae; Streptococcus
+7\t1803\t1184\t2\tBacteria; Actinobacteria; Actinobacteridae; Gordoniace\
+ae; Corynebacteriaceae
+3\t1722\t4903\t17\tBacteria; Firmicutes; Alicyclobacillaceae; Bacilli; St\
+aphylococcaceae
+4\t589\t2074\t34\tBacteria; Cyanobacteria; Chloroplasts; vectors
+"""
+extract_tsv_bug = """#OTU ID s1 s2 taxonomy
+1 123 32\t
+2 315 3 k__test;p__test
+3 0 22 k__test;p__test"""
otu_table1 = u"""# Some comment
#OTU ID\tFing\tKey\tNA\tConsensus Lineage
0\t19111\t44536\t42\tBacteria; Actinobacteria; Actinobacteridae; \
=====================================
biom/util.py
=====================================
@@ -9,7 +9,6 @@
# ----------------------------------------------------------------------------
import os
-import sys
import inspect
from contextlib import contextmanager
import io
@@ -27,12 +26,8 @@ try:
import h5py
HAVE_H5PY = True
- if sys.version_info.major == 2:
- H5PY_VLEN_STR = h5py.special_dtype(vlen=unicode) # noqa
- H5PY_VLEN_UNICODE = h5py.special_dtype(vlen=unicode) # noqa
- else:
- H5PY_VLEN_STR = h5py.special_dtype(vlen=str)
- H5PY_VLEN_UNICODE = h5py.special_dtype(vlen=str)
+ H5PY_VLEN_STR = h5py.special_dtype(vlen=str)
+ H5PY_VLEN_UNICODE = h5py.special_dtype(vlen=str)
except ImportError:
HAVE_H5PY = False
@@ -50,7 +45,7 @@ __url__ = "http://biom-format.org"
__maintainer__ = "Daniel McDonald"
__email__ = "daniel.mcdonald at colorado.edu"
__format_version__ = (2, 1)
-__version__ = "2.1.7"
+__version__ = "2.1.8"
def generate_subsamples(table, n, axis='sample', by_id=False):
@@ -390,7 +385,8 @@ def is_gzip(fp):
project, but we obtained permission from the authors of this function to
port it to the BIOM Format project (and keep it under BIOM's BSD license).
"""
- return open(fp, 'rb').read(2) == b'\x1f\x8b'
+ with open(fp, 'rb') as f:
+ return f.read(2) == b'\x1f\x8b'
@contextmanager
=====================================
doc/conf.py
=====================================
@@ -66,8 +66,8 @@ copyright = u'2011-2018 The BIOM Format Development Team'
# built documents.
#
# The full version, including alpha/beta/rc tags.
-version = "2.1.7"
-release = "2.1.7"
+version = "2.1.8"
+release = "2.1.8"
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
=====================================
setup.py
=====================================
@@ -9,7 +9,6 @@
# The full license is in the file COPYING.txt, distributed with this software.
# ----------------------------------------------------------------------------
-import os
import sys
from setuptools import setup, find_packages
@@ -21,6 +20,13 @@ try:
except ImportError:
raise ImportError("numpy must be installed prior to installing biom")
+
+try:
+ from Cython.Build import cythonize
+except ImportError:
+ raise ImportError("cython must be installed prior to installing biom")
+
+
# Hack to prevent stupid "TypeError: 'NoneType' object is not callable" error
# in multiprocessing/util.py _exit_function when running `python
# setup.py test` (see
@@ -37,7 +43,7 @@ __copyright__ = "Copyright 2011-2017, The BIOM Format Development Team"
__credits__ = ["Greg Caporaso", "Daniel McDonald", "Jose Clemente",
"Jai Ram Rideout", "Jorge CaƱardo Alastuey", "Michael Hall"]
__license__ = "BSD"
-__version__ = "2.1.7"
+__version__ = "2.1.8"
__maintainer__ = "Daniel McDonald"
__email__ = "mcdonadt at colorado.edu"
@@ -92,10 +98,9 @@ classes = """
Topic :: Software Development :: Libraries :: Application Frameworks
Topic :: Software Development :: Libraries :: Python Modules
Programming Language :: Python
- Programming Language :: Python :: 2.7
- Programming Language :: Python :: 3.4
- Programming Language :: Python :: 3.5
Programming Language :: Python :: 3.6
+ Programming Language :: Python :: 3.7
+ Programming Language :: Python :: 3.8
Programming Language :: Python :: Implementation :: CPython
Operating System :: OS Independent
Operating System :: POSIX :: Linux
@@ -104,8 +109,7 @@ classes = """
classifiers = [s.strip() for s in classes.split('\n') if s]
# Dealing with Cython
-USE_CYTHON = os.environ.get('USE_CYTHON', False)
-ext = '.pyx' if USE_CYTHON else '.c'
+ext = '.pyx'
extensions = [Extension("biom._filter",
["biom/_filter" + ext],
include_dirs=[np.get_include()]),
@@ -115,22 +119,15 @@ extensions = [Extension("biom._filter",
Extension("biom._subsample",
["biom/_subsample" + ext],
include_dirs=[np.get_include()])]
-
-if USE_CYTHON:
- from Cython.Build import cythonize
- extensions = cythonize(extensions)
+extensions = cythonize(extensions)
install_requires = ["click", "numpy >= 1.9.2", "future >= 0.16.0",
- "scipy >= 0.13.0", 'pandas >= 0.20.0',
- "six >= 1.10.0"]
+ "scipy >= 1.3.1", 'pandas >= 0.20.0',
+ "six >= 1.10.0", "cython >= 0.29"]
-# HACK: for backward-compatibility with QIIME 1.9.x, pyqi must be installed.
-# pyqi is not used anymore in this project.
if sys.version_info[0] < 3:
- install_requires.append("pyqi")
- import warnings
- warnings.warn("Python 2.7 support will be removed on the next release",
- DeprecationWarning)
+ raise SystemExit("Python 2.7 is no longer supported")
+
setup(name='biom-format',
version=__version__,
View it on GitLab: https://salsa.debian.org/med-team/python-biom-format/commit/d993b066e9879c3f852b79a16221b748d9cdf615
--
View it on GitLab: https://salsa.debian.org/med-team/python-biom-format/commit/d993b066e9879c3f852b79a16221b748d9cdf615
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200120/3d68aa9d/attachment-0001.html>
More information about the debian-med-commit
mailing list