[med-svn] [Git][med-team/python-xopen][master] 6 commits: New upstream version 1.2.0
Nilesh Patra (@nilesh)
gitlab at salsa.debian.org
Tue Sep 28 16:57:16 BST 2021
Nilesh Patra pushed to branch master at Debian Med / python-xopen
Commits:
79cdd80a by Nilesh Patra at 2021-09-28T21:20:35+05:30
New upstream version 1.2.0
- - - - -
4b692083 by Nilesh Patra at 2021-09-28T21:20:35+05:30
Update upstream source from tag 'upstream/1.2.0'
Update to upstream version '1.2.0'
with Debian dir d31410f56b3d0b6885cf8f201a59c720dcc55275
- - - - -
70d8fbb5 by Nilesh Patra at 2021-09-28T21:21:11+05:30
d/copyright: Update copyright holder
- - - - -
76a9bc2a by Nilesh Patra at 2021-09-28T21:23:30+05:30
d/copyright: Change MIT => Expat
- - - - -
e3c2a1e4 by Nilesh Patra at 2021-09-28T21:23:51+05:30
Bump Standards-Version to 4.6.0 (no changes needed)
- - - - -
fb5e8fb4 by Nilesh Patra at 2021-09-28T21:24:36+05:30
Upload to unstable
- - - - -
18 changed files:
- .github/workflows/ci.yml
- LICENSE
- PKG-INFO
- README.rst
- debian/changelog
- debian/control
- debian/copyright
- pyproject.toml
- setup.cfg
- setup.py
- src/xopen.egg-info/PKG-INFO
- src/xopen.egg-info/SOURCES.txt
- src/xopen.egg-info/requires.txt
- src/xopen/__init__.py
- src/xopen/_version.py
- + tests/file.txt.test
- tests/test_xopen.py
- tox.ini
Changes:
=====================================
.github/workflows/ci.yml
=====================================
@@ -4,7 +4,7 @@ on: [push, pull_request]
jobs:
lint:
- timeout-minutes: 5
+ timeout-minutes: 10
runs-on: ubuntu-latest
strategy:
matrix:
@@ -22,12 +22,12 @@ jobs:
run: tox -e ${{ matrix.toxenv }}
test:
- timeout-minutes: 5
+ timeout-minutes: 10
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
- python-version: [3.6, 3.7, 3.8, 3.9, pypy3]
+ python-version: ["3.6", "3.7", "3.8", "3.9", "pypy-3.7"]
include:
- os: macos-latest
python-version: 3.7
@@ -35,13 +35,12 @@ jobs:
python-version: 3.7
with-isal: true
steps:
- - name: Install pigz
- run: >
- if [[ ${{ startsWith(matrix.os, 'macos') }} = true ]]; then
- brew install pigz;
- else
- sudo apt-get install pigz;
- fi
+ - name: Install pigz and pbzip2 MacOS
+ if: startsWith(matrix.os, 'macos')
+ run: brew install pigz pbzip2
+ - name: Install pigz and pbzip2 Linux
+ if: startsWith(matrix.os, 'ubuntu')
+ run: sudo apt-get install pigz pbzip2
- name: Install isal
if: matrix.with-isal && !startsWith(matrix.os, 'macos')
run: sudo apt-get install isal libisal-dev
@@ -62,7 +61,7 @@ jobs:
uses: codecov/codecov-action at v1
deploy:
- timeout-minutes: 5
+ timeout-minutes: 10
runs-on: ubuntu-latest
needs: [lint, test]
if: startsWith(github.ref, 'refs/tags')
@@ -76,8 +75,8 @@ jobs:
python-version: 3.7
- name: Make distributions
run: |
- python setup.py sdist
- python -m pip wheel --no-deps -w dist/ .
+ python -m pip install build
+ python -m build
ls -l dist/
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish at v1.4.1
=====================================
LICENSE
=====================================
@@ -1,4 +1,4 @@
-Copyright (c) 2010-2019 Marcel Martin <mail at marcelm.net>
+Copyright (c) 2010-2021 xopen developers
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
=====================================
PKG-INFO
=====================================
@@ -1,181 +1,194 @@
Metadata-Version: 2.1
Name: xopen
-Version: 1.1.0
+Version: 1.2.0
Summary: Open compressed files transparently
-Home-page: https://github.com/marcelm/xopen/
-Author: Marcel Martin
+Home-page: https://github.com/pycompression/xopen/
+Author: Marcel Martin et al.
Author-email: mail at marcelm.net
License: MIT
-Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
- :target: https://travis-ci.org/marcelm/xopen
- :alt:
-
- .. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
- :target: https://pypi.python.org/pypi/xopen
-
- .. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg
- :target: https://anaconda.org/conda-forge/xopen
- :alt:
-
- .. image:: https://codecov.io/gh/marcelm/xopen/branch/master/graph/badge.svg
- :target: https://codecov.io/gh/marcelm/xopen
- :alt:
-
- =====
- xopen
- =====
-
- This small Python module provides an ``xopen`` function that works like the
- built-in ``open`` function, but can also deal with compressed files.
- Supported compression formats are gzip, bzip2 and xz. They are automatically
- recognized by their file extensions `.gz`, `.bz2` or `.xz`.
-
- The focus is on being as efficient as possible on all supported Python versions.
- For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``,
- to open ``.gz`` files, which is faster than using the built-in ``gzip.open``
- function. ``pigz`` can use multiple threads when compressing, but is also faster
- when reading ``.gz`` files, so it is used both for reading and writing if it is
- available. For gzip compression levels 1 to 3,
- `igzip <https://github.com/intel/isa-l/>`_ is used for an even greater speedup.
-
- For use cases where using only the main thread is desired xopen can be used
- with ``threads=0``. This will use `python-isal
- <https://github.com/pycompression/python-isal>`_ (which binds isa-l) if
- python-isal is installed (automatic on Linux systems, as it is a requirement).
- For installation instructions for python-isal please
- checkout the `python-isal homepage
- <https://github.com/pycompression/python-isal>`_. If python-isal is not
- available ``gzip.open`` is used.
-
- This module has originally been developed as part of the `Cutadapt
- tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
- manipulate sequencing data. It has been in successful use within that software
- for a few years.
-
- ``xopen`` is compatible with Python versions 3.6 and later.
-
-
- Usage
- -----
-
- Open a file for reading::
-
- from xopen import xopen
-
- with xopen('file.txt.xz') as f:
- content = f.read()
-
- Or without context manager::
-
- from xopen import xopen
-
- f = xopen('file.txt.xz')
- content = f.read()
- f.close()
-
- Open a file in binary mode for writing::
-
- from xopen import xopen
-
- with xopen('file.txt.gz', mode='wb') as f:
- f.write(b'Hello')
-
-
- Credits
- -------
-
- The name ``xopen`` was taken from the C function of the same name in the
- `utils.h file which is part of
- BWA <https://github.com/lh3/bwa/blob/83662032a2192d5712996f36069ab02db82acf67/utils.h>`_.
-
- Kyle Beauchamp <https://github.com/kyleabeauchamp/> has contributed support for
- appending to files.
-
- Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to
- make reading and writing gzipped files faster.
-
- Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for
- format detection from content.
-
- Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_.
- If you also want to open S3 files, you may want to use that module instead.
-
-
- Changes
- -------
- v1.1.0
- ~~~~~~
- * Python 3.5 support is dropped.
- * On Linux systems, `python-isal <https://github.com/pycompression/python-isal>`_
- is now added as a requirement. This will speed up the reading of gzip files
- significantly when no external processes are used.
-
- v1.0.0
- ~~~~~~
- * If installed, the ``igzip`` program (part of
- `Intel ISA-L <https://github.com/intel/isa-l/>`_) is now used for reading
- and writing gzip-compressed files at compression levels 1-3, which results
- in a significant speedup.
-
- v0.9.0
- ~~~~~~
- * When the file name extension of a file to be opened for reading is not
- available, the content is inspected (if possible) and used to determine
- which compression format applies.
- * This release drops Python 2.7 and 3.4 support. Python 3.5 or later is
- now required.
-
- v0.8.4
- ~~~~~~
- * When reading gzipped files, force ``pigz`` to use only a single process.
- ``pigz`` cannot use multiple cores anyway when decompressing. By default,
- it would use extra I/O processes, which slightly reduces wall-clock time,
- but increases CPU time. Single-core decompression with ``pigz`` is still
- about twice as fast as regular ``gzip``.
- * Allow ``threads=0`` for specifying that no external ``pigz``/``gzip``
- process should be used (then regular ``gzip.open()`` is used instead).
-
- v0.8.3
- ~~~~~~
- * When reading gzipped files, let ``pigz`` use at most four threads by default.
- This limit previously only applied when writing to a file.
- * Support Python 3.8
-
- v0.8.0
- ~~~~~~
- * Speed improvements when iterating over gzipped files.
-
- v0.6.0
- ~~~~~~
- * For reading from gzipped files, xopen will now use a ``pigz`` subprocess.
- This is faster than using ``gzip.open``.
- * Python 2 support will be dropped in one of the next releases.
-
- v0.5.0
- ~~~~~~
- * By default, pigz is now only allowed to use at most four threads. This hopefully reduces
- problems some users had with too many threads when opening many files at the same time.
- * xopen now accepts pathlib.Path objects.
-
-
- Contributors
- ------------
-
- * Marcel Martin
- * Ruben Vorderman
- * For more contributors, see <https://github.com/marcelm/xopen/graphs/contributors>
-
-
- Links
- -----
-
- * `Source code <https://github.com/marcelm/xopen/>`_
- * `Report an issue <https://github.com/marcelm/xopen/issues>`_
- * `Project page on PyPI (Python package index) <https://pypi.python.org/pypi/xopen/>`_
-
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.6
Provides-Extra: dev
+License-File: LICENSE
+
+.. image:: https://github.com/pycompression/xopen/workflows/CI/badge.svg
+ :target: https://github.com/pycompression/xopen
+ :alt:
+
+.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
+ :target: https://pypi.python.org/pypi/xopen
+
+.. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg
+ :target: https://anaconda.org/conda-forge/xopen
+ :alt:
+
+.. image:: https://codecov.io/gh/pycompression/xopen/branch/main/graph/badge.svg
+ :target: https://codecov.io/gh/pycompression/xopen
+ :alt:
+
+=====
+xopen
+=====
+
+This small Python module provides an ``xopen`` function that works like the
+built-in ``open`` function, but can also deal with compressed files.
+Supported compression formats are gzip, bzip2 and xz. They are automatically
+recognized by their file extensions `.gz`, `.bz2` or `.xz`.
+
+The focus is on being as efficient as possible on all supported Python versions.
+For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``,
+to open ``.gz`` files, which is faster than using the built-in ``gzip.open``
+function. ``pigz`` can use multiple threads when compressing, but is also faster
+when reading ``.gz`` files, so it is used both for reading and writing if it is
+available. For gzip compression levels 1 to 3,
+`igzip <https://github.com/intel/isa-l/>`_ is used for an even greater speedup.
+
+For use cases where using only the main thread is desired xopen can be used
+with ``threads=0``. This will use `python-isal
+<https://github.com/pycompression/python-isal>`_ (which binds isa-l) if
+python-isal is installed (automatic on Linux systems, as it is a requirement).
+For installation instructions for python-isal please
+checkout the `python-isal homepage
+<https://github.com/pycompression/python-isal>`_. If python-isal is not
+available ``gzip.open`` is used.
+
+This module has originally been developed as part of the `Cutadapt
+tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
+manipulate sequencing data. It has been in successful use within that software
+for a few years.
+
+``xopen`` is compatible with Python versions 3.6 and later.
+
+
+Usage
+-----
+
+Open a file for reading::
+
+ from xopen import xopen
+
+ with xopen('file.txt.xz') as f:
+ content = f.read()
+
+Or without context manager::
+
+ from xopen import xopen
+
+ f = xopen('file.txt.xz')
+ content = f.read()
+ f.close()
+
+Open a file in binary mode for writing::
+
+ from xopen import xopen
+
+ with xopen('file.txt.gz', mode='wb') as f:
+ f.write(b'Hello')
+
+
+Credits
+-------
+
+The name ``xopen`` was taken from the C function of the same name in the
+`utils.h file which is part of
+BWA <https://github.com/lh3/bwa/blob/83662032a2192d5712996f36069ab02db82acf67/utils.h>`_.
+
+Kyle Beauchamp <https://github.com/kyleabeauchamp/> has contributed support for
+appending to files.
+
+Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to
+make reading and writing gzipped files faster.
+
+Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for
+format detection from content.
+
+Dries Schaumont <https://github.com/DriesSchaumont> contributed support for
+faster bz2 reading and writing using pbzip2.
+
+Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_.
+If you also want to open S3 files, you may want to use that module instead.
+
+
+Changes
+-------
+
+v1.2.0
+~~~~~~
+
+* `pbzip2 <http://compression.ca/pbzip2/>`_ is now used to open ``.bz2`` files if
+ ``threads`` is greater than zero.
+
+v1.1.0
+~~~~~~
+* Python 3.5 support is dropped.
+* On Linux systems, `python-isal <https://github.com/pycompression/python-isal>`_
+ is now added as a requirement. This will speed up the reading of gzip files
+ significantly when no external processes are used.
+
+v1.0.0
+~~~~~~
+* If installed, the ``igzip`` program (part of
+ `Intel ISA-L <https://github.com/intel/isa-l/>`_) is now used for reading
+ and writing gzip-compressed files at compression levels 1-3, which results
+ in a significant speedup.
+
+v0.9.0
+~~~~~~
+* When the file name extension of a file to be opened for reading is not
+ available, the content is inspected (if possible) and used to determine
+ which compression format applies.
+* This release drops Python 2.7 and 3.4 support. Python 3.5 or later is
+ now required.
+
+v0.8.4
+~~~~~~
+* When reading gzipped files, force ``pigz`` to use only a single process.
+ ``pigz`` cannot use multiple cores anyway when decompressing. By default,
+ it would use extra I/O processes, which slightly reduces wall-clock time,
+ but increases CPU time. Single-core decompression with ``pigz`` is still
+ about twice as fast as regular ``gzip``.
+* Allow ``threads=0`` for specifying that no external ``pigz``/``gzip``
+ process should be used (then regular ``gzip.open()`` is used instead).
+
+v0.8.3
+~~~~~~
+* When reading gzipped files, let ``pigz`` use at most four threads by default.
+ This limit previously only applied when writing to a file.
+* Support Python 3.8
+
+v0.8.0
+~~~~~~
+* Speed improvements when iterating over gzipped files.
+
+v0.6.0
+~~~~~~
+* For reading from gzipped files, xopen will now use a ``pigz`` subprocess.
+ This is faster than using ``gzip.open``.
+* Python 2 support will be dropped in one of the next releases.
+
+v0.5.0
+~~~~~~
+* By default, pigz is now only allowed to use at most four threads. This hopefully reduces
+ problems some users had with too many threads when opening many files at the same time.
+* xopen now accepts pathlib.Path objects.
+
+
+Contributors
+------------
+
+* Marcel Martin
+* Ruben Vorderman
+* For more contributors, see <https://github.com/pycompression/xopen/graphs/contributors>
+
+
+Links
+-----
+
+* `Source code <https://github.com/pycompression/xopen/>`_
+* `Report an issue <https://github.com/pycompression/xopen/issues>`_
+* `Project page on PyPI (Python package index) <https://pypi.python.org/pypi/xopen/>`_
+
+
=====================================
README.rst
=====================================
@@ -1,5 +1,5 @@
-.. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
- :target: https://travis-ci.org/marcelm/xopen
+.. image:: https://github.com/pycompression/xopen/workflows/CI/badge.svg
+ :target: https://github.com/pycompression/xopen
:alt:
.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
@@ -9,8 +9,8 @@
:target: https://anaconda.org/conda-forge/xopen
:alt:
-.. image:: https://codecov.io/gh/marcelm/xopen/branch/master/graph/badge.svg
- :target: https://codecov.io/gh/marcelm/xopen
+.. image:: https://codecov.io/gh/pycompression/xopen/branch/main/graph/badge.svg
+ :target: https://codecov.io/gh/pycompression/xopen
:alt:
=====
@@ -89,12 +89,22 @@ make reading and writing gzipped files faster.
Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for
format detection from content.
+Dries Schaumont <https://github.com/DriesSchaumont> contributed support for
+faster bz2 reading and writing using pbzip2.
+
Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_.
If you also want to open S3 files, you may want to use that module instead.
Changes
-------
+
+v1.2.0
+~~~~~~
+
+* `pbzip2 <http://compression.ca/pbzip2/>`_ is now used to open ``.bz2`` files if
+ ``threads`` is greater than zero.
+
v1.1.0
~~~~~~
* Python 3.5 support is dropped.
@@ -155,12 +165,12 @@ Contributors
* Marcel Martin
* Ruben Vorderman
-* For more contributors, see <https://github.com/marcelm/xopen/graphs/contributors>
+* For more contributors, see <https://github.com/pycompression/xopen/graphs/contributors>
Links
-----
-* `Source code <https://github.com/marcelm/xopen/>`_
-* `Report an issue <https://github.com/marcelm/xopen/issues>`_
+* `Source code <https://github.com/pycompression/xopen/>`_
+* `Report an issue <https://github.com/pycompression/xopen/issues>`_
* `Project page on PyPI (Python package index) <https://pypi.python.org/pypi/xopen/>`_
=====================================
debian/changelog
=====================================
@@ -1,3 +1,13 @@
+python-xopen (1.2.0-1) unstable; urgency=medium
+
+ * New upstream version 1.2.0
+ * d/copyright:
+ + Update copyright holder
+ + Change MIT => Expat
+ * Bump Standards-Version to 4.6.0 (no changes needed)
+
+ -- Nilesh Patra <nilesh at debian.org> Tue, 28 Sep 2021 21:23:58 +0530
+
python-xopen (1.1.0-1) unstable; urgency=medium
* New upstream version
=====================================
debian/control
=====================================
@@ -13,7 +13,7 @@ Build-Depends: debhelper-compat (= 13),
python3-setuptools-scm,
python3-pytest,
pigz
-Standards-Version: 4.5.1
+Standards-Version: 4.6.0
Vcs-Browser: https://salsa.debian.org/med-team/python-xopen
Vcs-Git: https://salsa.debian.org/med-team/python-xopen.git
Homepage: https://github.com/marcelm/xopen
=====================================
debian/copyright
=====================================
@@ -3,14 +3,14 @@ Upstream-Name: xopen
Source: https://github.com/marcelm/xopen/releases
Files: *
-Copyright: 2010-2016 Marcel Martin <mail at marcelm.net>
-License: MIT
+Copyright: 2010-2021 xopen developers
+License: Expat
Files: debian/*
Copyright: 2017 Andreas Tille <tille at debian.org>
-License: MIT
+License: Expat
-License: MIT
+License: Expat
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
=====================================
pyproject.toml
=====================================
@@ -1,2 +1,5 @@
[build-system]
-requires = ["setuptools", "wheel", "setuptools_scm"]
+requires = ["setuptools", "wheel", "setuptools_scm>=6.2"]
+
+[tool.setuptools_scm]
+write_to = "src/xopen/_version.py"
=====================================
setup.cfg
=====================================
@@ -1,16 +1,32 @@
-[bdist_wheel]
-universal = 1
+[metadata]
+name = xopen
+author = Marcel Martin et al.
+author_email = mail at marcelm.net
+url = https://github.com/pycompression/xopen/
+description = Open compressed files transparently
+long_description = file: README.rst
+license = MIT
+classifiers =
+ Development Status :: 5 - Production/Stable
+ License :: OSI Approved :: MIT License
+ Programming Language :: Python :: 3
-[coverage:run]
-parallel = True
-include =
- */site-packages/xopen/*
- tests/*
+[options]
+python_requires = >=3.6
+package_dir =
+ =src
+packages = find:
+install_requires =
+ isal>=0.9.0; platform_machine == "x86_64" or platform_machine == "AMD64" or platform_machine == "aarch64"
-[coverage:paths]
-source =
- src/
- **/site-packages/
+[options.packages.find]
+where = src
+
+[options.package_data]
+* = py.typed
+
+[options.extras_require]
+dev = pytest
[egg_info]
tag_build =
=====================================
setup.py
=====================================
@@ -1,30 +1,3 @@
-import sys
-from setuptools import setup, find_packages
+from setuptools import setup
-with open('README.rst') as f:
- long_description = f.read()
-
-setup(
- name='xopen',
- use_scm_version={'write_to': 'src/xopen/_version.py'},
- setup_requires=['setuptools_scm'], # Support pip versions that don't know about pyproject.toml
- author='Marcel Martin',
- author_email='mail at marcelm.net',
- url='https://github.com/marcelm/xopen/',
- description='Open compressed files transparently',
- long_description=long_description,
- license='MIT',
- package_dir={'': 'src'},
- packages=find_packages('src'),
- package_data={"xopen": ["py.typed"]},
- extras_require={
- 'dev': ['pytest'],
- ':sys_platform=="linux" and python_implementation != "PyPy"': ['isal>=0.3.0']
- },
- python_requires='>=3.6',
- classifiers=[
- "Development Status :: 5 - Production/Stable",
- "License :: OSI Approved :: MIT License",
- "Programming Language :: Python :: 3",
- ]
-)
+setup(setup_requires=["setuptools_scm"])
=====================================
src/xopen.egg-info/PKG-INFO
=====================================
@@ -1,181 +1,194 @@
Metadata-Version: 2.1
Name: xopen
-Version: 1.1.0
+Version: 1.2.0
Summary: Open compressed files transparently
-Home-page: https://github.com/marcelm/xopen/
-Author: Marcel Martin
+Home-page: https://github.com/pycompression/xopen/
+Author: Marcel Martin et al.
Author-email: mail at marcelm.net
License: MIT
-Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
- :target: https://travis-ci.org/marcelm/xopen
- :alt:
-
- .. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
- :target: https://pypi.python.org/pypi/xopen
-
- .. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg
- :target: https://anaconda.org/conda-forge/xopen
- :alt:
-
- .. image:: https://codecov.io/gh/marcelm/xopen/branch/master/graph/badge.svg
- :target: https://codecov.io/gh/marcelm/xopen
- :alt:
-
- =====
- xopen
- =====
-
- This small Python module provides an ``xopen`` function that works like the
- built-in ``open`` function, but can also deal with compressed files.
- Supported compression formats are gzip, bzip2 and xz. They are automatically
- recognized by their file extensions `.gz`, `.bz2` or `.xz`.
-
- The focus is on being as efficient as possible on all supported Python versions.
- For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``,
- to open ``.gz`` files, which is faster than using the built-in ``gzip.open``
- function. ``pigz`` can use multiple threads when compressing, but is also faster
- when reading ``.gz`` files, so it is used both for reading and writing if it is
- available. For gzip compression levels 1 to 3,
- `igzip <https://github.com/intel/isa-l/>`_ is used for an even greater speedup.
-
- For use cases where using only the main thread is desired xopen can be used
- with ``threads=0``. This will use `python-isal
- <https://github.com/pycompression/python-isal>`_ (which binds isa-l) if
- python-isal is installed (automatic on Linux systems, as it is a requirement).
- For installation instructions for python-isal please
- checkout the `python-isal homepage
- <https://github.com/pycompression/python-isal>`_. If python-isal is not
- available ``gzip.open`` is used.
-
- This module has originally been developed as part of the `Cutadapt
- tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
- manipulate sequencing data. It has been in successful use within that software
- for a few years.
-
- ``xopen`` is compatible with Python versions 3.6 and later.
-
-
- Usage
- -----
-
- Open a file for reading::
-
- from xopen import xopen
-
- with xopen('file.txt.xz') as f:
- content = f.read()
-
- Or without context manager::
-
- from xopen import xopen
-
- f = xopen('file.txt.xz')
- content = f.read()
- f.close()
-
- Open a file in binary mode for writing::
-
- from xopen import xopen
-
- with xopen('file.txt.gz', mode='wb') as f:
- f.write(b'Hello')
-
-
- Credits
- -------
-
- The name ``xopen`` was taken from the C function of the same name in the
- `utils.h file which is part of
- BWA <https://github.com/lh3/bwa/blob/83662032a2192d5712996f36069ab02db82acf67/utils.h>`_.
-
- Kyle Beauchamp <https://github.com/kyleabeauchamp/> has contributed support for
- appending to files.
-
- Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to
- make reading and writing gzipped files faster.
-
- Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for
- format detection from content.
-
- Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_.
- If you also want to open S3 files, you may want to use that module instead.
-
-
- Changes
- -------
- v1.1.0
- ~~~~~~
- * Python 3.5 support is dropped.
- * On Linux systems, `python-isal <https://github.com/pycompression/python-isal>`_
- is now added as a requirement. This will speed up the reading of gzip files
- significantly when no external processes are used.
-
- v1.0.0
- ~~~~~~
- * If installed, the ``igzip`` program (part of
- `Intel ISA-L <https://github.com/intel/isa-l/>`_) is now used for reading
- and writing gzip-compressed files at compression levels 1-3, which results
- in a significant speedup.
-
- v0.9.0
- ~~~~~~
- * When the file name extension of a file to be opened for reading is not
- available, the content is inspected (if possible) and used to determine
- which compression format applies.
- * This release drops Python 2.7 and 3.4 support. Python 3.5 or later is
- now required.
-
- v0.8.4
- ~~~~~~
- * When reading gzipped files, force ``pigz`` to use only a single process.
- ``pigz`` cannot use multiple cores anyway when decompressing. By default,
- it would use extra I/O processes, which slightly reduces wall-clock time,
- but increases CPU time. Single-core decompression with ``pigz`` is still
- about twice as fast as regular ``gzip``.
- * Allow ``threads=0`` for specifying that no external ``pigz``/``gzip``
- process should be used (then regular ``gzip.open()`` is used instead).
-
- v0.8.3
- ~~~~~~
- * When reading gzipped files, let ``pigz`` use at most four threads by default.
- This limit previously only applied when writing to a file.
- * Support Python 3.8
-
- v0.8.0
- ~~~~~~
- * Speed improvements when iterating over gzipped files.
-
- v0.6.0
- ~~~~~~
- * For reading from gzipped files, xopen will now use a ``pigz`` subprocess.
- This is faster than using ``gzip.open``.
- * Python 2 support will be dropped in one of the next releases.
-
- v0.5.0
- ~~~~~~
- * By default, pigz is now only allowed to use at most four threads. This hopefully reduces
- problems some users had with too many threads when opening many files at the same time.
- * xopen now accepts pathlib.Path objects.
-
-
- Contributors
- ------------
-
- * Marcel Martin
- * Ruben Vorderman
- * For more contributors, see <https://github.com/marcelm/xopen/graphs/contributors>
-
-
- Links
- -----
-
- * `Source code <https://github.com/marcelm/xopen/>`_
- * `Report an issue <https://github.com/marcelm/xopen/issues>`_
- * `Project page on PyPI (Python package index) <https://pypi.python.org/pypi/xopen/>`_
-
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.6
Provides-Extra: dev
+License-File: LICENSE
+
+.. image:: https://github.com/pycompression/xopen/workflows/CI/badge.svg
+ :target: https://github.com/pycompression/xopen
+ :alt:
+
+.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
+ :target: https://pypi.python.org/pypi/xopen
+
+.. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg
+ :target: https://anaconda.org/conda-forge/xopen
+ :alt:
+
+.. image:: https://codecov.io/gh/pycompression/xopen/branch/main/graph/badge.svg
+ :target: https://codecov.io/gh/pycompression/xopen
+ :alt:
+
+=====
+xopen
+=====
+
+This small Python module provides an ``xopen`` function that works like the
+built-in ``open`` function, but can also deal with compressed files.
+Supported compression formats are gzip, bzip2 and xz. They are automatically
+recognized by their file extensions `.gz`, `.bz2` or `.xz`.
+
+The focus is on being as efficient as possible on all supported Python versions.
+For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``,
+to open ``.gz`` files, which is faster than using the built-in ``gzip.open``
+function. ``pigz`` can use multiple threads when compressing, but is also faster
+when reading ``.gz`` files, so it is used both for reading and writing if it is
+available. For gzip compression levels 1 to 3,
+`igzip <https://github.com/intel/isa-l/>`_ is used for an even greater speedup.
+
+For use cases where using only the main thread is desired xopen can be used
+with ``threads=0``. This will use `python-isal
+<https://github.com/pycompression/python-isal>`_ (which binds isa-l) if
+python-isal is installed (automatic on Linux systems, as it is a requirement).
+For installation instructions for python-isal please
+checkout the `python-isal homepage
+<https://github.com/pycompression/python-isal>`_. If python-isal is not
+available ``gzip.open`` is used.
+
+This module has originally been developed as part of the `Cutadapt
+tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
+manipulate sequencing data. It has been in successful use within that software
+for a few years.
+
+``xopen`` is compatible with Python versions 3.6 and later.
+
+
+Usage
+-----
+
+Open a file for reading::
+
+ from xopen import xopen
+
+ with xopen('file.txt.xz') as f:
+ content = f.read()
+
+Or without context manager::
+
+ from xopen import xopen
+
+ f = xopen('file.txt.xz')
+ content = f.read()
+ f.close()
+
+Open a file in binary mode for writing::
+
+ from xopen import xopen
+
+ with xopen('file.txt.gz', mode='wb') as f:
+ f.write(b'Hello')
+
+
+Credits
+-------
+
+The name ``xopen`` was taken from the C function of the same name in the
+`utils.h file which is part of
+BWA <https://github.com/lh3/bwa/blob/83662032a2192d5712996f36069ab02db82acf67/utils.h>`_.
+
+Kyle Beauchamp <https://github.com/kyleabeauchamp/> has contributed support for
+appending to files.
+
+Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to
+make reading and writing gzipped files faster.
+
+Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for
+format detection from content.
+
+Dries Schaumont <https://github.com/DriesSchaumont> contributed support for
+faster bz2 reading and writing using pbzip2.
+
+Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_.
+If you also want to open S3 files, you may want to use that module instead.
+
+
+Changes
+-------
+
+v1.2.0
+~~~~~~
+
+* `pbzip2 <http://compression.ca/pbzip2/>`_ is now used to open ``.bz2`` files if
+ ``threads`` is greater than zero.
+
+v1.1.0
+~~~~~~
+* Python 3.5 support is dropped.
+* On Linux systems, `python-isal <https://github.com/pycompression/python-isal>`_
+ is now added as a requirement. This will speed up the reading of gzip files
+ significantly when no external processes are used.
+
+v1.0.0
+~~~~~~
+* If installed, the ``igzip`` program (part of
+ `Intel ISA-L <https://github.com/intel/isa-l/>`_) is now used for reading
+ and writing gzip-compressed files at compression levels 1-3, which results
+ in a significant speedup.
+
+v0.9.0
+~~~~~~
+* When the file name extension of a file to be opened for reading is not
+ available, the content is inspected (if possible) and used to determine
+ which compression format applies.
+* This release drops Python 2.7 and 3.4 support. Python 3.5 or later is
+ now required.
+
+v0.8.4
+~~~~~~
+* When reading gzipped files, force ``pigz`` to use only a single process.
+ ``pigz`` cannot use multiple cores anyway when decompressing. By default,
+ it would use extra I/O processes, which slightly reduces wall-clock time,
+ but increases CPU time. Single-core decompression with ``pigz`` is still
+ about twice as fast as regular ``gzip``.
+* Allow ``threads=0`` for specifying that no external ``pigz``/``gzip``
+ process should be used (then regular ``gzip.open()`` is used instead).
+
+v0.8.3
+~~~~~~
+* When reading gzipped files, let ``pigz`` use at most four threads by default.
+ This limit previously only applied when writing to a file.
+* Support Python 3.8
+
+v0.8.0
+~~~~~~
+* Speed improvements when iterating over gzipped files.
+
+v0.6.0
+~~~~~~
+* For reading from gzipped files, xopen will now use a ``pigz`` subprocess.
+ This is faster than using ``gzip.open``.
+* Python 2 support will be dropped in one of the next releases.
+
+v0.5.0
+~~~~~~
+* By default, pigz is now only allowed to use at most four threads. This hopefully reduces
+ problems some users had with too many threads when opening many files at the same time.
+* xopen now accepts pathlib.Path objects.
+
+
+Contributors
+------------
+
+* Marcel Martin
+* Ruben Vorderman
+* For more contributors, see <https://github.com/pycompression/xopen/graphs/contributors>
+
+
+Links
+-----
+
+* `Source code <https://github.com/pycompression/xopen/>`_
+* `Report an issue <https://github.com/pycompression/xopen/issues>`_
+* `Project page on PyPI (Python package index) <https://pypi.python.org/pypi/xopen/>`_
+
+
=====================================
src/xopen.egg-info/SOURCES.txt
=====================================
@@ -22,6 +22,7 @@ tests/file.txt.bz2
tests/file.txt.bz2.test
tests/file.txt.gz
tests/file.txt.gz.test
+tests/file.txt.test
tests/file.txt.xz
tests/file.txt.xz.test
tests/hello.gz
=====================================
src/xopen.egg-info/requires.txt
=====================================
@@ -1,6 +1,6 @@
-[:sys_platform=="linux" and python_implementation != "PyPy"]
-isal>=0.3.0
+[:platform_machine == "x86_64" or platform_machine == "AMD64" or platform_machine == "aarch64"]
+isal>=0.9.0
[dev]
pytest
=====================================
src/xopen/__init__.py
=====================================
@@ -2,31 +2,40 @@
Open compressed files transparently.
"""
-__all__ = ["xopen", "PipedGzipWriter", "PipedGzipReader", "__version__"]
+__all__ = [
+ "xopen",
+ "PipedGzipReader",
+ "PipedGzipWriter",
+ "PipedIGzipReader",
+ "PipedIGzipWriter",
+ "PipedPigzReader",
+ "PipedPigzWriter",
+ "PipedPBzip2Reader",
+ "PipedPBzip2Writer",
+ "PipedPythonIsalReader",
+ "PipedPythonIsalWriter",
+ "__version__",
+]
import gzip
import sys
import io
import os
import bz2
-import time
+import lzma
import stat
import signal
import pathlib
import subprocess
import tempfile
+import time
from abc import ABC, abstractmethod
from subprocess import Popen, PIPE, DEVNULL
-from typing import Optional, TextIO, AnyStr, IO
+from typing import Optional, TextIO, AnyStr, IO, List, Set
from ._version import version as __version__
-try:
- import lzma
-except ImportError:
- lzma = None # type: ignore
-
try:
from isal import igzip, isal_zlib # type: ignore
except ImportError:
@@ -44,27 +53,12 @@ except ImportError:
fcntl = None # type: ignore
_MAX_PIPE_SIZE_PATH = pathlib.Path("/proc/sys/fs/pipe-max-size")
-if _MAX_PIPE_SIZE_PATH.exists():
+try:
_MAX_PIPE_SIZE = int(_MAX_PIPE_SIZE_PATH.read_text()) # type: Optional[int]
-else:
+except OSError: # Catches file not found and permission errors. Possible other errors too.
_MAX_PIPE_SIZE = None
-try:
- from os import fspath # Exists in Python 3.6+
-except ImportError:
- def fspath(path): # type: ignore
- if hasattr(path, "__fspath__"):
- return path.__fspath__()
- # Python 3.4 and 3.5 have pathlib, but do not support the file system
- # path protocol
- if pathlib is not None and isinstance(path, pathlib.Path):
- return str(path)
- if not isinstance(path, str):
- raise TypeError("path must be a string")
- return path
-
-
def _available_cpu_count() -> int:
"""
Number of available virtual or physical CPUs on this system
@@ -148,16 +142,16 @@ class Closing(ABC):
@abstractmethod
def close(self):
- pass
+ """Called when exiting the context manager"""
class PipedCompressionWriter(Closing):
"""
Write Compressed files by running an external process and piping into it.
"""
- def __init__(self, path, program: str, mode='wt',
+ def __init__(self, path, program_args: List[str], mode='wt',
compresslevel: Optional[int] = None,
- threads_flag: str = None,
+ threads_flag: Optional[str] = None,
threads: Optional[int] = None):
"""
mode -- one of 'w', 'wt', 'wb', 'a', 'at', 'ab'
@@ -174,11 +168,11 @@ class PipedCompressionWriter(Closing):
# TODO use a context manager
self.outfile = open(path, mode)
- self.closed = False
- self.name = path
- self._mode = mode
- self._program = program
- self._threads_flag = threads_flag
+ self.closed: bool = False
+ self.name: str = path
+ self._mode: str = mode
+ self._program_args: List[str] = program_args
+ self._threads_flag: Optional[str] = threads_flag
if threads is None:
threads = min(_available_cpu_count(), 4)
@@ -202,16 +196,16 @@ class PipedCompressionWriter(Closing):
self.__class__.__name__,
self.name,
self._mode,
- self._program,
+ " ".join(self._program_args),
self._threads,
)
def _open_process(
self, mode: str, compresslevel: Optional[int], threads: int, outfile: TextIO,
) -> Popen:
- program_args = [self._program]
+ program_args: List[str] = self._program_args[:] # prevent list aliasing
if threads != 0 and self._threads_flag is not None:
- program_args += [self._threads_flag, str(threads)]
+ program_args += [f"{self._threads_flag}{threads}"]
extra_args = []
if 'w' in mode and compresslevel is not None:
extra_args += ['-' + str(compresslevel)]
@@ -240,7 +234,8 @@ class PipedCompressionWriter(Closing):
self.outfile.close()
if retcode != 0:
raise OSError(
- "Output {} process terminated with exit code {}".format(self._program, retcode))
+ "Output {} process terminated with exit code {}".format(
+ " ".join(self._program_args), retcode))
def __iter__(self): # type: ignore
# For compatibility with Pandas, which checks for an __iter__ method
@@ -256,10 +251,16 @@ class PipedCompressionReader(Closing):
Open a pipe to a process for reading a compressed file.
"""
+ # This exit code is not interpreted as an error when terminating the process
+ _allowed_exit_code: Optional[int] = -signal.SIGTERM
+ # If this message is printed on stderr on terminating the process,
+ # it is not interpreted as an error
+ _allowed_exit_message: Optional[bytes] = None
+
def __init__(
self,
path,
- program: str,
+ program_args: List[str],
mode: str = "r",
threads_flag: Optional[str] = None,
threads: Optional[int] = None,
@@ -269,8 +270,8 @@ class PipedCompressionReader(Closing):
"""
if mode not in ('r', 'rt', 'rb'):
raise ValueError("Mode is '{}', but it must be 'r', 'rt' or 'rb'".format(mode))
- self._program = program
- program_args = [program, '-cd', path]
+ self._program_args = program_args
+ program_args = program_args + ['-cd', path]
if threads_flag is not None:
if threads is None:
@@ -281,7 +282,7 @@ class PipedCompressionReader(Closing):
# using multiple threads while there is only a 10% gain in wall
# clock time.
threads = 1
- program_args += [threads_flag, str(threads)]
+ program_args += [f"{threads_flag}{threads}"]
self._threads = threads
self.process = Popen(program_args, stdout=PIPE, stderr=PIPE)
self.name = path
@@ -291,15 +292,11 @@ class PipedCompressionReader(Closing):
self._mode = mode
if 'b' not in mode:
- self._file = io.TextIOWrapper(self.process.stdout) # type: IO
+ self._file: IO = io.TextIOWrapper(self.process.stdout)
else:
self._file = self.process.stdout
- assert self.process.stderr is not None
- self._stderr = io.TextIOWrapper(self.process.stderr)
self.closed = False
- # Give the subprocess a little bit of time to report any errors (such as
- # a non-existing file)
- time.sleep(0.01)
+ self._wait_for_output_or_process_exit()
self._raise_if_error()
def __repr__(self):
@@ -307,7 +304,7 @@ class PipedCompressionReader(Closing):
self.__class__.__name__,
self.name,
self._mode,
- self._program,
+ " ".join(self._program_args),
self._threads,
)
@@ -316,16 +313,14 @@ class PipedCompressionReader(Closing):
return
self.closed = True
retcode = self.process.poll()
+ check_allowed_code_and_message = False
if retcode is None:
# still running
self.process.terminate()
- allow_sigterm = True
- else:
- allow_sigterm = False
- self.process.wait()
+ check_allowed_code_and_message = True
+ _, stderr_message = self.process.communicate()
self._file.close()
- self._raise_if_error(allow_sigterm=allow_sigterm)
- self._stderr.close()
+ self._raise_if_error(check_allowed_code_and_message, stderr_message)
def __iter__(self):
return self
@@ -333,21 +328,60 @@ class PipedCompressionReader(Closing):
def __next__(self) -> AnyStr:
return self._file.__next__()
- def _raise_if_error(self, allow_sigterm: bool = False) -> None:
+ def _wait_for_output_or_process_exit(self):
+ """
+ Wait for the process to produce at least some output, or has exited.
"""
- Raise IOError if process is not running anymore and the exit code is
- nonzero. If allow_sigterm is set and a SIGTERM exit code is
- encountered, no error is raised.
+ # The program may crash due to a non-existing file, internal error etc.
+ # In that case we need to check. However the 'time-to-crash' differs
+ # between programs. Some crash faster than others.
+ # Therefore we peek the first character(s) of stdout. Peek will return at
+ # least one byte of data, unless the buffer is empty or at EOF. If at EOF,
+ # we should wait for the program to exit. This way we ensure the program
+ # has at least decompressed some output, or stopped before we continue.
+
+ # stdout is io.BufferedReader if set to PIPE
+ while True:
+ first_output = self.process.stdout.peek(1) # type: ignore
+ if first_output or self.process.poll() is not None:
+ break
+ time.sleep(0.01)
+
+ def _raise_if_error(self, check_allowed_code_and_message: bool = False,
+ stderr_message: bytes = b"") -> None:
+ """
+ Raise OSError if process is not running anymore and the exit code is
+ nonzero. If check_allowed_code_and_message is set, OSError is not raised when
+ (1) the exit value of the process is equal to the value of the allowed_exit_code
+ attribute or (2) the allowed_exit_message attribute is set and it matches with
+ stderr_message.
"""
retcode = self.process.poll()
- if (
- retcode is not None and retcode != 0
- and not (allow_sigterm and retcode == -signal.SIGTERM)
- ):
- message = self._stderr.read().strip()
- self._file.close()
- self._stderr.close()
- raise OSError("{} (exit code {})".format(message, retcode))
+
+ if retcode is None:
+ # process still running
+ return
+ if retcode == 0:
+ # process terminated successfully
+ return
+
+ if check_allowed_code_and_message:
+ if retcode == self._allowed_exit_code:
+ # terminated with allowed exit code
+ return
+ if (
+ self._allowed_exit_message
+ and stderr_message.startswith(self._allowed_exit_message)
+ ):
+ # terminated with another exit code, but message is allowed
+ return
+
+ assert self.process.stderr is not None
+ if not stderr_message:
+ stderr_message = self.process.stderr.read()
+
+ self._file.close()
+ raise OSError("{!r} (exit code {})".format(stderr_message, retcode))
def read(self, *args) -> AnyStr:
return self._file.read(*args)
@@ -362,7 +396,10 @@ class PipedCompressionReader(Closing):
return self._file.seekable()
def peek(self, n: int = None):
- return self._file.peek(n) # type: ignore
+ if hasattr(self._file, "peek"):
+ return self._file.peek(n) # type: ignore
+ else:
+ raise AttributeError("Peek is not available when 'b' not in mode")
def readable(self) -> bool:
return self._file.readable()
@@ -375,6 +412,34 @@ class PipedCompressionReader(Closing):
class PipedGzipReader(PipedCompressionReader):
+ """
+ Open a pipe to gzip for reading a gzipped file.
+ """
+ def __init__(self, path, mode: str = "r"):
+ super().__init__(path, ["gzip"], mode)
+
+
+class PipedGzipWriter(PipedCompressionWriter):
+ """
+ Write gzip-compressed files by running an external gzip process and
+ piping into it. On Python 3, gzip.GzipFile is on par with gzip itself,
+ but running an external gzip can still reduce wall-clock time because
+ the compression happens in a separate process.
+ """
+ def __init__(self, path, mode: str = "wt", compresslevel: Optional[int] = None):
+ """
+ mode -- one of 'w', 'wt', 'wb', 'a', 'at', 'ab'
+ compresslevel -- compression level
+ threads (int) -- number of pigz threads. If this is set to None, a reasonable default is
+ used. At the moment, this means that the number of available CPU cores is used, capped
+ at four to avoid creating too many threads. Use 0 to let pigz use all available cores.
+ """
+ if compresslevel is not None and compresslevel not in range(1, 10):
+ raise ValueError("compresslevel must be between 1 and 9")
+ super().__init__(path, ["gzip"], mode, compresslevel, None)
+
+
+class PipedPigzReader(PipedCompressionReader):
"""
Open a pipe to pigz for reading a gzipped file. Even though pigz is mostly
used to speed up writing by using many compression threads, it is
@@ -382,21 +447,18 @@ class PipedGzipReader(PipedCompressionReader):
(ca. 2x speedup).
"""
def __init__(self, path, mode: str = "r", threads: Optional[int] = None):
- try:
- super().__init__(path, "pigz", mode, "-p", threads)
- except OSError:
- super().__init__(path, "gzip", mode, None, threads)
+ super().__init__(path, ["pigz"], mode, "-p", threads)
-class PipedGzipWriter(PipedCompressionWriter):
+class PipedPigzWriter(PipedCompressionWriter):
"""
- Write gzip-compressed files by running an external gzip or pigz process and
- piping into it. pigz is tried first. It is fast because it can compress using
- multiple cores. Also it is more efficient on one core.
- If pigz is not available, a gzip subprocess is used. On Python 3, gzip.GzipFile is on
- par with gzip itself, but running an external gzip can still reduce wall-clock
- time because the compression happens in a separate process.
+ Write gzip-compressed files by running an external pigz process and
+ piping into it. pigz can compress using multiple cores. It is also more
+ efficient than gzip on only one core. (But then igzip is even faster and
+ should be preferred if the compression level allows it.)
"""
+ _accepted_compression_levels: Set[int] = set(list(range(10)) + [11])
+
def __init__(
self,
path,
@@ -411,12 +473,37 @@ class PipedGzipWriter(PipedCompressionWriter):
used. At the moment, this means that the number of available CPU cores is used, capped
at four to avoid creating too many threads. Use 0 to let pigz use all available cores.
"""
- if compresslevel is not None and compresslevel not in range(1, 10):
- raise ValueError("compresslevel must be between 1 and 9")
- try:
- super().__init__(path, "pigz", mode, compresslevel, "-p", threads)
- except OSError:
- super().__init__(path, "gzip", mode, compresslevel, None, threads)
+ if compresslevel is not None and compresslevel not in self._accepted_compression_levels:
+ raise ValueError("compresslevel must be between 0 and 9 or 11")
+ super().__init__(path, ["pigz"], mode, compresslevel, "-p", threads)
+
+
+class PipedPBzip2Reader(PipedCompressionReader):
+ """
+ Open a pipe to pbzip2 for reading a bzipped file.
+ """
+
+ _allowed_exit_code = None
+ _allowed_exit_message = b"\n *Control-C or similar caught [sig=15], quitting..."
+
+ def __init__(self, path, mode: str = "r", threads: Optional[int] = None):
+ super().__init__(path, ["pbzip2"], mode, "-p", threads)
+
+
+class PipedPBzip2Writer(PipedCompressionWriter):
+ """
+ Write bzip2-compressed files by running an external pbzip2 process and
+ piping into it. pbzip2 can compress using multiple cores.
+ """
+
+ def __init__(
+ self,
+ path,
+ mode: str = "wt",
+ threads: Optional[int] = None,
+ ):
+ # Use default compression level for pbzip2: 9
+ super().__init__(path, ["pbzip2"], mode, 9, "-p", threads)
class PipedIGzipReader(PipedCompressionReader):
@@ -435,7 +522,7 @@ class PipedIGzipReader(PipedCompressionReader):
"This version of igzip does not support reading "
"concatenated gzip files and is therefore not "
"safe to use. See: https://github.com/intel/isa-l/issues/143")
- super().__init__(path, "igzip", mode)
+ super().__init__(path, ["igzip"], mode)
class PipedIGzipWriter(PipedCompressionWriter):
@@ -455,7 +542,19 @@ class PipedIGzipWriter(PipedCompressionWriter):
def __init__(self, path, mode: str = "wt", compresslevel: Optional[int] = None):
if compresslevel is not None and compresslevel not in range(0, 4):
raise ValueError("compresslevel must be between 0 and 3")
- super().__init__(path, "igzip", mode, compresslevel)
+ super().__init__(path, ["igzip"], mode, compresslevel)
+
+
+class PipedPythonIsalReader(PipedCompressionReader):
+ def __init__(self, path, mode: str = "r"):
+ super().__init__(path, [sys.executable, "-m", "isal.igzip"], mode)
+
+
+class PipedPythonIsalWriter(PipedCompressionWriter):
+ def __init__(self, path, mode: str = "wt", compresslevel: Optional[int] = None):
+ if compresslevel is not None and compresslevel not in range(0, 4):
+ raise ValueError("compresslevel must be between 0 and 3")
+ super().__init__(path, [sys.executable, "-m", "isal.igzip"], mode, compresslevel)
def _open_stdin_or_out(mode: str) -> IO:
@@ -465,38 +564,64 @@ def _open_stdin_or_out(mode: str) -> IO:
return open(std.fileno(), mode=mode, closefd=False)
-def _open_bz2(filename, mode: str) -> IO:
+def _open_bz2(filename, mode: str, threads: Optional[int]):
+ if threads != 0:
+ try:
+ if "r" in mode:
+ return PipedPBzip2Reader(filename, mode, threads)
+ else:
+ return PipedPBzip2Writer(filename, mode, threads)
+ except OSError:
+ pass # We try without threads.
+
return bz2.open(filename, mode)
def _open_xz(filename, mode: str) -> IO:
- if lzma is None:
- raise ImportError(
- "Cannot open xz files: The lzma module is not available (use Python 3.3 or newer)")
return lzma.open(filename, mode)
-def _open_gz_external(filename, mode, compresslevel, threads):
- if 'r' in mode:
- try:
- return PipedIGzipReader(filename, mode)
- except (OSError, ValueError):
- # No igzip installed or version does not support reading
- # concatenated files.
- return PipedGzipReader(filename, mode, threads=threads)
- else:
+def _open_external_gzip_reader(filename, mode, compresslevel, threads):
+ assert "r" in mode
+ try:
+ return PipedIGzipReader(filename, mode)
+ except (OSError, ValueError):
+ # No igzip installed or version does not support reading
+ # concatenated files.
+ pass
+ if igzip:
+ return PipedPythonIsalReader(filename, mode)
+ try:
+ return PipedPigzReader(filename, mode, threads=threads)
+ except OSError:
+ return PipedGzipReader(filename, mode)
+
+
+def _open_external_gzip_writer(filename, mode, compresslevel, threads):
+ assert "r" not in mode
+ try:
+ return PipedIGzipWriter(filename, mode, compresslevel)
+ except (OSError, ValueError):
+ # No igzip installed or compression level higher than 3
+ pass
+ if igzip: # We can use the CLI from isal.igzip
try:
- return PipedIGzipWriter(filename, mode, compresslevel)
- except (OSError, ValueError):
- # No igzip installed or compression level higher than 3
- return PipedGzipWriter(filename, mode, compresslevel,
- threads=threads)
+ return PipedPythonIsalWriter(filename, mode, compresslevel)
+ except ValueError: # Wrong compression level
+ pass
+ try:
+ return PipedPigzWriter(filename, mode, compresslevel, threads=threads)
+ except OSError:
+ return PipedGzipWriter(filename, mode, compresslevel)
def _open_gz(filename, mode: str, compresslevel, threads):
if threads != 0:
try:
- return _open_gz_external(filename, mode, compresslevel, threads)
+ if "r" in mode:
+ return _open_external_gzip_reader(filename, mode, compresslevel, threads)
+ else:
+ return _open_external_gzip_writer(filename, mode, compresslevel, threads)
except OSError:
pass # We try without threads.
@@ -570,17 +695,18 @@ def xopen(
bzip2 and xz. If the filename is '-', standard output (mode 'w') or
standard input (mode 'r') is returned.
- The file type is determined based on the filename: .gz is gzip, .bz2 is bzip2, .xz is
- xz/lzma and no compression assumed otherwise.
+ When writing, the file format is chosen based on the file name extension:
+ - .gz uses gzip compression
+ - .bz2 uses bzip2 compression
+ - .xz uses xz/lzma compression
+ - otherwise, no compression is used
+
+ When reading, if a file name extension is available, the format is detected
+ using it, but if not, the format is detected from the contents.
mode can be: 'rt', 'rb', 'at', 'ab', 'wt', or 'wb'. Also, the 't' can be omitted,
so instead of 'rt', 'wt' and 'at', the abbreviations 'r', 'w' and 'a' can be used.
- In Python 2, the 't' and 'b' characters are ignored.
-
- Append mode ('a', 'at', 'ab') is not available with BZ2 compression and
- will raise an error.
-
compresslevel is the compression level for writing to gzip files.
This parameter is ignored for the other compression formats. If set to
None (default), level 6 is used.
@@ -596,7 +722,7 @@ def xopen(
mode += 't'
if mode not in ('rt', 'rb', 'wt', 'wb', 'at', 'ab'):
raise ValueError("Mode '{}' not supported".format(mode))
- filename = fspath(filename)
+ filename = os.fspath(filename)
if filename == '-':
return _open_stdin_or_out(mode)
@@ -610,6 +736,6 @@ def xopen(
elif detected_format == "xz":
return _open_xz(filename, mode)
elif detected_format == "bz2":
- return _open_bz2(filename, mode)
+ return _open_bz2(filename, mode, threads)
else:
return open(filename, mode)
=====================================
src/xopen/_version.py
=====================================
@@ -1,5 +1,5 @@
# coding: utf-8
# file generated by setuptools_scm
# don't change, don't track in version control
-version = '1.1.0'
-version_tuple = (1, 1, 0)
+version = '1.2.0'
+version_tuple = (1, 2, 0)
=====================================
tests/file.txt.test
=====================================
@@ -0,0 +1,2 @@
+Testing, testing ...
+The second line.
=====================================
tests/test_xopen.py
=====================================
@@ -1,3 +1,6 @@
+import gzip
+import bz2
+import lzma
import io
import os
import random
@@ -7,17 +10,28 @@ import sys
import time
import pytest
from pathlib import Path
-
-from xopen import xopen, PipedCompressionWriter, PipedGzipReader, \
- PipedGzipWriter, _MAX_PIPE_SIZE, _can_read_concatenated_gz
-
-extensions = ["", ".gz", ".bz2"]
-
-try:
- import lzma
- extensions.append(".xz")
-except ImportError:
- lzma = None
+from contextlib import contextmanager
+from itertools import cycle
+
+from xopen import (
+ xopen,
+ PipedCompressionReader,
+ PipedCompressionWriter,
+ PipedGzipReader,
+ PipedGzipWriter,
+ PipedPBzip2Reader,
+ PipedPBzip2Writer,
+ PipedPigzReader,
+ PipedPigzWriter,
+ PipedIGzipReader,
+ PipedIGzipWriter,
+ PipedPythonIsalReader,
+ PipedPythonIsalWriter,
+ _MAX_PIPE_SIZE,
+ _can_read_concatenated_gz,
+ igzip,
+)
+extensions = ["", ".gz", ".bz2", ".xz"]
try:
import fcntl
@@ -32,6 +46,58 @@ CONTENT_LINES = ['Testing, testing ...\n', 'The second line.\n']
CONTENT = ''.join(CONTENT_LINES)
+def available_gzip_readers_and_writers():
+ readers = [
+ klass for prog, klass in [
+ ("gzip", PipedGzipReader),
+ ("pigz", PipedPigzReader),
+ ("igzip", PipedIGzipReader),
+ ]
+ if shutil.which(prog)
+ ]
+ if PipedIGzipReader in readers and not _can_read_concatenated_gz("igzip"):
+ readers.remove(PipedIGzipReader)
+
+ writers = [
+ klass for prog, klass in [
+ ("gzip", PipedGzipWriter),
+ ("pigz", PipedPigzWriter),
+ ("igzip", PipedIGzipWriter),
+ ]
+ if shutil.which(prog)
+ ]
+ if igzip is not None:
+ readers.append(PipedPythonIsalReader)
+ writers.append(PipedPythonIsalWriter)
+ return readers, writers
+
+
+PIPED_GZIP_READERS, PIPED_GZIP_WRITERS = available_gzip_readers_and_writers()
+
+
+def available_bzip2_readers_and_writers():
+ if shutil.which("pbzip2"):
+ return [PipedPBzip2Reader], [PipedPBzip2Writer]
+ return [], []
+
+
+PIPED_BZIP2_READERS, PIPED_BZIP2_WRITERS = available_bzip2_readers_and_writers()
+
+ALL_READERS_WITH_EXTENSION = list(zip(PIPED_GZIP_READERS, cycle([".gz"]))) + \
+ list(zip(PIPED_BZIP2_READERS, cycle([".bz2"])))
+ALL_WRITERS_WITH_EXTENSION = list(zip(PIPED_GZIP_WRITERS, cycle([".gz"]))) + \
+ list(zip(PIPED_BZIP2_WRITERS, cycle([".bz2"])))
+
+
+THREADED_READERS = set([(PipedPigzReader, ".gz"), (PipedPBzip2Reader, ".bz2")]) & \
+ set(ALL_READERS_WITH_EXTENSION)
+
+
+ at pytest.fixture(params=PIPED_GZIP_WRITERS)
+def gzip_writer(request):
+ return request.param
+
+
@pytest.fixture(params=extensions)
def ext(request):
return request.param
@@ -42,40 +108,81 @@ def fname(request):
return request.param
- at pytest.fixture
-def lacking_pigz_permissions(tmp_path):
+ at pytest.fixture(params=ALL_READERS_WITH_EXTENSION)
+def reader(request):
+ return request.param
+
+
+ at pytest.fixture(params=THREADED_READERS)
+def threaded_reader(request):
+ return request.param
+
+
+ at pytest.fixture(params=ALL_WRITERS_WITH_EXTENSION)
+def writer(request):
+ return request.param
+
+
+ at contextmanager
+def disable_binary(tmp_path, binary_name):
"""
- Set PATH to a directory that contains a pigz binary with permissions set to 000.
- If no suitable pigz binary could be found, PATH is set to an empty directory
+ Find the location of the binary by its name, then set PATH to a directory that contains
+ the binary with permissions set to 000. If no suitable binary could be found,
+ PATH is set to an empty directory
"""
- pigz_path = shutil.which("pigz")
- if pigz_path:
- shutil.copy(pigz_path, str(tmp_path))
- os.chmod(str(tmp_path / "pigz"), 0)
+ try:
+ binary_path = shutil.which(binary_name)
+ if binary_path:
+ shutil.copy(binary_path, str(tmp_path))
+ os.chmod(str(tmp_path / binary_name), 0)
+ path = os.environ["PATH"]
+ os.environ["PATH"] = str(tmp_path)
+ yield
+ finally:
+ os.environ["PATH"] = path
+
+
+ at pytest.fixture
+def lacking_pigz_permissions(tmp_path):
+ with disable_binary(tmp_path, "pigz"):
+ yield
+
+
+ at pytest.fixture
+def lacking_pbzip2_permissions(tmp_path):
+ with disable_binary(tmp_path, "pbzip2"):
+ yield
+
- path = os.environ["PATH"]
- os.environ["PATH"] = str(tmp_path)
- yield
- os.environ["PATH"] = path
+ at pytest.fixture(params=[1024, 2048, 4096])
+def create_large_file(tmpdir, request):
+ def _create_large_file(extension):
+ path = str(tmpdir.join(f"large{extension}"))
+ random_text = ''.join(random.choice('ABCDEFGHIJKLMNOPQRSTUVWXYZ\n') for _ in range(1024))
+ # Make the text a lot bigger in order to ensure that it is larger than the
+ # pipe buffer size.
+ random_text *= request.param
+ with xopen(path, 'w') as f:
+ f.write(random_text)
+ return path
+ return _create_large_file
@pytest.fixture
-def large_gzip(tmpdir):
- path = str(tmpdir.join("large.gz"))
- random_text = ''.join(random.choice('ABCDEFGHIJKLMNOPQRSTUVWXYZ\n') for _ in range(1024))
- # Make the text a lot bigger in order to ensure that it is larger than the
- # pipe buffer size.
- random_text *= 1024
- with xopen(path, 'w') as f:
- f.write(random_text)
- return path
+def create_truncated_file(create_large_file):
+ def _create_truncated_file(extension):
+ large_file = create_large_file(extension)
+ with open(large_file, 'a') as f:
+ f.truncate(os.stat(large_file).st_size - 10)
+ return large_file
+ return _create_truncated_file
@pytest.fixture
-def truncated_gzip(large_gzip):
- with open(large_gzip, 'a') as f:
- f.truncate(os.stat(large_gzip).st_size - 10)
- return large_gzip
+def xopen_without_igzip(monkeypatch):
+ import xopen # xopen local overrides xopen global variable
+ monkeypatch.setattr(xopen, "igzip", None)
+ return xopen.xopen
def test_xopen_text(fname):
@@ -92,6 +199,20 @@ def test_xopen_binary(fname):
assert lines[1] == b'The second line.\n', fname
+def test_xopen_binary_no_isal_no_threads(fname, xopen_without_igzip):
+ with xopen_without_igzip(fname, 'rb', threads=0) as f:
+ lines = list(f)
+ assert len(lines) == 2
+ assert lines[1] == b'The second line.\n', fname
+
+
+def test_xopen_binary_no_isal(fname, xopen_without_igzip):
+ with xopen_without_igzip(fname, 'rb', threads=1) as f:
+ lines = list(f)
+ assert len(lines) == 2
+ assert lines[1] == b'The second line.\n', fname
+
+
def test_no_context_manager_text(fname):
f = xopen(fname, 'rt')
lines = list(f)
@@ -111,7 +232,6 @@ def test_no_context_manager_binary(fname):
def test_readinto(fname):
- # Test whether .readinto() works
content = CONTENT.encode('utf-8')
with xopen(fname, 'rb') as f:
b = bytearray(len(content) + 100)
@@ -120,29 +240,25 @@ def test_readinto(fname):
assert b[:length] == content
-def test_pipedgzipreader_readinto():
- # Test whether PipedGzipReader.readinto works
+def test_reader_readinto(reader):
+ opener, extension = reader
content = CONTENT.encode('utf-8')
- with PipedGzipReader("tests/file.txt.gz", "rb") as f:
+ with opener(f"tests/file.txt{extension}", "rb") as f:
b = bytearray(len(content) + 100)
length = f.readinto(b)
assert length == len(content)
assert b[:length] == content
-def test_pipedgzipreader_textiowrapper():
- with PipedGzipReader("tests/file.txt.gz", "rb") as f:
+def test_reader_textiowrapper(reader):
+ opener, extension = reader
+ with opener(f"tests/file.txt{extension}", "rb") as f:
wrapped = io.TextIOWrapper(f)
assert wrapped.read() == CONTENT
-def test_detect_gzip_file_format_from_content():
- with xopen("tests/file.txt.gz.test", "rb") as fh:
- assert fh.readline() == CONTENT_LINES[0].encode("utf-8")
-
-
-def test_detect_bz2_file_format_from_content():
- with xopen("tests/file.txt.bz2.test", "rb") as fh:
+def test_detect_file_format_from_content(ext):
+ with xopen(f"tests/file.txt{ext}.test", "rb") as fh:
assert fh.readline() == CONTENT_LINES[0].encode("utf-8")
@@ -157,20 +273,23 @@ def test_readline_text(fname):
assert f.readline() == CONTENT_LINES[0]
-def test_readline_pipedgzipreader():
+def test_reader_readline(reader):
+ opener, extension = reader
first_line = CONTENT_LINES[0].encode('utf-8')
- with PipedGzipReader("tests/file.txt.gz", "rb") as f:
+ with opener(f"tests/file.txt{extension}", "rb") as f:
assert f.readline() == first_line
-def test_readline_text_pipedgzipreader():
- with PipedGzipReader("tests/file.txt.gz", "r") as f:
+def test_reader_readline_text(reader):
+ opener, extension = reader
+ with opener(f"tests/file.txt{extension}", "r") as f:
assert f.readline() == CONTENT_LINES[0]
@pytest.mark.parametrize("threads", [None, 1, 2])
-def test_pipedgzipreader_iter(threads):
- with PipedGzipReader("tests/file.txt.gz", mode="r", threads=threads) as f:
+def test_piped_reader_iter(threads, threaded_reader):
+ opener, extension = threaded_reader
+ with opener(f"tests/file.txt{extension}", mode="r", threads=threads) as f:
lines = list(f)
assert lines[0] == CONTENT_LINES[0]
@@ -188,8 +307,9 @@ def test_xopen_has_iter_method(ext, tmpdir):
assert hasattr(f, '__iter__')
-def test_pipedgzipwriter_has_iter_method(tmpdir):
- with PipedGzipWriter(str(tmpdir.join("out.gz"))) as f:
+def test_writer_has_iter_method(tmpdir, writer):
+ opener, extension = writer
+ with opener(str(tmpdir.join(f"out.{extension}"))) as f:
assert hasattr(f, '__iter__')
@@ -200,20 +320,24 @@ def test_iter_without_with(fname):
f.close()
-def test_pipedgzipreader_iter_without_with():
- it = iter(PipedGzipReader("tests/file.txt.gz"))
+def test_reader_iter_without_with(reader):
+ opener, extension = reader
+ it = iter(opener(f"tests/file.txt{extension}"))
assert CONTENT_LINES[0] == next(it)
@pytest.mark.parametrize("mode", ["rb", "rt"])
-def test_pipedgzipreader_close(large_gzip, mode):
- with PipedGzipReader(large_gzip, mode=mode) as f:
+def test_reader_close(mode, reader, create_large_file):
+ reader, extension = reader
+ large_file = create_large_file(extension)
+ with reader(large_file, mode=mode) as f:
f.readline()
time.sleep(0.2)
# The subprocess should be properly terminated now
-def test_partial_gzip_iteration_closes_correctly(large_gzip):
+ at pytest.mark.parametrize("extension", [".gz", ".bz2"])
+def test_partial_iteration_closes_correctly(extension, create_large_file):
class LineReader:
def __init__(self, file):
self.file = xopen(file, "rb")
@@ -221,8 +345,8 @@ def test_partial_gzip_iteration_closes_correctly(large_gzip):
def __iter__(self):
wrapper = io.TextIOWrapper(self.file)
yield from wrapper
-
- f = LineReader(large_gzip)
+ large_file = create_large_file(extension)
+ f = LineReader(large_file)
next(iter(f))
f.file.close()
@@ -239,9 +363,9 @@ def test_write_to_nonexisting_dir(ext):
pass # pragma: no cover
-def test_invalid_mode():
+def test_invalid_mode(ext):
with pytest.raises(ValueError):
- with xopen("tests/file.txt.gz", mode="hallo"):
+ with xopen(f"tests/file.txt.{ext}", mode="hallo"):
pass # pragma: no cover
@@ -256,7 +380,16 @@ def test_invalid_compression_level(tmpdir):
with pytest.raises(ValueError) as e:
with xopen(path, mode="w", compresslevel=17) as f:
f.write("hello") # pragma: no cover
- assert "between 1 and 9" in e.value.args[0]
+ assert "compresslevel must be" in e.value.args[0]
+
+
+def test_invalid_compression_level_writers(gzip_writer, tmpdir):
+ # Currently only gzip writers handle compression levels
+ path = str(tmpdir.join("out.gz"))
+ with pytest.raises(ValueError) as e:
+ with gzip_writer(path, mode="w", compresslevel=17) as f:
+ f.write("hello") # pragma: no cover
+ assert "compresslevel must be" in e.value.args[0]
@pytest.mark.parametrize("ext", extensions)
@@ -310,34 +443,42 @@ class timeout:
signal.alarm(0)
-def test_truncated_gz(truncated_gzip):
+ at pytest.mark.parametrize("extension", [".gz", ".bz2"])
+def test_truncated_file(extension, create_truncated_file):
+ truncated_file = create_truncated_file(extension)
with timeout(seconds=2):
with pytest.raises((EOFError, IOError)):
- f = xopen(truncated_gzip, "r")
+ f = xopen(truncated_file, "r")
f.read()
f.close() # pragma: no cover
-def test_truncated_gz_iter(truncated_gzip):
+ at pytest.mark.parametrize("extension", [".gz", ".bz2"])
+def test_truncated_iter(extension, create_truncated_file):
+ truncated_file = create_truncated_file(extension)
with timeout(seconds=2):
with pytest.raises((EOFError, IOError)):
- f = xopen(truncated_gzip, 'r')
+ f = xopen(truncated_file, 'r')
for line in f:
pass
f.close() # pragma: no cover
-def test_truncated_gz_with(truncated_gzip):
+ at pytest.mark.parametrize("extension", [".gz", ".bz2"])
+def test_truncated_with(extension, create_truncated_file):
+ truncated_file = create_truncated_file(extension)
with timeout(seconds=2):
with pytest.raises((EOFError, IOError)):
- with xopen(truncated_gzip, 'r') as f:
+ with xopen(truncated_file, 'r') as f:
f.read()
-def test_truncated_gz_iter_with(truncated_gzip):
+ at pytest.mark.parametrize("extension", [".gz", ".bz2"])
+def test_truncated_iter_with(extension, create_truncated_file):
+ truncated_file = create_truncated_file(extension)
with timeout(seconds=2):
with pytest.raises((EOFError, IOError)):
- with xopen(truncated_gzip, 'r') as f:
+ with xopen(truncated_file, 'r') as f:
for line in f:
pass
@@ -347,29 +488,57 @@ def test_bare_read_from_gz():
assert f.read() == 'hello'
-def test_read_piped_gzip():
- with PipedGzipReader('tests/hello.gz', 'rt') as f:
- assert f.read() == 'hello'
+def test_readers_read(reader):
+ opener, extension = reader
+ with opener(f'tests/file.txt{extension}', 'rt') as f:
+ assert f.read() == CONTENT
-def test_write_pigz_threads(tmpdir):
- path = str(tmpdir.join('out.gz'))
+def test_write_threads(tmpdir, ext):
+ path = str(tmpdir.join(f'out.{ext}'))
with xopen(path, mode='w', threads=3) as f:
f.write('hello')
with xopen(path) as f:
assert f.read() == 'hello'
-def test_read_gzip_no_threads():
- import gzip
- with xopen("tests/hello.gz", "rb", threads=0) as f:
- assert isinstance(f, gzip.GzipFile), f
+def test_write_pigz_threads_no_isal(tmpdir, xopen_without_igzip):
+ path = str(tmpdir.join('out.gz'))
+ with xopen_without_igzip(path, mode='w', threads=3) as f:
+ f.write('hello')
+ with xopen_without_igzip(path) as f:
+ assert f.read() == 'hello'
+
+
+def test_read_no_threads(ext):
+ klasses = {
+ ".bz2": bz2.BZ2File,
+ ".gz": gzip.GzipFile,
+ ".xz": lzma.LZMAFile,
+ "": io.BufferedReader,
+ }
+ klass = klasses[ext]
+ with xopen(f"tests/file.txt{ext}", "rb", threads=0) as f:
+ assert isinstance(f, klass), f
+
+
+def test_write_no_threads(tmpdir, ext):
+ klasses = {
+ ".bz2": bz2.BZ2File,
+ ".gz": gzip.GzipFile,
+ ".xz": lzma.LZMAFile,
+ "": io.BufferedWriter,
+ }
+ klass = klasses[ext]
+ path = str(tmpdir.join(f"out.{ext}"))
+ with xopen(path, "wb", threads=0) as f:
+ assert isinstance(f, klass), f
-def test_write_gzip_no_threads(tmpdir):
+def test_write_gzip_no_threads_no_isal(tmpdir, xopen_without_igzip):
import gzip
path = str(tmpdir.join("out.gz"))
- with xopen(path, "wb", threads=0) as f:
+ with xopen_without_igzip(path, "wb", threads=0) as f:
assert isinstance(f, gzip.GzipFile), f
@@ -417,13 +586,6 @@ def test_write_pathlib_binary(ext, tmpdir):
assert f.read() == b'hello'
-# lzma doesn’t work on PyPy3 at the moment
-if lzma is not None:
- def test_detect_xz_file_format_from_content():
- with xopen("tests/file.txt.xz.test", "rb") as fh:
- assert fh.readline() == CONTENT_LINES[0].encode("utf-8")
-
-
def test_concatenated_gzip_function():
assert _can_read_concatenated_gz("gzip") is True
assert _can_read_concatenated_gz("pigz") is True
@@ -446,12 +608,114 @@ def test_xopen_falls_back_to_gzip_open(lacking_pigz_permissions):
assert f.readline() == CONTENT_LINES[0].encode("utf-8")
-def test_open_many_gzip_writers(tmp_path):
+def test_xopen_falls_back_to_gzip_open_no_isal(lacking_pigz_permissions,
+ xopen_without_igzip):
+ with xopen_without_igzip("tests/file.txt.gz", "rb") as f:
+ assert f.readline() == CONTENT_LINES[0].encode("utf-8")
+
+
+def test_xopen_fals_back_to_gzip_open_write_no_isal(lacking_pigz_permissions,
+ xopen_without_igzip,
+ tmp_path):
+ tmp = tmp_path / "test.gz"
+ with xopen_without_igzip(tmp, "wb") as f:
+ f.write(b"hello")
+ assert gzip.decompress(tmp.read_bytes()) == b"hello"
+
+
+def test_xopen_falls_back_to_bzip2_open(lacking_pbzip2_permissions):
+ with xopen("tests/file.txt.bz2", "rb") as f:
+ assert f.readline() == CONTENT_LINES[0].encode("utf-8")
+
+
+def test_open_many_writers(tmp_path, ext):
files = []
for i in range(1, 61):
- path = tmp_path / "{:03d}.txt.gz".format(i)
+ path = tmp_path / f"{i:03d}.txt.{ext}"
f = xopen(path, "wb", threads=2)
f.write(b"hello")
files.append(f)
for f in files:
f.close()
+
+
+def test_pipedcompressionwriter_wrong_mode(tmpdir):
+ with pytest.raises(ValueError) as error:
+ PipedCompressionWriter(tmpdir.join("test"), ["gzip"], "xb")
+ error.match("Mode is 'xb', but it must be")
+
+
+def test_pipedcompressionwriter_wrong_program(tmpdir):
+ with pytest.raises(OSError):
+ PipedCompressionWriter(tmpdir.join("test"), ["XVXCLSKDLA"], "wb")
+
+
+def test_compression_level(tmpdir, gzip_writer):
+ # Currently only the gzip writers handle compression levels.
+ with gzip_writer(tmpdir.join("test.gz"), "wt", 2) as test_h:
+ test_h.write("test")
+ assert gzip.decompress(Path(tmpdir.join("test.gz")).read_bytes()) == b"test"
+
+
+def test_iter_method_writers(writer, tmpdir):
+ opener, extension = writer
+ test_path = tmpdir.join(f"test{extension}")
+ writer = opener(test_path, "wb")
+ assert iter(writer) == writer
+
+
+def test_next_method_writers(writer, tmpdir):
+ opener, extension = writer
+ test_path = tmpdir.join(f"test.{extension}")
+ writer = opener(test_path, "wb")
+ with pytest.raises(io.UnsupportedOperation) as error:
+ next(writer)
+ error.match('not readable')
+
+
+def test_pipedcompressionreader_wrong_mode():
+ with pytest.raises(ValueError) as error:
+ PipedCompressionReader("test", ["gzip"], "xb")
+ error.match("Mode is 'xb', but it must be")
+
+
+def test_piped_compression_reader_peek_binary(reader):
+ opener, extension = reader
+ filegz = Path(__file__).parent / f"file.txt{extension}"
+ with opener(filegz, "rb") as read_h:
+ # Peek returns at least the amount of characters but maybe more
+ # depending on underlying stream. Hence startswith not ==.
+ assert read_h.peek(1).startswith(b"T")
+
+
+ at pytest.mark.parametrize("mode", ["r", "rt"])
+def test_piped_compression_reader_peek_text(reader, mode):
+ opener, extension = reader
+ compressed_file = Path(__file__).parent / f"file.txt{extension}"
+ with opener(compressed_file, mode) as read_h:
+ with pytest.raises(AttributeError):
+ read_h.peek(1)
+
+
+def writers_and_levels():
+ for writer in PIPED_GZIP_WRITERS:
+ if writer == PipedGzipWriter:
+ # Levels 1-9 are supported
+ yield from ((writer, i) for i in range(1, 10))
+ elif writer == PipedPigzWriter:
+ # Levels 0-9 + 11 are supported
+ yield from ((writer, i) for i in list(range(10)) + [11])
+ elif writer == PipedIGzipWriter or writer == PipedPythonIsalWriter:
+ # Levels 0-3 are supported
+ yield from ((writer, i) for i in range(4))
+ else:
+ raise NotImplementedError(f"Test should be implemented for "
+ f"{writer}") # pragma: no cover
+
+
+ at pytest.mark.parametrize(["writer", "level"], writers_and_levels())
+def test_valid_compression_levels(writer, level, tmpdir):
+ test_file = tmpdir.join("test.gz")
+ with writer(test_file, "wb", level) as handle:
+ handle.write(b"test")
+ assert gzip.decompress(Path(test_file).read_bytes()) == b"test"
=====================================
tox.ini
=====================================
@@ -7,10 +7,10 @@ deps =
coverage
setenv = PYTHONDEVMODE = 1
commands =
- coverage run --concurrency=multiprocessing -m pytest --doctest-modules src/xopen/ tests/
- coverage combine
+ coverage run --branch --source=xopen,tests -m pytest -v --doctest-modules tests
coverage report
coverage xml
+ coverage html
[testenv:isal]
deps =
@@ -19,13 +19,13 @@ deps =
isal
[testenv:flake8]
-basepython = python3.6
+basepython = python3.7
deps = flake8
commands = flake8 src/ tests/
skip_install = true
[testenv:mypy]
-basepython = python3.6
+basepython = python3.7
deps = mypy
commands = mypy src/
skip_install = true
View it on GitLab: https://salsa.debian.org/med-team/python-xopen/-/compare/4e2a2b99447ee32ef6fd86e3a4939f860eecdd8e...fb5e8fb41bc0e629c60be6b535c3642d1eb5865b
--
View it on GitLab: https://salsa.debian.org/med-team/python-xopen/-/compare/4e2a2b99447ee32ef6fd86e3a4939f860eecdd8e...fb5e8fb41bc0e629c60be6b535c3642d1eb5865b
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20210928/effc1b8b/attachment-0001.htm>
More information about the debian-med-commit
mailing list